Text Generation
PEFT
Safetensors
Transformers
English
lora
sft
trl
🇪🇺 Region: EU
File size: 2,054 Bytes
6eef8ad
d9524a6
 
 
 
 
 
 
 
 
 
84c3b20
 
 
 
4f5371c
 
 
6eef8ad
 
d9524a6
6eef8ad
3f949d1
6eef8ad
3f949d1
 
4f5371c
3f949d1
 
6eef8ad
3f949d1
 
 
6eef8ad
3f949d1
6eef8ad
d9524a6
6eef8ad
d9524a6
6eef8ad
 
d9524a6
6eef8ad
d9524a6
6eef8ad
d9524a6
 
 
 
 
 
6eef8ad
d9524a6
6eef8ad
 
 
d9524a6
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
base_model: ethicalabs/xLSTM-7b-Instruct
library_name: peft
model_name: xlstm-7b-instruct-phase-2
tags:
- lora
- sft
- transformers
- trl
licence: license
pipeline_tag: text-generation
datasets:
- teknium/OpenHermes-2.5
- meta-math/MetaMathQA
- trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness
license: mit
language:
- en
---

# Model Card for xlstm-7b-instruct-phase-2

This model is a fine-tuned version of [ethicalabs/xLSTM-7b-Instruct](https://huggingface.co/ethicalabs/xLSTM-7b-Instruct) for task alignment.

It has been trained using [TRL](https://github.com/huggingface/trl) using SFT on assistant-only tokens.

The `k_proj` and `v_proj` matrices have been frozen to isolate and preserve the model's pre-trained knowledge base.

This fine-tuning focused only on the `q_proj` (query) and FFN matrices, adapting the model's reasoning and query-retrieval mechanisms without overwriting its core, frozen knowledge.

This experiment was designed to test the hypothesis that the model's reasoning capabilities (`q_proj`) could be specialized for math/code while its knowledge (`k_proj`, `v_proj`) remained intact.

## Quick start

Work in Progress!

## Training procedure

[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ethicalabs-ai/xlstm-finetuning-ultrafeedback/runs/zxpd9xeh) 


This model was trained with SFT.

### Framework versions

- PEFT 0.17.1
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.8.0+cu126
- Datasets: 4.2.0
- Tokenizers: 0.22.1

## Citations



Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```