File size: 2,054 Bytes
6eef8ad d9524a6 84c3b20 4f5371c 6eef8ad d9524a6 6eef8ad 3f949d1 6eef8ad 3f949d1 4f5371c 3f949d1 6eef8ad 3f949d1 6eef8ad 3f949d1 6eef8ad d9524a6 6eef8ad d9524a6 6eef8ad d9524a6 6eef8ad d9524a6 6eef8ad d9524a6 6eef8ad d9524a6 6eef8ad d9524a6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
---
base_model: ethicalabs/xLSTM-7b-Instruct
library_name: peft
model_name: xlstm-7b-instruct-phase-2
tags:
- lora
- sft
- transformers
- trl
licence: license
pipeline_tag: text-generation
datasets:
- teknium/OpenHermes-2.5
- meta-math/MetaMathQA
- trl-lib/ultrafeedback-gpt-3.5-turbo-helpfulness
license: mit
language:
- en
---
# Model Card for xlstm-7b-instruct-phase-2
This model is a fine-tuned version of [ethicalabs/xLSTM-7b-Instruct](https://huggingface.co/ethicalabs/xLSTM-7b-Instruct) for task alignment.
It has been trained using [TRL](https://github.com/huggingface/trl) using SFT on assistant-only tokens.
The `k_proj` and `v_proj` matrices have been frozen to isolate and preserve the model's pre-trained knowledge base.
This fine-tuning focused only on the `q_proj` (query) and FFN matrices, adapting the model's reasoning and query-retrieval mechanisms without overwriting its core, frozen knowledge.
This experiment was designed to test the hypothesis that the model's reasoning capabilities (`q_proj`) could be specialized for math/code while its knowledge (`k_proj`, `v_proj`) remained intact.
## Quick start
Work in Progress!
## Training procedure
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/ethicalabs-ai/xlstm-finetuning-ultrafeedback/runs/zxpd9xeh)
This model was trained with SFT.
### Framework versions
- PEFT 0.17.1
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.8.0+cu126
- Datasets: 4.2.0
- Tokenizers: 0.22.1
## Citations
Cite TRL as:
```bibtex
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
``` |