Update README.md
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ tags:
|
|
| 8 |
- genomics
|
| 9 |
- segmentation
|
| 10 |
---
|
| 11 |
-
#
|
| 12 |
|
| 13 |
SegmentNT is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
|
| 14 |
elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
|
|
@@ -92,7 +92,7 @@ print(f"Intron probabilities shape: {probabilities_intron.shape}")
|
|
| 92 |
|
| 93 |
## Training data
|
| 94 |
|
| 95 |
-
The **
|
| 96 |
During training, sequences are randomly sampled in the genome with associated annotations. However, we keep the sequences in the validation and test set fixed by
|
| 97 |
using a sliding window of length 30,000 over the chromosomes 20 and 21. The validation set was used to monitor training and for early stopping.
|
| 98 |
|
|
|
|
| 8 |
- genomics
|
| 9 |
- segmentation
|
| 10 |
---
|
| 11 |
+
# SegmentNT
|
| 12 |
|
| 13 |
SegmentNT is a segmentation model leveraging the [Nucleotide Transformer](https://huggingface.co/InstaDeepAI/nucleotide-transformer-v2-500m-multi-species) (NT) DNA foundation model to predict the location of several types of genomics
|
| 14 |
elements in a sequence at a single nucleotide resolution. It was trained on 14 different classes of human genomics elements in input sequences up to 30kb. These
|
|
|
|
| 92 |
|
| 93 |
## Training data
|
| 94 |
|
| 95 |
+
The **SegmentNT** model was trained on all human chromosomes except for chromosomes 20 and 21, kept as test set, and chromosome 22, used as a validation set.
|
| 96 |
During training, sequences are randomly sampled in the genome with associated annotations. However, we keep the sequences in the validation and test set fixed by
|
| 97 |
using a sliding window of length 30,000 over the chromosomes 20 and 21. The validation set was used to monitor training and for early stopping.
|
| 98 |
|