Update README.md
Browse files
README.md
CHANGED
|
@@ -1,197 +1,55 @@
|
|
| 1 |
-
|
| 2 |
---
|
| 3 |
-
|
| 4 |
license: apache-2.0
|
| 5 |
-
|
| 6 |
language:
|
| 7 |
-
|
| 8 |
- en
|
| 9 |
-
|
| 10 |
tags:
|
| 11 |
-
|
| 12 |
- maritime
|
| 13 |
-
|
| 14 |
- AIS
|
| 15 |
-
|
| 16 |
- vessel-tracking
|
| 17 |
-
|
| 18 |
- navigation
|
| 19 |
-
|
| 20 |
- fine-tuned
|
| 21 |
-
|
| 22 |
-
-
|
| 23 |
-
|
| 24 |
base_model: mistralai/Magistral-Small-2506
|
| 25 |
-
|
| 26 |
datasets:
|
| 27 |
-
|
| 28 |
- synthetic-maritime-ais-qa
|
| 29 |
-
|
| 30 |
model-index:
|
| 31 |
-
|
| 32 |
-
- name: nolanplatt/hvf-slm
|
| 33 |
-
|
| 34 |
results: []
|
| 35 |
-
|
| 36 |
---
|
| 37 |
|
| 38 |
-
# HVF-SLM:
|
| 39 |
-
|
| 40 |
|
| 41 |
|
| 42 |
-
|
| 43 |
|
| 44 |
-
|
| 45 |
-
Cleaning and enrichment of the data was acomplished by leveraging [Pentaho+ Data Integration](https://pentaho.com/).
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
## Model Details
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
- **Base Model**: Magistral-Small-2506 (24B parameters)
|
| 53 |
-
|
| 54 |
- **Context Length**: 131k tokens (via RoPE scaling factor 3.2)
|
| 55 |
-
|
| 56 |
-
- **Training Dataset**: ~22,000 synthetic maritime Q&A pairs with full AIS tracking data (random vessel context for each pair pulled from ~3.4B U.S. Coast Guard data). Differing linguistic variations, styles, focus areas.
|
| 57 |
-
|
| 58 |
- **Fine-tuning Method**: QLoRA (4-bit) rank 128
|
|
|
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
- **Training Duration**: ~18 hours
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
## Intended Use
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
This model excels at:
|
| 71 |
-
|
| 72 |
-
- AIS trajectory prediction and analysis
|
| 73 |
-
|
| 74 |
-
- Maritime anomaly detection
|
| 75 |
-
|
| 76 |
-
- Vessel behavior classification
|
| 77 |
-
|
| 78 |
-
- Navigation compliance (COLREGs)
|
| 79 |
-
|
| 80 |
-
- Route optimization with AIS constraints
|
| 81 |
-
|
| 82 |
-
- Maritime domain Q&A
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
## Technical Specifications
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
- **Model Size**: 24B parameters (16-bit merged)
|
| 91 |
-
|
| 92 |
-
- **Max Context**: 131,072 tokens
|
| 93 |
-
|
| 94 |
-
- **RoPE Scaling**: Linear, factor 3.2
|
| 95 |
-
|
| 96 |
-
- **Supported Tasks**: Text generation, maritime analysis
|
| 97 |
-
|
| 98 |
-
- **Long Context Handling**: Successfully trained on sequences up to 131k tokens without truncation on a *singular GPU* via gradient checkpointing.
|
| 99 |
-
|
| 100 |
-
- **Mixed Precision**: BFloat16 training with 4-bit base model quantization
|
| 101 |
-
|
| 102 |
-
- **Cosine Warm Restarts**: 6 restart cycles to escape loss plateaus
|
| 103 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
```python
|
| 111 |
-
|
| 112 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
model = AutoModelForCausalLM.from_pretrained("nolanplatt/hvf-slm")
|
| 117 |
-
|
| 118 |
-
tokenizer = AutoTokenizer.from_pretrained("nolanplatt/hvf-slm")
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
# Example: Analyze AIS data
|
| 123 |
-
|
| 124 |
-
prompt = "Analyze the following AIS data and predict the vessel's next position..." # inject AIS data after prompt, formatted as JSON
|
| 125 |
-
|
| 126 |
-
inputs = tokenizer(prompt, return_tensors="pt", max_length=131072, truncation=True)
|
| 127 |
-
|
| 128 |
-
outputs = model.generate(**inputs, max_length=2000)
|
| 129 |
-
|
| 130 |
-
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 131 |
-
|
| 132 |
-
```
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
## Training Configuration
|
| 137 |
-
|
| 138 |
-
# Through our extensive research, these hyperparameters enable 131k context on single H100
|
| 139 |
-
|
| 140 |
-
```json
|
| 141 |
-
|
| 142 |
-
{
|
| 143 |
-
|
| 144 |
-
"max_seq_length": 131072,
|
| 145 |
-
|
| 146 |
-
"per_device_train_batch_size": 1,
|
| 147 |
-
|
| 148 |
-
"gradient_accumulation_steps": 8,
|
| 149 |
-
|
| 150 |
-
"learning_rate": 3e-5, // inc. this if fall into loss plateau; helps escape
|
| 151 |
-
|
| 152 |
-
"warmup_steps": 300,
|
| 153 |
-
|
| 154 |
-
"lr_scheduler_type": "cosine_with_restarts", // also helps escape loss plateaus
|
| 155 |
-
|
| 156 |
-
"num_cycles": 6,
|
| 157 |
-
|
| 158 |
-
"optimizer": "paged_adamw_8bit",
|
| 159 |
-
|
| 160 |
-
"bf16": true,
|
| 161 |
-
|
| 162 |
-
"gradient_checkpointing": true
|
| 163 |
-
|
| 164 |
-
}
|
| 165 |
-
|
| 166 |
-
```
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
## Performance
|
| 171 |
-
|
| 172 |
-
We are still performing evaluation's on HVF-SLM's success. We can say, preliminarily, that it successfully processes full AIS tracking sequences (90k+ tokens) and maintains domain expertise while preserving general capabilities of the base Magistral model.
|
| 173 |
-
|
| 174 |
-
|
| 175 |
|
| 176 |
## Citation
|
| 177 |
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
A full citation will be available here upon publication.
|
| 181 |
-
|
| 182 |
-
```
|
| 183 |
-
|
| 184 |
-
@misc{hvf-slm-2025,
|
| 185 |
-
|
| 186 |
-
title={HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context},
|
| 187 |
-
|
| 188 |
-
author={Platt, Nolan and Nayak, Pragyansmita},
|
| 189 |
-
|
| 190 |
-
year={2025},
|
| 191 |
-
|
| 192 |
-
publisher={HuggingFace}
|
| 193 |
-
|
| 194 |
-
}
|
| 195 |
-
|
| 196 |
-
```
|
| 197 |
-
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
language:
|
|
|
|
| 4 |
- en
|
|
|
|
| 5 |
tags:
|
|
|
|
| 6 |
- maritime
|
|
|
|
| 7 |
- AIS
|
|
|
|
| 8 |
- vessel-tracking
|
|
|
|
| 9 |
- navigation
|
|
|
|
| 10 |
- fine-tuned
|
| 11 |
+
- experimental
|
| 12 |
+
- research
|
|
|
|
| 13 |
base_model: mistralai/Magistral-Small-2506
|
|
|
|
| 14 |
datasets:
|
|
|
|
| 15 |
- synthetic-maritime-ais-qa
|
|
|
|
| 16 |
model-index:
|
| 17 |
+
- name: hvf-slm-v1-magistral
|
|
|
|
|
|
|
| 18 |
results: []
|
|
|
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# HVF-SLM v1 (Magistral): Experimental Baseline for Maritime Domain LLM Research
|
|
|
|
| 22 |
|
| 23 |
|
| 24 |
+
This is the first experimental iteration (v1) in the HVF-SLM series, serving as a baseline for maritime domain-specialized language models. While it demonstrated 131k context capability during training, significant limitations were discovered during evaluation.
|
| 25 |
|
| 26 |
+
## Model Status
|
|
|
|
| 27 |
|
| 28 |
+
**This model is provided for research purposes only** as part of our iterative development process documented in [paper citation pending]. It has:
|
| 29 |
+
- Successful 131k token context window during training
|
| 30 |
+
- Poor coordinate extraction accuracy
|
| 31 |
+
- Tendency to generate incorrect vessel positions
|
| 32 |
+
- Limited understanding of maritime JSON structure
|
| 33 |
|
| 34 |
## Model Details
|
|
|
|
|
|
|
|
|
|
| 35 |
- **Base Model**: Magistral-Small-2506 (24B parameters)
|
|
|
|
| 36 |
- **Context Length**: 131k tokens (via RoPE scaling factor 3.2)
|
| 37 |
+
- **Training Dataset**: ~22,000 synthetic maritime Q&A pairs
|
|
|
|
|
|
|
| 38 |
- **Fine-tuning Method**: QLoRA (4-bit) rank 128
|
| 39 |
+
- **Status**: Superseded by v2-llama and v3-qwen
|
| 40 |
|
| 41 |
+
## Why This Model Failed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
Despite achieving low training loss (0.004), the model failed to generalize to real-world maritime queries:
|
| 44 |
+
1. **Memorization over comprehension**: The model memorized training patterns rather than learning vessel relationships
|
| 45 |
+
2. **JSON parsing failures**: Unable to reliably extract specific vessels from complex AIS data
|
| 46 |
+
3. **Coordinate hallucination**: Generated plausible but incorrect lat/lon positions
|
| 47 |
|
| 48 |
|
| 49 |
+
This baseline informed critical improvements in subsequent versions:
|
| 50 |
+
- v2-llama: Better extraction but with hallucination issues
|
| 51 |
+
- v3-qwen: Architectural changes to address fundamental limitations
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
## Citation
|
| 54 |
|
| 55 |
+
Part of a larger, in-depth paper by HVF. Full citation available upon publication.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|