nolanplatt commited on
Commit
3974272
·
verified ·
1 Parent(s): d7f3a2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -164
README.md CHANGED
@@ -1,197 +1,55 @@
1
-
2
  ---
3
-
4
  license: apache-2.0
5
-
6
  language:
7
-
8
  - en
9
-
10
  tags:
11
-
12
  - maritime
13
-
14
  - AIS
15
-
16
  - vessel-tracking
17
-
18
  - navigation
19
-
20
  - fine-tuned
21
-
22
- - 131-context
23
-
24
  base_model: mistralai/Magistral-Small-2506
25
-
26
  datasets:
27
-
28
  - synthetic-maritime-ais-qa
29
-
30
  model-index:
31
-
32
- - name: nolanplatt/hvf-slm
33
-
34
  results: []
35
-
36
  ---
37
 
38
- # HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context ⚓︎🛳️
39
-
40
 
41
 
42
- We present a small language model (SLM) with domain expertise in AIS/vessel data. We performed supervised fine tuning (SFT) on [Magistral Small](https://huggingface.co/mistralai/Magistral-Small-2506) with a customized dataset from publicly available AIS data in US coastal waters.
43
 
44
- Dataset creation and supervised fine-tuning (SFT) was performed by [Hitachi Vantara Federal](https://www.hitachivantarafederal.com/).
45
- Cleaning and enrichment of the data was acomplished by leveraging [Pentaho+ Data Integration](https://pentaho.com/).
46
 
 
 
 
 
 
47
 
48
  ## Model Details
49
-
50
-
51
-
52
  - **Base Model**: Magistral-Small-2506 (24B parameters)
53
-
54
  - **Context Length**: 131k tokens (via RoPE scaling factor 3.2)
55
-
56
- - **Training Dataset**: ~22,000 synthetic maritime Q&A pairs with full AIS tracking data (random vessel context for each pair pulled from ~3.4B U.S. Coast Guard data). Differing linguistic variations, styles, focus areas.
57
-
58
  - **Fine-tuning Method**: QLoRA (4-bit) rank 128
 
59
 
60
- - **Hardware**: NVIDIA H100 (80GB)
61
-
62
- - **Training Duration**: ~18 hours
63
-
64
-
65
-
66
- ## Intended Use
67
-
68
-
69
-
70
- This model excels at:
71
-
72
- - AIS trajectory prediction and analysis
73
-
74
- - Maritime anomaly detection
75
-
76
- - Vessel behavior classification
77
-
78
- - Navigation compliance (COLREGs)
79
-
80
- - Route optimization with AIS constraints
81
-
82
- - Maritime domain Q&A
83
-
84
-
85
-
86
- ## Technical Specifications
87
-
88
-
89
-
90
- - **Model Size**: 24B parameters (16-bit merged)
91
-
92
- - **Max Context**: 131,072 tokens
93
-
94
- - **RoPE Scaling**: Linear, factor 3.2
95
-
96
- - **Supported Tasks**: Text generation, maritime analysis
97
-
98
- - **Long Context Handling**: Successfully trained on sequences up to 131k tokens without truncation on a *singular GPU* via gradient checkpointing.
99
-
100
- - **Mixed Precision**: BFloat16 training with 4-bit base model quantization
101
-
102
- - **Cosine Warm Restarts**: 6 restart cycles to escape loss plateaus
103
 
 
 
 
 
104
 
105
 
106
- ## Usage
107
-
108
-
109
-
110
- ```python
111
-
112
- from transformers import AutoModelForCausalLM, AutoTokenizer
113
-
114
-
115
-
116
- model = AutoModelForCausalLM.from_pretrained("nolanplatt/hvf-slm")
117
-
118
- tokenizer = AutoTokenizer.from_pretrained("nolanplatt/hvf-slm")
119
-
120
-
121
-
122
- # Example: Analyze AIS data
123
-
124
- prompt = "Analyze the following AIS data and predict the vessel's next position..." # inject AIS data after prompt, formatted as JSON
125
-
126
- inputs = tokenizer(prompt, return_tensors="pt", max_length=131072, truncation=True)
127
-
128
- outputs = model.generate(**inputs, max_length=2000)
129
-
130
- response = tokenizer.decode(outputs[0], skip_special_tokens=True)
131
-
132
- ```
133
-
134
-
135
-
136
- ## Training Configuration
137
-
138
- # Through our extensive research, these hyperparameters enable 131k context on single H100
139
-
140
- ```json
141
-
142
- {
143
-
144
- "max_seq_length": 131072,
145
-
146
- "per_device_train_batch_size": 1,
147
-
148
- "gradient_accumulation_steps": 8,
149
-
150
- "learning_rate": 3e-5, // inc. this if fall into loss plateau; helps escape
151
-
152
- "warmup_steps": 300,
153
-
154
- "lr_scheduler_type": "cosine_with_restarts", // also helps escape loss plateaus
155
-
156
- "num_cycles": 6,
157
-
158
- "optimizer": "paged_adamw_8bit",
159
-
160
- "bf16": true,
161
-
162
- "gradient_checkpointing": true
163
-
164
- }
165
-
166
- ```
167
-
168
-
169
-
170
- ## Performance
171
-
172
- We are still performing evaluation's on HVF-SLM's success. We can say, preliminarily, that it successfully processes full AIS tracking sequences (90k+ tokens) and maintains domain expertise while preserving general capabilities of the base Magistral model.
173
-
174
-
175
 
176
  ## Citation
177
 
178
- This model is open-source and free to use, permitted you cite the authors and do not claim it as your own.
179
-
180
- A full citation will be available here upon publication.
181
-
182
- ```
183
-
184
- @misc{hvf-slm-2025,
185
-
186
- title={HVF-SLM: Maritime Domain-Specialized Language Model with 131k Context},
187
-
188
- author={Platt, Nolan and Nayak, Pragyansmita},
189
-
190
- year={2025},
191
-
192
- publisher={HuggingFace}
193
-
194
- }
195
-
196
- ```
197
-
 
 
1
  ---
 
2
  license: apache-2.0
 
3
  language:
 
4
  - en
 
5
  tags:
 
6
  - maritime
 
7
  - AIS
 
8
  - vessel-tracking
 
9
  - navigation
 
10
  - fine-tuned
11
+ - experimental
12
+ - research
 
13
  base_model: mistralai/Magistral-Small-2506
 
14
  datasets:
 
15
  - synthetic-maritime-ais-qa
 
16
  model-index:
17
+ - name: hvf-slm-v1-magistral
 
 
18
  results: []
 
19
  ---
20
 
21
+ # HVF-SLM v1 (Magistral): Experimental Baseline for Maritime Domain LLM Research
 
22
 
23
 
24
+ This is the first experimental iteration (v1) in the HVF-SLM series, serving as a baseline for maritime domain-specialized language models. While it demonstrated 131k context capability during training, significant limitations were discovered during evaluation.
25
 
26
+ ## Model Status
 
27
 
28
+ **This model is provided for research purposes only** as part of our iterative development process documented in [paper citation pending]. It has:
29
+ - Successful 131k token context window during training
30
+ - Poor coordinate extraction accuracy
31
+ - Tendency to generate incorrect vessel positions
32
+ - Limited understanding of maritime JSON structure
33
 
34
  ## Model Details
 
 
 
35
  - **Base Model**: Magistral-Small-2506 (24B parameters)
 
36
  - **Context Length**: 131k tokens (via RoPE scaling factor 3.2)
37
+ - **Training Dataset**: ~22,000 synthetic maritime Q&A pairs
 
 
38
  - **Fine-tuning Method**: QLoRA (4-bit) rank 128
39
+ - **Status**: Superseded by v2-llama and v3-qwen
40
 
41
+ ## Why This Model Failed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ Despite achieving low training loss (0.004), the model failed to generalize to real-world maritime queries:
44
+ 1. **Memorization over comprehension**: The model memorized training patterns rather than learning vessel relationships
45
+ 2. **JSON parsing failures**: Unable to reliably extract specific vessels from complex AIS data
46
+ 3. **Coordinate hallucination**: Generated plausible but incorrect lat/lon positions
47
 
48
 
49
+ This baseline informed critical improvements in subsequent versions:
50
+ - v2-llama: Better extraction but with hallucination issues
51
+ - v3-qwen: Architectural changes to address fundamental limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ## Citation
54
 
55
+ Part of a larger, in-depth paper by HVF. Full citation available upon publication.