HVF-SLM v3 (Qwen): Maritime Domain SLM

We present HVF-SLM, the first language model specifically for maritime intelligence and data. All dataset creation and supervised fine tuning (SFT) was conducted by Hitachi Vantara Federal. This is the third model produced by HVF for this domain; we previously used the same dataset for both Magistral (v1) and Llama (v2). More info is below. It will also be clear that this model - v3 (Qwen) - is by far the best, and is extraordinarily fast. Even at just 7B parameters, it is directly competing with much larger, more expensive models.

Less is better for domain-specific, mission critical domains.

Third and final iteration in the HVF-SLM AIS research. v3 is based on Qwen2.5-7B. This model successfully addresses the critical failures of v1-magistral and v2-llama, demonstrating extraordinary vessel extraction and maritime calculations without hallucination, even when provided with 100k+ tokens of structured AIS JSON data.

Model Performance

Validated Capabilities:

Accurate vessel extraction: Successfully identifies and extracts specific vessels from 100k+ token JSON contexts
No coordinate hallucination: Uses only actual vessel positions from provided data
Correct maritime calculations: Uses correct physics; applies rhumb line formulas and nautical units as trained
No repetition issues: Generates clean outputs without the endless repetition of v2
100K context window: Handles massive AIS datasets via YaRN scaling (100k+ tokens)

Model Details

Base Model: Qwen/Qwen2.5-7B-Instruct
Context Length: 100k tokens (YaRN rope_scaling factor 4.0)
Training Dataset: 21,543 synthetic maritime Q&A pairs (95K tokens average)
Fine-tuning Method: QLoRA rank 256, alpha 512, LoRA dropout 0.1
Training Loss: 0.117 (training) / 0.084 (eval)
Optimal Temperature: 0.7 (tested range: 0.1-0.9)

Inference

We highly recommend using the following settings for inference. In our case, we use vLLM.

payload={
  "model": "hvf-slm-qwen",
  "prompt": full_prompt,
  "max_tokens": 2500,
  "temperature": 0.7,
  "top_p": 0.9,
  "stop": ["<|im_end|>", "<|im_start|>"]
},

Comparisons

After v1-magistral's complete failure and v2-llama's hallucination issues, v3-qwen succeeds due to:

Architectural advantages: Qwen2.5's native long-context support and structured data capabilities (pre-trained on JSON and long contexts)
Training methodology: Questions positioned before vessel data to prevent truncation
System instruction inclusion: Maritime context provided during training
Aggressive learning rate: 2e-4 forced genuine learning vs memorization
Proper regularization: Dropout and weight decay prevented overfitting
Cosine restarts: Implemented cosine restarts to decay LR, enhancing the robustness of the model and escaping lsos plateau

Validated Use Cases

Maritime vessel tracking and identification
AIS data analysis and extraction
Vessel trajectory calculations
Port congestion analysis
Maritime safety assessments

Research Value

Low training loss (0.0002 in v2) can indicate memorization rather than learning
Proper context ordering (questions first) is critical for extreme sequence lengths
System instructions must be included during training, not just inference
Higher learning rates with instability can force genuine pattern learning

Citation

Part of the HVF-SLM research series documenting iterative improvements in maritime AI. Full citation available upon publication.

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nolanplatt/hvf-slm-v3-qwen

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2182)

this model

Evaluation results

Metadata error: specify a dataset to view leaderboard