---
license: apache-2.0
base_model: Qwen/Qwen3-14B
tags:
  - revit
  - bim
  - code-generation
  - architecture
  - engineering
  - construction
  - aec
  - ifc
  - fine-tuned
  - qlora
  - unsloth
datasets:
  - custom
language:
  - en
pipeline_tag: text-generation
model-index:
  - name: revit-coder-14b
    results:
      - task:
          type: text-generation
          name: Revit API Code Generation
        metrics:
          - type: custom-composite
            value: 0.800
            name: Composite Score (40 questions)
---

# revit-coder-14b

![Hero Banner](images/01-hero-banner.png)

A fine-tuned Qwen3-14B specialized in **Revit API code generation**, **IFC reasoning**, and **BIM development patterns**.

**An experiment in domain-specific fine-tuning** demonstrating that focused training on 177,127 Revit/BIM examples can produce a specialized model for $48 in 8 hours on a single GPU. The model was validated on a 40-question Revit C# benchmark, showing competitive performance with frontier models on domain-specific tasks.

**Key insight:** This demonstrates the value of domain-specific fine-tuning for specialized use cases rather than claiming superiority over general-purpose frontier models.

**GitHub:** [schauh11/revit-coder-14b](https://github.com/schauh11/revit-coder-14b) - Benchmark suite, training scripts, and full results.

## Training

![Training Pipeline](images/02-training-pipeline.png)

### Training Configuration

| Spec | Value |
|------|-------|
| **Base Model** | Qwen3-14B-Instruct |
| **Method** | QLoRA (rank 64, alpha 128) |
| **Framework** | Unsloth + HuggingFace TRL |
| **Training Data** | 159,414 examples (90%) |
| **Validation Data** | 8,856 examples (5%) |
| **Test Data** | 8,857 examples (5%) |
| **Epochs** | 3 |
| **Sequence Length** | 4096 tokens |
| **Batch Size** | 16 effective (4 x 4 gradient accumulation) |
| **Learning Rate** | 2e-4 (cosine schedule) |
| **Warmup Steps** | 200 |
| **Optimizer** | AdamW 8-bit |
| **Weight Decay** | 0.01 |
| **GPU** | NVIDIA B200 192GB |
| **Training Time** | ~8 hours |
| **Training Cost** | ~$48 |
| **LoRA Dropout** | 0 (required for Unsloth) |
| **Early Stopping Patience** | 3 epochs |
| **Random Seed** | 42 |
| **Packing** | Enabled |

**Why these hyperparameters?**
- **QLoRA rank 64:** Balances expressiveness with efficiency for domain-specific patterns
- **Packing enabled:** Maximizes GPU utilization on variable-length sequences
- **Cosine schedule + warmup:** Stable learning on technical documentation
- **Low dropout:** Unsloth fast patching requires 0 dropout for optimized training
- **4096 context:** Covers typical Revit API code examples with context

### Dataset Splits

| Split | Examples | Percentage | Purpose |
|-------|----------|------------|---------|
| Train | 159,414 | 90% | Model training |
| Validation | 8,856 | 5% | Hyperparameter tuning & early stopping |
| Test | 8,857 | 5% | Final evaluation (not used in this benchmark) |

**Split strategy:** Stratified sampling by domain to maintain proportional representation across all 6 BIM domains. Random seed: 42.

**Note:** The 40-question benchmark is separate from the training data—it tests zero-shot generalization on new Revit API questions not seen during training.

### Training Data Distribution

![Data Distribution](images/03-data-distribution.png)

| Domain | Records | % | Description |
|--------|---------|---|-------------|
| revit_csharp | 143,060 | 72.7% | Revit API C# code from docs, examples, references |
| ifc_reasoning | 44,571 | 22.6% | IFC topology, spatial hierarchies, BIM reasoning |
| aps_schema | 4,980 | 2.5% | APS/Forge cloud API patterns |
| revit_patterns | 3,758 | 1.9% | Development patterns (IUpdater, events, filters) |
| revit_python | 285 | 0.1% | pyRevit Python automation |
| mcp_tools | 149 | 0.1% | MCP tool definitions for AI-BIM integration |

**Why this distribution?**
- **72.7% revit_csharp:** Reflects the primary use case—Revit plugin development is predominantly C#/.NET
- **22.6% ifc_reasoning:** BIM data exchange and interoperability are core to AEC workflows
- **Domain-tagged system prompts:** Each domain uses specialized prompts to activate appropriate model behaviors

**Data format:** ChatML with domain-specific system prompts. Each record includes `<|im_start|>system`, `<|im_start|>user`, `<|im_start|>assistant` sections.

**Sources:** Revit API Docs 2025/2026, Revit SDK code examples, IFC/BIM specifications, Autodesk forums, APS SDK documentation.

### Environmental Impact

| Metric | Value |
|--------|-------|
| Hardware | 1x NVIDIA B200 192GB |
| Training time | ~8 hours |
| Cloud cost | ~$48 (RunPod) |
| CO2 estimate | ~2.4 kg (based on US grid average) |

**Key takeaway:** Domain-specific fine-tuning achieved competitive performance with <3% of the compute required to train frontier models from scratch.

## Intended Use

**Primary:** Domain-specialized Revit API code generation. This is an experiment demonstrating that domain-specific fine-tuning can achieve competitive results with significantly less compute than training frontier models from scratch.

**Capabilities:**
- Generate correct Revit C# code (FilteredElementCollector, Transaction patterns, BuiltInParameter)
- Validate Revit API usage (catch missing Transactions, null checks, type filter issues)
- Reason about IFC spatial hierarchies and property sets
- Produce Revit development patterns (IExternalEventHandler, IUpdater, ISelectionFilter)

**Limitations:**
- Optimized for Revit 2025/2026 (.NET 8) API, may not cover older API versions
- Strongest on revit_csharp domain; weaker on IFC STEP format generation
- Best results under 800 tokens; quality may degrade on very long outputs
- Not a general-purpose coding model; use frontier models for non-Revit tasks
- The benchmark comparison is asymmetric (fine-tuned vs. zero-shot); Claude with proper system prompts may perform differently

## Benchmark Results

**40-question Revit C# benchmark** - pure code generation focused on practical API usage:

| Model | Avg Score | Questions Scored Higher | Parameters | Inference |
|-------|-----------|-------------------------|------------|-----------|
| revit-coder-14b | 0.800 | 25 of 40 | 14B | Local (Ollama) |
| Claude Opus 4.6 | 0.793 | 15 of 40 | ~100B+ | API |

**Note:** This comparison shows a fine-tuned specialist vs. a zero-shot generalist. The fine-tuned model naturally has advantages on this specific benchmark. In production with proper prompting and examples, Claude may outperform on complex tasks.

### By Difficulty

| Difficulty | Count | revit-coder-14b | Claude Opus 4.6 | Notes |
|------------|-------|-----------------|-----------------|-------|
| Easy | 9 | 0.800 | 0.796 | Similar performance |
| Medium | 19 | 0.839 | 0.801 | Fine-tuned model shows strength on practical patterns |
| Hard | 12 | 0.736 | 0.779 | Claude shows strength on complex multi-class problems |

![Average Scores by Difficulty](images/04-scores-by-difficulty.png)

![Scoring Components Breakdown](images/05-scoring-components.png)

All 40 questions and both models' full responses are published in [BENCHMARK_FULL.md](https://github.com/schauh11/revit-coder-14b/blob/main/benchmark/BENCHMARK_FULL.md).

### Benchmark Methodology

**Data Independence:** The 40 benchmark questions were held out from training data to ensure fair evaluation.

**Automated Scoring:** Each response is scored on three axes:
- **Signal Presence (40%):** Fraction of expected domain keywords found (e.g., `FilteredElementCollector`, `Transaction`, `IfcRelAggregates`)
- **Code Quality (30%):** Domain-specific structural checks (namespaces, class structure, API patterns)
- **Completeness (30%):** Response length, code block formatting, error-free output

**Composite = 0.4 × signal + 0.3 × quality + 0.3 × completeness**

**Important:** No reference answers or human evaluation were used. Scores reflect structural patterns, not compilation or execution. This is automated evaluation only.

**Asymmetric Comparison:** The fine-tuned model received domain training; Claude did not. This tests whether domain-specific fine-tuning provides value, not which model is "better."

## Usage

### Ollama (Recommended)

```bash
# Pull or create the model
ollama run revit-coder-14b-f16

# Query
ollama run revit-coder-14b-f16 "Write C# code to collect all walls and group by type name"
```

### Python (transformers)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "schauh11/revit-coder-14b"  # HuggingFace repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

messages = [
    {"role": "system", "content": "You are a Revit API expert specialized in C# and .NET 8."},
    {"role": "user", "content": "Write code to get all rooms and their areas."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Unsloth (for inference with LoRA adapter)

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="path/to/revit-coder-14b-lora",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)
```

## Citation

```bibtex
@misc{revit-coder-14b-2026,
  title={revit-coder-14b: Domain-Specialized Code Generation for Revit API},
  author={Sanjay Chauhan},
  year={2026},
  url={https://huggingface.co/schauh11/revit-coder-14b}
}
```

## License

Apache 2.0, same as the base Qwen3-14B model.