--- license: apache-2.0 base_model: Qwen/Qwen3-14B tags: - revit - bim - code-generation - architecture - engineering - construction - aec - ifc - fine-tuned - qlora - unsloth datasets: - custom language: - en pipeline_tag: text-generation model-index: - name: revit-coder-14b results: - task: type: text-generation name: Revit API Code Generation metrics: - type: custom-composite value: 0.800 name: Composite Score (40 questions) --- # revit-coder-14b ![Hero Banner](images/01-hero-banner.png) A fine-tuned Qwen3-14B specialized in **Revit API code generation**, **IFC reasoning**, and **BIM development patterns**. **An experiment in domain-specific fine-tuning** demonstrating that focused training on 177,127 Revit/BIM examples can produce a specialized model for $48 in 8 hours on a single GPU. The model was validated on a 40-question Revit C# benchmark, showing competitive performance with frontier models on domain-specific tasks. **Key insight:** This demonstrates the value of domain-specific fine-tuning for specialized use cases rather than claiming superiority over general-purpose frontier models. **GitHub:** [schauh11/revit-coder-14b](https://github.com/schauh11/revit-coder-14b) - Benchmark suite, training scripts, and full results. ## Training ![Training Pipeline](images/02-training-pipeline.png) ### Training Configuration | Spec | Value | |------|-------| | **Base Model** | Qwen3-14B-Instruct | | **Method** | QLoRA (rank 64, alpha 128) | | **Framework** | Unsloth + HuggingFace TRL | | **Training Data** | 159,414 examples (90%) | | **Validation Data** | 8,856 examples (5%) | | **Test Data** | 8,857 examples (5%) | | **Epochs** | 3 | | **Sequence Length** | 4096 tokens | | **Batch Size** | 16 effective (4 x 4 gradient accumulation) | | **Learning Rate** | 2e-4 (cosine schedule) | | **Warmup Steps** | 200 | | **Optimizer** | AdamW 8-bit | | **Weight Decay** | 0.01 | | **GPU** | NVIDIA B200 192GB | | **Training Time** | ~8 hours | | **Training Cost** | ~$48 | | **LoRA Dropout** | 0 (required for Unsloth) | | **Early Stopping Patience** | 3 epochs | | **Random Seed** | 42 | | **Packing** | Enabled | **Why these hyperparameters?** - **QLoRA rank 64:** Balances expressiveness with efficiency for domain-specific patterns - **Packing enabled:** Maximizes GPU utilization on variable-length sequences - **Cosine schedule + warmup:** Stable learning on technical documentation - **Low dropout:** Unsloth fast patching requires 0 dropout for optimized training - **4096 context:** Covers typical Revit API code examples with context ### Dataset Splits | Split | Examples | Percentage | Purpose | |-------|----------|------------|---------| | Train | 159,414 | 90% | Model training | | Validation | 8,856 | 5% | Hyperparameter tuning & early stopping | | Test | 8,857 | 5% | Final evaluation (not used in this benchmark) | **Split strategy:** Stratified sampling by domain to maintain proportional representation across all 6 BIM domains. Random seed: 42. **Note:** The 40-question benchmark is separate from the training data—it tests zero-shot generalization on new Revit API questions not seen during training. ### Training Data Distribution ![Data Distribution](images/03-data-distribution.png) | Domain | Records | % | Description | |--------|---------|---|-------------| | revit_csharp | 143,060 | 72.7% | Revit API C# code from docs, examples, references | | ifc_reasoning | 44,571 | 22.6% | IFC topology, spatial hierarchies, BIM reasoning | | aps_schema | 4,980 | 2.5% | APS/Forge cloud API patterns | | revit_patterns | 3,758 | 1.9% | Development patterns (IUpdater, events, filters) | | revit_python | 285 | 0.1% | pyRevit Python automation | | mcp_tools | 149 | 0.1% | MCP tool definitions for AI-BIM integration | **Why this distribution?** - **72.7% revit_csharp:** Reflects the primary use case—Revit plugin development is predominantly C#/.NET - **22.6% ifc_reasoning:** BIM data exchange and interoperability are core to AEC workflows - **Domain-tagged system prompts:** Each domain uses specialized prompts to activate appropriate model behaviors **Data format:** ChatML with domain-specific system prompts. Each record includes `<|im_start|>system`, `<|im_start|>user`, `<|im_start|>assistant` sections. **Sources:** Revit API Docs 2025/2026, Revit SDK code examples, IFC/BIM specifications, Autodesk forums, APS SDK documentation. ### Environmental Impact | Metric | Value | |--------|-------| | Hardware | 1x NVIDIA B200 192GB | | Training time | ~8 hours | | Cloud cost | ~$48 (RunPod) | | CO2 estimate | ~2.4 kg (based on US grid average) | **Key takeaway:** Domain-specific fine-tuning achieved competitive performance with <3% of the compute required to train frontier models from scratch. ## Intended Use **Primary:** Domain-specialized Revit API code generation. This is an experiment demonstrating that domain-specific fine-tuning can achieve competitive results with significantly less compute than training frontier models from scratch. **Capabilities:** - Generate correct Revit C# code (FilteredElementCollector, Transaction patterns, BuiltInParameter) - Validate Revit API usage (catch missing Transactions, null checks, type filter issues) - Reason about IFC spatial hierarchies and property sets - Produce Revit development patterns (IExternalEventHandler, IUpdater, ISelectionFilter) **Limitations:** - Optimized for Revit 2025/2026 (.NET 8) API, may not cover older API versions - Strongest on revit_csharp domain; weaker on IFC STEP format generation - Best results under 800 tokens; quality may degrade on very long outputs - Not a general-purpose coding model; use frontier models for non-Revit tasks - The benchmark comparison is asymmetric (fine-tuned vs. zero-shot); Claude with proper system prompts may perform differently ## Benchmark Results **40-question Revit C# benchmark** - pure code generation focused on practical API usage: | Model | Avg Score | Questions Scored Higher | Parameters | Inference | |-------|-----------|-------------------------|------------|-----------| | revit-coder-14b | 0.800 | 25 of 40 | 14B | Local (Ollama) | | Claude Opus 4.6 | 0.793 | 15 of 40 | ~100B+ | API | **Note:** This comparison shows a fine-tuned specialist vs. a zero-shot generalist. The fine-tuned model naturally has advantages on this specific benchmark. In production with proper prompting and examples, Claude may outperform on complex tasks. ### By Difficulty | Difficulty | Count | revit-coder-14b | Claude Opus 4.6 | Notes | |------------|-------|-----------------|-----------------|-------| | Easy | 9 | 0.800 | 0.796 | Similar performance | | Medium | 19 | 0.839 | 0.801 | Fine-tuned model shows strength on practical patterns | | Hard | 12 | 0.736 | 0.779 | Claude shows strength on complex multi-class problems | ![Average Scores by Difficulty](images/04-scores-by-difficulty.png) ![Scoring Components Breakdown](images/05-scoring-components.png) All 40 questions and both models' full responses are published in [BENCHMARK_FULL.md](https://github.com/schauh11/revit-coder-14b/blob/main/benchmark/BENCHMARK_FULL.md). ### Benchmark Methodology **Data Independence:** The 40 benchmark questions were held out from training data to ensure fair evaluation. **Automated Scoring:** Each response is scored on three axes: - **Signal Presence (40%):** Fraction of expected domain keywords found (e.g., `FilteredElementCollector`, `Transaction`, `IfcRelAggregates`) - **Code Quality (30%):** Domain-specific structural checks (namespaces, class structure, API patterns) - **Completeness (30%):** Response length, code block formatting, error-free output **Composite = 0.4 × signal + 0.3 × quality + 0.3 × completeness** **Important:** No reference answers or human evaluation were used. Scores reflect structural patterns, not compilation or execution. This is automated evaluation only. **Asymmetric Comparison:** The fine-tuned model received domain training; Claude did not. This tests whether domain-specific fine-tuning provides value, not which model is "better." ## Usage ### Ollama (Recommended) ```bash # Pull or create the model ollama run revit-coder-14b-f16 # Query ollama run revit-coder-14b-f16 "Write C# code to collect all walls and group by type name" ``` ### Python (transformers) ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "schauh11/revit-coder-14b" # HuggingFace repo tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto") messages = [ {"role": "system", "content": "You are a Revit API expert specialized in C# and .NET 8."}, {"role": "user", "content": "Write code to get all rooms and their areas."}, ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.1) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Unsloth (for inference with LoRA adapter) ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="path/to/revit-coder-14b-lora", max_seq_length=4096, load_in_4bit=True, ) FastLanguageModel.for_inference(model) ``` ## Citation ```bibtex @misc{revit-coder-14b-2026, title={revit-coder-14b: Domain-Specialized Code Generation for Revit API}, author={Sanjay Chauhan}, year={2026}, url={https://huggingface.co/schauh11/revit-coder-14b} } ``` ## License Apache 2.0, same as the base Qwen3-14B model.