Instructions to use gxx27/BioTool-finetuned-Qwen3-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gxx27/BioTool-finetuned-Qwen3-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="gxx27/BioTool-finetuned-Qwen3-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("gxx27/BioTool-finetuned-Qwen3-4B")
model = AutoModelForMultimodalLM.from_pretrained("gxx27/BioTool-finetuned-Qwen3-4B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use gxx27/BioTool-finetuned-Qwen3-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gxx27/BioTool-finetuned-Qwen3-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gxx27/BioTool-finetuned-Qwen3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/gxx27/BioTool-finetuned-Qwen3-4B

SGLang

How to use gxx27/BioTool-finetuned-Qwen3-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "gxx27/BioTool-finetuned-Qwen3-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gxx27/BioTool-finetuned-Qwen3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "gxx27/BioTool-finetuned-Qwen3-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gxx27/BioTool-finetuned-Qwen3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use gxx27/BioTool-finetuned-Qwen3-4B with Docker Model Runner:
```
docker model run hf.co/gxx27/BioTool-finetuned-Qwen3-4B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

BioTool-finetuned-Qwen3-4B

Qwen3-4B-Instruct-2507 fully fine-tuned on the BioTool training split (5,632 samples). The resulting model produces structured tool calls against 127 biomedical APIs spanning NCBI E-utilities + BLAST, UniProt and Ensembl REST.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("gxx27/BioTool-finetuned-Qwen3-4B")
mdl = AutoModelForCausalLM.from_pretrained(
    "gxx27/BioTool-finetuned-Qwen3-4B",
    torch_dtype="auto",
    device_map="auto",
)

system = (
    "You are a biomedicine function-calling assistant. Always respond by calling "
    "exactly one function from the provided tools with a single tool call. Do not "
    "answer with natural language."
)
question = "What is the genomic location of the BRCA1 gene in humans?"

prompt = tok.apply_chat_template(
    [
        {"role": "system", "content": system},
        {"role": "user",   "content": question},
    ],
    tokenize=False,
    add_generation_prompt=True,
)
out = mdl.generate(**tok(prompt, return_tensors="pt").to(mdl.device), max_new_tokens=256)
print(tok.decode(out[0][tok(prompt, return_tensors="pt")["input_ids"].shape[1]:],
                 skip_special_tokens=False))

To execute the resulting tool call, use the Python wrappers in the BioTool repository:

import sys
sys.path.insert(0, "/path/to/BioTool")
from ensembl.lookup.api import lookup_by_symbol

print(lookup_by_symbol(species="human", symbol="BRCA1"))

Training data

Source: the BioTool training split (data/BioTool_train.json).
Format: ShareGPT-style conversations of the form system → user → function_call, where function_call.value is a JSON object {"name": <tool_name>, "arguments": <dict>}.
Coverage: 127 tools across 3 databases (NCBI / UniProt / Ensembl).
Size: 5,632 samples (an additional 1,408 samples form the held-out test set used to evaluate the model).

Training setup

Base model: Qwen/Qwen3-4B-Instruct-2507
Tuning method: full fine-tuning (no LoRA)
Framework: LLaMA-Factory
Template: qwen3_nothink
Cutoff length: 2,048
Optimizer: AdamW (fused), lr=2e-5, cosine schedule, warmup_ratio=0.1
Epochs: 3
Effective batch size: 16 (per-device 1 × grad-accum 16)
Precision: bf16
Hyperparameter file: qwen3_4b.yaml in the BioTool repo's llamafactory_cfgs/

Evaluation

On the BioTool test split (1,408 samples), this model achieves the highest BioTool Score among the open-source models we evaluated, while also being the smallest (4B parameters). Per-database breakdowns and head-to-head numbers against GPT-5.1, GPT-5.1-Codex, Claude Sonnet 4.5 and Gemini 3 Pro are reported in the paper.

Citation

@misc{gao2026biotoolcomprehensivetoolcallingdataset,
      title={BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models},
      author={Xin Gao and Ruiyi Zhang and Meixi Du and Peijia Qin and Pengtao Xie},
      year={2026},
      eprint={2605.05758},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.05758},
}

License

Released under the Apache 2.0 license, inheriting from the base model. The underlying API responses used during training are subject to the licenses of the respective NCBI, UniProt and Ensembl services.