---
language: en
license: apache-2.0
tags:
- education
- coverage-assessment
- bert
- regression
- domain-agnostic
- educational-ai
datasets:
- synthetic-educational-conversations
metrics:
- pearson_correlation
- mae
- r_squared
model-index:
- name: BERT Coverage Assessment
  results:
  - task:
      type: regression
      name: Educational Coverage Assessment
    metrics:
    - type: pearson_correlation
      value: 0.865
      name: Pearson Correlation
    - type: r_squared
      value: 0.749
      name: R-squared
    - type: mae
      value: 0.133
      name: Mean Absolute Error
---

# BERT Coverage Assessment Model

🎯 **A domain-agnostic BERT model for assessing educational conversation coverage**

## Model Description

This model fine-tunes BERT for educational coverage assessment, predicting how well student conversations address learning objectives. It achieves **0.865 Pearson correlation** with coverage assessments, making it suitable for real-time educational applications.

## Key Features

- 🌍 **Domain-agnostic**: Works across subjects without retraining
- 📊 **Continuous scoring**: Outputs 0.0-1.0 coverage scores
- ⚡ **Real-time capable**: Fast inference for live systems
- 🎓 **Research-validated**: Exceeds academic benchmarks

## Performance

| Metric | Value |
|--------|-------|
| Pearson Correlation | 0.8650 |
| R-squared | 0.7490 |
| Mean Absolute Error | 0.1330 |
| RMSE | 0.165 |

## Usage

```python
from transformers import AutoTokenizer
import torch
import torch.nn as nn
from transformers import AutoModel

class BERTCoverageRegressor(nn.Module):
    def __init__(self, model_name='bert-base-uncased', dropout_rate=0.3):
        super(BERTCoverageRegressor, self).__init__()
        self.bert = AutoModel.from_pretrained(model_name)
        self.dropout = nn.Dropout(dropout_rate)
        self.regressor = nn.Linear(self.bert.config.hidden_size, 1)
        
    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.pooler_output
        output = self.dropout(pooled_output)
        return self.regressor(output)

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('KingTechnician/bert-osmosis-coverage')
model = BERTCoverageRegressor()

# Load the fine-tuned weights
model_path = "pytorch_model.bin"  # Download from repo
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()

# Make prediction
def predict_coverage(objective, conversation, max_length=512):
    encoding = tokenizer(
        objective,
        conversation,
        truncation=True,
        padding='max_length',
        max_length=max_length,
        return_tensors='pt'
    )
    
    with torch.no_grad():
        output = model(encoding['input_ids'], encoding['attention_mask'])
        score = torch.clamp(output.squeeze(), 0.0, 1.0).item()
    
    return score

# Example usage
objective = "Understand the process of photosynthesis"
conversation = "Student explains light reactions and Calvin cycle with examples..."
coverage_score = predict_coverage(objective, conversation)
print(f"Coverage Score: {coverage_score:.3f}")
```

## Input Format

The model expects input in the format:
```
[CLS] learning_objective [SEP] student_conversation [SEP]
```

## Output

Returns a continuous score between 0.0 and 1.0:
- **0.0-0.2**: Minimal coverage
- **0.3-0.4**: Low coverage  
- **0.5-0.6**: Moderate coverage
- **0.7-0.8**: High coverage
- **0.9-1.0**: Complete coverage

## Training Data

Trained on synthetic educational conversations across multiple domains:
- Computer Science (algorithms, data structures)
- Statistics (hypothesis testing, regression)
- Multi-domain conversations

## Research Background

This model implements the methodology from research on domain-agnostic educational assessment, achieving significant improvements over traditional similarity-based approaches:

- **269% improvement** over baseline similarity features
- **Domain transfer capability** without retraining
- **Real-time processing** under 100ms per assessment

## Limitations

- Trained primarily on synthetic data (validation on real conversations recommended)
- Optimized for English language conversations
- Performance may vary for highly specialized technical domains

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{bert-coverage-assessment,
  title={Domain-Agnostic Coverage Assessment Through BERT Fine-tuning},
  author={Your Name},
  year={2025},
  url={https://huggingface.co/KingTechnician/bert-osmosis-coverage}
}
```

## Contact

For questions or collaborations, please open an issue in the model repository.

---

**Model Type**: Educational AI | **Task**: Coverage Assessment | **Performance**: r=0.865