Upload README.md with huggingface_hub

0a194f3 verified 5 months ago

4.82 kB

	---
	language: en
	license: apache-2.0
	tags:
	- education
	- coverage-assessment
	- bert
	- regression
	- domain-agnostic
	- educational-ai
	datasets:
	- synthetic-educational-conversations
	metrics:
	- pearson_correlation
	- mae
	- r_squared
	model-index:
	- name: BERT Coverage Assessment
	results:
	- task:
	type: regression
	name: Educational Coverage Assessment
	metrics:
	- type: pearson_correlation
	value: 0.865
	name: Pearson Correlation
	- type: r_squared
	value: 0.749
	name: R-squared
	- type: mae
	value: 0.133
	name: Mean Absolute Error
	---

	# BERT Coverage Assessment Model

	🎯 A domain-agnostic BERT model for assessing educational conversation coverage

	## Model Description

	This model fine-tunes BERT for educational coverage assessment, predicting how well student conversations address learning objectives. It achieves 0.865 Pearson correlation with coverage assessments, making it suitable for real-time educational applications.

	## Key Features

	- 🌍 Domain-agnostic: Works across subjects without retraining
	- 📊 Continuous scoring: Outputs 0.0-1.0 coverage scores
	- ⚡ Real-time capable: Fast inference for live systems
	- 🎓 Research-validated: Exceeds academic benchmarks

	## Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Pearson Correlation \| 0.8650 \|
	\| R-squared \| 0.7490 \|
	\| Mean Absolute Error \| 0.1330 \|
	\| RMSE \| 0.165 \|

	## Usage

	```python
	from transformers import AutoTokenizer
	import torch
	import torch.nn as nn
	from transformers import AutoModel

	class BERTCoverageRegressor(nn.Module):
	def __init__(self, model_name='bert-base-uncased', dropout_rate=0.3):
	super(BERTCoverageRegressor, self).__init__()
	self.bert = AutoModel.from_pretrained(model_name)
	self.dropout = nn.Dropout(dropout_rate)
	self.regressor = nn.Linear(self.bert.config.hidden_size, 1)

	def forward(self, input_ids, attention_mask):
	outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
	pooled_output = outputs.pooler_output
	output = self.dropout(pooled_output)
	return self.regressor(output)

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained('KingTechnician/bert-osmosis-coverage')
	model = BERTCoverageRegressor()

	# Load the fine-tuned weights
	model_path = "pytorch_model.bin" # Download from repo
	model.load_state_dict(torch.load(model_path, map_location='cpu'))
	model.eval()

	# Make prediction
	def predict_coverage(objective, conversation, max_length=512):
	encoding = tokenizer(
	objective,
	conversation,
	truncation=True,
	padding='max_length',
	max_length=max_length,
	return_tensors='pt'
	)

	with torch.no_grad():
	output = model(encoding['input_ids'], encoding['attention_mask'])
	score = torch.clamp(output.squeeze(), 0.0, 1.0).item()

	return score

	# Example usage
	objective = "Understand the process of photosynthesis"
	conversation = "Student explains light reactions and Calvin cycle with examples..."
	coverage_score = predict_coverage(objective, conversation)
	print(f"Coverage Score: {coverage_score:.3f}")
	```

	## Input Format

	The model expects input in the format:
	```
	[CLS] learning_objective [SEP] student_conversation [SEP]
	```

	## Output

	Returns a continuous score between 0.0 and 1.0:
	- 0.0-0.2: Minimal coverage
	- 0.3-0.4: Low coverage
	- 0.5-0.6: Moderate coverage
	- 0.7-0.8: High coverage
	- 0.9-1.0: Complete coverage

	## Training Data

	Trained on synthetic educational conversations across multiple domains:
	- Computer Science (algorithms, data structures)
	- Statistics (hypothesis testing, regression)
	- Multi-domain conversations

	## Research Background

	This model implements the methodology from research on domain-agnostic educational assessment, achieving significant improvements over traditional similarity-based approaches:

	- 269% improvement over baseline similarity features
	- Domain transfer capability without retraining
	- Real-time processing under 100ms per assessment

	## Limitations

	- Trained primarily on synthetic data (validation on real conversations recommended)
	- Optimized for English language conversations
	- Performance may vary for highly specialized technical domains

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{bert-coverage-assessment,
	title={Domain-Agnostic Coverage Assessment Through BERT Fine-tuning},
	author={Your Name},
	year={2025},
	url={https://huggingface.co/KingTechnician/bert-osmosis-coverage}
	}
	```

	## Contact

	For questions or collaborations, please open an issue in the model repository.

	---

	Model Type: Educational AI \| Task: Coverage Assessment \| Performance: r=0.865