Query Dependence Classifier
A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.
Model Description
- Model Type: Random Forest Classifier (scikit-learn)
- Task: Binary text classification for query dependency detection
- Features: 45 engineered linguistic features
- Classes: Independent vs Dependent queries
Intended Use
This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.
Examples:
- Query 1: "What is machine learning?" Query 2: "Can you give me examples?" โ Dependent
- Query 1: "What is AI?" Query 2: "What's the weather today?" โ Independent
Model Performance
- Training Features: 45 engineered features
- Model Architecture: Random Forest with 500 estimators
- Cross-validation: Out-of-bag scoring enabled
Feature Engineering
The model uses 45 sophisticated features including:
Lexical Features
- Word overlap and Jaccard similarity
- N-gram overlap (bigrams, trigrams)
- Semantic similarity with stemming
Linguistic Features
- Pronoun and reference patterns
- Question type classification
- Discourse markers and connectives
- Dependency phrases detection
Structural Features
- Length ratios and differences
- Punctuation patterns
- Complexity measures (syllable density)
- Capitalization patterns
Usage
# Install dependencies
# pip install scikit-learn pandas nltk huggingface-hub joblib
from huggingface_hub import hf_hub_download
import joblib
import json
# Download model files
model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")
# Load model components
model = joblib.load(model_path)
label_encoder = joblib.load(encoder_path)
with open(config_path, 'r') as f:
config = json.load(f)
# Initialize classifier
classifier = DependencyClassifier()
classifier.model = model
classifier.label_encoder = label_encoder
classifier.feature_names = config['feature_names']
# Make predictions
result = classifier.predict(
"What is artificial intelligence?",
"Can you give me some examples?"
)
print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.3f}")
print(f"Probabilities: {result['probabilities']}")
Alternative Loading Method
# Load directly using class method
classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")
# Use for inference
result = classifier.predict("Query 1", "Query 2")
Training Data Format
The model expects training data with columns:
query1
: First query/questionquery2
: Second query/questionlabel
: 'independent' or 'dependent'
Model Architecture
RandomForestClassifier(
n_estimators=500,
max_depth=15,
min_samples_split=7,
min_samples_leaf=3,
max_features='sqrt',
class_weight='balanced',
random_state=42
)
Limitations
- Designed for English language queries
- Performance may vary on very short queries (< 3 words)
- Requires NLTK stopwords corpus for optimal performance
- Best suited for conversational question-answering scenarios
Technical Details
- Framework: scikit-learn
- Storage Format: joblib (secure alternative to pickle)
- Configuration: JSON metadata
- Reproducibility: Fixed random seed (42)
Citation
@misc{query_dependence_classifier_2025,
title={Query Dependence Classifier},
author={Admin-4minds},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
}
License
This model is released under the MIT License.
Contact
For questions or issues, please contact the admin-4minds team.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support