Query Dependence Classifier

A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.

Model Description

Model Type: Random Forest Classifier (scikit-learn)
Task: Binary text classification for query dependency detection
Features: 45 engineered linguistic features
Classes: Independent vs Dependent queries

Intended Use

This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.

Examples:

Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → Dependent
Query 1: "What is AI?" Query 2: "What's the weather today?" → Independent

Model Performance

Training Features: 45 engineered features
Model Architecture: Random Forest with 500 estimators
Cross-validation: Out-of-bag scoring enabled

Feature Engineering

The model uses 45 sophisticated features including:

Lexical Features

Word overlap and Jaccard similarity
N-gram overlap (bigrams, trigrams)
Semantic similarity with stemming

Linguistic Features

Pronoun and reference patterns
Question type classification
Discourse markers and connectives
Dependency phrases detection

Structural Features

Length ratios and differences
Punctuation patterns
Complexity measures (syllable density)
Capitalization patterns

Usage

# Install dependencies
# pip install scikit-learn pandas nltk huggingface-hub joblib

from huggingface_hub import hf_hub_download
import joblib
import json

# Download model files
model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")

# Load model components
model = joblib.load(model_path)
label_encoder = joblib.load(encoder_path)

with open(config_path, 'r') as f:
    config = json.load(f)

# Initialize classifier
classifier = DependencyClassifier()
classifier.model = model
classifier.label_encoder = label_encoder
classifier.feature_names = config['feature_names']

# Make predictions
result = classifier.predict(
    "What is artificial intelligence?", 
    "Can you give me some examples?"
)

print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.3f}")
print(f"Probabilities: {result['probabilities']}")

Alternative Loading Method

# Load directly using class method
classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")

# Use for inference
result = classifier.predict("Query 1", "Query 2")

Training Data Format

The model expects training data with columns:

query1: First query/question
query2: Second query/question
label: 'independent' or 'dependent'

Model Architecture

RandomForestClassifier(
    n_estimators=500,
    max_depth=15,
    min_samples_split=7,
    min_samples_leaf=3,
    max_features='sqrt',
    class_weight='balanced',
    random_state=42
)

Limitations

Designed for English language queries
Performance may vary on very short queries (< 3 words)
Requires NLTK stopwords corpus for optimal performance
Best suited for conversational question-answering scenarios

Technical Details

Framework: scikit-learn
Storage Format: joblib (secure alternative to pickle)
Configuration: JSON metadata
Reproducibility: Fixed random seed (42)

Citation

@misc{query_dependence_classifier_2025,
  title={Query Dependence Classifier},
  author={Admin-4minds},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
}

License

This model is released under the MIT License.

Contact

For questions or issues, please contact the admin-4minds team.

Downloads last month: -