Query Dependence Classifier

A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.

Model Description

  • Model Type: Random Forest Classifier (scikit-learn)
  • Task: Binary text classification for query dependency detection
  • Features: 45 engineered linguistic features
  • Classes: Independent vs Dependent queries

Intended Use

This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.

Examples:

  • Query 1: "What is machine learning?" Query 2: "Can you give me examples?" โ†’ Dependent
  • Query 1: "What is AI?" Query 2: "What's the weather today?" โ†’ Independent

Model Performance

  • Training Features: 45 engineered features
  • Model Architecture: Random Forest with 500 estimators
  • Cross-validation: Out-of-bag scoring enabled

Feature Engineering

The model uses 45 sophisticated features including:

Lexical Features

  • Word overlap and Jaccard similarity
  • N-gram overlap (bigrams, trigrams)
  • Semantic similarity with stemming

Linguistic Features

  • Pronoun and reference patterns
  • Question type classification
  • Discourse markers and connectives
  • Dependency phrases detection

Structural Features

  • Length ratios and differences
  • Punctuation patterns
  • Complexity measures (syllable density)
  • Capitalization patterns

Usage

# Install dependencies
# pip install scikit-learn pandas nltk huggingface-hub joblib

from huggingface_hub import hf_hub_download
import joblib
import json

# Download model files
model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")

# Load model components
model = joblib.load(model_path)
label_encoder = joblib.load(encoder_path)

with open(config_path, 'r') as f:
    config = json.load(f)

# Initialize classifier
classifier = DependencyClassifier()
classifier.model = model
classifier.label_encoder = label_encoder
classifier.feature_names = config['feature_names']

# Make predictions
result = classifier.predict(
    "What is artificial intelligence?", 
    "Can you give me some examples?"
)

print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.3f}")
print(f"Probabilities: {result['probabilities']}")

Alternative Loading Method

# Load directly using class method
classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")

# Use for inference
result = classifier.predict("Query 1", "Query 2")

Training Data Format

The model expects training data with columns:

  • query1: First query/question
  • query2: Second query/question
  • label: 'independent' or 'dependent'

Model Architecture

RandomForestClassifier(
    n_estimators=500,
    max_depth=15,
    min_samples_split=7,
    min_samples_leaf=3,
    max_features='sqrt',
    class_weight='balanced',
    random_state=42
)

Limitations

  • Designed for English language queries
  • Performance may vary on very short queries (< 3 words)
  • Requires NLTK stopwords corpus for optimal performance
  • Best suited for conversational question-answering scenarios

Technical Details

  • Framework: scikit-learn
  • Storage Format: joblib (secure alternative to pickle)
  • Configuration: JSON metadata
  • Reproducibility: Fixed random seed (42)

Citation

@misc{query_dependence_classifier_2025,
  title={Query Dependence Classifier},
  author={Admin-4minds},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
}

License

This model is released under the MIT License.

Contact

For questions or issues, please contact the admin-4minds team.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support