mjpsm/Idea-Difficulty-XGB
π§Ύ Overview
This model predicts the difficulty of a business idea as Low, Medium, or High.
It is part of the Entrepreneurial Readiness series of tabular classifiers (alongside Skill Level, Risk Tolerance, and Confidence).
The model was trained with XGBoost on a 2,000-row synthetic dataset of structured features that capture common difficulty drivers.
π₯ Input Features
| Feature | Type | Range | Definition |
|---|---|---|---|
capital_required |
int | 1β10 | How much upfront capital is needed (1 = minimal, 10 = very high) |
technical_complexity |
int | 1β10 | How technically difficult the product/service is to build or maintain |
market_competition |
int | 1β10 | How crowded the target market is with competitors |
customer_acquisition_difficulty |
int | 1β10 | How difficult it is to acquire and retain customers |
regulatory_hurdles |
int | 1β10 | The degree of legal/regulatory challenges |
time_to_mvp_months |
int | 1β60 | Estimated time to Minimum Viable Product launch (in months) |
team_expertise_required |
int | 1β10 | Level of specialized expertise/team members required |
scalability_requirement |
int | 1β10 | Degree to which scaling is required for success |
Target label:
Low= Idea is relatively easy to executeMedium= Moderately challengingHigh= Difficult, requiring significant resources and expertise
π Performance
- Accuracy: 0.9733
- Macro F1: 0.9733
- Log Loss: 0.0584
Confusion Matrix (rows = true, cols = predicted):
| High | Low | Medium | |
|---|---|---|---|
| High | 100 | 0 | 0 |
| Low | 0 | 96 | 4 |
| Medium | 2 | 2 | 96 |
π Quickstart (load from the Hub)
# Load directly from: mjpsm/Idea-Difficulty-XGB
from huggingface_hub import hf_hub_download
from xgboost import XGBClassifier
import pandas as pd, json
REPO_ID = "mjpsm/Idea-Difficulty-XGB"
model_path = hf_hub_download(REPO_ID, "xgb_model.json")
clf = XGBClassifier()
clf.load_model(model_path)
# IMPORTANT: Use the same feature names/order as training
FEATURES = [
"capital_required","technical_complexity","market_competition",
"customer_acquisition_difficulty","regulatory_hurdles",
"time_to_mvp_months","team_expertise_required","scalability_requirement"
]
row = pd.DataFrame([{
"capital_required": 7,
"technical_complexity": 9,
"market_competition": 6,
"customer_acquisition_difficulty": 8,
"regulatory_hurdles": 7,
"time_to_mvp_months": 18,
"team_expertise_required": 5,
"scalability_requirement": 9
}], columns=FEATURES)
pred_id = int(clf.predict(row)[0])
# If label_map.json is NOT uploaded, default to alphabetical LabelEncoder order:
CLASSES = ["High","Low","Medium"] # update if you publish label_map.json
print("Predicted Idea Difficulty:", CLASSES[pred_id])
# OPTIONAL: If you later upload 'label_map.json', prefer this:
# lm_path = hf_hub_download(REPO_ID, "label_map.json")
# label_map = json.load(open(lm_path))
# inv_map = {v:k for k,v in label_map.items()}
# print("Predicted Idea Difficulty:", inv_map[pred_id])
Evaluation results
- accuracy on idea_difficulty_dataset_2000 (synthetic, balanced)self-reported0.973
- macro_f1 on idea_difficulty_dataset_2000 (synthetic, balanced)self-reported0.973
- log_loss on idea_difficulty_dataset_2000 (synthetic, balanced)self-reported0.058