Risk Tolerance Classifier (XGBoost, v3)
What it does:
Predicts an entrepreneurβs risk tolerance β Low, Medium, or High β from eight numeric features that capture financial, psychological, and behavioral factors.
Why itβs here:
Understanding risk tolerance is crucial for entrepreneurial readiness. This sub-model complements the Skill Level Classifier by quantifying how much uncertainty, failure, and financial variability an individual can handle.
π Input Features
All input features are numeric and should be scaled as defined below:
comfort_with_uncertainty (1β10)
- Meaning: How comfortable the individual feels making decisions without knowing the outcome.
- High value: Person is confident in uncertain situations β more risk-tolerant.
savings_to_expense_ratio (0.1β12.0)
- Meaning: Ratio of monthly savings to monthly expenses (financial buffer).
- High value: Stronger financial cushion β easier to tolerate risks.
runway_months (0β60)
- Meaning: How many months the person could cover costs if no new income came in.
- High value: Longer runway β more freedom to take risks.
debt_to_income_ratio (0.0β1.5)
- Meaning: Portion of income already committed to debt.
- High value: Higher debt burden β lower risk tolerance.
comfort_with_failure (1β10)
- Meaning: How resilient the person feels after setbacks or failures.
- High value: Bounces back quickly β higher risk tolerance.
entrepreneurial_experience_level (0β10)
- Meaning: Past experience starting or running ventures/projects.
- High value: More experience β typically higher tolerance.
investment_risk_history (1β10)
- Meaning: Willingness to take risks in past investments/decisions.
- High value: Prior bold decisions β greater tolerance now.
short_term_vs_long_term_preference (1β10)
- Meaning: Whether the person prefers immediate results (low) vs. long-term outcomes (high).
- High value: Long-term focus β can withstand short-term risks for future payoff.
π― Target
risk_tolerance(categorical):Lowβ 0Mediumβ 1Highβ 2
This is derived from the individualβs profile and reflects their comfort with uncertainty, failure, and financial tradeoffs.
π§ Training Setup
- Algorithm: XGBoost (gradient-boosted decision trees)
- Task: Tabular classification (3-class)
- Dataset size: 2,000 rows (synthetic, balanced across Low/Medium/High)
- Split: 80% train / 20% validation
- Early stopping: Enabled
π Results (Validation)
- Accuracy: 0.9225
- Macro F1: 0.9212
- Log Loss: 0.1839
- Best Trees: 165
Confusion Matrix:
| Pred High | Pred Low | Pred Medium | |
|---|---|---|---|
| True High | 123 | 0 | 7 |
| True Low | 0 | 135 | 5 |
| True Medium | 10 | 9 | 111 |
π Artifacts
xgb_model_Risk_Tolerance_v3.jsonβ trained modelfeature_order_Risk_Tolerance_v3.jsonβ feature order (list of 8 features)label_map_Risk_Tolerance_v3.jsonβ mapping ({"High":0,"Low":1,"Medium":2})
π Usage Example (Python)
import json, pandas as pd, numpy as np
from xgboost import XGBClassifier
from huggingface_hub import hf_hub_download
REPO_ID = "mjpsm/Risk-Tolerance-XGB"
# --- Download artifacts from Hugging Face Hub ---
model_file = hf_hub_download(REPO_ID, "xgb_model_Risk_Tolerance_v3.json")
feat_file = hf_hub_download(REPO_ID, "feature_order_Risk_Tolerance_v3.json")
map_file = hf_hub_download(REPO_ID, "label_map_Risk_Tolerance_v3.json")
# --- Load model + metadata ---
clf = XGBClassifier()
clf.load_model(model_file)
features = json.load(open(feat_file))
label_map = json.load(open(map_file))
inv_map = {v:k for k,v in label_map.items()}
# --- Example row ---
row = {
"comfort_with_uncertainty": 8,
"savings_to_expense_ratio": 3.2,
"runway_months": 14,
"debt_to_income_ratio": 0.35,
"comfort_with_failure": 7,
"entrepreneurial_experience_level": 6,
"investment_risk_history": 7,
"short_term_vs_long_term_preference": 8,
}
# --- Predict ---
X = pd.DataFrame([row])[features].astype("float32").values
proba = clf.predict_proba(X)[0]
pred_idx = int(np.argmax(proba))
print("Prediction:", inv_map[pred_idx], proba)
Evaluation results
- accuracy on risk_tolerance_dataset_v1 (synthetic, 2k rows)self-reported0.922
- macro_f1 on risk_tolerance_dataset_v1 (synthetic, 2k rows)self-reported0.921
- log_loss on risk_tolerance_dataset_v1 (synthetic, 2k rows)self-reported0.184