Model Card for Advanced Heart Risk AI - Cardiovascular Risk Prediction v4
This is a state-of-the-art stacked ensemble model designed to predict cardiovascular disease risk, combining 6 optimized algorithms with advanced feature engineering and probability calibration. The model achieves high sensitivity while maintaining clinical transparency and is specifically designed for medical screening scenarios.
Model Details
Model Description
This advanced model uses a sophisticated stacked ensemble approach, combining the best performing algorithms from a comprehensive evaluation of 11 different machine learning models. The final ensemble includes RandomForest, XGBoost, LightGBM, CatBoost, GradientBoosting, and LogisticRegression with a meta-learner for optimal prediction. The model incorporates medical knowledge through Framingham risk scores and advanced feature engineering, with probability calibration for reliable risk estimates.
- Developed by: Juan Manuel Infante Quiroga
- Model type: Stacked Ensemble Classification (6 algorithms + meta-learner)
- Language(s): Python (scikit-learn, XGBoost, LightGBM, CatBoost)
- License: MIT
- Version: v4 (Advanced Stacked Ensemble with Probability Calibration)
- Training Date: July 2025
Model Sources
- Repository: Juan12Dev/advanced-heart-risk-ai-v4
- Dataset: Kaggle Cardiovascular Disease Dataset (70,000 samples)
- Framework: scikit-learn, XGBoost, LightGBM, CatBoost, Optuna
Model Architecture
- Base Models Evaluated: 11 algorithms (RandomForest, ExtraTrees, XGBoost, LightGBM, CatBoost, GradientBoosting, LogisticRegression variants, SVM, KNN, NaiveBayes)
- Final Ensemble: Top 6 performing models in stacked configuration
- Meta-learner: Calibrated LogisticRegression
- Feature Engineering: 30+ scientifically-derived features including Framingham scores
- Probability Calibration: Sigmoid calibration for improved reliability
Uses
Direct Use
This model is intended for educational, research, and preliminary screening purposes. It can be used as an advanced screening tool for:
- Initial cardiovascular risk assessment with high sensitivity
- Patient education and awareness about cardiovascular risk factors
- Research studies on cardiovascular risk prediction methodologies
- Educational platforms for medical training and AI in healthcare
- Clinical decision support as a supplementary screening tool
⚠️ CRITICAL: This tool has 69% accuracy and 53% specificity. It may overestimate risk in healthy individuals and should NEVER replace comprehensive clinical evaluation or professional medical judgment.
Downstream Use
The model can be integrated into:
- Healthcare screening applications with appropriate medical oversight
- Clinical decision support systems (as a preliminary screening component)
- Research studies on ensemble methods in medical AI
- Educational platforms demonstrating advanced ML techniques in healthcare
- Quality improvement initiatives for cardiovascular screening programs
Out-of-Scope Use
- Definitive diagnosis: Never use for final diagnostic decisions
- Treatment planning: Not suitable for determining treatment protocols
- Standalone medical advice: Requires professional medical interpretation
- Real-time emergency decisions: Not validated for acute care settings
- High-stakes individual decisions: 47% false positive rate requires careful implementation
Bias, Risks, and Limitations
Known Limitations
- Moderate Specificity (53.4%): Nearly half of healthy patients may be incorrectly flagged as at-risk, potentially causing unnecessary anxiety and healthcare utilization
- Complex Architecture: Stacked ensemble with 30+ features requires exact replication for proper functioning
- Overfitting Evidence: Large gap between cross-validation performance (99.7% AUC) and test performance (79.4% AUC) indicates some overfitting
- Dataset Generalizability: Model trained on specific dataset characteristics may not generalize to all populations
- Probability Calibration Dependency: Reliable risk estimates depend on proper calibration curve application
Technical Risks
- High False Positive Rate (46.6%): May generate unnecessary medical referrals and patient anxiety
- Model Complexity: Ensemble of 6 models increases computational requirements and interpretation difficulty
- Feature Engineering Critical Path: Requires exact replication of 30+ feature transformations
- Threshold Sensitivity: Optimal threshold (0.0178) is very low, making model highly sensitive to small probability changes
Ethical Considerations
- Healthcare Equity: Model performance may vary across different demographic groups
- Anxiety Generation: High false positive rate may cause undue stress in healthy individuals
- Resource Allocation: Overestimation could lead to inefficient healthcare resource utilization
- Clinical Workflow Integration: Requires careful integration with existing clinical protocols
Recommendations
- Use only as preliminary screening tool with mandatory professional follow-up
- Implement patient counseling protocols for positive predictions to manage anxiety
- Validate performance on target populations before deployment
- Monitor false positive rates and implement quality assurance measures
- Ensure transparent communication about model limitations to healthcare providers
- Establish clear referral pathways for positive screen results
How to Get Started with the Model
import joblib
import pandas as pd
import numpy as np
from huggingface_hub import hf_hub_download
# Load the advanced model from Hugging Face
model_path = hf_hub_download(repo_id="Juan12Dev/advanced-heart-risk-ai-v4",
filename="advanced_heart_risk_model_v4.pkl")
model_data = joblib.load(model_path)
calibrated_ensemble = model_data['calibrated_ensemble']
scaler = model_data['scaler']
feature_names = model_data['feature_names']
optimal_threshold = model_data['optimal_threshold']
# Example patient data (must match the Kaggle dataset structure)
patient_data = {
'age': 52, # Age in years
'gender': 2, # 1: female, 2: male
'height': 175, # Height in cm
'weight': 85.0, # Weight in kg
'ap_hi': 145, # Systolic blood pressure
'ap_lo': 95, # Diastolic blood pressure
'cholesterol': 2, # 1: normal, 2: above normal, 3: well above normal
'gluc': 2, # 1: normal, 2: above normal, 3: well above normal
'smoke': 0, # 0: no, 1: yes
'alco': 1, # 0: no, 1: yes
'active': 0 # 0: no, 1: yes (physical activity)
}
# Advanced Feature Engineering (must be identical to training)
def create_advanced_feature_engineering(patient_data):
df = pd.DataFrame([patient_data.copy()])
age_years = float(patient_data['age'])
# Age features - convert to days for model compatibility
df['age'] = age_years * 365.25
df['age_normalized'] = (df['age'] - (25 * 365.25)) / ((70*365.25) - (25*365.25))
df['age_risk_exponential'] = np.where(age_years > 45,
np.exp(np.clip((age_years - 45) / 10, 0, 5)), 1.0)
df['age_squared'] = age_years ** 2
df['age_log'] = np.log1p(age_years)
# Blood pressure features
df['pulse_pressure'] = df['ap_hi'] - df['ap_lo']
df['mean_arterial_pressure'] = df['ap_lo'] + (df['pulse_pressure'] / 3)
# Metabolic features
df['metabolic_profile'] = df['cholesterol'] / max(age_years, 1)
df['metabolic_syndrome_risk'] = ((df['cholesterol'] > 1).astype(int) +
(df['gluc'] > 1).astype(int) +
(df['ap_hi'] > 140).astype(int))
# Gender interaction features
df['male_age_interaction'] = (df['gender'] == 2).astype(int) * age_years
df['female_chol_interaction'] = (df['gender'] == 1).astype(int) * df['cholesterol']
df['gender_specific_risk'] = np.where(df['gender'] == 1,
df['cholesterol'] * 0.008,
age_years * 0.1 + df['cholesterol'] * 0.005)
# Medical risk scores
df['framingham_score'] = (age_years * 0.04 +
(df['ap_hi'] - 120) * 0.02 +
df['cholesterol'] * 15)
df['traditional_risk_score'] = (age_years * 0.04 + df['gender'] * 10 +
(df['cholesterol'] - 1) * 20 + df['ap_hi'] * 0.1 +
df['gluc'] * 20)
df['cardiac_risk_score'] = (df['pulse_pressure'] * 0.2 + df['ap_hi'] * 0.1)
df['combined_risk_score'] = (df['traditional_risk_score'] * 0.4 +
df['cardiac_risk_score'] * 0.6)
# Statistical aggregations for key features
key_features = ['age', 'ap_hi', 'ap_lo', 'cholesterol', 'gluc']
available_features = [f for f in key_features if f in df.columns]
if len(available_features) >= 3:
feature_data = df[available_features]
df['feature_mean'] = feature_data.mean(axis=1)
df['feature_std'] = feature_data.std(axis=1)
df['feature_median'] = feature_data.median(axis=1)
df['feature_max'] = feature_data.max(axis=1)
df['feature_min'] = feature_data.min(axis=1)
df['feature_range'] = df['feature_max'] - df['feature_min']
# Age-based categorical encoding
if age_years < 45:
df['age_group_encoded'] = 0
elif age_years < 55:
df['age_group_encoded'] = 1
elif age_years < 65:
df['age_group_encoded'] = 2
else:
df['age_group_encoded'] = 3
# Cholesterol category encoding
chol_val = df['cholesterol'].iloc[0]
if chol_val <= 1.5:
df['chol_category_encoded'] = 0
elif chol_val <= 2.5:
df['chol_category_encoded'] = 1
elif chol_val <= 3.5:
df['chol_category_encoded'] = 2
else:
df['chol_category_encoded'] = 3
# Blood pressure category encoding
bp_val = df['ap_hi'].iloc[0]
if bp_val < 120:
df['bp_category_encoded'] = 0
elif bp_val < 140:
df['bp_category_encoded'] = 1
elif bp_val < 160:
df['bp_category_encoded'] = 2
elif bp_val < 180:
df['bp_category_encoded'] = 3
else:
df['bp_category_encoded'] = 4
return df
# Apply advanced feature engineering
engineered_df = create_advanced_feature_engineering(patient_data)
# Prepare features for prediction (select only numeric features)
X = engineered_df.reindex(columns=feature_names, fill_value=0)
X = X.select_dtypes(include=[np.number]).fillna(0)
X_scaled = scaler.transform(X)
# Make prediction with calibrated ensemble
probability = calibrated_ensemble.predict_proba(X_scaled)[0, 1]
prediction = (probability >= optimal_threshold).astype(int)
# Determine risk category
if probability < 0.20:
risk_category = "Low"
elif probability < 0.45:
risk_category = "Moderate"
elif probability < 0.70:
risk_category = "High"
else:
risk_category = "Critical"
print(f"Raw Probability: {probability:.4f}")
print(f"Risk Category: {risk_category}")
print(f"Binary Prediction: {'High Risk' if prediction else 'Low Risk'}")
print(f"Model Confidence: The model detects 85% of actual cases but has 47% false positive rate")
Training Details
Training Data
- Source: Kaggle Cardiovascular Disease Dataset
- Size: 70,000 patient records
- Training Set: 56,000 samples (after 80/20 split)
- Test Set: 14,000 samples (held out for final evaluation)
- Features: 11 base features → 30+ engineered features after intelligent feature selection
- Target: Binary classification (presence/absence of cardiovascular disease)
- Class Balance: Handled through intelligent sampling strategy evaluation (SMOTE, ADASYN, BorderlineSMOTE, SMOTEENN)
Advanced Training Procedure
Data Preprocessing and Feature Engineering
- Comprehensive data quality analysis with correlation detection and variance checks
- Advanced feature engineering including:
- Framingham cardiovascular risk scores
- Polynomial interaction features
- Statistical aggregations (mean, std, median, max, min, range)
- Medical domain features (pulse pressure, mean arterial pressure)
- Age-based risk calculations
- Intelligent feature selection using 5 methods:
- Univariate statistical tests (f_classif)
- Recursive Feature Elimination (RFE)
- Tree-based importance
- L1 regularization (Lasso)
- Mutual information
- Ensemble ranking of feature selection methods
- Robust scaling with outlier resistance
- Intelligent sampling strategy - automated evaluation and selection of best technique
Model Development
- Base Model Evaluation: 11 different algorithms evaluated
- Tree-based: RandomForest, ExtraTrees, XGBoost, LightGBM, CatBoost, GradientBoosting
- Linear: LogisticRegression (standard and L1), SVM
- Other: KNN, NaiveBayes
- Bayesian Hyperparameter Optimization using Optuna (50+ trials)
- Stacked Ensemble Creation with top 6 performing models
- Probability Calibration using CalibratedClassifierCV with method selection
- Advanced Threshold Optimization using multiple criteria:
- Target sensitivity optimization
- Youden's J statistic
- F1-score maximization
- Balanced accuracy
Training Hyperparameters
- Optimization Framework: Optuna with Tree-structured Parzen Estimator
- Cross-validation: 5-fold stratified cross-validation
- Primary Metric: ROC-AUC with sensitivity constraints
- Ensemble Method: Stacked generalization with probability outputs
- Calibration Method: Sigmoid calibration (selected via Brier score)
- Threshold Optimization: Target sensitivity ≥ 85%
Computational Requirements
- Training time: 4-6 hours on standard hardware (70K samples, 11 algorithms, hyperparameter optimization)
- Model size: ~45 MB (ensemble of 6 models + calibration)
- Memory requirements: 4-8 GB RAM for training, 1-2 GB for inference
- Inference time: < 200ms per prediction
Evaluation
Testing Data, Factors & Metrics
Testing Data
- Size: 14,000 samples (20% stratified holdout)
- Evaluation method: Single holdout with comprehensive metric evaluation
- Validation approach: 5-fold cross-validation during training, final test on unseen data
Evaluation Factors
- Age distribution: Performance across different age groups
- Gender balance: Male vs female prediction accuracy
- Risk factor combinations: Various combinations of cardiovascular risk factors
- Probability calibration: Reliability of probability estimates across different risk levels
Comprehensive Metrics
Performance evaluated using medical-standard metrics optimized for screening applications:
- ROC-AUC: Area under the receiver operating characteristic curve
- Sensitivity (Recall): True positive rate - primary optimization target for medical screening
- Specificity: True negative rate - important for minimizing unnecessary referrals
- Precision: Positive predictive value - reliability of positive predictions
- F1-Score: Harmonic mean of precision and recall
- Brier Score: Calibration quality of probability estimates
- Matthews Correlation Coefficient: Balanced measure accounting for all confusion matrix elements
Results
Metric | Value | Clinical Interpretation |
---|---|---|
ROC-AUC | 0.7940 | Good discriminative ability - can distinguish between cases and non-cases |
Accuracy | 0.6928 | Overall correct predictions - nearly 7 out of 10 cases classified correctly |
Sensitivity (Recall) | 0.8520 ⭐ | Excellent - catches 85% of actual cardiovascular disease cases |
Specificity | 0.5338 | Moderate - correctly identifies 53% of healthy individuals |
Precision | 0.6461 | When model predicts "high risk", it's correct 65% of the time |
F1-Score | 0.7349 | Good balance between precision and recall |
Brier Score | 0.2295 | Moderate probability calibration (lower is better) |
Matthews Corr. Coef. | 0.4070 | Moderate overall correlation between predictions and reality |
False Positive Rate | 0.4662 | 47% of healthy patients flagged as high-risk |
Optimal Threshold | 0.0178 | Very low threshold optimized for high sensitivity |
Performance Summary
The advanced model successfully achieves its primary objective of high sensitivity (85.2%) for cardiovascular disease detection, making it highly effective for screening applications where missing a positive case has serious consequences. The model demonstrates strong statistical reliability due to training on 70,000 samples with sophisticated ensemble techniques.
Key Clinical Implications:
- Excellent Screening Tool: Detects 85% of actual cardiovascular disease cases
- High False Positive Rate: 47% of healthy individuals may be unnecessarily flagged
- Moderate Overall Accuracy: 69% correct classification rate
- Good Discriminative Ability: AUC of 79% indicates useful clinical discrimination
Comparison to Previous Version:
- Improved Scientific Foundation: Framingham scores, advanced feature engineering
- Better Calibration: Probability estimates more reliable through systematic calibration
- Enhanced Transparency: Known performance characteristics with comprehensive evaluation
- Maintained High Sensitivity: Consistent with medical screening requirements
Model Examination
Interpretability Features
The stacked ensemble provides multiple levels of interpretability:
- Feature Importance Rankings: From multiple feature selection methods
- Individual Model Contributions: Transparency into each base model's prediction
- Calibration Curve Analysis: Understanding of probability reliability
- Medical Feature Validation: Framingham scores and medical domain features prominent
- Comprehensive Training Reports: Detailed analysis of model development process
Top Feature Importance
Based on ensemble feature selection across 5 methods:
- feature_std (0.7682) - Variability across patient parameters
- ap_hi (0.6717) - Systolic blood pressure ⭐
- framingham_score (0.6681) - Established cardiovascular risk score ⭐
- feature_mean (0.6414) - Average of key health parameters
- poly_age_ap_hi (0.6171) - Age-blood pressure interaction
- traditional_risk_score (0.6139) - Composite cardiovascular risk
- poly_age_cholesterol (0.5854) - Age-cholesterol interaction
- age_log (0.5378) - Logarithmic age transformation
- weight (0.5255) - Patient weight ⭐
- age (0.5097) - Patient age ⭐
Model Validation
- Cross-validation Performance: Comprehensive 5-fold validation during training
- Overfitting Detection: Gap between CV (99.7%) and test (79.4%) AUC identified and addressed
- Calibration Validation: Brier score optimization for probability reliability
- Medical Relevance: Top features align with established cardiovascular risk factors
Environmental Impact
Carbon Footprint
- Hardware Type: Standard CPU computing (no GPU required)
- Training Duration: 4-6 hours on standard hardware
- Model Complexity: Ensemble of 6 models with moderate computational requirements
- Inference Efficiency: Real-time prediction capability
- Carbon Footprint: Moderate due to comprehensive hyperparameter optimization and ensemble training
Computational Efficiency
- Training Energy: Higher than simple models due to ensemble approach and optimization
- Inference Energy: Low - single prediction < 200ms
- Model Storage: 45 MB - reasonable for deployment
- Scalability: Suitable for healthcare applications with moderate computational resources
Technical Specifications
Model Architecture and Objective
- Architecture: Stacked ensemble of 6 optimized algorithms with meta-learner
- Base Models: RandomForest, XGBoost, LightGBM, CatBoost, GradientBoosting, LogisticRegression
- Meta-learner: Calibrated LogisticRegression with sigmoid calibration
- Objective: Binary classification optimized for maximum sensitivity with probability calibration
- Input Dimension: 30+ features after intelligent feature selection
- Output: Calibrated risk probability + binary classification with optimized threshold
Compute Infrastructure
Hardware Requirements
- Training: Standard CPU, 4-8 GB RAM, 4-6 hours
- Inference: Standard CPU, 1-2 GB RAM, < 200ms per prediction
- Storage: 45 MB model file
- No specialized hardware required
Software Dependencies
- Python: 3.8+
- Core ML: scikit-learn, XGBoost, LightGBM, CatBoost
- Optimization: Optuna for hyperparameter tuning
- Data Processing: pandas, numpy
- Evaluation: matplotlib, seaborn for visualization
- Deployment: joblib, huggingface_hub
Quality Assurance
- Comprehensive Testing: Multiple holdout evaluations
- Calibration Validation: Brier score optimization
- Feature Validation: Medical domain expertise incorporated
- Performance Monitoring: Detailed metrics tracking and reporting
- Reproducibility: Fixed random seeds and documented procedures
Citation
BibTeX:
@misc{advanced_heart_risk_ai_v4_2025,
title={Advanced Heart Risk AI: Stacked Ensemble for Cardiovascular Risk Prediction v4},
author={Juan Manuel Infante Quiroga},
year={2025},
note={Stacked ensemble with probability calibration optimized for medical screening},
howpublished={Hugging Face Hub},
url={https://huggingface.co/Juan12Dev/advanced-heart-risk-ai-v4}
}
APA:
Infante Quiroga, J. M. (2025). Advanced Heart Risk AI: Stacked Ensemble for Cardiovascular Risk Prediction v4. Hugging Face Hub. https://huggingface.co/Juan12Dev/advanced-heart-risk-ai-v4
Glossary
- Stacked Ensemble: Machine learning technique combining multiple models with a meta-learner
- Sensitivity (Recall): Proportion of actual positive cases correctly identified (true positive rate)
- Specificity: Proportion of actual negative cases correctly identified (true negative rate)
- ROC-AUC: Receiver Operating Characteristic - Area Under Curve, measures discriminative ability
- Brier Score: Measure of probability calibration quality (lower is better)
- Matthews Correlation Coefficient: Balanced metric considering all elements of confusion matrix
- Probability Calibration: Process of adjusting model probabilities to reflect true likelihoods
- Framingham Score: Established cardiovascular risk assessment tool
- Feature Engineering: Creating new predictive features from existing data
- Bayesian Optimization: Efficient hyperparameter optimization using probabilistic models
- False Positive Rate: Proportion of healthy individuals incorrectly classified as high-risk
Model Card Authors
Juan Manuel Infante Quiroga
Model Card Contact
For questions about this model, implementation guidance, or collaboration opportunities:
Juan Manuel Infante Quiroga
- Hugging Face Hub: Juan12Dev
- Model Repository: advanced-heart-risk-ai-v4
Acknowledgments
This work builds upon the Kaggle Cardiovascular Disease Dataset and incorporates established medical knowledge including Framingham risk assessment methodologies. The advanced ensemble techniques and probability calibration methods are based on current best practices in medical machine learning applications.
- Downloads last month
- -