Model Card for Advanced Heart Risk AI - Cardiovascular Risk Prediction v4

This is a state-of-the-art stacked ensemble model designed to predict cardiovascular disease risk, combining 6 optimized algorithms with advanced feature engineering and probability calibration. The model achieves high sensitivity while maintaining clinical transparency and is specifically designed for medical screening scenarios.

Model Details

Model Description

This advanced model uses a sophisticated stacked ensemble approach, combining the best performing algorithms from a comprehensive evaluation of 11 different machine learning models. The final ensemble includes RandomForest, XGBoost, LightGBM, CatBoost, GradientBoosting, and LogisticRegression with a meta-learner for optimal prediction. The model incorporates medical knowledge through Framingham risk scores and advanced feature engineering, with probability calibration for reliable risk estimates.

  • Developed by: Juan Manuel Infante Quiroga
  • Model type: Stacked Ensemble Classification (6 algorithms + meta-learner)
  • Language(s): Python (scikit-learn, XGBoost, LightGBM, CatBoost)
  • License: MIT
  • Version: v4 (Advanced Stacked Ensemble with Probability Calibration)
  • Training Date: July 2025

Model Sources

  • Repository: Juan12Dev/advanced-heart-risk-ai-v4
  • Dataset: Kaggle Cardiovascular Disease Dataset (70,000 samples)
  • Framework: scikit-learn, XGBoost, LightGBM, CatBoost, Optuna

Model Architecture

  • Base Models Evaluated: 11 algorithms (RandomForest, ExtraTrees, XGBoost, LightGBM, CatBoost, GradientBoosting, LogisticRegression variants, SVM, KNN, NaiveBayes)
  • Final Ensemble: Top 6 performing models in stacked configuration
  • Meta-learner: Calibrated LogisticRegression
  • Feature Engineering: 30+ scientifically-derived features including Framingham scores
  • Probability Calibration: Sigmoid calibration for improved reliability

Uses

Direct Use

This model is intended for educational, research, and preliminary screening purposes. It can be used as an advanced screening tool for:

  • Initial cardiovascular risk assessment with high sensitivity
  • Patient education and awareness about cardiovascular risk factors
  • Research studies on cardiovascular risk prediction methodologies
  • Educational platforms for medical training and AI in healthcare
  • Clinical decision support as a supplementary screening tool

⚠️ CRITICAL: This tool has 69% accuracy and 53% specificity. It may overestimate risk in healthy individuals and should NEVER replace comprehensive clinical evaluation or professional medical judgment.

Downstream Use

The model can be integrated into:

  • Healthcare screening applications with appropriate medical oversight
  • Clinical decision support systems (as a preliminary screening component)
  • Research studies on ensemble methods in medical AI
  • Educational platforms demonstrating advanced ML techniques in healthcare
  • Quality improvement initiatives for cardiovascular screening programs

Out-of-Scope Use

  • Definitive diagnosis: Never use for final diagnostic decisions
  • Treatment planning: Not suitable for determining treatment protocols
  • Standalone medical advice: Requires professional medical interpretation
  • Real-time emergency decisions: Not validated for acute care settings
  • High-stakes individual decisions: 47% false positive rate requires careful implementation

Bias, Risks, and Limitations

Known Limitations

  1. Moderate Specificity (53.4%): Nearly half of healthy patients may be incorrectly flagged as at-risk, potentially causing unnecessary anxiety and healthcare utilization
  2. Complex Architecture: Stacked ensemble with 30+ features requires exact replication for proper functioning
  3. Overfitting Evidence: Large gap between cross-validation performance (99.7% AUC) and test performance (79.4% AUC) indicates some overfitting
  4. Dataset Generalizability: Model trained on specific dataset characteristics may not generalize to all populations
  5. Probability Calibration Dependency: Reliable risk estimates depend on proper calibration curve application

Technical Risks

  • High False Positive Rate (46.6%): May generate unnecessary medical referrals and patient anxiety
  • Model Complexity: Ensemble of 6 models increases computational requirements and interpretation difficulty
  • Feature Engineering Critical Path: Requires exact replication of 30+ feature transformations
  • Threshold Sensitivity: Optimal threshold (0.0178) is very low, making model highly sensitive to small probability changes

Ethical Considerations

  • Healthcare Equity: Model performance may vary across different demographic groups
  • Anxiety Generation: High false positive rate may cause undue stress in healthy individuals
  • Resource Allocation: Overestimation could lead to inefficient healthcare resource utilization
  • Clinical Workflow Integration: Requires careful integration with existing clinical protocols

Recommendations

  • Use only as preliminary screening tool with mandatory professional follow-up
  • Implement patient counseling protocols for positive predictions to manage anxiety
  • Validate performance on target populations before deployment
  • Monitor false positive rates and implement quality assurance measures
  • Ensure transparent communication about model limitations to healthcare providers
  • Establish clear referral pathways for positive screen results

How to Get Started with the Model

import joblib
import pandas as pd
import numpy as np
from huggingface_hub import hf_hub_download

# Load the advanced model from Hugging Face
model_path = hf_hub_download(repo_id="Juan12Dev/advanced-heart-risk-ai-v4", 
                            filename="advanced_heart_risk_model_v4.pkl")
model_data = joblib.load(model_path)

calibrated_ensemble = model_data['calibrated_ensemble']
scaler = model_data['scaler']
feature_names = model_data['feature_names']
optimal_threshold = model_data['optimal_threshold']

# Example patient data (must match the Kaggle dataset structure)
patient_data = {
    'age': 52,          # Age in years
    'gender': 2,        # 1: female, 2: male
    'height': 175,      # Height in cm
    'weight': 85.0,     # Weight in kg
    'ap_hi': 145,       # Systolic blood pressure
    'ap_lo': 95,        # Diastolic blood pressure
    'cholesterol': 2,   # 1: normal, 2: above normal, 3: well above normal
    'gluc': 2,          # 1: normal, 2: above normal, 3: well above normal
    'smoke': 0,         # 0: no, 1: yes
    'alco': 1,          # 0: no, 1: yes
    'active': 0         # 0: no, 1: yes (physical activity)
}

# Advanced Feature Engineering (must be identical to training)
def create_advanced_feature_engineering(patient_data):
    df = pd.DataFrame([patient_data.copy()])
    age_years = float(patient_data['age'])
    
    # Age features - convert to days for model compatibility
    df['age'] = age_years * 365.25
    df['age_normalized'] = (df['age'] - (25 * 365.25)) / ((70*365.25) - (25*365.25))
    df['age_risk_exponential'] = np.where(age_years > 45, 
                                         np.exp(np.clip((age_years - 45) / 10, 0, 5)), 1.0)
    df['age_squared'] = age_years ** 2
    df['age_log'] = np.log1p(age_years)
    
    # Blood pressure features
    df['pulse_pressure'] = df['ap_hi'] - df['ap_lo']
    df['mean_arterial_pressure'] = df['ap_lo'] + (df['pulse_pressure'] / 3)
    
    # Metabolic features
    df['metabolic_profile'] = df['cholesterol'] / max(age_years, 1)
    df['metabolic_syndrome_risk'] = ((df['cholesterol'] > 1).astype(int) + 
                                    (df['gluc'] > 1).astype(int) + 
                                    (df['ap_hi'] > 140).astype(int))
    
    # Gender interaction features
    df['male_age_interaction'] = (df['gender'] == 2).astype(int) * age_years
    df['female_chol_interaction'] = (df['gender'] == 1).astype(int) * df['cholesterol']
    df['gender_specific_risk'] = np.where(df['gender'] == 1, 
                                         df['cholesterol'] * 0.008, 
                                         age_years * 0.1 + df['cholesterol'] * 0.005)
    
    # Medical risk scores
    df['framingham_score'] = (age_years * 0.04 + 
                              (df['ap_hi'] - 120) * 0.02 + 
                              df['cholesterol'] * 15)
    df['traditional_risk_score'] = (age_years * 0.04 + df['gender'] * 10 + 
                                   (df['cholesterol'] - 1) * 20 + df['ap_hi'] * 0.1 + 
                                   df['gluc'] * 20)
    df['cardiac_risk_score'] = (df['pulse_pressure'] * 0.2 + df['ap_hi'] * 0.1)
    df['combined_risk_score'] = (df['traditional_risk_score'] * 0.4 + 
                                df['cardiac_risk_score'] * 0.6)
    
    # Statistical aggregations for key features
    key_features = ['age', 'ap_hi', 'ap_lo', 'cholesterol', 'gluc']
    available_features = [f for f in key_features if f in df.columns]
    if len(available_features) >= 3:
        feature_data = df[available_features]
        df['feature_mean'] = feature_data.mean(axis=1)
        df['feature_std'] = feature_data.std(axis=1)
        df['feature_median'] = feature_data.median(axis=1)
        df['feature_max'] = feature_data.max(axis=1)
        df['feature_min'] = feature_data.min(axis=1)
        df['feature_range'] = df['feature_max'] - df['feature_min']
    
    # Age-based categorical encoding
    if age_years < 45:
        df['age_group_encoded'] = 0
    elif age_years < 55:
        df['age_group_encoded'] = 1
    elif age_years < 65:
        df['age_group_encoded'] = 2
    else:
        df['age_group_encoded'] = 3
    
    # Cholesterol category encoding
    chol_val = df['cholesterol'].iloc[0]
    if chol_val <= 1.5:
        df['chol_category_encoded'] = 0
    elif chol_val <= 2.5:
        df['chol_category_encoded'] = 1
    elif chol_val <= 3.5:
        df['chol_category_encoded'] = 2
    else:
        df['chol_category_encoded'] = 3
    
    # Blood pressure category encoding
    bp_val = df['ap_hi'].iloc[0]
    if bp_val < 120:
        df['bp_category_encoded'] = 0
    elif bp_val < 140:
        df['bp_category_encoded'] = 1
    elif bp_val < 160:
        df['bp_category_encoded'] = 2
    elif bp_val < 180:
        df['bp_category_encoded'] = 3
    else:
        df['bp_category_encoded'] = 4
    
    return df

# Apply advanced feature engineering
engineered_df = create_advanced_feature_engineering(patient_data)

# Prepare features for prediction (select only numeric features)
X = engineered_df.reindex(columns=feature_names, fill_value=0)
X = X.select_dtypes(include=[np.number]).fillna(0)
X_scaled = scaler.transform(X)

# Make prediction with calibrated ensemble
probability = calibrated_ensemble.predict_proba(X_scaled)[0, 1]
prediction = (probability >= optimal_threshold).astype(int)

# Determine risk category
if probability < 0.20:
    risk_category = "Low"
elif probability < 0.45:
    risk_category = "Moderate"
elif probability < 0.70:
    risk_category = "High"
else:
    risk_category = "Critical"

print(f"Raw Probability: {probability:.4f}")
print(f"Risk Category: {risk_category}")
print(f"Binary Prediction: {'High Risk' if prediction else 'Low Risk'}")
print(f"Model Confidence: The model detects 85% of actual cases but has 47% false positive rate")

Training Details

Training Data

  • Source: Kaggle Cardiovascular Disease Dataset
  • Size: 70,000 patient records
  • Training Set: 56,000 samples (after 80/20 split)
  • Test Set: 14,000 samples (held out for final evaluation)
  • Features: 11 base features → 30+ engineered features after intelligent feature selection
  • Target: Binary classification (presence/absence of cardiovascular disease)
  • Class Balance: Handled through intelligent sampling strategy evaluation (SMOTE, ADASYN, BorderlineSMOTE, SMOTEENN)

Advanced Training Procedure

Data Preprocessing and Feature Engineering

  1. Comprehensive data quality analysis with correlation detection and variance checks
  2. Advanced feature engineering including:
    • Framingham cardiovascular risk scores
    • Polynomial interaction features
    • Statistical aggregations (mean, std, median, max, min, range)
    • Medical domain features (pulse pressure, mean arterial pressure)
    • Age-based risk calculations
  3. Intelligent feature selection using 5 methods:
    • Univariate statistical tests (f_classif)
    • Recursive Feature Elimination (RFE)
    • Tree-based importance
    • L1 regularization (Lasso)
    • Mutual information
  4. Ensemble ranking of feature selection methods
  5. Robust scaling with outlier resistance
  6. Intelligent sampling strategy - automated evaluation and selection of best technique

Model Development

  1. Base Model Evaluation: 11 different algorithms evaluated
    • Tree-based: RandomForest, ExtraTrees, XGBoost, LightGBM, CatBoost, GradientBoosting
    • Linear: LogisticRegression (standard and L1), SVM
    • Other: KNN, NaiveBayes
  2. Bayesian Hyperparameter Optimization using Optuna (50+ trials)
  3. Stacked Ensemble Creation with top 6 performing models
  4. Probability Calibration using CalibratedClassifierCV with method selection
  5. Advanced Threshold Optimization using multiple criteria:
    • Target sensitivity optimization
    • Youden's J statistic
    • F1-score maximization
    • Balanced accuracy

Training Hyperparameters

  • Optimization Framework: Optuna with Tree-structured Parzen Estimator
  • Cross-validation: 5-fold stratified cross-validation
  • Primary Metric: ROC-AUC with sensitivity constraints
  • Ensemble Method: Stacked generalization with probability outputs
  • Calibration Method: Sigmoid calibration (selected via Brier score)
  • Threshold Optimization: Target sensitivity ≥ 85%

Computational Requirements

  • Training time: 4-6 hours on standard hardware (70K samples, 11 algorithms, hyperparameter optimization)
  • Model size: ~45 MB (ensemble of 6 models + calibration)
  • Memory requirements: 4-8 GB RAM for training, 1-2 GB for inference
  • Inference time: < 200ms per prediction

Evaluation

Testing Data, Factors & Metrics

Testing Data

  • Size: 14,000 samples (20% stratified holdout)
  • Evaluation method: Single holdout with comprehensive metric evaluation
  • Validation approach: 5-fold cross-validation during training, final test on unseen data

Evaluation Factors

  • Age distribution: Performance across different age groups
  • Gender balance: Male vs female prediction accuracy
  • Risk factor combinations: Various combinations of cardiovascular risk factors
  • Probability calibration: Reliability of probability estimates across different risk levels

Comprehensive Metrics

Performance evaluated using medical-standard metrics optimized for screening applications:

  • ROC-AUC: Area under the receiver operating characteristic curve
  • Sensitivity (Recall): True positive rate - primary optimization target for medical screening
  • Specificity: True negative rate - important for minimizing unnecessary referrals
  • Precision: Positive predictive value - reliability of positive predictions
  • F1-Score: Harmonic mean of precision and recall
  • Brier Score: Calibration quality of probability estimates
  • Matthews Correlation Coefficient: Balanced measure accounting for all confusion matrix elements

Results

Metric Value Clinical Interpretation
ROC-AUC 0.7940 Good discriminative ability - can distinguish between cases and non-cases
Accuracy 0.6928 Overall correct predictions - nearly 7 out of 10 cases classified correctly
Sensitivity (Recall) 0.8520 ⭐ Excellent - catches 85% of actual cardiovascular disease cases
Specificity 0.5338 Moderate - correctly identifies 53% of healthy individuals
Precision 0.6461 When model predicts "high risk", it's correct 65% of the time
F1-Score 0.7349 Good balance between precision and recall
Brier Score 0.2295 Moderate probability calibration (lower is better)
Matthews Corr. Coef. 0.4070 Moderate overall correlation between predictions and reality
False Positive Rate 0.4662 47% of healthy patients flagged as high-risk
Optimal Threshold 0.0178 Very low threshold optimized for high sensitivity

Performance Summary

The advanced model successfully achieves its primary objective of high sensitivity (85.2%) for cardiovascular disease detection, making it highly effective for screening applications where missing a positive case has serious consequences. The model demonstrates strong statistical reliability due to training on 70,000 samples with sophisticated ensemble techniques.

Key Clinical Implications:

  • Excellent Screening Tool: Detects 85% of actual cardiovascular disease cases
  • High False Positive Rate: 47% of healthy individuals may be unnecessarily flagged
  • Moderate Overall Accuracy: 69% correct classification rate
  • Good Discriminative Ability: AUC of 79% indicates useful clinical discrimination

Comparison to Previous Version:

  • Improved Scientific Foundation: Framingham scores, advanced feature engineering
  • Better Calibration: Probability estimates more reliable through systematic calibration
  • Enhanced Transparency: Known performance characteristics with comprehensive evaluation
  • Maintained High Sensitivity: Consistent with medical screening requirements

Model Examination

Interpretability Features

The stacked ensemble provides multiple levels of interpretability:

  • Feature Importance Rankings: From multiple feature selection methods
  • Individual Model Contributions: Transparency into each base model's prediction
  • Calibration Curve Analysis: Understanding of probability reliability
  • Medical Feature Validation: Framingham scores and medical domain features prominent
  • Comprehensive Training Reports: Detailed analysis of model development process

Top Feature Importance

Based on ensemble feature selection across 5 methods:

  1. feature_std (0.7682) - Variability across patient parameters
  2. ap_hi (0.6717) - Systolic blood pressure ⭐
  3. framingham_score (0.6681) - Established cardiovascular risk score ⭐
  4. feature_mean (0.6414) - Average of key health parameters
  5. poly_age_ap_hi (0.6171) - Age-blood pressure interaction
  6. traditional_risk_score (0.6139) - Composite cardiovascular risk
  7. poly_age_cholesterol (0.5854) - Age-cholesterol interaction
  8. age_log (0.5378) - Logarithmic age transformation
  9. weight (0.5255) - Patient weight ⭐
  10. age (0.5097) - Patient age ⭐

Model Validation

  • Cross-validation Performance: Comprehensive 5-fold validation during training
  • Overfitting Detection: Gap between CV (99.7%) and test (79.4%) AUC identified and addressed
  • Calibration Validation: Brier score optimization for probability reliability
  • Medical Relevance: Top features align with established cardiovascular risk factors

Environmental Impact

Carbon Footprint

  • Hardware Type: Standard CPU computing (no GPU required)
  • Training Duration: 4-6 hours on standard hardware
  • Model Complexity: Ensemble of 6 models with moderate computational requirements
  • Inference Efficiency: Real-time prediction capability
  • Carbon Footprint: Moderate due to comprehensive hyperparameter optimization and ensemble training

Computational Efficiency

  • Training Energy: Higher than simple models due to ensemble approach and optimization
  • Inference Energy: Low - single prediction < 200ms
  • Model Storage: 45 MB - reasonable for deployment
  • Scalability: Suitable for healthcare applications with moderate computational resources

Technical Specifications

Model Architecture and Objective

  • Architecture: Stacked ensemble of 6 optimized algorithms with meta-learner
  • Base Models: RandomForest, XGBoost, LightGBM, CatBoost, GradientBoosting, LogisticRegression
  • Meta-learner: Calibrated LogisticRegression with sigmoid calibration
  • Objective: Binary classification optimized for maximum sensitivity with probability calibration
  • Input Dimension: 30+ features after intelligent feature selection
  • Output: Calibrated risk probability + binary classification with optimized threshold

Compute Infrastructure

Hardware Requirements

  • Training: Standard CPU, 4-8 GB RAM, 4-6 hours
  • Inference: Standard CPU, 1-2 GB RAM, < 200ms per prediction
  • Storage: 45 MB model file
  • No specialized hardware required

Software Dependencies

  • Python: 3.8+
  • Core ML: scikit-learn, XGBoost, LightGBM, CatBoost
  • Optimization: Optuna for hyperparameter tuning
  • Data Processing: pandas, numpy
  • Evaluation: matplotlib, seaborn for visualization
  • Deployment: joblib, huggingface_hub

Quality Assurance

  • Comprehensive Testing: Multiple holdout evaluations
  • Calibration Validation: Brier score optimization
  • Feature Validation: Medical domain expertise incorporated
  • Performance Monitoring: Detailed metrics tracking and reporting
  • Reproducibility: Fixed random seeds and documented procedures

Citation

BibTeX:

@misc{advanced_heart_risk_ai_v4_2025,
  title={Advanced Heart Risk AI: Stacked Ensemble for Cardiovascular Risk Prediction v4},
  author={Juan Manuel Infante Quiroga},
  year={2025},
  note={Stacked ensemble with probability calibration optimized for medical screening},
  howpublished={Hugging Face Hub},
  url={https://huggingface.co/Juan12Dev/advanced-heart-risk-ai-v4}
}

APA:

Infante Quiroga, J. M. (2025). Advanced Heart Risk AI: Stacked Ensemble for Cardiovascular Risk Prediction v4. Hugging Face Hub. https://huggingface.co/Juan12Dev/advanced-heart-risk-ai-v4

Glossary

  • Stacked Ensemble: Machine learning technique combining multiple models with a meta-learner
  • Sensitivity (Recall): Proportion of actual positive cases correctly identified (true positive rate)
  • Specificity: Proportion of actual negative cases correctly identified (true negative rate)
  • ROC-AUC: Receiver Operating Characteristic - Area Under Curve, measures discriminative ability
  • Brier Score: Measure of probability calibration quality (lower is better)
  • Matthews Correlation Coefficient: Balanced metric considering all elements of confusion matrix
  • Probability Calibration: Process of adjusting model probabilities to reflect true likelihoods
  • Framingham Score: Established cardiovascular risk assessment tool
  • Feature Engineering: Creating new predictive features from existing data
  • Bayesian Optimization: Efficient hyperparameter optimization using probabilistic models
  • False Positive Rate: Proportion of healthy individuals incorrectly classified as high-risk

Model Card Authors

Juan Manuel Infante Quiroga

Model Card Contact

For questions about this model, implementation guidance, or collaboration opportunities:

Juan Manuel Infante Quiroga

  • Hugging Face Hub: Juan12Dev
  • Model Repository: advanced-heart-risk-ai-v4

Acknowledgments

This work builds upon the Kaggle Cardiovascular Disease Dataset and incorporates established medical knowledge including Framingham risk assessment methodologies. The advanced ensemble techniques and probability calibration methods are based on current best practices in medical machine learning applications.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Juan12Dev/heart-risk-ai-v4 1