Spaces:
Sleeping
A newer version of the Streamlit SDK is available:
1.50.0
title: Fake News Detector
colorFrom: gray
colorTo: red
sdk: streamlit
sdk_version: 1.33.0
app_file: app.py
pinned: false
Detect fake news using a fine-tuned BERT model. Enter any headline or statement and get an instant prediction on whether it's likely real or fake. Built with Streamlit and Hugging Face Transformers.
Fake News Detection Project
A machine learning project that classifies news articles as real or fake using both traditional NLP techniques and advanced transformer models.
π― Project Overview
This project implements multiple approaches to detect fake news:
- Traditional ML: TF-IDF vectorization with Logistic Regression
- Deep Learning: Fine-tuned BERT model for sequence classification
π Performance Results
TF-IDF + Logistic Regression Model
- Accuracy: 98.62%
- F1 Score: 98.67%
Detailed Classification Report:
precision recall f1-score support
0 0.98 0.99 0.99 4284 (Real News)
1 0.99 0.98 0.99 4696 (Fake News)
accuracy 0.99 8980
macro avg 0.99 0.99 0.99 8980
weighted avg 0.99 0.99 0.99 8980
π Project Structure
FakeNewsDetector/
βββ README.md
βββ requirements.txt
βββ notebooks/
β βββ FakeNewsClassifier_HuggingFace.ipynb
βββ scripts/
β βββ train.py
βββ models/
β βββ bert-fake-news/ (generated after training)
βββ data/
βββ app/
βββ venv/
π Quick Start
1. Clone and Setup
git clone <repository-url>
cd FakeNewsDetector
2. Create Virtual Environment
python -m venv venv
# Windows PowerShell
.\venv\Scripts\Activate.ps1
# Windows CMD
.\venv\Scripts\activate.bat
# Git Bash
source venv/Scripts/activate
3. Install Dependencies
pip install -r requirements.txt
4. Launch Jupyter Notebook
jupyter notebook
π Dataset
The project uses the mrm8488/fake-news
dataset from Hugging Face, which contains:
- Total articles: ~45,000
- Training split: 80% (~36,000 articles)
- Test split: 20% (~9,000 articles)
- Classes:
- 0: Real News
- 1: Fake News
π§ Models Implemented
1. TF-IDF + Logistic Regression
- Vectorizer: TF-IDF with 5,000 max features, n-grams (1,2)
- Classifier: Logistic Regression with balanced class weights
- Performance: 98.62% accuracy
2. BERT Fine-tuning
- Base Model:
bert-base-uncased
- Training: 3 epochs with evaluation per epoch
- Optimizer: AdamW with learning rate 2e-5
- Batch Size: 8 per device
π οΈ Usage
Running the Notebook
- Ensure your virtual environment is activated
- Start Jupyter:
jupyter notebook
- Open
notebooks/FakeNewsClassifier_HuggingFace.ipynb
- Make sure the kernel is set to "venv" or "FakeNewsDetector (venv)"
- Run all cells
Training BERT Model
python scripts/train.py
The trained model will be saved to models/bert-fake-news/
π Requirements
- Python 3.8+
- pandas
- scikit-learn
- datasets (Hugging Face)
- transformers
- torch
- matplotlib
- seaborn
- jupyter
- ipywidgets
π― Key Features
- High Accuracy: Achieves 98.6% accuracy on test set
- Multiple Approaches: Compares traditional ML vs. transformer models
- Easy Setup: Simple virtual environment setup
- Comprehensive Analysis: Includes confusion matrix and detailed metrics
- Production Ready: Trained models can be saved and deployed
π Model Analysis
The TF-IDF + Logistic Regression model shows excellent performance:
- Balanced Performance: High precision and recall for both classes
- Low False Positives: 98% precision for fake news detection
- Low False Negatives: 99% recall for real news detection
- Robust: Handles class imbalance well with balanced weights
π Future Improvements
- Implement ensemble methods combining multiple models
- Add cross-validation for more robust evaluation
- Experiment with other transformer models (RoBERTa, DistilBERT)
- Deploy model as a web API
- Add real-time news article classification
- Implement explainability features (LIME, SHAP)
π€ Contributing
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
π§ Contact
For questions or suggestions, please open an issue or contact the project maintainer.
Note: This project is for educational and research purposes. Always verify news from multiple reliable sources.