Content-Freshness-Scorer: Content Freshness and Relevance for Refresh

Type: Commercial | Domain: SEO, Content Strategy
Hugging Face: syeedalireza/content-freshness-scorer

Score content by freshness (age) and relevance to a target query set for refresh prioritization.

Author

Alireza Aminzadeh

Problem

Deciding which pages to refresh first requires combining age and relevance to current search demand.

Approach

Input: Page content (or summary), last_modified date, target keywords or query set.
Output: Freshness score (e.g. decay by age) and relevance score (embedding similarity to query set); combined refresh_priority.
Models: sentence-transformers for relevance; rule-based or small model for age decay; optional XGBoost to combine.

Tech Stack

Category	Tools
NLP	sentence-transformers
ML	scikit-learn, pandas (time handling)
Data	pandas, NumPy

Setup

pip install -r requirements.txt

Usage

python train.py
python inference.py --input data/pages.csv --queries data/target_queries.txt --output scores.csv

Project structure

08_content-freshness-scorer/
├── config.py
├── train.py           # Build query embeddings from target_queries.txt
├── inference.py       # Score relevance + freshness → refresh_priority
├── requirements.txt
├── .env.example
├── data/
│   ├── pages.csv          # Sample: url, content, last_modified
│   └── target_queries.txt # One query per line (used by train.py)
└── models/

Data

Sample data (included): data/pages.csv (columns: url, content, last_modified), data/target_queries.txt (one target query per line). Run train.py first to build embeddings from the queries file.
Override via DATA_PATH, QUERIES_PATH, FRESHNESS_HALFLIFE_DAYS in .env.

License

MIT.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

alirezaaminzadeh
/

content-freshness-scorer