Content-Freshness-Scorer: Content Freshness and Relevance for Refresh
Type: Commercial | Domain: SEO, Content Strategy
Hugging Face: syeedalireza/content-freshness-scorer
Score content by freshness (age) and relevance to a target query set for refresh prioritization.
Author
Alireza Aminzadeh
- Hugging Face: syeedalireza
- LinkedIn: alirezaaminzadeh
- Email: alireza.aminzadeh@hotmail.com
Problem
Deciding which pages to refresh first requires combining age and relevance to current search demand.
Approach
- Input: Page content (or summary), last_modified date, target keywords or query set.
- Output: Freshness score (e.g. decay by age) and relevance score (embedding similarity to query set); combined refresh_priority.
- Models: sentence-transformers for relevance; rule-based or small model for age decay; optional XGBoost to combine.
Tech Stack
| Category | Tools |
|---|---|
| NLP | sentence-transformers |
| ML | scikit-learn, pandas (time handling) |
| Data | pandas, NumPy |
Setup
pip install -r requirements.txt
Usage
python train.py
python inference.py --input data/pages.csv --queries data/target_queries.txt --output scores.csv
Project structure
08_content-freshness-scorer/
βββ config.py
βββ train.py # Build query embeddings from target_queries.txt
βββ inference.py # Score relevance + freshness β refresh_priority
βββ requirements.txt
βββ .env.example
βββ data/
β βββ pages.csv # Sample: url, content, last_modified
β βββ target_queries.txt # One query per line (used by train.py)
βββ models/
Data
- Sample data (included):
data/pages.csv(columns:url,content,last_modified),data/target_queries.txt(one target query per line). Runtrain.pyfirst to build embeddings from the queries file. - Override via
DATA_PATH,QUERIES_PATH,FRESHNESS_HALFLIFE_DAYSin.env.
License
MIT.
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support