Content-Freshness-Scorer: Content Freshness and Relevance for Refresh

Type: Commercial | Domain: SEO, Content Strategy
Hugging Face: syeedalireza/content-freshness-scorer

Score content by freshness (age) and relevance to a target query set for refresh prioritization.

Author

Alireza Aminzadeh

Problem

Deciding which pages to refresh first requires combining age and relevance to current search demand.

Approach

  • Input: Page content (or summary), last_modified date, target keywords or query set.
  • Output: Freshness score (e.g. decay by age) and relevance score (embedding similarity to query set); combined refresh_priority.
  • Models: sentence-transformers for relevance; rule-based or small model for age decay; optional XGBoost to combine.

Tech Stack

Category Tools
NLP sentence-transformers
ML scikit-learn, pandas (time handling)
Data pandas, NumPy

Setup

pip install -r requirements.txt

Usage

python train.py
python inference.py --input data/pages.csv --queries data/target_queries.txt --output scores.csv

Project structure

08_content-freshness-scorer/
β”œβ”€β”€ config.py
β”œβ”€β”€ train.py           # Build query embeddings from target_queries.txt
β”œβ”€β”€ inference.py       # Score relevance + freshness β†’ refresh_priority
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ pages.csv          # Sample: url, content, last_modified
β”‚   └── target_queries.txt # One query per line (used by train.py)
└── models/

Data

  • Sample data (included): data/pages.csv (columns: url, content, last_modified), data/target_queries.txt (one target query per line). Run train.py first to build embeddings from the queries file.
  • Override via DATA_PATH, QUERIES_PATH, FRESHNESS_HALFLIFE_DAYS in .env.

License

MIT.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using alirezaaminzadeh/content-freshness-scorer 1