AIPI Recommendation Module Project
Developer: Keese Phillips
About:
The purpose of this project is to create recommendations for different albums based on the user's playlists. This will allow the user to discover new music and possible additions to the playlist. The model is trained on a dataset from Spotify which is a combination of one million user playlists of all genders and ages. This was part of an initiative from Spotify for the community to find the best recommendation model. To download the dataset please visit Spotify Challenge and sign up for the challenge.
How to run the project
If you want to run the full pipeline and train the model from scratch
- You will need to visit the challenge site sign up to be able to download the dataset
- You will need to install all of the necessary packages to run the setup.py script beforehand
- You will then need to run setup.py to create the data pipeline and train the model
- You will then need to run the frontend to use the model
pip install -r requirements.txt
python setup.py
streamlit run main.py
If you want to just run the frontend
- You will need to install all of the necessary packages to run the setup.py script beforehand
- You will then need to run the frontend to use the model
pip install -r requirements.txt
streamlit run main.py
Project Structure
- requirements.txt: list of python libraries to download before running project
- setup.py: script to set up project (get data, train model)
- main.py: main script/notebook to run streamlit user interface
- assets: directory for images used in frontend
- scripts: directory for pipeline scripts or utility scripts
- make_dataset.py: script to get data
- model.py: script to train model and predict
- models: directory for trained models
- recommendation.pt: pytorch trained model for album recommendations
- data: directory for project data
- raw: directory for raw data from spotify's challenge
- processed: directory to store the processed dataframe to use on the frontend
- notebooks: directory to store any exploration notebooks used
- .gitignore: git ignore file
Data source
The data used to train the model was provided by Spotify. As per their dataset description:
The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017.
Contributions
Brinnae Bent
Jon Reifschneider