Skip to content

aqn96/SemanticSounds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

31 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SemanticSounds: Lyric Semantic Meaning Recommendation System

Python TensorFlow Open In Colab License

This study analyzes the relationship between audio features and lyrical content using extensive Spotify datasets from Kaggle. By integrating machine learning we aim to make a user tailored recommender system. Additionally, we explore semantic meanings in song lyrics and develop an improved recommender system. The study aims to prototype a recommender system that is more personalized than those in music apps. Attached is the src code and also the poster presentation and research paper.

Below is the youtube video that explains project with demo

Semantic Sounds Demo) | ๐Ÿ“„ Read the Research Paper


๐Ÿ“‹ Table of Contents


๐ŸŽฏ Overview

Traditional music recommendation systems filter songs by artist, genre, or audio similarity. SemanticSounds goes beyond surface-level features by analyzing the semantic meaning behind song lyrics using sentence-BERT (sBERT) embeddings.

Research Questions

  1. Feature-Popularity Correlation: Which song features (tempo, energy, danceability, lyrical complexity) correlate with popularity over time?
  2. Recommender System Development: Can we build a system that recommends songs based on both audio features AND lyrical meaning?
  3. Semantic Analysis: How do lyrical themes evolve across decades (1950s-2010s)?

Motivation

  • For Listeners: Discover songs with similar emotional and thematic content, not just similar beats
  • For Artists: Understand which lyrical themes resonate with audiences
  • For Industry: Identify cyclical trends and optimize marketing strategies

โœจ Key Features

Feature Description
Dual Recommendation Engines Base (audio features) + Enhanced (semantic lyrics)
Fuzzy Matching Handles misspelled song/artist names gracefully
Semantic Clustering Groups songs by meaning using UMAP + clustering algorithms
Temporal Analysis Word clouds and trend analysis across musical eras (1950s-2010s)
Interactive Visualizations Plotly-based interactive embedding visualizations

Example: Semantic vs. Base Recommendations

Input: "Judas" by Lady Gaga (a highly religious-themed song)

Base Recommender (Audio Features) Semantic Recommender (sBERT)
Generic pop songs with similar tempo/energy "Edge of Heaven"
Songs matching mood/vibe "When You Were Young"
Similar danceability scores "Original Sin", "Devil Inside"

The semantic recommender captures the religious motifs in the lyrics!


๐Ÿ“ Project Structure

SemanticSounds/
โ”‚
โ”œโ”€โ”€ ๐Ÿ“„ README.md                          # Main project documentation
โ”œโ”€โ”€ ๐Ÿ“„ paper.pdf                          # Research paper
โ”œโ”€โ”€ ๐Ÿ“„ Rhythms Through Time_ Hu.pdf       # Poster presentation
โ”‚
โ”œโ”€โ”€ ๐Ÿ“ demo/                              # Demo materials
โ”‚
โ””โ”€โ”€ ๐Ÿ“ recommender_src/                   # Source code and data
    โ”‚
    โ”œโ”€โ”€ ๐Ÿ““ music_recommender_base.ipynb   # Base recommender (audio features)
    โ”‚   โ””โ”€โ”€ Contains:
    โ”‚       โ€ข EDA & data preprocessing
    โ”‚       โ€ข Feature engineering pipeline
    โ”‚       โ€ข Regression models (Linear, Ridge, Lasso, RF, XGBoost)
    โ”‚       โ€ข Neural network implementation
    โ”‚       โ€ข SHAP feature importance analysis
    โ”‚       โ€ข Base recommendation system
    โ”‚
    โ”œโ”€โ”€ ๐Ÿ““ music_recommender_sbert.ipynb  # SBERT-enhanced recommender
    โ”‚   โ””โ”€โ”€ Contains:
    โ”‚       โ€ข Fuzzy matching & record linkage (merging datasets)
    โ”‚       โ€ข Lyric preprocessing pipeline
    โ”‚       โ€ข SBERT embedding generation
    โ”‚       โ€ข UMAP dimensionality reduction
    โ”‚       โ€ข Clustering (KMeans, DBSCAN, HDBSCAN, Agglomerative)
    โ”‚       โ€ข Word cloud generation by decade
    โ”‚       โ€ข Semantic recommendation system
    โ”‚
    โ”œโ”€โ”€ ๐Ÿ“Š top_10000_1950-now.csv         # Spotify audio features dataset
    โ”œโ”€โ”€ ๐Ÿ“Š spotify_60000_songs.csv        # Song lyrics dataset (Git LFS)
    โ”œโ”€โ”€ ๐Ÿ“Š merged_data2.csv               # Merged dataset after fuzzy matching
    โ”‚
    โ””โ”€โ”€ ๐Ÿ“„ README.md                      # Source code documentation

๐Ÿ“Š Datasets

Dataset 1: Spotify Top 10,000 Songs (1950-2024)

Attribute Details
File top_10000_1950-now.csv
Size 10,000 songs, 35 features
Key Features Danceability, Energy, Acousticness, Valence, Speechiness, Liveness, Loudness, Tempo, Popularity, Artist Genres, Album Release Date

Dataset 2: Song Lyrics (57,650 songs)

Attribute Details
File spotify_60000_songs.csv
Size 57,650 songs
Columns Artist, Song, Link, Text (full lyrics)

Dataset 3: Merged Dataset

Attribute Details
File merged_data2.csv
Matching Method Fuzzy matching with RapidFuzz + RecordLinkage (threshold: 80%)
Final Size ~2,011 matched entries

๐Ÿš€ Installation

Option 1: Google Colab (Recommended)

The notebooks are designed to run in Google Colab with Google Drive integration:

  1. Upload the data files to your Google Drive under Colab Notebooks/
  2. Open the notebooks in Colab
  3. Run the setup cells to install dependencies

Option 2: Local Installation

# Clone the repository
git clone https://github.com/aqn96/SemanticSounds.git
cd SemanticSounds/recommender_src

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install pandas numpy scikit-learn tensorflow sentence-transformers
pip install umap-learn hdbscan plotly matplotlib seaborn
pip install thefuzz[speedup] rapidfuzz recordlinkage
pip install nltk gensim shap xgboost lightgbm wordcloud openpyxl

# Download NLTK resources
python -c "import nltk; nltk.download('stopwords'); nltk.download('wordnet'); nltk.download('omw-1.4')"

Key Dependencies

pandas, numpy, scikit-learn
tensorflow, sentence-transformers
umap-learn, hdbscan
thefuzz, rapidfuzz, recordlinkage
plotly, matplotlib, seaborn
shap, xgboost, lightgbm
nltk, gensim, wordcloud

๐Ÿ’ป Usage

Running the Base Recommender

Open music_recommender_base.ipynb and run all cells. At the end, use the interactive recommender:

# Interactive mode - prompts for user input
recommend_similar_songs()

Example Session:

Enter the name of the song (required): toxic
Enter the artist name (optional): britney spears
Enter the number of recommendations you want (default 10): 15

Matched Song: 'Toxic' with score 100
Matched Artist: 'Britney Spears' with score 77

Recommended Songs (Top 15):
- 'Burning Up' by Madonna from the album 'Madonna'
- 'Turn Around (5,4,3,2,1)' by Flo Rida from the album 'Only One Flo (Part 1)'
- 'Sorry' by Joel Corry from the album 'Sorry'
- 'Me Against the Music' by Britney Spears, Madonna from the album 'In The Zone'
...

Running the SBERT-Enhanced Recommender

Open music_recommender_sbert.ipynb and run all cells. The semantic recommender considers lyrical meaning:

# Recommender with SBERT embeddings
recommend_similar_songs()  # Same interface, different results!

Example: "Firework" by Katy Perry

  • Because the lyrics are about "explosiveness," "fire," and "burning"
  • Recommends: "Burn," "Firefly," songs with passion/fire themes

๐Ÿ”ฌ Methodology

Data Pipeline

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Spotify 10K    โ”‚     โ”‚  Lyrics 60K     โ”‚     โ”‚  Fuzzy Match    โ”‚
โ”‚  Audio Features โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Song Lyrics    โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  & Record Link  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                         โ”‚
                                                         โ–ผ
                                               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                               โ”‚  Merged Dataset โ”‚
                                               โ”‚  (~2,011 songs) โ”‚
                                               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                        โ”‚
                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ–ผ                               โ–ผ                               โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚ Audio Features  โ”‚            โ”‚ SBERT Embeddingsโ”‚            โ”‚ Time Period     โ”‚
              โ”‚ Engineering     โ”‚            โ”‚ (768-dim)       โ”‚            โ”‚ Binning         โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚                              โ”‚                              โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                      โ–ผ
                                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                            โ”‚  Combined       โ”‚
                                            โ”‚  Feature Vector โ”‚
                                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                     โ”‚
                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                    โ–ผ                                 โ–ผ
                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                          โ”‚ Clustering      โ”‚               โ”‚ Recommendation  โ”‚
                          โ”‚ (UMAP + KMeans) โ”‚               โ”‚ (Euclidean Dist)โ”‚
                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Feature Engineering (Base Notebook)

Feature Type Examples Purpose
Interaction Terms energy ร— valence, dance ร— energy Capture feature relationships
Binned Features tempo_slow/moderate/fast, duration_short/medium/long Discretize continuous variables
Derived Features is_instrumental, overall_mood, Age_of_Song Domain-specific indicators
One-Hot Encoded Genre_pop, Genre_rock, Genre_dance pop Categorical representation
Artist Proxy Artist_Popularity (mean popularity per artist) Artist influence factor

SBERT Embedding Pipeline (SBERT Notebook)

# 1. Load SBERT model (GPU-accelerated)
model = SentenceTransformer('all-mpnet-base-v2', device='cuda')

# 2. Generate 768-dimensional embeddings from lyrics
embeddings = model.encode(lyrics_list, batch_size=32)

# 3. Reduce dimensions: PCA (768 โ†’ 300) โ†’ UMAP (300 โ†’ 2)
pca = PCA(n_components=300)
embeddings_pca = pca.fit_transform(embeddings)

umap_model = UMAP(n_components=2, n_neighbors=15, min_dist=0.1)
embeddings_2d = umap_model.fit_transform(embeddings_pca)

Lyric Preprocessing

def preprocess_lyrics(text):
    # 1. Remove text within parentheses
    # 2. Convert to lowercase
    # 3. Replace line breaks with ' / '
    # 4. Remove special characters (keep essential punctuation)
    # 5. Remove excessive repetitions (limit to 2)
    # 6. Tokenize and lemmatize
    # 7. Remove duplicate lines
    return cleaned_text

Fuzzy Matching for Dataset Merge

# Using RapidFuzz + RecordLinkage
# Blocking on first 2 characters + phonetic (Double Metaphone)
# Jaro-Winkler similarity threshold: 85%
# Final match threshold: 80%

๐Ÿ“ˆ Results

Model Performance Comparison

Regression Models (Predicting Popularity)

Model MSE Rยฒ Improvement vs Baseline
Linear Regression 368.78 0.512 +103ร—
Ridge Regression 368.77 0.512 +103ร—
Lasso Regression 368.76 0.512 +103ร—
Random Forest 473.25 0.373 +75ร—
Gradient Boosting 380.86 0.496 +100ร—
XGBoost (Tuned) 377.18 0.501 +101ร—
Keras Neural Net 471.68 0.376 +76ร—
Baseline [1] 883.80 -0.005 โ€”

Key Finding: Our models achieved Rยฒ = 0.512 vs baseline's Rยฒ = -0.005

Clustering Performance (Silhouette Scores)

Configuration Silhouette Score Interpretation
Time Period Only (Benchmark) 0.8084 High separation, low semantic richness
sBERT Only 0.4286 Semantic grouping, more overlap
sBERT + Audio Features 0.7464 Best balance of meaning & separation

Feature Importance (SHAP Analysis)

Top Features Influencing Popularity:
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
1. Artist_Popularity    โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 58.3%
2. Speechiness          โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  9.9%
3. Acousticness         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  9.8%
4. Loudness             โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  9.8%
5. Liveness             โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  9.5%

Time Period Analysis

Songs were binned into musical eras for temporal analysis:

Period Era Name Song Count
1950-1959 Birth of Rock 'n' Roll ~200
1960-1969 Cultural Revolution ~400
1970-1979 Rise of Diverse Genres ~600
1980-1989 MTV Era and Electronic Explosion ~1,200
1990-1993 The End of an Era ~800
1994-1996 Expansion and Mainstream Success ~900
1997-1999 Technological Advancements ~1,100
2000-2009 Digital Revolution ~2,500
2010-2019 Streaming and Global Connectivity ~2,300

๐Ÿ”ฎ Future Work

  1. Real-time Streaming Integration: Spotify API for live recommendations
  2. Enhanced NLP: Sentiment analysis, emotion detection per verse/chorus
  3. Production Deployment: FastAPI REST API, Docker containerization
  4. Advanced Modeling: Graph Neural Networks for artist relationships
  5. Multi-modal Fusion: Combine audio waveforms + lyrics + album art

๐Ÿ‘ฅ Contributors

  • An Nguyen (@aqn96) - Lead Developer, ML Engineering

๐Ÿ“š References

  1. Joe Beach Capital. (2023). Top 10,000 Songs: EDA and Models. Kaggle. Link

  2. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP.

  3. McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv.

  4. Spotify Web API Documentation. Audio Features. Link


๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


โญ If you found this project useful, please consider giving it a star! โญ

๐Ÿ“บ View Demo โ€ข ๐Ÿ“„ Read the Paper โ€ข ๐Ÿ› Report Bug

About

Prototype of a music recommender system based on lyrical meaning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors