How to find underlying topics in song lyrics by implementing Latent Dirichlet Allocation in Python using gensim, NLTK, and pyLDAvis

Have you ever had the feeling that most popular songs on the radio or on the charts cover the same topics? Well…I did, which is why applied Topic Modeling to a corpus containing lyrics of over 120,000 songs in order to discover underlying themes.

Here are the main Python libraries used for this project:

  • langdetect for filtering songs that are not primarily in English


The main source of our topic model is the AZLyrics dataset from Kaggle [1]. It contains approximately…

How to easily create an awesome graph visualization using Python, Gephi, and GitHub Pages

In this article, I will provide a step-by-step guide on how you can create, publish, and share interactive network graph visualizations in 5 simple steps.

Before we get started, here are the three things we need:

  • Python & NetworkX library [1]

Step 1: Import the CSV file and create a NetworkX graph

Our data source really can be any data format (e.g. TSV, pandas data frame, or array), but in this article, we will focus on CSV (Comma-Separated Values) format. In this tutorial, I will use the Marvel…

Building an easy-to-use Bundesliga data analysis app with Python and Streamlit

As the latest Bundesliga season is coming to an end and Bayern München is close to winning its 9th consecutive championship, I was wondering recently if football in Germany has changed at all in the last few years. I often spend my Saturdays watching Bundesliga matches and think to myself: “I could swear matches were more interesting last season”. More goals, more shots on the goal, more fouls. However, I am never able to find adequate data or analyses that may confirm my gut feeling. …

