Warning: Adult Language Ahead
I have a passion for rhymes, specifically in hip hop. I may sound cliché but seeing the way rap artists string together syllables into a cool cohesive line… it’s like magic.
My original inspiration for this project came from this video (skip to 4:19 to see an example of a deconstructed verse)
In the video they manually highlighted the words that rhyme together. My goal for the project was to automate the entire process in a deployed webapp. This was one of the biggest project I had undertaken (at the time), involving web scraping, algorithms, text processing, and a webapp with front end, back end, and database utilization.
To see its current iteration follow this link.
Here’s an example from the exceptionally lyrical artist Danger Doom:
The repo for those interested.
If you are interested in how I did it, read on. But be warned, since this was a text analysis project, there will be walls of text.
The project originated as a small web scraping Software Design project. I originally aimed to create a python program that gave an overall rhyme score to any song identified.
To start off, I had to find out if two words rhymed. Due to the idiosyncrasies of the human language, processing solely based on characters does not give a good enough approximation of the pronunciation of a word. Instead, I used thea python script along with the CMU Pronouncing Dictionary to break the words up phonetically and then compare the ends. The phonetic parts that match up add 100 points to the overall rhyme score of the word while ones that match up but are one position off add 50 to the rhyme score (to be honest the points are completely arbitrary 100 just seemed like a fun number). Respected and Directed have 4 pronunciation positions lined up and therefore have a score of 400.
Through fine tuning, I found that requiring greater than 150 points for the last 3 units is a good approximation of whether two word “sound alike” (at least for the purposes of rap). However, these parameters could be changed and fine tuned for different results.
Now that we can tell if two words rhyme, we have to find rhymes in a sentence. To do this, we simply go through the sentence and compare each word to the 10 words before it (another fine tuned parameter), then note each pair of words that rhyme.
In order to grab the lyrics, I used the Beautiful Soup python module to index and scrape web pages. In order to have consistent input, I required all input to be in the form of a url to metrolyrics.com. This way I can use the same code to get the lyrics every time.
Now I can get any song and get an overall metric for how much it rhymes given only a url to it’s Metrolyrics page.
Now for the User Interface.
I ran the webapp with a python back end using flask and deploying using heroku. I easily created a start screen that will accept user input and return the necessary url, but I ran into the problem of being unable to easily initialize a webpage with the required colored words. I found the best way to solve this problem was to store the information in a database (using sqllite) where it can be called by the secondary webpage that the user is sent to. This has the benefit of when any user searches for a page that has been searched before, the information can be pulled up much quicker since none of the text processing needs to be done.
Finally with the webscraping, text processing, and data base handled, I added some html and color to make it (somewhat) less bare bones.
Obviously there was more details, iteration, debugging, and head banging than described here, so as always feel free to email me if you have any questions!