andrewvos.com

projects - git - twitter - cv

Amount of profanity in git commit messages per programming language

2011-02-21

Edit #1: By popular demand, here's a list of the commit messages with swear words in them. Also I added a bar chart below because this guy has no idea how to interpret pie charts :)

Edit #2: I've added a bar chart showing the amount of times certain words were used. Note that I got the calculation of the initial words a tiny bit wrong (I forgot to downcase the words when searching).

Edit #3: Added another chart for despo

Edit #4: By popular demand I've added Perl!

Last weekend I really needed to write some code. Any code. I ended up ripping just under a million commit message from GitHub.

The plan was to find out how much profanity I could find in commit messages, and then show the stats by language. These are my findings:

Out of 929857 commit messages, I found 210 swear words (using George Carlin's Seven dirty words ).

Note that I ripped an equal amount of commit messages per language so the results aren't based on how many projects there are per language.

Here's that data in pretty format:

Profanity by Programming Language

Total Swear Words

Total Other Words

If anyone is interested, the source is up at github.