Turns out mine is bad. I’m also uncool and old. More on that in a bit. Let me tell you how I found out…
I’m a big fan of Pudding.cool. I love their visual storytelling. It’s one of the handful of sites I’ll type in by URL to visit (yep, like an old guy). If nothing else, you should sign up for their Winning the Internet newsletter, which is where I found the link to their How Bad Is Your Spotify? project. I was just thinking how bad Spotify had been lately at suggesting interesting playlists, so I eagerly clicked over, hoping to find out why the music service had lost a step in its personalized recommendations.
Instead, I found there was a different concept at play:
Our sophisticated AI judges your awful taste in music.
I recognized this idea immediately. Quickly judging someone’s taste in music was something I often did in my head at the record store where I worked during college, The Disc Exchange (RIP). Like The Pudding’s Spotify project (and every 20-something hipster), I also started from the position that most people have awful taste. To be clear, the Disc Exchange wasn’t Other Music and there wasn’t a ton of attitude at the store. It had more to do with my youthful bias and an unsettling exposure to the music-buying public that conditioned me to assume the worst in people after awhile.
If, in my one- to two-minute assessment, I decided a person’s music tastes were bad, they would get an “Oh, I bet you’ll like the new Pearl Jam” response when they asked, “What’s something new I should listen to?” It was my ’90s record-store-clerk version of “Bless your heart“—in short, mean and stupid.
So now the fine folks at The Pudding have figured out how to give chatbots record-store smugness. Finally, we’re putting technology to good use! The folks at The Pudding are smart, and I assume this whole music taste project was done with their tongues firmly placed in their cheeks. What is more interesting is that on a meta level, they have just demonstrated how biased “artificial intelligence” is due to the people who make it and the data that goes into it. Let me explain.
How good is your training data?
Like all good machine learning projects, The Pudding developers were using OBJECTIVE DATA to train their machine learning model to judge your Spotify listening history. If you take Pitchfork, Brooklyn Vegan, and hundreds of other music blogs and amalgamate all that coolness into a bona fide artificial intelligence algorithm, well, all you got are FACTS….right? I even clicked the fine print to ensure there was some legit tech testing my listening worthiness:
Oh, good, they also included subreddits! Clearly this was comprehensive science, so I was in. I logged in to give them access to my Spotify account. To ensure this process was free of misinterpretation or bias, I was prompted with qualifying questions:
I thought it was a nice touch that they let the user know the AI was architected by real music snobs with the “lol” and “omg” comments upon looking at my collection—er, Spotify listening history. I lived in fear of my wall of CDs being judged by people that came to my apartment. That’s because I also judged people by their collections. “So, you own every album by Marillion? Fascinating.” (Note: That would turn out to be cool AF after all. Another hipster dilemma for another post.)
I did get a little irritated by the ironic question because I’m a Gen X’er whose Alanis and Reality Bites exposure instilled fear of misuse of that rhetorical device at a formative age. What I do know is “Down the Dream” is a badass song by these badass sisters. Check it out:
So I double-checked. Nope, I wasn’t listening to Maggie and Terre Roche ironically! In fact, I went down to Mill Valley Music and bought a vinyl copy of that record. Gary, who runs the store, is literally the nicest guy in the world and would never judge you or your tastes—you know, like it should be.
As the AI plowed through my listening history, it discovered other indicators of my Spotify awfulness:
Wait… the Grateful Dead are cool now, right?! Phil Cook is definitely cool. Is being a Lambchop stan a bad thing? Last time I saw them in before times, they were still playing small theaters full of hipsters. I wasn’t sure, but I still had a bit of confidence the AI was going to figure out that this ol’ record store clerk still had it.
Wait…is listening to Mount Kimbie a sign I’m depressed or something?! It’s not Swedish Death Metal, which I would think of as truly morbid. But now I’m at the age when I have no idea about anything anymore. Maybe all of music is a bit miserable like Hornby wondered:
“What came first—the music or the misery? Did I listen to the music because I was miserable? Or was I miserable because I listened to the music? Do all those records turn you into a melancholy person?”
—Nick Hornby, High Fidelity
But now things were getting really awkward. I didn’t want to fuck, marry, or kill Hiss Golden Messenger, Lambchop, or Bob Dylan. Although I should point out that all of them seem reasonably suited for your sexual or life partner needs. I want all of them to live a long time, too. This felt like we were training the AI to fear us…again. But I was so invested in this test that I couldn’t back out now. I put my hands over my eyes and start clicking until the test moved on.
I got an initial “Hey, your Spotify looks great!” message. Yay! But then…the AI changed its mind, erased that sentiment, and started to tell me what it (and its creators) really thought:
While I will freely admit a Stones addiction, there was only one week where I listened to Nothing’s Shocking two or three times while running (BTW, still great!). I had to double check what they meant by stomp and holler to know why that was a knockdown. Actually, that categorization makes sense as I’m all over that playlist…but none of those artists are from the mid-’90s. Or mainstream country. Or necessarily yoga-inspired for a nag champa burndown. Weird.
What is signal and what is bias?
What this exposes is a bias toward whatever data The Pudding’s AI got from Spotify and the biases they put in to train the data. What is the right number of times to listen to something that signals that a piece of music defines me? I think this is part of the Spotify recommendations problem too. How does Spotify balance my “liked songs” with songs I play repeatedly? Then how do they make recommendations from my listening cohorts and look-a-like users when I only wanted to listen Jane’s Addiction for a week of nostalgia? What about that day I wanted to figure out what Hyperpop was…but didn’t like it? It can get complicated quick.
As my Spokestack data scientist friend Will Rice said, “The important take home is that algorithms are sold as objective but are actually subjective because of the selection bias for recommendation features”. To Will’s point, I’d love to look through the training data to figure out where Jane’s Addiction or The Roches set off warning bells of my music stankiness. Is Gorilla vs. Bear just hating like crazy on The Roches? Somewhere, someone made a decision about what was cool based on year, genre, etc that echoed through the whole AI’s “objective analysis”.
And with chatbots, the developer biases are always clear. You don’t think the AI learned these ageist and stereotypical putdowns on its own, do you? I mean, I don’t read all the music blogs they’re ingesting to train their machine learning model, but are these the things you say to someone whose music doesn’t jibe with the new arbiters of cool?
Nope. Engineers put those phrases in little reference YAML files in a conversation manager they whipped up for the chatbot to use as responses based on the inputs from my “bad Spotify.” No magic here, folks. Just good ol’ social engineering.
So let’s talk more about the inputs that categorized me as such a stereotypical loser, and remember, this was derived from the objective data!
Well, it’s hard to dispute the data. I love all those artists and songs, so color me awful. What’s interesting is that these results demonstrate where the data inputs fall apart. Here are the telltales:
- The default playlist problem: “Hawk” by Brasstronaught was the first song on a playlist I made in August to listen to while running and working out. Out of sheer laziness and a pure self-hatred if I pick a “power workout” playlist, I kept playing my workout playlist over and over again this fall. In fact, four of the five songs on my “too much” results come from this playlist.
- This song is the first track on new album problem: The only track in my top 5 that isn’t on that workout playlist is the Mount Kimbie song, “Four Years and One Day”. That track is the first track on their album Love What Survives. Two years after its release, I still listen to that album as a default when I can’t think of anything else to listen to because it is brilliant. Unfortunately, that track is not even close to my top five favorite songs on that album, much less in general.
- Another default album problem: I really didn’t listen to a lot of Lambchop this year like I have in years past, and I did it purposely. Their album Is a Woman was my go-to for focus or sleep for a long time. Why did I stop actively listening to it? For the last few years, Lambchop’s “Is a Woman” came up as a top album for me in my “end of year” Spotify reports. It angered me that I was paying Spotify to rent an album that I own on vinyl, CD, and FLAC formats. More on that issue in another post.
- Autoplay and the artist who has 500+ albums problem: The thing about Bill Evans, Bob Dylan, and the Grateful Dead is they have a lot of albums. So listening to them or, more often, letting them keep playing while I do other things overweights their influence on my listening output. If I put on Bill Evan’s Undercurrent and start doing other things, 3 of his albums can play before I turn it off or listen to something else.
My point is the data is more influenced by how I listen to music than by why I listen to it. Continuous play mode, mobile access (“going for a run”), and sheer passive listening brought on by cloud-based music have muddied the signal of what we like and enjoy listening to the most. If you got into my car when I was in high school, it was clear from the wear on my cassettes that The Replacements’ Tim, Husker Dü’s Candy Apple Grey, R.E.M.’s Life’s Rich Pageant, and The Cult’s Electric defined me. I voted with my limited amount of money and space. Now, with unlimited storage for $9.99 a month, there are no clear signals of what I really value unless you come over and see my vinyl collection. Again, like an old.
Figuring out what data we are sending that truly contains the signal of our likes and dislikes will always be challenging. Netflix, Amazon, Spotify, et al struggle with it as their data catalogs get bigger and our consumption becomes more ambient than active. All of these companies choke over and over on recommendations due to the overabundance of non-contextual inputs and data complexity. It’s hard to make personal recommendations at scale.
This is why I get most of my music recommendations from Worldwidefm.net, Aquarium Drunkard, Bandcamp Weekly, and Mixcloud DJs. Humans still make better playlists. It is also why I think we’ll see a comeback in favor of human-based recommendations. Even Spotify is betting on humans to make better playlists and why they launched a service that enables everyone to create podcasts with licensed music late last year.
What The Pudding really judged was my “newness,” which was pretty weak this year. This was more clearly pointed out in my Last.fm recap:
So while I’ll own my flannel-shirt wearin’, mid-’90s craft beer persona proudly, I do get the point that I need to up my new-music game. Anyone got any recommendations? No AI’s allowed.
Hat tip to Ben, Monteiro, Om, and Drew, who all inspired me with their physical ‘zines (YES), blog posts, and newsletters. All of you contributed to my development and overall happiness throughout last year with your insightful, heartfelt writing. Thanks!