12/23/15

Don't Just Look At The Ratings: A Closer Look At Pitchfork's Numbers -- Part 2: All Of Pitchfork's Ratings, Like Pretty Much, Ever, And A Look At The Most Extreme Ones

Calvin gets it

Hey, wassup.  I'm back with more spreadsheets about music!  But seriously folks what's the fun in anything if you can't precisely quantify it? THIS IS HARD DATA!

If you're unsure if I'm being sarcastic or not, honestly I'm not sure either at this point.  Regardless, I've been sitting on some great data that I scraped off line way back last summer, and it's time I did something with it.  So I'm going to use a couple different metrics to find weird Pitchfork reviews.  Specifically, I'm going to look at Pitchfork's absolute lowest ratings, it's ratings with the largest difference relative to a consensus, and it's ratings that deviate most from a linear model.  Though this sounds boring, it lead to me reading some truly lol-able reviews by Pitchfork and a couple other blogs.  I'm going to share some choice quotes from those reviews, alongside some classic armchair meta-criticism to try and spice things up.



But first a little more detail on the data.  Albumoftheyear.org aggregates album ratings from music criticism sites like Pitchfork, Spin, or The A.V. Club and puts them in a semi-accessible format that can be scraped into spreadsheet form.  I pulled all of their aggregate ratings for albums with 5 ratings or more since 2000, as well as Pitchforks ratings from the same time period (ending, apparently, June 23, 2015), then smashed those two tables together for comparison.  The data steadily increases in quantity from the beginning of the sample to the end, as can be seen in this nifty little graph:
What's actually most interesting about this graph is the drop-off in albums released each December.  I'm pretty sure
it's due to the need to drop an album before the end of year awards and holiday shopping seasons, but I'm not certain.  Not really related to the article, but I had to bring it up.
Anyway, the obvious first thing to do is just look at the scores.  As far as I can tell, there is nowhere else on the internet where these data can be found, all in one spreadsheet.  It's kinda cool.  Also, note that these data do not include Pitchfork's "Reissue" reviews, which they use to review historical albums.  So yes, they definitely have given out more 10/10s than what you see here, but these are the only contemporary albums that they have given 10/10s to, since the year 2000.


First off, legendary MVP playlist band ...And You Will Know Us By The Trail Of Dead clockin' in with a 10/10 album leggo....


Below is a histogram of these ratings, if anyone cares.  Pitchfork sticks to what it knows.  The 1st Quartile and 3rd Quartile to these data are 6.4 and 7.8, respectively, which means that the middle 50% of all Pitchfork's scores in the last 15 years have been between 6.4 and 7.8.  Which is... interesting.

The best part about this list is to scroll to the bottom (sorry, couldn't figure out how to make these embeddable tables sortable. You can easily download the spreadsheet and look at it in excel).  Reading some of Pitchfork's most negative reviews is pretty fucking hilarious.  At its extremes is when Pitchfork's true Pitchforkiness really comes out.  A synopsis of some of Pitchfork's most hated albums below....

#1 Lowest all time: 0.0  Liz Phair - Liz Phair
By: Matt LeMay, June 24, 2003
The first, last, and only 0.0 Pitchfork gave out in my sample was to Liz Phair for her eponymous 6th album.  I have not listened to this album.  I very much do not want to listen to this album.  But it seems that Matt LeMay did, and he did not like what he heard one bit.
"In recent interviews, Phair has been upfront about her hopes of mainstream success, and claims full awareness that Liz Phair is likely to alienate many of her original fans. What she doesn't seem to realize is that a collection of utterly generic rocked-out pop songs isn't likely to win her many new ones. It's sad that an artist as groundbreaking as Phair would be reduced to cheap publicity stunts and hyper-commercialized teen-pop. But then, this is "the album she has always wanted to make"-- one in which all of her quirks and limitations are absorbed into well-tested clichés, and ultimately, one that may as well not even exist." -- attribution
The album seems to be another fulfillment of the indie-rocker gone mainstream narrative... Ok I just listened to 30 seconds of "Why Can't I" and this album is really bad.  But 0.0 bad?  No.  This 0.0 isn't really a thoughtful, accurate rating.  It's just an extreme, hyperbolic statement.  This rating is certainly a relic of a different time in Pitchfork's life, before their reviews and ratings began to really matter to a wide audience.  I've heard a lot of people cite their review of Arcade Fire's "Funeral" in 2004 as that exact moment.  Before that, there was some really weird shit going on at Pitchfork.  You can still find some of it.  Especially pre-2000, which is not included in my data, there were some 10/10 ratings that were... interesting to say the least.  Pitchfork themselves have removed some of these reviews from their site.  Pussies.

#3 Lowest all time: 0.4 Butthole Surfers - Weird Revolution
By: Brent DiCrecenzo, August 31, 2001
Yeah, this is pretty spot on.  And I think early on-album Butthole Surfers is pretty good. But late-career Butthole Surfers sucks. so. much. It's really depressing.  And this article really has some zingers.
"It's as if you took Eastern Airlines to Eastern City in the heart of the Far East with nothing but some drugs and an English-to-Eastern dictionary to get by. "God, Zeus, Allah, Buddha," Haynes says. "Bob Dylan on a motor scooter." It's about as surreal as a Lunchables commercial."
That's one way of putting it.  Reading this out of context would make so little sense it's stupid.
"Weird Revolution exists only because the Butthole Surfers have mouths to feed, mortgages, and no other option in life. This is never the beginning of an essential album."
Yep. Sigh.
"Exhibit #19,954 in the eternal, one-sided case of commercialism and aging v. art: Weird Revolution unintentionally lives up to its title by embodying the inexplicable surge of honky fratboy Jeep-beat rock that can't disappear soon enough." -- attribution
I personally was unaware of the scourge of Jeep-beat rock that was ravaging the country at that time.  The more you know.

#6 Lowest all time: 1.0 Mac Miller - Blue Slide Park
Jordan Sargent, December 8, 2011
"The reason Miller's mass of fans follow him is not because of his music, at least not completely. It's because he looks just like them, because they can see themselves up on the stage behind him, if not next to him."
Uh oh, don't go there.  Yep we're going there IT'S BECAUSE HES WHITE.
"The pop world has left rap behind, save four or five rappers, and it's opened a door for someone like Mac Miller to seize the college-aged, white-male fanbase. If that fanbase is interacting less with rap music, then maybe they've rallied around Miller because he also barely engages with the wider rap world." -- attribution
One of Sargent's main criticisms-- besides the album just being bad, is that Mac Miller was some isolated Pittsburgh frat-rapper who was sucked into a void that no one had figured out needed to be filled.  That is popularity was more historical incident than anything else, causing his independently released Blue Slide Park, an album with zero guest artists, to rocket to the top of the charts.

Amazingly, just four years later, Mac Miller is as industry focused of a rapper as they come.  He's on a major label. He's had Schoolboy Q and Tyler the Creator and Miguel hop on his two new albums as features.  Mac's been on albums with Meek Mill, Ariana Grande, Earl Sweatshirt, Action Bronson, Freddie Gibbs, Chief Keef, and Ab-Soul and more.  He's everywhere.  Kendrick even called him out is his atom-bomb diss verse on "Control".  Pitchfork just gave his last album a 7.3.  And just listen to his old and new music side-by-side.


Mac Miller has changed.... a lot.  Pretty much everything Sargent has to say may have been correct at the time, but is completely false now.  I feel weird that I just wrote so much about Mac Miller, but it's just such an odd feeling knowing that this guy is one of the biggest success stories in the last half-decade of hip hop.

Footnote: "Take over the world when I'm on my Donald Trump shit." -- Mac Miller was not the prophet we wanted, but the one we needed.

#9 Lowest all time: 1.6 Childish Gambino - Camp
Lol.  This one's great.
"Glover's exaggerated, cartoonish flow and overblown pop-rap production are enough to make Camp one of the most uniquely unlikable rap records of this year (and most others)"
Ouch.
"At the very least, Camp can serve as hashtag rap's tombstone"
Yikes.
"Camp: preposterously self-obsessed, but not the least bit self-aware." -- attribution
So good.  But pretty much everything that's written here applies to Because The Internet, which, though generally a little better quality-wise, is still chalk full of cheesy punchlines and Glover's inexplicable attitude that he's adorable and unique because his raps are clever and emotionally vulnerable and he occasionally mentions fucking college chicks.  My point?  If Camp was a 1.6, Because The Internet isn't, like, 5 times as good at 5.8, and vice versa.  Pitchfork's lowest scores are mostly just shock-factor attempts to make a statement, and don't really contain much more information than that.

To get at the albums that Pitchfork really, specifically hates, or loves, I needed to look at Pitchfork's ratings relative to a 3rd party.  That's where AOTY's aggregate scores came in.  For each album with both a Pitchfork and AOTY rating, I took the difference between the two, took the absolute value, and looked at that leaderboard.  This captures Pitchfork's most significant deviations from industry consensus.  Here's all the data: (note: "aoty.adj" are the scores I used to calculate the differences.  Pitchfork scores are included in the "aotycritscore", so I removed these scores and recalculated the mean.)

Largest difference between AOTY and Pitchfork scores: Metallica - St. Anger: Pitchfork 8, AOTY 70.4
St. Anger was released in 2003 as Metallica's first album in 6 years.  It was considered a huge departure from their "classic" recordings, featuring raw production, no solos, and odd choices in instrumentation, such as the use of wah pedals and steel toms.  The 6 other reviews of this album were generally, if not very positive.  NME and Drowned In Sound gave this album a 90.  The lead single "St. Anger" won a grammy.  And Pitchfork fucking hated it.  They hated it, in fact, more than any other album relative to it's AOTY rating.  Far more.  Their rating, 62.4 points below AOTY's score, is 21% larger than the next greatest difference, Tool's Lateralus.  Says Brent DiCrescenzo --
"Hetfield and Hammett's guitars underwent more processing than cat food. When they both speedstrummed through "St. Anger", and most other movements, H&H; seemed to overwhelm each other with different, terrible noise."
"Bob Rock's bass neither bobbed nor rocked; it simply hid like an undulating grey amoeboid of sound."
This is the first result if you google "amoeboid of sound", if anybody cares.

"Emo bands found the simple process of merely moving from quiet to loud to be breathtaking. They wrote songs where beauty and melody was assumed on the count of clean guitar and picking chords, despite the fact that the two guitars had no knowledge of each other. In Metallica's case, the result was somehow worse for sounding so calculated and plotted in ProTools. ProTools had never been metal. ProTools never snorted ants up his FireWire from the side of the pool while urinating down a woman's dress. ProTools never inserted the sound of a chainsaw into the opening of "Black Metal" off the album Black Metal. ProTools never burned churches in Norway. And yet, ProTools had a major hand in assembling both "American Life" and "Frantic""
 -- attribution
I can say, for a fact, that I have never seen the software program ProTools snort ants up its firewire whilst poolside.  This is why I come to Pitchfork, for the relatable, helpful analogies.  Yet, about the same album, NME raves:

"Each song mutates and heads off in a new direction at the exact point lesser mortals would finish up. It takes 73 minutes to play 11 tracks, stretching time and endurance, until you have an immense statement of superiority. Nu-metal minnows, you may return to your cubbyholes. The true masters have finally awakened from their slumber." -- attribution
Good shit.


Largest Positive Difference Between Pitchfork and AOTY Scores: Architecture In Helinski - In Case We Die: Pitchfork 88, AOTY 65.14

As AOTY's ratings are aggregates of many ratings, they tend to be pretty high, and rarely dip below 50.  For this reason, the leaderboard for absolute differences is dominated by albums that Pitchfork liked less than the consensus.  To find the greatest positive difference, you have to scroll down to the 27th greatest overall difference.  There you will find this electro-pop record.  Pitchfork gave it "Best New Music".  On the other side of the spectrum, tinymixtapes.com gave it a 30.  Tiny Mix Tapes' review does not pull any punches:
"I don't know whether Architecture in Helsinki is for real or an ironic parody of the post-Arcade Fire indie rock scene."
Pitchfork's review also mentions Arcade Fire similarities, but doesn't seem to find anything wrong with that comparison.
"The band uses a horn section to underscore virtually every riff, giving the songs the authentic feeling of a Reel Big Fish rip off." -- attribution
Hey now.  Pitchfork refers to the horn section as granting the band "multiple palette combinations to try." --link.  They also refer to the album as "pretentious-pop" like it's a complement.  Yuck.  This album is exactly the kind of thing you would expect for Pitchfork to like more than the field.  Though there are certainly similar albums on the leaderboard with genres like dance pop and"indie pop, I don't think I could make any conclusions about Pitchfork pointedly rating this type of album or genre of music higher than the field.  Alongside such albums are names such as Gucci Mane, Tortoise, R. Kelly and Black Dice and Billy Corgan.

For my last thing that I'm doing, in what I now realize has become an excruciatingly long post, I ran a linear regression between Pitchfork and AOTY scores, and found the residual for each album.  This number can be interpreted as how abnormal a Pitchfork rating was, compared to the rest of the data's general relationship between Pitchfork and AOTY ratings.  A residual of -10 means that Pitchfork's actual rating was ten points lower than what the model predicts based on its AOTY rating.  Here's the data:




And at left is a plot of the data.  A lot of the usual suspects on this list.  As it should be.  It makes sense that the albums with the largest differences between the two scores would suggest outlier behavior for those points.  Also, this data is heteroschedastic, as the variance increases as AOTY scores decrease.  I have done nothing to compensate for this, so the residuals will tend to be higher for albums with lower AOTY scores.  The last album I'll look into isn't the album with the largest residual, but it has one of the highest positive residuals, and I also think it's really funny Pitchfork came out in its defense.  So for our last album:

 Lil' Wayne - Rebirth: Pitchfork 45, AOTY 32.29

Fuck ya.  With their middling 45 rating, Pitchfork turned in a very uncharacteristic review.  A 45 doesn't seem like such a odd rating, but when compared to the consensus rating of 32.29, the lowest AOTY rating in my dataset, it's an odd choice.  The model predicts that a 32.29 AOTY score would typically translate to about a 24 Pitchfork rating.  Instead, Pitchfork came out with tepid support of Weezy's abortion of an attempt at rap-rock, and turned in a very deviant review.  Their review starts of the with the sentence "It could've been worse."  Lil' Wayne tried to sound like Limp Bizkit, Pitchfork.  I'm not sure how much worse you can get.

Well, this is the end.  Some takeaways:

  • Very, very low ratings are not usually accurate reflections of the quality of an album relative to some theoretical underlying distribution.  Say all music is uniformly distributed by quality across the 10 point scale.  Is St. Anger by Metallica really worse than 94% of all music? No.  It's just a gimmick, a writer looking to stir the pot to get clicks and make controversy.
  • Pitchfork writers use so many goddamn hyphens.  Why?  It isn't a competition.  So you can type "proto-anglo-nu-gaze-funk-wave" with one hand while sticking the other one up your ass.  Congratulations.
  • Just looking at the data and seeing how much variance there is just between Pitchfork and one other, consensus rating, it really captures how much difference in opinion there is, even among mainstream music journalism.  If you're going to read music criticism online, I think this look at these extreme ratings captures the need to get your opinions from multiple sources.  There's a wide distribution of opinions online.  Try and get a large sample, or else you might end up missing out on a guy like Mac Miller, or subjecting your ears to the nonconsensual ear-rape of Lil' Wayne's Rebirth.

No comments:

Post a Comment