oscar ballot statistics
Saturday, March 8th, 2008I’ve been meaning to post about this for a while, but it took me a long while to get around to making these calculations. See, I went to this party to watch the Oscars. I’m not really a big awards show person, so I was hoping to just ignore the television and focus on eating good food (I made a delicious frittata). However, there was an additional aspect to this party, which was a betting pool where everyone tried to pick the winners in 24 categories. I didn’t really want to expose my ignorance by picking a bunch of stinkers, so I went with a pseudo-random strategy for filling out my ballot.
Pseudo-random sounds very fancy, but really I should have just gone ahead and used a random number generator. The method that I did use took a bit longer to calculate, didn’t do anything that the rng wouldn’t do, and had one small drawback/feature (discussed below). For each category, there were between 3 and 5 nominees (usually 5), listed in alphabetical order. The pseudo-random variable that I used was the number of letters in the name of the first nominee (this was a person for some categories and a movie title for others). To generate my pick, I would raise 2 to the power of that number, subtract 1, and mod it with the number of nominees to determine who to pick.

In this formula, x is the number of letters in the first nominee’s name, Ni is the number of nominees in the categories, and Xi is the index of the nominee that I pick, counting from 0. One problem with this formula, which make me think that I should have just used a random number generator are that some people or movies were obviously nominated in many categories, and if they were at the beginning of the alphabet (‘Bourne Ultimatum’, for example) then the same value of x was used for multiple categories. A second problem is that there was one category (Best Documentary Short Subject) had four nominees; since 2 to some power minus 1 is always an odd number, my formula could only take the values 1 or 3 (not 0 or 2). However, I think that overall, my formula did accomplish the goal of being totally uncorrelated to any of the criteria used by the Academy, and therefor qualifies as pseudo-random.
Now then, here is a plot (with a logarithmic y-axis) showing the probability of getting any particular number of correct answers through random picking. The red line is calculated exactly, by generating all possible permutations of the ballot (there are 2.7 million ways to arrange 12 correct and 12 incorrect results!) and using the actual number of nominees in each category. The black curve is a binomial distribution where the probability of success in any category is 21.23% (there are 113 nominees for 24 categories). Obviously, the black curve is an excellent approximation of the actual probability distribution.

So, the big question is how many correct predictions I made. Well, the expectation value is 5.245 correct guesses. And… I only got 3 right. The probability of getting 3 or fewer correct answers is 19.56%, so you could say that I got somewhat unlucky. However, Jenn did a very good job of showing that the Oscars are most likely not a random process by correctly guessing 15 of the winners. The probability of randomly picking at least that many correct results is only 0.00186%!!
