Can This Clever Statistical Model Predict Olympic Medal Winners?

Interactive: Two brothers’ quest to Nate Silver the Sochi Winter Olympics

How many medals will the U.S. walk away with at this year's Winter Olympics? What about perennial runner-up China? Two brothers, Dan and Tim Graettinger, think they have the answers, and you’ll be surprised to hear how they got them.

Dan thought up the idea to Nate Silver the Olympics while watching NBC’s nightly medal count during the 2010 Winter games. Inspired by Google’s 20% rule, where you dedicate 20% of work time to personal interests, Dan pitched the project to his brother, a data analyst. Over the next four years, the two collected more than 30 datasets and ran regression after regression until they found a model that matched the past two Winter Olympics with incredible accuracy. The first chart pictured below shows which countries the brothers’ model predicts will win it all at this year’s Games.

“If he had known how long it would take to assemble the data,” Dan tells Co.Design, “maybe Tim would’ve told me to work on something else.”

Careful to avoid speculative biases, Dan collected every piece of information he could imagine, spanning economic, geographic, religious, and sociopolitical metrics. After Dan compiled all the data, Tim created a first-pass model that determined whether a country would leave the Games with even one Olympic victory. Many nations, including every country from Africa, South America, and the Middle East, were relatively easy to predict—they’ve never taken home a winter medal. Other perennial losers include Iceland, Greece, and Argentina.

The strongest predictor of a country’s Winter Olympics success was its performance two years before, in the Summer Games. “It was totally unexpected,” says Dan. “If a country didn’t medal in the summer, there was 100% certainty that it wouldn’t win at the next Winter Olympics.” Although the Jamaican bobsled team has faced quite a bit of stress on the road to Sochi, their country's strong 2012 performance in track and field spells good news for their chances to medal this year.

For the final model, the Graettinger brothers found that only four variables consistently predicted a country’s medal count in the Olympics: geographic size, GDP per capita, the value of its exports, and the capital city’s latitude.

“It’s interesting when you bring together data that doesn’t naturally co-exist,” Tim says. “You don’t usually see medal counts lined up against all this economic data.”

Of course, not every country fits the model perfectly, and this is what our second graph above illustrates. Although the Graettingers correctly predicted success within three medals for more than 80% of the countries in the 2006 and 2010 Games, several countries consistently defied expectations. South Korea raked in 16 more medals than the model predicted, whereas the United Kingdom walked away with 24 fewer medals than they should have. What is it about these countries that makes them so exceptional—for better or worse?

The bulk of South Korea’s medals came from speed skating. “How do you account for the fact that short-track speed skating is hugely popular in South Korea?” the brothers wonder in a blog post about their findings.

It also appears that home court advantage is a very real factor as well. Both Italy and Canada over-performed in 2006 and 2010, respectively, when the games were hosted in their country.

“If we were going to Vegas, I think we would make some adjustments based on what we see in the outliers table,” Dan says. Even if the Graettinger brothers aren’t putting money down on their predictions, an 80% success rate is far better odds than the house ever lets you play at the casino.

It’s easy to get lost in the Olympics’ human stories of dedication and perseverance, but the Graettingers’ findings suggest that a Shaun White born in Kyrgyzstan would probably be sitting at home this week, watching the Games from afar like the rest of us. What do you think? Have the Graettinger brothers pulled a Nate Silver, or did they miss a crucial variable? If you have any thoughts on how to improve their model, leave them in the comments below.

[Image: Turin, Italy via Paolo Bona / Shutterstock]

Add New Comment

14 Comments

  • Arthur Manouki

    It was a interesting idea but perhaps the two brothers should have looked at other variables like the average ages of Olympians from each country. There were some sure wins predicted early in the games that did not pan out, Shawn white was one of them.

  • Go Bluth

    Only 29 countries won medals in the 2010 Olympics, and 9 of these won 3 or fewer medals. Therefore, if I predicted 0 medals for all countries I would be "correct within 3 medals" for all but 20 countries. Since there currently 204 national Olympic committees then my "model" of predicting all zeroes is correct within 3 medals 90% of time (since(204-20)/204=.9). So, given that, this article is highly misleading.

  • You raise a good point, Mr. Bluth. Much of the model's accuracy comes from predicting that nations which haven't won in the past won't win this year. I think there's value in appreciating the rarity of Winter Olympics underdog stories, though. Also, only 82 committees participated in 2010, bringing your accuracy down to 78 percent. Still, I think the main point of the Graettingers' experiment was to have some fun, identify variables common across the winning nations, and show just how small that pool actually is. In any case, thanks for chiming in!

  • Did they use an OLS model? As far I can tell they did, but I wonder if Poisson or negative binomial model would have been better. These models are better suited to handle count variables, which, like Olympic medals, can only take on discrete integer values greater than or equal to zero. Also, it looks like they pooled data from the past two Olympics, but didn't control for Olympic year in their models. This could be important if there was something unobserved about the two Olympics that may have resulted in a country getting more/less medals in 2006 or 2010 , like the "home field advantage" for Canada in the 2010 games that the authors point out. Just two things to consider that might improve the accuracy of the models.

  • Brendan Bartanen

    Did you control for the number of athletes participating from each country? Seems like that would be strongly correlated with number of medals.

  • For folks wanting more details on the statistical model:

    R^2: 0.585

    Adjusted R^2: 0.550

    Coefficients B Std. Error Sig. (p-value)

    (Constant) -15.874 6.566 0.020

    geog_area 7.94E-07 2.1E-07 0.0004

    gdp_per_cap 3.12E-04 7.75E-05 0.0002

    exports 1.17E-11 3.2E-12 0.001

    lat_of_capital 0.267 0.122 0.034