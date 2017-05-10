Why is data visualization so important in statistics, anyway? Graphs and other kinds of visualizations might seem superfluous, if you’re using statistical analysis to look for patterns in a data set, right? Short answer: wrong.

A new research paper presented this week at the human-computer interaction conference ACM CHI shows just how important it is to visualize your data. In it, two Autodesk researchers show how 12 data sets that share the same basic qualities, like mean, standard deviation, and Pearson’s correlation, can look radically different as graphs. The data sets might have a lot in common on paper, but as visualizations they form stars, circles, and other shapes. The point? To show that data visualization isn’t just aesthetic–it’s a crucial part of analysis that can reveal surprising things about your data.

“There’s still the impression that creating graphics or visualizations is really just making pretty pictures and the real stuff you need to do can be done through analysis,” says Autodesk researcher Justin Matejka, who wrote the paper with fellow researcher George Fitzmaurice. “Even if you’re very good at statistics, you might miss something.”

The paper builds on a classic idea in statistics called Anscombe’s Quartet. The “quartet” is a group of four data sets, created by the statistician F.J. Anscombe in 1973, that have the same “summary statistics,” or mean, standard deviation, and Pearson’s correlation. Yet they each produce wildly different graphs. It’s a famed demonstration of just how vital it can be to visualize data rather than relying on statistics alone, and Matejka and Fitzmaurice wanted to update it for data-rich 2017.

“[Anscombe’s Quartet] is 45 years old at this point, so maybe it’s time for a slightly more exciting tool to teach the same lesson,” Matejka says.

They were also inspired by an image from the data viz expert Albert Cairo, who tweeted a visualization of a data set that formed the shape of a T. rex (he called it “the datasaurus”) last year. The numbers in this data set itself looked totally normal–it wasn’t until they were visualized that the dinosaur emerged. No matter how well you think you know your data, visualizing it can reveal something surprising.