Approaching a dataset with visualization
I’d like to give my students some simple guidelines for how to use data visualization to look at a new dataset. What to do first, second, and so on. Here’s what I’m going to suggest.
Examine individual variables
First, take one variable at a time. Which are the most important ones, considering the audience and the purpose of your work? What are the mean, median, and mode? Accordingly, your first visualizations may be histograms or box-and-whisker plots, maybe Pareto diagrams. These go beyond the statistics by showing us the overall “shape” of the distributions, revealing things like Normal distributions, skewness, and fat or thin tails.
Compare subsets of the data on single variables
Once you have a sense of how the data is distributed overall, you can begin slicing and dicing it by some categorical dimension(s). This can be as simple as a bar chart comparing a single...