Visual Cheatsheet for Plot Types Based on Data Column Types
So many different chart types for data visualization! When do you choose which one? This quick cheatsheet is for my students and for the world as well. Just look at the columns for which you want to create a visualization in your data table, and find the corresponding entry in this cheatsheet.
If you only have one categorical column and one numerical column to display, then bar chart is the simplest choice. Pie chart is also possible (but you may want to avoid it).
If you have two categorical columns and a numerical column, then you can use grouped bar chart. If summing over the second column makes sense, then you might consider stacked bar chart. Or maybe not. If in addition summing over the first categorical column also makes sense, then you can consider Marimekko chart, which can be thought of as a stacked bar chart whose bars are stretched or squeezed vertically to have the same height and accordingly each becoming thinner or fatter so as to keep its area unchanged.
If you have many categorical columns each representing one level in a hierarchical structure (e.g. country/state/city/district or enterprise/division/unit/department) then treemap maybe your best choice, although interactive icicle or sunburst may sometimes be more helpful. BTW, bar chart : pie chart ~ icicle : sunburst.
My personal favorite is scatter plot / bubble chart, which can show three numerical columns along with two categorical columns at once. One cat. column has entries corresponding to each bubble/point and the other cat. column may be coded using colors. It might even help you discover a pattern between the x and y columns.
If you have two numerical columns one of which is chronological (ie. time sequence) then line chart is a good candidate. A line chart can accommodate an additional categorical column by drawing several lines of different colors in one chart.
Left out are (1) statistical summaries such as histogram and violin chart, in which huge amount of data rows are collapsed into a single value in the chart, and (2) maps, in which two numerical columns correspond to latitude and longitude, respectively. It is obvious to see when you need them.
The SVG version of this cheatsheet is also available. University of Edinburgh has a nice DataVis course with lots of lecture notes and video lectures.