A box plot can provide a valuable way to visualize distribution of data. It lets you summarize important details of a set of data at a glance. Unfortunately, Google Sheets does not have a feature to display box and whisker plots.
In this tutorial, however, we will show you how to get around this limitation and create a box and whisker plot in Google Sheets.
What is a Box Plot Used For?
A box plot is often used in exploratory data analysis (EDA). It provides a visual summary of five important descriptors of a dataset:
- The highest value
- The lowest value
- The median value
- The first quartile value
- The third quartile value
At just one glance of a box and whisker plot, one can tell if the data is symmetrical or skewed, or how tightly it is grouped together.
A box and whisker plot also lets you compare data from different related data sources, so you can make effective decisions.
For example, you can use them to:
- Compare scores from different classes
- Compare how data changes before and after a process
And much more.
How to Make a Box and Whisker Plot in Google Sheets
Let’s say you have the following list of numbers and want to display a box and whisker plot to describe the distribution.
Computing the Five-Number Summary for the Google Sheets Box and Whisker Plot
A box and whisker plot always requires the following five summary descriptors, which should always appear in the same order:
- Minimum Value
- Lower Quartile
- Upper Quartile
- Maximum Value
So we need to first enter formulas to find each of the above values for the given data. Let us calculate these values one by one.
Finding the Minimum Value
This is the smallest value in the list of numeric data. To find the minimum value, we use the MIN function. Enter the following formula in the first column, under the column header, ‘Minimum Value’ (in cell B2):
Finding the Lower Quartile
This is the value at the center of the first quarter of the numerical data. To find the lower quartile, we use the QUARTILE function, and specify the second parameter of this function as 1 (since we want to extract the first quartile of the given data).
Enter the following formula in the second column, under column header, ‘Lower Quartile’ (in cell C2):
Finding the Median
This value is not necessary for creating the box plot, but it is an important component of the five-number summary, and as such, should be computed.
The median is the value at the center of the numerical data. To find the median, we use the MEDIAN function.
Enter the following formula in the third column, under column header, ‘Median’ (in cell D2):
Finding the Upper Quartile
This is the value at the center of the last quarter of the numerical data. To find the upper quartile, we use the QUARTILE function, and specify the second parameter of this function as 3 (since we want to extract the third quartile of the given data). Enter the following formula in the fourth column, under column header, ‘Upper Quartile’ (in cell E2):
Finding the Maximum Value
This is the largest value in the list of numeric data. To find the maximum value, we use the MAX function.
Enter the following formula in the fifth column, under the column header, ‘Maximum Value’ (in cell F2):
Displaying the Box and Whisker Plot by Using the Five-Number Summary
Now that we have our five-number summary in places, we can begin creating our box and whisker plot.
Here are the steps:
- Insert a new column to the left of column B (by right clicking on the column header and selecting ‘Insert 1 left’).
- In the cells exactly to the left of the first number summary value, enter the text that you would like to display as your box plot’s title. In our example, we simply entered the text “Data” in cell B2.
- Select the cells containing the 5 summary descriptors, along with the cell containing the text, “Data”. In our example, we select the cells B2:G2.
- From the Insert menu, select the Chart option.
- This opens the Chart editor sidebar to the right of the browser window.
- In the Chart editor, click on the dropdown under ‘Chart type’.
- Under the ‘Other’ category, select the Candlestick chart option.
- You should now see a box and whisker plot that represents your data.
Note: Always remember to specify the 5 number summary in the same order, otherwise your box plot might display erroneous data.
Interpreting the Box and Whisker Plot
Before closing, let’s take a moment to quickly understand the plot that we created.
- The top line (the top of the whisker) marks the maximum value in the data.
- The length of the top line (the top whisker) signifies how far the highest value is from the median. This can help us identify the presence of potential outliers, or a skewed dataset.
- The top of the box marks the third quartile value of the data.
- The bottom of the box marks the first quartile value of the data.
- The length of the bottom line (the bottom whisker) signifies how far the lowest value is from the median. This can again help us identify the presence of potential outliers, or a skewed dataset.
- The bottom line (the bottom of the whisker) marks the minimum value in the data.
You can also choose to customize the box and whisker plot according to your liking. For example, you can change the color of the box plot or give it a three dimensional feel.
To customize the chart, double-click on the chart. This will display the Chart editor. Select the ‘Customize’ tab from the Chart editor and make your required changes by unraveling the required section and making the changes you need.
In this tutorial, we discussed why box and whisker plots are useful and how they can be created in Google Sheets. We hope you will give this method a shot and use it often, especially for your exploratory data analysis tasks.