Correlation is quite a frequently encountered term when performing data analysis. If you have a basic idea of how two variables are correlated, you can decide whether you want to further explore and analyze the relationship between the two variables.
In this tutorial, we will discuss correlation and how to find correlation in Google Sheets using the CORREL function.
What is Correlation?
Correlation is a measure of the relationship between two variables, telling us how much one variable changes in relation to another. It’s important to note here, that correlation does not necessarily indicate a dependency. It only indicates how two variables move together and how strongly they are related.
For example, we might have two variables – weekly salary and job satisfaction, and we might want to know if job satisfaction is, in fact, related to a higher weekly salary. We might also want to know if a higher weekly salary might coincide with higher job satisfaction.
In such cases, a correlation coefficient (also known as the Pearson product-moment correlation coefficient) can be a good indicator.
What Does the Correlation Coefficient Tell About Your Data?
The Pearson product-moment correlation coefficient, r gives information about the strength of a linear relationship between two variables. The value ranges between -1 to 1, where a higher absolute value indicates a stronger correlation.
Different values of this measure indicate different types of relationships. For example:
- A positive correlation coefficient value means that the two variables are positively correlated. In other words, a positive change in one value is accompanied by a positive change in the other, as shown in the scatter chart below:
- A negative correlation coefficient value means that the two variables are negatively correlated. In other words, a positive change in one value is accompanied by a negative change in the other, as shown in the scatter chart below:
- A correlation coefficient value of 0 means that the two variables are not correlated at all. In other words, the variables behave completely independently of each other.
In Google Sheets, we calculate the correlation coefficient value of a given dataset using the CORREL function.
Let us see how to find the r value in Google Sheets.
How to Do a Correlation in Google Sheets
The CORREL function in Google Sheets takes two independent variables (arrays) and returns the value of correlation coefficient between the two variables.
Syntax for the CORREL function is as follows:
- data_y is the array or range of cells corresponding to the dependent variable
- data_x is the array or range of cells corresponding to the independent variable
Note that text or blank values in any of the two sets are ignored by the CORREL function.
How to Use CORREL in Google Sheets (Examples)
Let us see a few examples to understand how the CORREL function is used in Google Sheets.
Consider the following two variables:
- Weekly Salary
- Job satisfaction
Let’s say you have the following set of data points for these two variables:
To find the correlation between these two variables, all that you need to do is use the CORREL function as follows:
Enter the above formula in a blank cell (say D2).
You should now see the Google Sheets correlation coefficient, which is 0.83 for this dataset. Since the value is highly positive, it means that a higher weekly salary is accompanied by higher job satisfaction. However, this does not necessarily mean that a higher weekly salary is the only cause for job satisfaction, but it does indicate that the two variables might be related.
If we plot a scatter chart of the above dataset, we can clearly see a positive correlation between the two values (as shown by the line of best fit moving upwards).
Now let us consider another example. In the next dataset, we used a RANDBETWEEN function to randomly compute values of Column B:
When we use the following CORREL formula, the result we get is as expected:
Here the result we get is a less significant correlation value of -0.05, which makes sense, because we had just randomly generated the values of column B, so there’s actually no real relationship between the variables x and y.
Similarly, if we plot a scatter chart of the above dataset, we can clearly see that there is no correlation between the two values and it is not possible to find a line of best fit that actually works.
Why does my formula return a #N/A error?
The CORREL function returns an #N/A error if the ranges provided in the two parameters have an unequal number of data points. So, if you’re getting an #N/A error, try to make sure that both data lists are of the same length.
What happens if either of my datasets has missing values?
The CORREL function usually ignores empty cells (if any) in the list. As long as the total number of data points in the two lists is the same, the function should work just fine.
How do you do a correlation in Google Sheets?
The CORREL function gives a good estimate of the correlation between two variables. To do a correlation, simply enter the array of data points for each variable as parameters of the CORREL function.
If the returned result is a highly positive number, then there is a strong positive correlation between the two variables. If it is highly negative, then there is a strong negative correlation between the two variables. If the answer is 0 or close to it (positive or negative) then the two variables have little to no correlation.
How do you find the correlation coefficient in Google Sheets?
You can use the CORREL function to find the correlation coefficient in Google sheets.
What is a good correlation coefficient?
A good correlation coefficient value is one that is highly positive or highly negative. This indicates that the variables are strongly correlated.
How do you find residuals in Google Sheets?
Residuals are simply the differences between the actual and predicted values of a dependent variable. Use the Google Sheets SLOPE and INTERCEPT functions to find the line of best fit. Let’s say you find the slope of the line to be m and intercept to be c. You can then find the predicted value of the dependent variable, y for each independent variable, x using the formula y=mx+c. The residuals corresponding to each independent variable’s data point will then be predicted y – actual y.
In this tutorial, we showed you using 2 examples of how to find the correlation coefficient in Google sheets. For this, we used the CORREL function in Google Sheets. The function is really helpful in doing a quick analysis to see if two variables may be related or not and if they are, then how strongly they are related.
We hope we succeeded in giving you a basic idea of correlation in Google Sheets and how to calculate the correlation coefficient in Google Sheets with the CORREL function.