
Successfully Using Data From Google Correlate |
Google has an interesting new tool in Google Labs called Google Correlate.
What is Google Correlate?
“Google Correlate is an experimental new tool on Google Labs which enables you to find queries with a similar pattern to a target data series. The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter.”
Here is an example of one such search:
Correlated with medical records
- 0.8836 non profit
- 0.8827 trustee
- 0.8791 requesting
- 0.8772 blood test
- 0.8731 excel add
- 0.8711 join
- 0.8703 a grant
- 0.8657 excel
- 0.8637 disability
- 0.8627 creditor
Depending on your sample size, certain values for r indicate a high confidence in the correlation or what is deemed as statistically significant. What we’re concerned with here is the probability of committing what is called a Type 1 error. Usually the significance is set at 0.05; meaning there is a 5 percent probability your results are due to chance. For example, a Pearson correlation of .63 is statistically significant (p < .05) in a sample size of 10 but when the sample size is 500, a correlation of .09 is significant.
One common misconception about correlation is that of inferring causality when the correlation value is high. Correlation does not mean causality between the variables. There could be a causal relationship there, but correlation alone does not entail causation. This is something very much worth keeping in mind before hastily jumping to conclusions when confronted with values for r with really strong statistical significance. That being said, results from correlations analyses can yield valuable information; everything from the strangely curious, to the wildly humorous, to the surprisingly insightful.
Below are a couple of examples using datasets for state statistics I’ve uploaded to discover search volume correlations.
1) Dataset for number of Classic Theaters and Drive-Ins by state correlated with “TV scripts” with a value for r of 0.8461:
“Movie script” has a correlation of 0.9552 with “TV scripts” lending more to the idea that it makes sense that areas where there are a high number of classic theaters and drive-ins would also contain a higher population of folks interested in TV scripts since it probably has a higher population of folks interested in old movies, nostalgia related to those movies, and scripts in general.
“Lambda Phi Epsilon”, the largest Asian-interest fraternity in North America, also has a high correlation with the same data set with a value for r of 0.8454. That seems to have little real-world connection and may be a reminder of avoiding hasty causal inferences from high correlation values.
2) Dataset for people who lived in a different house one year ago by state correlated with “address forms” with a value for r of 0.7531:
“How to use dry ice” also has a high correlation with the same data set with a value for r of 0.7740, but probably little real-world connection there.
3) Dataset for divorce rate by state correlated with “find old friends” with a value for r of 0.7991: “Ninja art” also has a high correlation with the same data set with a value for r of 0.7915.
4) Dataset for divorce rate by state correlated with “Net Detective” with a value for r of 0.7863. Net Detective is a popular tool used for, among other things, online background checks and criminal records: So, there is definitely useful information to be gleaned from Google Correlation, but it also requires a measure of caution and commonsense to interpret things correctly.
Michael Marshall is the chief executive of Internet Marketing Analysts, and you can read his full PubCon biography here.










