The First Industry-Wide Google Analytics Data Quality Study

A common statement from many a practitioner is: “GIGO = garbage in, results in garbage out“. That has always been the case when working with data, but it becomes especially important when collected data then goes onto to be the input for further modelling. That is, the more advanced techniques of data modelling and data science that rely on good clean data as their input.

Can you tell if your data is good…?

This study follows numerous years of applying rigorous and consistent auditing techniques assessing Google Analytics data for some of the largest and best known brands in the world. The collated results is the first study to compare Google Analytics data quality industry wide.

This video story brings together some of the extraordinary findings of my work. Its a study of 75 enterprise websites using Google Analytics. I describe the audit methodology and display results in a visual scorecard format (a summary of nearly 200 unit tests in total).

Results Summary

The findings are somewhat surprising (and depressing) in that they show the very poor quality of data that organisations are working with. Those organisations were generally investing heavily in their data analysis, yet evidently they have taken their eye off the data quality ball. For example:

  • Average Quality Index score of only 35.7 out of 100.
  • One in five websites have a PII issue i.e. were collecting personal information into GA.

Finding bad data just got easier…

But it’s not all doom and gloom. The good news is that most data quality issues are relatively straightforward to fix – the hard part is finding such issues in the first place! This is because poor data is usually buried deep and looks identical to the other regular data points. However, finding and visualising bad data just got a whole lot easier for you with the automation provided by Verified Data ūüôā

This video is a 20 minute walkthrough of my study in slide format. The results are a culmination of many years of meticulous work. that ultimately inspired the automation of the whole process i.e. Verified Data.

Find out if you can trust your data.

Get started with the Free trial of Verified Data. See what’s included.

Summary Slides

1. Who this study audited

The graphs show the type of website (its vertical) and the geolocation of target audience. Note all audited websites are enterprise organisations – mostly very well known brand leaders that requested help with their Google Analytics.

graph of audited websites
Graph showing audited websites

2. Overall Results Distribution

This slide reveals the result distribution of audited Quality Index scores. The maximum value = 100 and this represents an ultimate best practice implementation and setup of Google Analytics. That should always be your ambition. However, in reality the aim is to consistently maintain a Quality Score above 80.

The slide reveals:

  • A lowly average¬†Quality Index score of 35.7
  • Only 12% of sites score above 50 – a score I insist on exceeding before¬†analysing data
  • Only a single website scored above 70
Quality Score distrubution
Graph showing average data quality Quality Score

3. Quality Index Breakdown

Although all areas have problems, visitor segmentation is the most poorly understood/implemented feature of Google Analytics – only 7% of website get segmentation right. Segmentation is a key requirement to be able to perform any kind of in-depth analysis of data. By default GA has some great default segmentation tools. However, these are at the session level – they do not tell you about your users i.e. real people. Read my definition of what is tested with respect to visitor segmentation.

Overall results data quality

Author: Brian Clifton (PhD). If you found this article useful, please share it: