Collect Google Analytics data in real-time with Microsoft Azure
Written by Chris Sutcliffe
In the first in a series of articles, we will explore how to collect your Google Analytics events, close to real-time, in a relatively simple and low cost way, leveraging Google’s existing free services and Microsoft Azure’s pay-as-you-go cloud services and start to gain valuable ownership over your data.
Google Analytics is extremely popular and free to use, however there are some limitations, especially when your website or eCommerce site is receiving a large number of visitors daily. One major drawback is data sampling, but another is being limited to standard reports and not being able to easily combine with other data across your organization.
One option is to upgrade to the paid version of Google Analytics (GA360) and to export to Google Big Query, however, most organizations cannot justify the cost. Another option is to develop custom web tracking in your own website code and data pipelines which could incur significant development costs and time.
An alternative, which we discuss in this article, is to duplicate events that are sent to Google Analytics using Google Tag Manager (GTM) and send these to Azure Event Hub, giving you flexibility to capture your raw traffic data and also build further data pipelines for close to real-time analysis/visualizations or integrate into your existing data platform/data warehouse.
To enable this, there are just a few components required to get started:
A Google Analytics account
An Microsoft Azure subscription with an Azure Event Hub
In the above architecture, GTM effectively duplicates events which are ordinarily sent to Google Analytics, to another endpoint of your choosing, in our case Azure Event Hubs.
The great thing about Event Hubs is the flexibility. Multiple consumers can be set up to perform different functions in batch or real time.
For example, we can further configure Azure Event Hubs to enable Event Hub Capture to automatically capture the streaming data in Event Hubs into an Azure Blob storage or Azure Data Lake Storage Gen 2 account which allows you to start building a history of your web events right away.
In another scenario, an Azure Stream Analytics job can be configured to consume, transform and push the data to a variety of outputs, such as a Power BI streaming dataset, an Azure SQL Database or Azure Cosmos DB. A full list of Stream Analytics outputs can be found here.
A more advanced approach would be using Azure Databricks and structured streaming to stream data directly from Event Hubs or to simply use Databricks to develop pipelines to transform and analyize or develop machine learning models.
As you can see, with very few components and at a relatively low cost, you can quickly start capturing your raw web traffic events in real-time to process them however you choose, depending on your organizations goal.
In our next article we’ll take a deep dive into setting up the Azure Event Hub and Google Tag Manager to start capturing data.