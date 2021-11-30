



Google Analytics can provide a lot of insights about traffic and users visiting your website. Much of this data is available in the appropriate format in the web console, but if you want to create your own diagrams or visualizations, do you want to process the data further or usually programmatically? That’s where the Google Analytics API comes in. This article describes how to use it to query and process real-time analytical data in Python.

Before you start using a particular Google API, it might be a good idea to try some of them first. You can use the Google API Explorer to find out which API is most useful. It also helps you decide which APIs to enable in the Google Cloud console.

I was interested in real-time analytic data, so I’ll start with the real-time reporting API. The API Explorer is available here. Check the report’s landing page to find other interesting APIs. From there, you can go to other APIs and their explorers.

For this particular API to work, you need to provide at least two value IDs and metrics. The first of them is the so-called table ID, which is the ID of the analysis profile.To find it, go to the analytics dashboard and go to the bottom left[管理]Click to[表示設定]Choose.[表示ID]The ID is displayed in the field. For this API, you need to provide an ID formatted as ga. ..

The other value required is a metric. You can choose one from the metric columns here. For the real-time API, you need either rt: activeUsers or rt: pageviews.

After setting these values,[実行]You can click to inspect the data. If the data looks good and you decide that this is the API you need, enable the data and set up your project.

To be able to access the API, you first need to create a project on Google Cloud. To do this, go to Cloud Resource Manager and[プロジェクトの作成]Click. Alternatively, it can be done via the CLI. Create $ PROJECT_ID in your gcloud project. After a few seconds, the new project will appear in the list.

Next, you need to enable the API for this project. All available APIs are in the API library. If you are interested in the Google Analytics Reporting API, you can find it here.

Now you are ready to use the API, but you need your credentials to access the API. There are several different types of credentials based on the type of application. Most of them are suitable for applications that require user consent, such as client-side apps and Android / iOS apps. The one for our use case (querying data and processing locally) is the use of service accounts.

To create a service account[資格情報]Go to the page and[資格情報の作成]Click to[サービスアカウント]Choose. Give it a name and make a note of the service account ID (second field). You will need it soon.[作成して続行]Click (no need to grant access or permissions to the service account).

next,[サービスアカウント]On the page, select the newly created service account and[キー]Go to the tab.[キーの追加]When[新しいキーの作成]Click. Select the JSON format and download. Keep it safe as it can be used to access your project with your Google Cloud account.

You now have a project with the API enabled and a service account with credentials to access programmatically. However, this service account does not have access to the Google Analytics view, so you cannot query the data. To fix this, you need to add the service account ID mentioned above ([email protected]) as a Google Analytics user with read and analytics access. Here is a guide to adding users.

Finally, you need to install the Python client library to use the API. You need two for authentication and one for the actual Google API.

Once that’s all done, let’s write the first query.

First, authenticate the API using the JSON credential of the service account (previously downloaded) and limit the scope of the credential to read-only analytics APIs only. Then build the service used to query the API. The build function gets the name of the API, its version, and the previously created credential object. If you want to access another API, see this list for available names and versions.

Finally, you can query APIs that have dimensions set with IDs, metrics, and options, as you did in API Explorer earlier. You may be wondering where you found the methods for the service object (.data () .realtime () .get (…))-they are all documented here.

When I run the above code, the print (…) looks like this (trimmed for readability):

It works, but given that the result is a dictionary, it’s a good idea to access the individual fields of the result.

The previous example shows how to use the API’s realtime () method, but there are two more you can use. The first of them is ga ():

This method returns historical (non-real-time) data from Google Analytics. There are also arguments that you can use to specify the time range, sampling level, segment, and so on. The API also has additional required fields start_date and end_date.

You may also have noticed that the metrics and dimensions for this method are slightly different, as each API has its own set of metrics and dimensions. These are always prepended to the API name. In this case, it is ga: instead of the previous rt :.

The third available method, .mcf (), is for multi-channel funnel data and is beyond the scope of this article. Check the documentation if you find it useful.

The last thing to mention about basic queries is pagination. Creating a query that returns a large amount of data can run out of query limits and quotas, or cause problems when processing all the data at once. To avoid this, you can use pagination.

In the snippet above, I added start_index = ‘1’ and max_results = ‘2’ to force pagination. This will enter previousLink and nextLink, which can be used to request the previous and next pages, respectively. However, this does not work for real-time analysis using the realtime () method because it lacks the required arguments.

The API itself is very simple. The most customizable parts are the arguments such as metrics and dimensions. Now let’s take a closer look at all the arguments and their possible values ​​to see how to get the most out of this API.

Starting with the metric, there are three most important values ​​to choose from: rt: activeUsers, rt: pageviews, and rt: screenViews.

rt: activeUsers shows the number of users currently browsing the website and their attributes rt: pageviews shows the pages the user is viewing rt: screenViews-same as pageviews, but Android and Only relevant within applications such as iOS

You can use a set of dimensions to categorize your data by metric. Since they are too numerous to list here, let’s instead look at some combinations of metrics and dimensions that can be plugged into the above example to get interesting information about website visitors.

metrics =’rt: activeUsers’, dimensions =’rt: userType’-Distinguish currently active users based on new or returned goods. metrics =’rt: pageviews’, dimensions =’rt: pagePath’-Current pageview path.metrics =’rt: pageviews’, dimensions =’rt: medium, rt: trafficType’ Breakdown-Media (email, etc.) And pageviews with a breakdown by traffic type (such as organic). metrics =’rt: pageviews’, Dimensions =’rt: browser, rt: operatingSystem’-Page views with a breakdown by browser and operating system. metrics =’rt: pageviews’, dimensions =’rt: country, rt: city’-Page views with a breakdown by country and city.

As you can see, there is a lot of data that can be queried and the amount is so large that it may need to be filtered. You can use the filters argument to filter the results. The syntax is very flexible and supports arithmetic and logical operators, as well as regular expression queries. Let’s look at some examples.

rt: medium == ORGANIC-Show only page visits from organic search rt: pageviews> 2-Show only results with more than 2 pageviews rt: country = ~ United. *, ga: country == Canada-Show only visits from the starting country “United” (UK, US) or Canada (, acts as an OR operator when using AND;).

See this page for complete documentation on filters.

Finally, you can also use the sort argument to sort the results to make them a little more readable and easier to process. You can use sort = rt: pagePath, for example, to sort in ascending order, or add-to the beginning of sort = -rt: pageTitle, for example, in descending order.

If you can’t find some data, or if you’re missing some features of the Realtime Analytics API, try another Google Analytics API. One of them could be Reporting API v4, which is a few improvements over the old API.

However, the way you create queries is a bit different, so let’s look at an example to get you started.

As you can see, this API doesn’t provide many arguments that you can enter. Instead, it has a single body argument that receives a request body containing all the previously confirmed values.

If you want to dig deeper into this, you should check out the sample documentation that gives a complete overview of its features.

This article only shows how to use the analytics API, but since all the APIs in the client library use the same general design, it provides a general idea of ​​how to use all the Google APIs in Python. need to do it. In addition, the authentication shown above can be applied to any API and only the scope needs to be changed.

This article used the google-api-python-client library, but Google also provides a lightweight library for individual services and APIs at https://github.com/googleapis/google-cloud-python. At the time of this writing, the particular library for analysis is still in beta and lacks documentation, but when it comes to GA (or more stable), you probably need to consider exploring it. there is.

