Listening and data analysis: manage the impact of bots on website traffic data

Share on

How to effectively recognize and address the impact of bot traffic on website data analysis tools


A data analyst analyzes a website's traffic data on GA4

In the vast and complex ecosystem of the Web, one factor that can negatively affect a website or application’s data analysis is so-called “bot traffic,” a term that often arouses concern and suspicion.

But what exactly does “bot traffic” mean?

In simple terms, we refer to any type of non-human traffic that interacts with a website or application. This traffic can come from a wide range of sources, from search engines and social media spiders, to crawlers used for indexing online content.

It is important to underline that the concept of bot is not necessarily negative but mainly depends on the purpose for which it is used. Some bots are designed to facilitate search engine indexing and improve a website’s online visibility while others can be used for fraudulent activities.

Bots used for content scraping, for example, can collect information from websites without permission, undermining intellectual property and violating users’ privacy. In addition, click fraud bots can artificially inflate clicks on online advertisements, damaging ad campaigns and misleading marketers about the true performance of their strategies.

In general, the distinction between “good” and “bad” depends on the intent and effect of their behavior on the website or application involved.

The impact of bot traffic on Google Analytics data

In any case, bot traffic exerts a significant impact on the data collected through analytics tools, especially for websites that do not receive a high volume of traffic from human users. These sites can be particularly susceptible to the influence of bots, as non-human traffic can skew key metrics, such as number of visits, engagement rate, and session length. In addition, bot traffic can consume server resources, slowing site loading and compromising user experience.

Bot traffic therefore exerts a significant impact on the data collected by Data Analysis tools such as Google Analytics 4. These tools are designed to provide a clear view of user interactions with a website, allowing, through data analysis, decisions to be made to optimize user experience and improve KPI results. Traffic bots can skew these metrics, negatively affecting the accuracy and reliability of the data and making it difficult for website owners to gain an accurate understanding of their site’s performance and make informed decisions to improve user experience and achieve business goals.

Therefore, it is essential to implement preventive measures to recognize and filter bot traffic, ensuring the accuracy and usefulness of data analysis collected by Data Analysis tools.

How to recognize it

To understand whether our website is affected by bot traffic, it is essential to pay attention to several key indicators. Usually bot traffic is noticed by the presence of a traffic spike in the data not corresponding to known advertising campaigns or promotions. To test the hypothesis that it is non-human traffic, it is necessary to analyze the data more thoroughly, especially in relation to the presence of some typical signals such as:

  • sessions without involvement that also result in a sharp decrease in the involvement rate.
  • extremely short duration of the session, close to 0 or in any case much lower than the average length of time of real users.
  • excessive volume of page views, often focused on specific, repetitive pages with the same URL. This behavior may be indicative of bots automatically navigating through the website for indexing purposes or other automated activities.
  • Unusual geographic origins: If traffic comes from regions or countries that are not in the site’s typical target demographic, it could be indicative of bot traffic.
  • Lack of custom interaction events: Bot traffic typically doesn’t interact with your site like a human user would, for example, they don’t fill out forms, click on links, or make purchases.

Identifying and analyzing these patterns is critical to distinguishing automated traffic and ensuring the integrity and reliability of data collected by analytics tools like Google Analytics 4.

How to eliminate bot traffic

To reduce bot traffic, several strategies can be adopted, each aimed at filtering specific types of traffic and dependent on the information available. In the following section, we will examine some case studies in order to outline different strategies.

In general, among digital marketing tools, those most effective in implementing filters to exclude bot traffic are tag management software, such as Google Tag Manager, which allow custom scripts to be implemented to identify and filter bot traffic based on specific criteria, providing greater flexibility in managing non-human traffic.

The following are some approaches that you can take to exclude non-human traffic from the data collected by the Data Analysis tools based on the information you have available.

  • One of the first ways to do this is to filter traffic based on known IP addresses, if available. For example, Cookiebot, a cookie management service, has developed bots that can be identified through IP addresses. Cookiebot has made these addresses available to allow website owners to filter bot traffic using analytics tools. This practice is therefore particularly useful for reducing the impact of known and identifiable bots. Many analytics software, including Google Analytics 4, offer traffic filtering capabilities based on IP address, making it easier to exclude bot traffic but also, for example, internal traffic from the visit count.
  • Another variable that can be used to construct a filter, if known, is the user agent. For example, Cookiebot makes the user agent of its scanner available, allowing website owners to insert an exclusion from Tag Manager when data collection tags are activated, so that they are not activated if the site is loaded from that certain user agent. This approach allows you to identify and block bots with greater precision, offering an additional tool to defend against intrusions.
  • Finally, it is possible to temporarily exclude traffic from bots with foreign IPs by filtering traffic from certain countries that are inconsistent with the normal origin of users on a site. This option is not recommended, as it also partially excludes traffic from real users and is very generic.

If this information is not available, by exploring data from analytics platforms, it is usually possible to gain insight into the sites from which visits to our website come, including bot traffic. If site visits are characterized by a single page view and do not trigger custom tracking events, simply use Tag Manager to create an exclusion so that measurement tags are not triggered if the domain of origin of the traffic is one of the domains identified as spam. By implementing these exclusions, site owners can minimize distortions in the data caused by unwanted bot traffic.

However, there are situations where site visits are not traceable to a specific domain, which makes it more difficult to filter that particular traffic. In some cases, in fact, the origin of the sessions is reported as “(not set)”, causing all traffic to flow into the direct traffic category. In such circumstances, filtering traffic becomes more complex as exclusion usually based on the referring domain is not available except for the first user interaction.

A potential solution in this situation is the creation of first-party cookies that can identify when the spam referrer loads the site and allow persistent identification over time. The cookies thus created can then be used to mark that traffic as bots and subsequently filter it. This strategy requires a more sophisticated approach and more technical knowledge but can be effective in mitigating the impact of bot traffic when other filtering options are not applicable.

By adopting the appropriate strategies and using these combined approaches, you can reduce the impact of bot traffic and preserve the integrity and reliability of the analytics data collected by websites.

Share on
23 May 2024 Emily Salamon

Related articles:

TAG: digital marketing