Problem Solving with Null or “Other” Data

You have likely looked into your analytics platform and found that some of the events are passing as null or “other” and wondered where to start with problem solving. Trying to figure out where to start can be overwhelming. Is it that the data is not properly defined or is it that the data is missing? Do I need to pull in engineering or can I solve this problem on my own? In my experience as a data advisor, most companies are dealing with both missing data and data logic. Oftentimes, companies don’t tend to fully understand their data logic and having an “other” event can sometimes mean that it simply does not fit the logic of what they are querying. In these cases, we expect the data to be bucketed in “other”. Since I’ve had clients asking for a strategy to approach these instances, I put together a quick guide to help companies and analysts tackle these problems.

Photo by Campaign Creators on Unsplash

Is Data Missing?

Whenever you see “other” or null in your analytics platform, one of the first things that comes to mind is whether or not the data is missing and the event is not firing correctly. Before going into the backend or frantically pinging engineers, first discern whether or not data is missing or if it is an issue with the data logic.

Is it the Data Logic?

Sometimes event properties are “other” because we have not defined them. In these cases, the code logic is searching for tags that don’t exist on pieces of data and thus the third party tracking tool will automatically bucket them as other. For example, with UTM data, direct traffic won’t be recorded as such if it is not explicitly set up. When a user hits the site and there is no marketing link to click, that means there are no parameters in the URL to explicitly say that that traffic is direct. In this case null or “other” data is expected. Of course there are other filters you can use to discern this, but if you are just filtering by UTM Source or UTM Medium and wondering why there is an “other” or null, it is usually because links that don’t contain UTM parameters tend to be bucketed into “Other”.

Has the user enabled Do Not Track?

Do Not Track on a browser is a big gotcha and I have even seen developers get into this trap. Many browsers offer the option to not be tracked which when turned on, prevents tracking code from accurately picking up certain events and event properties. As a result, you will see nulls when it comes to location, IP address or anything that is typically considered a User Property. Importantly, users with Do Not Track on aren’t completely invisible. In this case we can see that an event has taken place but not by whom or any info about the user tied that event. As a result, the event will still fire and it will be included in the overall numbers, but when digging deeper into the data, user properties will be missing. Do Not Track features have historically been sparsely used so if you notice your number of nulls is abnormally high, it is worth investigating whether or not data is misfiring.

How do I dig into missing data?

Look at the data from different angles when investigating where the data might be leaking. For example, if I have explicitly set up the event to record OS and there are some that are null, I try to see if other related data is present such as device or device family. This can be made clearer by utilizing your third party analytics tools to filter the data where OS = null and then break the data down by device. If there is device data coming through, it might be that the OS is missing from certain devices or users have intentionally blocked that data. If no data is coming through, then you may have your culprit and a place to direct your investigation.

When problem solving with data it is important to frame the initial step as an investigation. Oftentimes when sifting through data and its subsequent data logic, you will uncover a ton of insights and issues and it can be easy to get distracted. Thus, whenever you uncover that data might be missing, create an investigation task so that you can stay on task and have a place to capture insights specifically related to that issue. As you uncover new issues, I always suggest recording them with the intent of coming back to them later. You would be surprised to see how often when you fix a data leak, other leaks will also be fixed.

Is the tracking code present?

Another way to narrow down your investigation is to filter for URL or Page Name. This will allow you to hone in on specific pages to understand whether or not tracking code has even been implemented on that page. Since products and features tend to be built with tracking code as an afterthought, your data leak could be because a new page or feature was built without any tracking code. For website based analytics, an effective way to track this is by utilizing a browser extension. Most big third party analytics tools have them, and they can quickly and easily shine light on whether or not the tracking code is present and further, whether or not events are even firing.

Tracking the Fixes

Paramount to fixing the data leaks is also tracking when those fixes took place. This can typically be achieved in a pretty straight forward way in your third party analytics platform by adding notes to timelines or you can choose to record elsewhere — the important thing is that you record major changes. Since you will have to wait for more data to come into truly understand whether or not the fix took, it is always good to track dates of action so that you can properly understand and adapt.

One of the most integral ideas of data is that it flows and is dependent on users and user behaviour to exist. Since platforms typically have hundreds of thousands or even millions of users, it becomes increasingly difficult to manage your data and to make sense of it. Think of your Growth Specialist or Data Advisor as both the navigator and the aqueduct: we are here to help you build and fortify your data pipelines but also to help you navigate it. Ultimately, other and null data is a surefire way to confuse or undermine the data you are presenting to stakeholders. This is why null or other data needs to be understood or fixed quickly. Hopefully this has helped you steer your investigation and as always feel free to ad questions in the comments.

Data and Analytics Strategist and Consultant. Product Manager at TWG