4 Great Ways to Future Proof Your Data
One of this biggest questions I have gotten from clients lately is how to future proof their data. Whether they are just starting out with a Data and Analytics strategy or they are well into their third or fourth iteration, future proofing data can be extremely difficult, especially when products were built without even considering data or tracking at all. I mention that a key ingredient to future proofing your data is making sure that people actually use data within your organization but here are a few things to consider when trying to make sure your data is used, understood and disseminated within your organization.
1. Store It in the Same Place
This seems like a no-brainer but a lot of companies I work with have a ton of disparate data in different places. One of the main issues is that the way we think of and develop workstreams around things like attribution data tends to be thought of differently and thus stored differently. This makes it really difficult to understand how a system error might affect marketing data, for example. The future of data is a holistic look at customer behaviour, user research, backend data and marketing data.
Generally, product tends to look at customer behaviour and user research and the data team tends to look at backend data and marketing tends to look at marketing data. There are a multitude of problems with this separated approach. For one, marketing tends to not be staffed with data centric folks, and the ones that are tend to understand paid attribution and maybe some user research from a marketing perspective. This limits an organization’s ability to look at the big picture of their data. Further, even if you have a data specialist on your marketing team, they tend to be siloed and have to go through a lot of red tape to access any additional data. In the same vein, backend data tends to be completely buried and only accessible by engineering teams and the odd technical product manager.
If your data is in the same place it means that people within your organization have access to the bigger picture and thus can start asking the right questions. This does not mean shoving everything into a data lake necessarily, but there are a multitude of tools to use the make sure an overview of your data is visible to the right people, so they can find the right solutions to their problems, no matter what arms of the business they are in. Ultimately, if we are not looking at all of the data when asking questions like “why did a user drop off on this stage of the flow” we are not accounting for all of the possibilities. If the data is not all in the same place and relatable, teams are separately driving toward answers often doubling work and reaching dead ends when they try to ask more of the data.
2. Make it Relatable
Now that your data is in the same place whether it be in Redshift or similar, it is essential that the data is relatable. What I mean by this is that you should be able to relate a unique user in one data set to other data sets. This might seems straightforward, but tends to be one of the most complex aspects to data engineering and data governance. With separate user data in separate places, it can be difficult to correlate user actions to the same person. This is where user sign ups and login are essential because then you have a specific ID unique to your organization that you can tie to what you consider a unique user. There is always going to be some discrepancy with different data sets, especially because they all tend to have their own unique user_id and understand unique users and sessions differently, but having a user_id distinct to your business is a great way to latter up all of the disparate data you have to the same person.
Making sure that your data is in the same place and relatable also means that you are comparing apples to apples rather than going through a series of convoluted joins or worse, a python script with countless csv imports and transformations. One way to make sure your data is relatable is through making identity management a separate table in your data architecture. This should be solely meant to tie different unique IDs to the same user with your user_id as the main identifier. This exercise is also help in aligning the business around what you consider a unique user and what data sets do or do not fit into that and why. Once you have established unique users, you can dedupe and tie to other kinds of data so that your business is not only on the same page, but more able to align around common goals.
3. Choose a Great Analytics Tool
At this point, if you have all of your data stored in the same place and a way that is easily relatable you have the equivalent of a neatly organized basement. Data is ultimately not effective if it is not used by the business. Making sure that your data is used and used often is paramount to future proofing your data. This means choosing an analytics tool that is easy to use even by the lesser data savvy people on your team and it is easily shareable. Having an analytics tool that lets you write description underneath the charts and dashboard you make is also helpful when you might be making a ton of charts and you need to keep track of the assumptions and tradeoffs you are making.
As a note: whenever you are making a chart you are always making assumptions and there are always tradeoffs. Think of it this way, when you take a picture you are choosing what is in frame but it is only a representation of the full picture. You tend to focus on the gist of it, the salient aspects, but there are always going to be other things. Making a chart or dashboard is the same, where you are taking the data and making a picture of the salient aspects.
Part of future proofing your data strategy is making sure this is well understood internally and that anyone sharing charts can articulate the assumptions and trade off they have made. These tend to be small, like a specific data range or a current understanding of session but they should be generally understood.
4. Document Document Document
I know it seems very obvious but documenting this stuff is important if not for people coming into the business but for your future self. Documentation tends to be left behind because it is extra work but I promise you when you are sitting in a meeting and someone asks you about a chart you shared 6 month ago you will be glad you wrote it down. Beyond having a living and breathing data dictionary where you can quickly identify what each event represents as well as associated event and user properties, you should also record your thought process when analyzing the data. As new information comes up or new context is set for you, it makes it easier to pivot and recording changing assumptions and trade offs.
Further, from an engineering perspective there are always a ton of changes and implications on the data that should be documented. For example, if you are moving data from one place to another, it will create a break in the way you understand your data and you should absolutely record this change. This also goes back to my point of thinking of the data holistically because engineering might make a change that will indirectly affect marketing but if the teams are completely siloed one team might be wondering why something changed and spend days or weeks trying to figure it out when it actually had to with an engineering change. Again, I recognize that documentation tends to be people’s least favourite task but it saves a lot of trouble for your future self and your future business.