Data is the new oil
Data is the new oil, or so it is stated almost everywhere. What that means is open to interpretation, it could mean anything from unlocking data is the key to driving revenue or it could mean that data is buried beneath many layers and difficult to extract.
I enjoy working with, analysing and presenting data. I see that as being a point of difference in the approaches I’ve taken when leading projects. This is the role of an architect boiled down to its essence. Understand and define the goal, observe the current situation and use a lifetime of experience and expertise to plot a course, and after you’ve managed to agree that yourself, use the data, tools and analysis to bring everyone on a journey to understand the what, how and why. That could mean anything from driving a whiteboard, building a BI dashboard or plotting a wardley map.
The projects that live long in the memory are those that have relied on a teams skill at deciphering complicated infrastructure, applications and business processes or delivery of something remarkable toward a common good. Being able to pull together the data from source, analyse it and make decisions based upon that analysis is something that sets apart projects, products and organisations that succeed from those that do not.
Data is the new oil, is not particularly helpful as a marketing phrases go. However, it is a subtle way of framing the issue that many organisations face. Data might well be the new oil, but if there is no plan for it when it is surfaced than to push the analogy further all that is accomplished is the creation of a hazardous flammable mess.
Data, Information, Knowledge, Wisdom
I’m much more fond of the maxim Data, Information, Knowledge and Wisdom also referred to as the DIKW Pyramid.
The DIKW pyramid refers to data models that represent the relationships between data, information, knowledge, and wisdom. When defining information this is often referred to in the context of data, knowledge in term of information and wisdom in terms of knowledge. As a model it builds that an understanding of a subject, topic, process etc. is built by passing through the stages of DIKW.
I have access to the biggest library in history all the data imaginable does that make me an oracle or a visitor in the library? I have read everything available on a subject does that make me an expert or an enthusiast? I’ve been tested and recall every aspect of a topic from memory, am I wise or knowledgeable? If not wise, where does wisdom come from?
A somewhat often quoted phrase here in the UK is “knowledge is knowing that a tomato is a fruit. wisdom is knowing not to put it in a fruit salad”. Therefore does wisdom come from asking why the data might look a certain way and through the practical application of knowledge?
Data, Information and Knowledge are by themselves of limited value. As remarked earlier in this post all the data in the world might be available but if you don’t know how to interpret it or actions to launch then what is being accomplished?
So is wisdom demonstrated through the interpretation of data and the application of knowledge?
Data, Information, Knowledge, Workflow
If it holds true that wisdom is derived from the practical application of knowledge, then the goal of drilling for data should be to link that data, information and knowledge into workflows acting upon it. That could mean anything from being aware that if a particular event happens within a system then that system needs to react in a certain way. Capturing that event, piece of data or information and ensuring it triggers a workflow that carries out the required action should be the goal that makes the statement ‘data is the new oil’ true.
Therefore if an organisation has not created a mechanism by which workflows can be created acting on inputs from data, then it will be at a disadvantage.
Working through a scenario
I’ve built a couple of projects that have focused on policing public data, in particular how to build visualisations with the data to help with interpretation and analysis. The first using Metabase to enable interactive analysis of public policing data and more recently a python project focused on the Police Data Stop and Search API. Whilst building these projects I’ve got to know the data sets in some detail.
Data Hypothesis, using the Stop and Search API project, I predict that regardless of the Police Force and month searched the top reason for any stop and search will be for an ‘Object of Search’ defined as ‘Controlled Drugs’.
Each heading links to the query string, if anybody wants to look at the data themselves.
Let’s start with the Metropolitan Police, August 2022
The large blue section is ‘Controlled Drugs’
Compared to a rural location North Yorkshire Police, August 2022
The large red section is ‘Controlled Drugs’
Lets pick an area with a mix of rural and towns Bedfordshire Police, August 2022
The large red section is again ‘Controlled Drugs’
Rather than go force by force, I can download the entire stop and search data set for August, 2022 from the Police Data website.
This data shows us that in August 2022 across all forces and available data sets:
- 45,802 – Stop and Searches conducted of either people, vehicles or both people and vehicles
- 27,293 – using legislation ‘Misuse of Drugs Act 1971 (section 23)’ (59.59%)
- 26,273 – Object of Search logged as ‘controlled drugs’ (57.36%)
The available data shows that 59.59% of every stop and search from a the sample month of August 2022 was conducted based on the ‘Misuse of Drugs Act 1971 (section 23)’ and of all the conducted stops 57.36% logged an object of search as ‘Controlled Drugs’
Stop the analysis at that point and it paints a picture of a drug problem in the nation, however we also have outcome data.
- 18,407 – Outcome listed as ‘A no further action disposal’ (70.06%)
This highlights that of all of the stops listed with an object of search as ‘controlled drugs’ 70.06% of those resulted in ‘No further action disposal’. Does that mean that 18,407 stops were not required? That conclusion would be an over reach based on the available data. We can continue to analyse the data based on different available data points
- What is the age range profile for the stops being conducted under the legislation of ‘Misuse of Drugs Act 1971 (section 23)’
The data shows that of the conducted stops 63.47% were conducted on individuals that logged their age as under 34. If we remember the datapoint from above that 70.06% of stops under this legislation resulted in ‘no further action’, can we dig further into that data and see if that trend is continued for every age range?
The outcome trend is not continued with the age group 18-24, the data highlights that 35.82% of the 18-24 age group when stopped under the legislation of ‘Misuse of Drugs Act 1971 (section 23)’ are processed to an outcome ranging from arrest, to community resolution or summons.
Working through this dataset I’ve tried to demonstrate how interpretation of the data can lead to better understanding. Or Data, Information, Knowledge, Wisdom. Creating a better understanding that could be applied to practical application and eventually linked to workflows that acting on the inputs, testing interpretations and hopefully making a positive impact. With this dataset that might mean looking at elements that may impact the 18-24 age range from drugs education, the impact of targeted sports or youth groups, employment rates, and perhaps even overall life satisfaction.
Hopefully this is useful for someone!