The untapped business potential of unstructured data
Adapted from Nick Kervin’s presentation at Innovate Australia. Canberra, 15 March 2023.
Nick is the CEO of Frisk.
Many businesses are sitting on an untapped goldmine of data and insights and don’t realise it. These insights are often floating around in siloed and disparate data sets, in an inherently difficult unstructured format.
The challenge for many businesses is that unstructured data is intrinsically complex. It comes in a variety of formats, there is a high volume of it, and advanced tools are needed to properly extract insights from.
For the Public Sector and often for non-government businesses, this data shows up in a myriad of different types. While that list is extensive, examples include:
- Physical forms that have been filled out and scanned into systems
- PDF attachments to an email including text, images or a combination
- Customer comments on public websites or Chatbot conversations
- Posts and comments on social media
In fact, by 2025 it is expected that the rate of growth of data will be equivalent to a new Google every four days. And if it’s not the case already, at least 80% of that data will be unstructured.
A recent poll by Deloitte found only 18% of businesses had a strategy in place to manage their unstructured data. Yet, executives who say unstructured data is one of the most valuable sources of insights are 24% more likely to have exceeded their business goals[1].
Defining the problem
The Australian Government has a vision to be one of the top three digital governments in the world by 2025[2]. In order to achieve this, it’s imperative that Government agencies must prepare for these unstructured data levels.
In order to find a solution, it’s important to look at the barriers that exist. Frisk sees five key barriers to harnessing the power of unstructured data.
- Legacy Technology: The way that the world processes data has changed, and many legacy systems that were designed to process structured data aren’t fully compatible with unstructured data. They tend to be inflexible, fragile, and unable to scale to meet the demands of new requirements and an increasing number of users. It’s estimated that these legacy systems still account for 31% of technology systems[3] within organisations worldwide.
- Indexing is just one part of a solution: Many organisations, when attempting to interrogate large volumes of data (often pooled into a “data lake”) struggle to achieve desired outcomes – the insights are just too hard to find – leading to the deduction that an indexing platform will solve the issue.
While data indexing is a critical component of unlocking insights, indexing itself does not directly facilitate insight gathering, making decisions or executing actions off the back of that insight.
More powerful technology is required. Technology that moves beyond only indexing and into insight.
For example, health data is full of unstructured information and health-related departments and businesses (such as clinics) are challenged with the many different ways health data can present. For example, pharmaceuticals which can often be referred to as the brand or chemical, such as Panadol versus Paracetamol.
Indexing health data can also save large amounts of time in finding information about a patient presenting at hospital. Imagine the benefits that could be realised from a hospital having immediate access to insights about the patient’s unstructured health data such as printed reports, referrals and records. These insights could be used to improve the quality of care and increase efficiency in the hospital system, a problem all states are struggling with. In some cases, this insight could literally save lives.
- Not everyone can swim in a data lake: Data lakes do allow for the storage of unstructured data, but they can be time-consuming and often devolve into data swamps without regular data governance. Often, they also involve such large volumes of data that data scientists and engineers are typically the only ones that can pull data analysis from them. At Frisk, we love data scientists and engineers, but these resources are often expensive and hard to find, and it often means a delay in getting the right insights into the hands of those that are using them.
- Time-intensiveness of bespoke solutions: Some organisations opt for a one-size-fits-all approach, but often find limitations in the use of it. To gain improved function, they might attempt to leverage Open-Source projects to create a system that delivers to bespoke needs, to complement off-the-shelf systems or written from scratch. This approach is difficult and expensive because many organisations lack the breadth and depth of required skills to take on a task of this scale.
- Bespoke often means ‘not integrated’: Often, a data science team develops a novel tool that performs a useful service for their business, but because it has not been developed holistically as part of a data management platform, it ends up being either a one-trick-pony (that doesn’t integrate with anything or anyone else else) or a many-headed monster with additional features bolted-together, but without an overarching design process.
What is the solution?
Given these hurdles, how can Public Sector agencies undertake a more seamless journey to tap into this potential? How can Australia be in the top three digital governments in the world by 2025?
To get there, Frisk would suggest a team focus on four key areas:
- Seek out expertise in unstructured data, without compromising on semi- and structured data. At 80-90% of all data, unstructured has to be a focus
- Make sure that this expertise goes beyond the indexing of unstructured data and into insight. It is the insight that will inform decision-making and drive effectiveness, not the data search-ability alone.
- Consider, what can you re-use and where do you need to invest? Explore the option of modern digital capability that can integrate with legacy systems.
- Get clear and informative insights into the right hands. Data should be democratised – not something only for scientists, and not something for silos. This is the way to be truly efficient and effective.
Unstructured data is here to stay. Those that learn to harness its power will set themselves up for success now, into 2025 and beyond. What are you waiting for?
To find out more about the Frisk platform and its benefits, explore our products.
Sources:
[1] Deloitte. (2021). The insight-driven organization. Retrieved from https://www2.deloitte.com/us/en/insights/topics/analytics/insight-driven-organization.html
[2] Department of Transport and Regional Services. (2006). Digital Government Strategy. Retrieved from https://www.dta.gov.au/digital-government-strategy
[3] Agile Solutions UK Ltd. (2017). The Trouble with Big Data: Unstructured Data and Legacy Systems. Retrieved from https://www.agilesolutions.co.uk/the-trouble-with-big-data-unstructured-data-and-legacy-systems/