Bespoke Identity Index presentation to COBA (Customer Owned Banking Association)

Bespoke Identity Index presentation to COBA (Customer Owned Banking Association)

15.12.2020

We are leaders in data augmentation and decision optimisation, specialising in big data solutions across the full gamut of modern data problems.

Frisk is laser-focussed on bringing clarity to data and we offer the RegTech sector a flexible approach and competitive solution that can be deployed as a managed service or SaaS model.

Recently, our Chief Architect, Jesse Budgen spoke at an event hosted by the The RegTech Association for the Customer Owned Banking Association (COBA) about harnessing data from disparate sources to produce actionable insights for the RegTech sector.

Jesse talked about Frisk’s journey since 2008 and our core product, FISC (Flexible Index and Search Capability), but the focus of his presentation was some insight into the development and deployment of the Bespoke Identity Index (BII) for the Australian Tax Office.

A summary of the presentation is below:

 

Frisk was founded in 2008 to deliver a modern search engine experience for content on file shares. It was a simple idea that resonated with business users because they got a powerful, instant response across what was previously dark data for them.

Frisk’s capabilities grew to encompass auditing, risk and compliance use cases. We soon realised two things:

  • Scaling this up to meet the needs of medium to large customers filled a real gap, especially once we started connecting to big customer databases and integrating notes and files across multiple data silos.
  • We found that search and index quickly grew into machine learning, AI and natural language processing – a trend we noticed among our client’s but is also in the competitive landscape. Solutions in this space are often the main lens through which non-technical users consume the output from a range of modern data science techniques.

We therefore re-developed our core product FISC (Flexible Index and Search Capability) to be flexible and configurable to suit a range of use cases.

FISC is an end-to-end solution and we re-use reliable components with flexible configuration to connect data silos, extract text efficiently and present non-technical users with an intuitive interface. Users see a bespoke solution even though every detail is either standard or config driven.

Parts of the FISC pipeline can be integrated with other tools – including in-house data science capabilities – yet we also offer the whole pipeline and guarantee amazing performance and uptime.

The process of getting data into the indexer is a core part of the FISC offering. It manages a constant stream of mixed content linked by various fields so that the right data can be written to the index incrementally, accommodating systems with batch loads as well as real time systems.

Our security model is enforced at the index level and can align with clients’ existing permission systems. One of the ways FISC filters data in real-time is by auditing fragments of content as it goes through the data pipeline to the indexer. Users may only have access to some data in an index – if that’s the case, then that’s all they’ll ever see.

Our experience over the last 6 years has seen us provide both tactical and production solutions across various datasets in the Australian Tax Office. The FISC solution has been used on small datasets with really targeted use cases, right through to a customer database with over 1 billion pieces of full-text searchable content, all via a single flexible user interface.

We were therefore well prepared when the ATO’s intelligence analysts came to Frisk with a broader problem:

 

Incomplete data across many channels can be a rich source of intelligence, but when traditional systems require an exact match, how can analysts make use of this data?”

 

We were challenged to find solutions to two key questions:

  1. How can they automate the process of finding possible fragments of data across all of the analyses they’ve conducted?
  2. How can they find the low quality data entries that might contradict what the structured view of this data is telling them?

 

Whilst working with the analysts to dive deeper into the needs, it was evident that they didn’t want to rely on ad-hoc jobs with their data science teams, they required a flexible interface, configured to enable them to focus on business problems and improve their efficiency.

Frisk’s approach to these kinds of requests is to develop general features (if required) that we can add to our core product and then deliver a new configuration that is scalable, efficient and easy to use.

This is where we introduced the Bespoke Identity Index (BII). Upon Release 1, BII was heralded as a significant improvement on capabilities for the intelligence analysts with sub-second search allowing them to get self-service results on questions that previously took them a week or two.

For the first release, we used automated tools to generate a massive test database of identity fragments with randomised spelling mistakes and partially overlapping content. Loading this into our standard data pipeline we identified the need for a loose schema to accommodate the data and a way of applying spelling suggestions to expand the fuzzy matching capability of our existing search algorithm.

As a result, ATO data teams now load their analysis/insights into a Hadoop environment at any time. They can be very specific with fields like street name, suburb or passport number when they have high confidence data, but they can just as easily roll up imperfect data into general address and identifier fields. The data is ingested automatically by the FISC pipeline, enabling the intelligence analysts to search as generally or as specifically as they want.

Release 1 was delivered in just a couple of months whilst ensuring we met the ATO’s high standards in regards to security, audit logging and transparent testing schedules.

The upcoming BII release 2 will introduce a nearest neighbour algorithm to improve the suggestions based on the latest natural language data science theories. It will also have enhanced reporting and data export capabilities, in order to leverage graph and linked data services.

Data problems don’t need to be of the same scale as the ATO’s to achieve a positive ROI when engaging Frisk. We can provide effective solutions to SME through to large corporate/government businesses and we provide unbiased vendor agnostic recommendations about the way client’s data could optimally drive better decisions.

15 December, 2020