22nd June 2017

Data integrity for machine learning and AI apps

Rick Chapman, High Value Tech Specialist reflects on the AI event hosted by KPMG.

One of the emerging conversations from the event was the evolving picture of data integrity when applied to machine learning and AI-based applications, and issues around validity of data, trust, ownership and beyond.

As data is used to ‘train’ machine learning, it will be key to have both quantitative and qualitative measures of data quality. Once data is then ‘consumed’ by the algorithms, those algorithms make recommendations or trigger further actions.

This raises issues of liability if the actions lead to outcomes such as accidents etc.

Does liability lie with the providers of data, the providers of the machine learning software or the providers of the decision recommendation engine?

Can the fidelity of the data be managed throughout the pipeline, especially when data protection legislation means levels of anonymity must be introduced or data sets further abstracted.

There are several application areas where addressing these issues will be critical to adoption such as autonomous vehicles. However, one area of particular note is health care.

Diagnostics systems, online triage and other areas are prime candidates for machine learning deployments, yet the risks associated are of the most significant nature.

The adoption, therefore, will be more cautious than many other sectors, but the potential benefits are enormous.

Rick Chapman: SETsquared Bristol Entrepreneur in Residence, High Tech Specialist for the West of England LEP, Owner of Parkview Consultants & iPhone developer.