Oct 1, 2024
AI in Consulting Practices #4: Measuring Equals Learning
Dennie van den Biggelaar, Onesurance, in Know Your Stuff!, VVP 4-2024
In this fourth part of the series AI in the Advisory Practice, we focus on a crucial aspect: how do you know and measure if your AI system is really doing what it should be doing? In the first part (VVP 1, 2024), AI strategist Dennie van den Biggelaar showed how to get started with Machine Learning (a specific part of AI), in the second part (VVP 2), how to operationalize AI in your business processes, and the third part (VVP 3) highlighted Integrating AI software into existing IT landscapes.
Measuring the effectiveness of an AI application starts with defining clear 'business KPIs.' These KPIs are crucial because they guide which aspects of your business operations you want to improve and how you can make these improvements measurable. For an insurance company, these goals might include increasing revenue, improving retention, increasing policy density, or increasing STP acceptance. Establishing these KPIs provides a framework for both the development and evaluation of the AI application.
Human and machine
In practice, AI applications often work alongside human experts. Therefore, it's important to measure the performance of both the AI and the human separately and together. This provides insight into the effectiveness of the collaboration and helps you determine where improvements are possible.
Example: Active customer management: imagine you have an AI algorithm that identifies customers with a high likelihood of defection. If the sales team or advisor does not adequately follow up on these signals, the intended reduction in defection may not occur. By measuring performance per employee, you can discover if certain employees achieve better results than others. These insights can then be shared to strengthen the team as a whole.
Technical performance
To assess the technical performance of a predictive algorithm, various indicators are used: accuracy (indicating how often the algorithm makes the correct prediction), precision (this metric specifically looks at the reliability of positive predictions), sensitivity (this measures how well an AI model can detect all relevant outcomes), Area Under Curve (provides an overview of the model's prediction quality over different thresholds), and Log Loss (this measures how close the predicted probabilities are to the actual outcomes).
In addition to these indicators, speed, efficiency, and scalability are important. Speed, or latency, determines how quickly the AI application responds to a request. Efficiency is measured by the application's memory usage, and scalability is assessed based on the number of predictions made within a certain time frame (throughput). These factors provide an assessment of the scalability of an algorithm.
Robust and ethical
An AI application must not only perform well technically but also be robust and ethically sound. This includes the model's ability to remain effective even as input data or the environment changes (model drift and shift). Additionally, the model must be sensitive to changes in the data it has been trained on (data drift and shift). Ethical considerations, such as preventing discrimination based on gender, ethnicity, or age, are also crucial to ensure the AI operates fairly and responsibly.
'Measuring the effectiveness of an AI application is a complex but necessary process.'
Uptime and reliability
As with any cloud-based application, the uptime of an AI application is crucial, especially in production environments. A common standard in a Service Level Agreement (SLA) is an uptime of 99.9 percent. This means that out of every 1,000 interactions with the application, only one may fail. To ensure this reliability, a backup application is often deployed that can take over in case of failures.
From prototype to production
Setting up an AI application is a step-by-step process. In the prototype phase, the focus is primarily on testing the predictability of the algorithm and minimizing any discrimination. If the AI application passes these tests, the next step is to assess whether the application actually improves the desired business KPIs. The scalability of the model is also taken into consideration at this stage.
Once the AI is in production, the focus shifts to ensuring uptime and monitoring the robustness of the AI over time. By systematically measuring and evaluating, you can continuously improve and ensure that your AI application does what it is supposed to do, now and in the future.
Measuring impact
One of the most effective methods to measure whether an AI application delivers the desired results is through A/B testing. In this method, the target audience is randomly divided into two groups: one group (Group A) uses the new AI application, while the other group (Group B) uses the traditional method or a previous version of the system without AI. By comparing the performance of both groups, you can determine how effective the AI is in improving the business KPIs.
The success of an AI application greatly depends on how insights from A/B tests are integrated into business operations. For instance, if an A/B test shows that a particular AI tool leads to higher policy density, this could be a reason to roll out the tool more broadly within the organization.
Effective
Measuring the effectiveness of an AI application is a complex but necessary process. It begins with defining clear business KPIs and evaluating both the technical performance and the collaboration between human and machine. Robustness, ethical considerations, and uptime are just as important as the predictability of the algorithm. By leveraging A/B testing, you can reliably determine whether the AI application genuinely contributes to achieving your business goals. It is essential that it not only functions well technically but also effectively contributes to improving your business outcomes.
The original article was published in VVP, read here the article online.