Monitoring is critical to the success of machine learning models deployed in production systems. Because ML models are not static pieces of code but, rather, dynamic predictors which depend on data, hyperparameters, evaluation metrics, and many other variables, it is vital to have insight into the training, validation, deployment, and inference processes in order to prevent model drift and predictive stasis, and a host of additional issues. However, not all monitoring solutions are created equal. In this post, I highlight three must-haves for machine learning monitoring, which hopefully serve you well whether you are deciding to build or buy a solution.
Complete Process Visibility
First, models have to be evaluated in the context of the business function they aim to serve. Such function is often realized multiple steps downstream from the model. Outside of the lab, many AI-driven applications involve multiple models working in tandem. Furthermore, the behavior of the models will likely depend on data transformations which are multiple steps upstream. Thus, a monitoring solution which focuses on single model behavior will not capture the holistic picture of model performance as it relates to the global business context and will fail to uncover many of the issues that begin or end outside of the model. A proper assessment of ML model viability only comes from complete process visibility — having insight into the entire dataflow, metadata, context, and overarching business processes on which the modeling is predicated.
For example, as part of a credit approval application, a bank may deploy a suite of models which assess credit worthiness, screen for potential fraud, and dynamically allocate trending offers and promos. A simple monitoring system might be able to evaluate any one of these models individually, but solving the overall business problem demands an understanding of the interlocution between them. While they may have divergent modeling goals, each of these models rests upon a shared foundation of training data, context, and business metadata. Thus, an effective monitoring solution will take all of these disparate pieces into account and generate unified insights which harness this shared information. These might include identifying niche and underutilized customer segments in the training data distribution, flagging potential instances of concept and data drift, understanding the aggregate model impact on business KPIs, and more.
The best monitoring solutions are able to extend to all process stages, including the ones which do not involve a model component.
Mona's intelligent monitoring platform instantly detects anomalies within your data, automatically providing actionable insights to quickly resolve underperformance issues
Automatic, Granular Insights
A common misconception is that a monitoring solution should simply enable visualization and troubleshooting of the common metrics associated with an ML model in production. While this is helpful, visualization and troubleshooting implies that you are already in “investigation mode”. Worse yet, you might be “fire-fighting” after the business complained that a KPI dropped (and asked “what’s wrong with the model?”).
So, how about being more proactive?
How about detecting issues weeks or even longer before overall performance declined?
You should expect your monitoring solution to automatically detect problems when they are still small, in granular segments of data. Allowing you ample time to take corrective or preemptive action. The meaning of “automatically” deserves some further elaboration here. Some monitoring tools will provide dashboards that allow you to manually investigate subsegments of data to see what’s performing well and what’s not. However, this sort of facile introspection requires painstaking manual intervention and misses the greater point, which is that a true monitoring solution will be able to intrinsically detect anomalies via its own mechanisms without external reliance on an individual to provide a hypothesis of their own.
The more granular you get, the more you should pay attention to noise reduction. It’s expected that single anomalies would propagate issues in multiple places. It’s via detection of the root causes of issues that the monitoring truly succeeds, not just by flagging surface-level data discrepancies or the like.
Mona is the most flexible ML monitoring solution, enabling teams to track custom metrics that matter the most to them
Different ML systems have different data and flow, different business cycles, different success indicators, and different types of models. You should seriously doubt “plug-and-play” monitoring solutions.
A complete ML monitoring solution has to be configurable to any problem and across all of its components. It should be able to take in any model metric, any unstructured log, and, really, any piece of tabular data and make it easy to:
- Construct and continuously update a single performance database
- Create and customize dynamic visualizations and reports
- Setup and tweak automatic, granular insights and notifications
One simple example for the need for configurability lies in the contrast between systems in which you can gain (near) real time feedback for model fidelity (e.g., consumer recommendation systems) and ones in which a feedback loop requires human intervention and more time (e.g., fraud detection, credit scoring and more).
Most enterprise ML teams are working on a variety of ML projects on very different business problems. Consequently, monitoring requirements are broad and require nuance and flexibility to accommodate the differences. If you are on one of those teams, you may have established strong data science standards, a unified stack for data preparation, model development and deployment. Now, will you be able to monitor and govern your systems with unified standards and a single solution? You should definitely expect to do so.
Given the ever-increasing hype around machine learning, there exist many solutions which will take an ML model and provide superficial insights into its feature behavior, output distributions and basic performance metrics. However, solutions which exhibit complete process visibility, proactive, intelligent insights, and total configurability are much, much rarer. Yet, it is these three attributes which are key for squeezing the highest performance and downstream business impact out of ML models. Therefore, it’s crucial to evaluate any monitoring solution through the lens of these three must-haves and ensure that it provides not only model visibility but a more global and complete understanding of the business context.