...
Table of Contents
Table of Contents |
---|
Introduction
Monitoring incoming data for statistical drift is necessary to track whether assumptions made during model development are still valid in a production setting. For instance, a data scientist may assume that the values of a particular feature are normally distributed or the choice of encoding of a certain categorical variable may have been made with a certain multinomial distribution in mind. Tests should be run routinely against batches of live data and compared against the distribution of the training data to ensure that these assumptions are still valid, and if the tests fail, then appropriate alerts are raised for the data scientist or ModelOps engineer to investigate.
Details
As the same data set may serve several models, you can write one drift detection model to associate to several models. This association is made during the Model Lifecycle process. The drift model can compare the training data of the associated models to a given batch of data. The following is a simple example:
...