Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Choosing a drift monitor for a business model depends in practice on the particular model in consideration. For example, a binary classification model can be best monitored for concept drift by running a Summary test (basic statistics), instead of a 2-sample test, since there are only two possible outcomes, and thus a very small range for the random variable. In addition, feature types (numerical vs categorical - also referred to in MOC terminology as dataClass) play an important role in choosing the right monitor. Some monitors, such as Kullback-LibelerLiebler (KL) accommodate both numerical and categorical data, whereas others (usually 2-sample tests such as Kolmogorov-Smirnov or Epps-Singleton) work only on numerical features.

...

If the output of the Epps-Singleton test on two distributions is a p-value that is less than a certain threshold (iei.e. 0.05), then we can reject the null hypothesis that the two samples come from a similar underlying distribution. When applied to a feature (or a target variable) of a dataset, we can determine if there is drift between a baseline and a sample dataset in that feature (or target variable).

Remarks:

  1. Null values in the samples will cause the Epps-Singleton test to fail. As such, null values are dropped when calculating the Epps-Singleton test.

  2. The Epps-Singleton test will fail when there are less than five values in each sample. In such cases, the Epps-Singleton test will return a null metric

  • Kolmogorov-Smirnov 2-Sample Test

...

If the output of the Kolmogorov-Smirnov test on two distributions is a p-value that is less than a certain threshold (iei.e. 0.05), then we can reject the null hypothesis that the two samples have an identical underlying distribution. When applied to a feature (or a target variable) of a dataset, we can determine if there is drift between a baseline and a sample dataset in that feature (or target variable).

...

Computes the Jensen-Shannon distance between two distributions, which is the square root of the Jensen-Shannon divergence metric.

The output of the Jensen-Shannon distance calculation is not a p-value, like the Epps-Singleton or the Kolmogorov-Smirnov tests, but a distance. As such, there is not a one-case-fits-all or a universally accepted value that shows that the two distributions are significantly different. However, it is useful over time to keep track of how the distances of two distributions might change.

  • Kullback-Leibler Divergence

https:Remarks:

  1. Null values in the samples will cause the Jensen-Shannon distance to fail. As such, null values are dropped when calculating the Jensen-Shannon distance.

  2. Because the Jensen-Shannon distance attempts to fit a Gaussian KDE on the samples, an error occurs when there is little to no variance in the samples (i.e. all constant values). In such cases, the Jensen-Shannon distance will return a null metric.

  • Kullback-Leibler Divergence

https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.kl_div.html

Computes the Kullback-Leibler divergence metric (also called relative entropy) between two distributions. Computes by bucketing the samples, computing the element-wise Kullbkac-Leibler divergence metric, then sums each bucket for the final divergence metric over the samples. Because the Kullback-Leibler divergence is asymmetric, the order in which the samples are input into the calculation might output slightly differing results. Also, it is possible that the metric might return a value of Inf. In such a case, the samples are automatically reversed and calculated again.

The output of the Kullback-Leibler divergence calculation is a not a p-value (like the Epps-Singleton and Kolmogorov-Smirnov tests), nor is it a distance (like the Jensen-Shannon distance), but rather a metric to inform how diverged two distributions might be. Like the Jensen-Shannon distance, there is no one-case-fits-all or universally accepted value to determine if two distributions are significantly different, but the Kullback-Leibler divergence provides one more option in detecting possible drift/generated/scipy.special.kl_div.html

Computes the Kullback-Leibler divergence metric (also called relative entropy) between two distributions. Computes by bucketing the samples, computing the element-wise Kullback-Leibler divergence metric, then sums each bucket for the final divergence metric over the samples. Because the Kullback-Leibler divergence is asymmetric, the order in which the samples are input into the calculation might output slightly differing results.

The output of the Kullback-Leibler divergence calculation is a not a p-value (like the Epps-Singleton and Kolmogorov-Smirnov tests), nor is it a distance (like the Jensen-Shannon distance), but rather a metric to inform how divergent two distributions might be. Like the Jensen-Shannon distance, there is no one-case-fits-all or universally accepted value to determine if two distributions are significantly different, but the Kullback-Leibler divergence provides one more option in detecting possible drift.

Remarks:

  1. It is possible that the Kullback-Leibler Divergence will return a value of Inf (when the support of one sample is not contained within the support of the other sample, or when one sample distribution has a much “wider tail” than the other). In such cases, the order of the samples will be reversed and the Kullback-Leibler Divergence will be recalculated (with an appropriate logger.warning raised). However, in the case that even the reversed order of samples returns Inf, the Kullback-Leibler Divergence will return a null metric.

Model Assumptions


Business Models considered for drift monitoring have a couple of requirements:

...