...
Code Block | ||
---|---|---|
| ||
import pandas as pd import numpy as np from scipy.stats import ks_2samp from scipy.stats import binom_test # modelop.init def begin(): """ A function to read training data and save it, along with it numerical features, globally. """ global train, numerical_features train = pd.read_csv('training_data.csv') numerical_features = train.select_dtypes(['int64', 'float64']).columns pass # modelop.scoremetrics def actionmetrics(datumdata): """ A function to compute KS p-values on input (sample) data yield datum as #compared modelop.metricsto deftraining metrics(baseline data): """ ks_tests = [ks_2samp(train.loc[:, feat], data.loc[:, feat]) \ for feat in numerical_features] pvalues = [x[1] for x in ks_tests] ks_pvalues = dict(zip(numerical_features, pvalues)) yield dict(pvalues=ks_pvalues) |
This drift model executes a two-sample Kolmogorov-Smirnov test between numerical features of the training data and the incoming batch and reports the p-values. If the p-values are sufficiently large (over 0.01 or 0.05), you can assume that the two samples are similar. If the p-values are small, you can assume that these samples are different and generate an alert.
...