Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagepy
import pandas as pd
import numpy as np
from scipy.stats import ks_2samp
from scipy.stats import binom_test


# modelop.init
def begin():
    """
    A function to read training data and save it, along with it
    numerical features, globally.
    """
    
    global train, numerical_features
    
    train = pd.read_csv('training_data.csv')
    numerical_features = train.select_dtypes(['int64', 'float64']).columns
   
pass


# modelop.scoremetrics
def actionmetrics(datumdata):
    """
    A function to compute KS p-values on input (sample) data
 yield datum  as #compared modelop.metricsto deftraining metrics(baseline data):
    """
    
    ks_tests = [ks_2samp(train.loc[:, feat], data.loc[:, feat]) \
                for feat in numerical_features]
    pvalues = [x[1] for x in ks_tests]
    ks_pvalues = dict(zip(numerical_features, pvalues))
    
    yield dict(pvalues=ks_pvalues)

This drift model executes a two-sample Kolmogorov-Smirnov test between numerical features of the training data and the incoming batch and reports the p-values. If the p-values are sufficiently large (over 0.01 or 0.05), you can assume that the two samples are similar. If the p-values are small, you can assume that these samples are different and generate an alert.

...