Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article describes how ModelOp Center enables on-going ongoing Drift Monitoring.

Table of Contents

...

Code Block
languagepy
import pandas as pd
import numpy as np
from scipy.stats import ks_2samp
from scipy.stats import binom_test


# #modelopmodelop.init
def begin():
    global train, numerical_features
    train = pd.read_csv('training_data.csv')
    numerical_features = train.select_dtypes(['int64', 'float64']).columns
    pass

#modelop
# modelop.score
def action(datum):
    yield datum

#modelop
# modelop.metrics
def metrics(data):
    ks_tests = [ks_2samp(train.loc[:, feat], data.loc[:, feat]) \
                for feat in numerical_features]
    pvalues = [x[1] for x in ks_tests]
    ks_pvalues = dict(zip(numerical_features, pvalues))
    
    yield dict(pvalues=ks_pvalues)

...

If the training data is too large to fit in memory, you can save summary statistics about the training data and save those as, e.g., a pickle file and read those statistics in during the init function of the drift model. The metrics function can contain other statistical tests to compare those statistics to the statistics of the incoming batch.

Next Article: Model Governance: Standard Model Definition >