Add Model-Specific Data Assets

ModelOp Center is designed to be agnostic to the data platforms with which the model interacts during various portions of the model lifecycle.

 

Types of Data Sets Used During the Model Life Cycle Include:

  • Training Data (often used as Baseline Data)

  • Evaluation (“Test”) Data

  • “Comparator Data” (production data “window” that will be compared against a training/baseline data set)

 

ModelOp Center “Assets” that support various data technologies:

  • S3 Files

  • HDFS Files

  • SQL Asset

 

Summary of example activities where a User or an MLC may use a Data Asset:

Action

Data Required

Available via:

MOC Asset Examples [ASSET_ROLE : EXAMPLE}

Add an Attachment (to a StoredModel)

All types

CLI: s3 & embedded assets only

UI: all

API: all

Applicable to S3, HDFS, and SQL Asset

Run a Metrics Job

(e.g. “Back Test” using labeled data)

Evaluation Data

CLI, UI, API, MLC

S3:

Test_Data: s3://<model_base>/EvaluationData.csv

HDFS:

Test_Data: hdfs://<model_base>/EvaluationData.csv

SQL Asset:

Test_Data: SELECT * FROM <read_only_Eval_table> WHERE <conditions>

Run a Data Drift Job

•Training/Baseline Data

•Comparator Data

CLI: would trigger the API

API, MLC

S3:

Training_Data: s3://<model_base>/TrainingData.csv

Compare_Data: s3://<model_base>/ComparatorData.csv

HDFS:

Training_Data: hdfs://<model_base>/TrainingData.csv

Compare_Data: hdfs://<model_base>/ComparatorData.csv

SQL:

Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions>

Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions> … may need to have tag to specify input vs. output comparator data

Run a Model Concept Drift Job

•Training/Baseline Data

•Comparator Data

CLI: would trigger the API

API, MLC

S3:

Training_Data: s3://<model_base>/TrainingData.csv

Compare_Data: s3://<model_base>/ComparatorData.csv

HDFS:

Training_Data: hdfs://<model_base>/TrainingData.csv

Compare_Data: hdfs://<model_base>/ComparatorData.csv

SQL:

Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions>

Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions> … may need to have tag to specify input vs. output comparator data

Run a Bias Detection Job

Evaluation Data

CLI, UI, API, MLC

S3:

Test_Data: s3://<model_base>/EvaluationData.csv

HDFS:

Test_Data: hdfs://<model_base>/EvaluationData.csv

SQL Asset:

Test_Data: SELECT * FROM <read_only_Eval_table> WHERE <conditions>

Run a Training Job

•Training Data

MLC, UI, API

S3:

Training_Data: s3://<model_base>/TrainingData.csv

HDFS:

Training_Data: hdfs://<model_base>/TrainingData.csv

SQL Asset:

Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions>

Next Article: Model Life Cycle Management: Overview >