Add Model-Specific Data Assets
ModelOp Center is designed to be agnostic to the data platforms with which the model interacts during various portions of the model lifecycle.
Types of Data Sets Used During the Model Life Cycle Include:
Training Data (often used as Baseline Data)
Evaluation (“Test”) Data
“Comparator Data” (production data “window” that will be compared against a training/baseline data set)
ModelOp Center “Assets” that support various data technologies:
S3 Files
HDFS Files
SQL Asset
Summary of example activities where a User or an MLC may use a Data Asset:
Action | Data Required | Available via: | MOC Asset Examples [ASSET_ROLE : EXAMPLE} |
Add an Attachment (to a StoredModel) | All types | CLI: s3 & embedded assets only UI: all API: all | Applicable to S3, HDFS, and SQL Asset |
Run a Metrics Job (e.g. “Back Test” using labeled data) | •Evaluation Data | CLI, UI, API, MLC | S3: •Test_Data: s3://<model_base>/EvaluationData.csv HDFS: •Test_Data: hdfs://<model_base>/EvaluationData.csv SQL Asset: •Test_Data: SELECT * FROM <read_only_Eval_table> WHERE <conditions> |
Run a Data Drift Job | •Training/Baseline Data •Comparator Data | CLI: would trigger the API API, MLC | S3: •Training_Data: s3://<model_base>/TrainingData.csv •Compare_Data: s3://<model_base>/ComparatorData.csv HDFS: •Training_Data: hdfs://<model_base>/TrainingData.csv •Compare_Data: hdfs://<model_base>/ComparatorData.csv SQL: •Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions> •Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions> … may need to have tag to specify input vs. output comparator data |
Run a Model Concept Drift Job | •Training/Baseline Data •Comparator Data | CLI: would trigger the API API, MLC | S3: •Training_Data: s3://<model_base>/TrainingData.csv •Compare_Data: s3://<model_base>/ComparatorData.csv HDFS: •Training_Data: hdfs://<model_base>/TrainingData.csv •Compare_Data: hdfs://<model_base>/ComparatorData.csv SQL: •Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions> •Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions> … may need to have tag to specify input vs. output comparator data |
Run a Bias Detection Job | •Evaluation Data | CLI, UI, API, MLC | S3: •Test_Data: s3://<model_base>/EvaluationData.csv HDFS: •Test_Data: hdfs://<model_base>/EvaluationData.csv SQL Asset: •Test_Data: SELECT * FROM <read_only_Eval_table> WHERE <conditions> |
Run a Training Job | •Training Data | MLC, UI, API | S3: •Training_Data: s3://<model_base>/TrainingData.csv HDFS: •Training_Data: hdfs://<model_base>/TrainingData.csv SQL Asset: •Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions> |
Next Article: Model Life Cycle Management: Overview >