Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article describes the key concept of the ModelOp Center Governance Inventory.

Table of Contents

Table of Contents

...

minLevel1
maxLevel

...

6
outlinefalse
styledefault
typelist
printabletrue

Use Cases and Model Implementations

...

Action

Data Required

Available via:

MOC Asset Examples (note: these examples are not the only ways to define assets)

Add an Asset (to a Business Use Case, Model, or Snapshot)

All types

CLI: s3 & embedded assets only

UI: all

API: all

Applicable to S3, Azure Blob, GCP Storage Buckets, HDFS, and SQL Asset

Run a Metrics Job

(e.g. “Back Test” using labeled data)

Test Data that contains: (1) model output (2) labels/ground truth for each model output

CLI, UI, API, MLC

S3:

Test_Data: s3://<model_base>/TestData.csv

HDFS:

Test_Data: hdfs://<model_base>/TestData.csv

SQL Asset:

Test_Data: SELECT * FROM <read_only_Test_Data_table> WHERE <conditions>

Run a Performance Metrics Job

•Comparator Data

CLI: would trigger the API

•Production (Comparator) Data that contains: (1) model output (2) labels/ground truth for each model output

UI, API, MLC

S3:

Compare_Data: s3://<model_base>/ComparatorData.csv

HDFS:

Compare_Data: hdfs://<model_base>/ComparatorData.csv

SQL:

Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions> … may need to have tag to specify input vs. output comparator data

Run a Distribution Comparison (e.g. Drift) Job

•Training /or Baseline Datathat contains all applicable model features in the training or baseline data set

•Comparator Data

CLI: would trigger the API

that contains the exact same model features listed in the training or baseline data set

UI, API, MLC

S3:

Training_Data: s3://<model_base>/TrainingData.csv

Compare_Data: s3://<model_base>/ComparatorData.csv

HDFS:

Training_Data: hdfs://<model_base>/TrainingData.csv

Compare_Data: hdfs://<model_base>/ComparatorData.csv

SQL:

Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions>

Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions> … may need to have tag to specify input vs. output comparator data

Run a Bias Detection Job

Evaluation Data

CLI, •Production (Comparator) Data that contains: (1) model output (2) labels/ground truth for each model output (3) the protected class for each model record

UI, API, MLC

S3:

Test_Data: s3://<model_base>/EvaluationData.csv

HDFS:

Test_Data: hdfs://<model_base>/EvaluationData.csv

SQL Asset:

Test_Data: SELECT * FROM <read_only_Eval_table> WHERE <conditions>

Run LLM Tests on Model Output Data

(e.g. Sentiment Analysis, Top Words by Parts of Speech, PII Leakage Detection, Toxicity, Gibberish Detection)

•Production (Comparator) Data that contains the model output (e.g. responses from a chatbot)

UI, API, MLC

S3:

Compare_Data: s3://<model_base>/ComparatorData.csv

HDFS:

Compare_Data: hdfs://<model_base>/ComparatorData.csv

SQL:

Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions>

Run LLM Tests using Known Questions & Answers (e.g. Similarity Analysis, Cross-LLM Accuracy, Cross-LLM Fact Checking)

•Production (Known_Questions) Data that contains: (a) known set of questions (b) the model output from a known set of questions (c) human-reviewed answers to the questions

•LLM (e.g. GPT-4o) for cross-LLM tests

UI, API, MLC

S3:

Known_Questions_Data: s3://<model_base>/KnownQuestionsData.csv

HDFS:

Known_Questions_Data: hdfs://<model_base>/KnownQuestionsData.csv

SQL:

Known_Questions_Data: Training: SELECT * FROM <read_only_Known_Questions_table> WHERE <conditions>

Run LLM Guardrails Validation

•Guardrail_Testing_Questions that contains: (a) known set of questions to validate guardrails efficacy (b) human-reviewed expected results (e.g. guardrails should filter the answers out or allow it through)

UI, API, MLC

S3:

Guardrail_Questions_Data: s3://<model_base>/GuardrailQuestionsData.csv

HDFS:

Guardrail_Questions_Data: hdfs://<model_base>/GuardrailQuestionsData.csv

SQL:

Guardrail_Questions_Data: Training: SELECT * FROM <read_only_Guardrail_Questions_table> WHERE <conditions>

Run a Training Job

•Training Data

MLC, UI, API, MLC

S3:

Training_Data: s3://<model_base>/TrainingData.csv

HDFS:

Training_Data: hdfs://<model_base>/TrainingData.csv

SQL Asset:

Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions>

...

  • Purpose: provides a list of all assets that are required to run a given metrics model (test/monitor).  

  • Usage: primarily used by the ModelOp Center UI to indicate to the user which assets are required when adding a monitor.

  • Location: this file should be included in the model’s git repository.

  • Import: when importing the model from git, ModelOp Center will recognize this file and automatically use it to populate the “Add a Monitor” wizard “assets” screen.

  • Structure: needs to be valid json that follows the ModelOp Center “required assets” structure. Please see this article on the structure of the ModelOp Center “required assets” structure.

Next Article: Inventory Metadata >