Inventory Key Concepts
This article describes the key concept of the ModelOp Center Governance Inventory.
Table of Contents
Use Cases and Model Implementations
AI and ML are meant to solve a business problem. This business problem can be solved through a number of different “AI/ML” techniques, with the best technique often chosen to be implemented for business usage. With Generative AI, especially foundation models, this separation becomes even more apparent, as one foundation model (e.g. LLM) can satisfy a myriad of different business cases.
Thus, in ModelOp Center, there is a separation of “Use Case” from “Model Implementation”:
Use Case: the business problem that is being solved to drive tangible business outputs
Model Implementation: the “Model Implementation” is the technology/model that is used to solve the business use case. The “model implementation” contains the technical details, including model source code, training/test data, configuration files, test results for the model, etc. A given Use Case could be solved by several different “model implementation” approaches.
Example: For example, the Use Case of “Fraud detection” could be solved using a rules-based “model implementation” OR by using a neural net “model implementation.” From a governance perspective, the details of each model implementation needs to be tracked for each use case.
Assets
ModelOp Center is designed to be agnostic to the type of model, the platform on which the model runs, and the data platforms with which the model interacts during various portions of the model lifecycle. Central to the execution of a model are the model-specific assets. Types of Model Assets that may be used during the Model Life Cycle Include:
Trained model artifacts
Requirements (Dependencies) lists
Data assets
Configuration files
Data Assets
ModelOp Center “Assets” support various data technologies, including:
AWS S3 or S3-Compliant S3 Files
Azure Blob Storage Files
GCP Storage Buckets
HDFS Files
SQL data sources
While assets can be used in a variety of activities throughout a model life cycle, below is a summary of typical activities where a User or an MLC may use a Data Asset:
Action | Data Required | Available via: | MOC Asset Examples (note: these examples are not the only ways to define assets) |
Add an Asset (to a Business Use Case, Model, or Snapshot) | All types | CLI: s3 & embedded assets only UI: all API: all | Applicable to S3, Azure Blob, GCP Storage Buckets, HDFS, and SQL Asset |
Run a Metrics Job (e.g. “Back Test” using labeled data) | Test Data | CLI, UI, API, MLC | S3: •Test_Data: s3://<model_base>/TestData.csv HDFS: •Test_Data: hdfs://<model_base>/TestData.csv SQL Asset: •Test_Data: SELECT * FROM <read_only_Test_Data_table> WHERE <conditions> |
Run a Performance Metrics Job | •Comparator Data | CLI: would trigger the API API, MLC | S3: •Compare_Data: s3://<model_base>/ComparatorData.csv HDFS: •Compare_Data: hdfs://<model_base>/ComparatorData.csv SQL: •Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions> … may need to have tag to specify input vs. output comparator data |
Run a Distribution Comparison (e.g. Drift) Job | •Training/Baseline Data •Comparator Data | CLI: would trigger the API API, MLC | S3: •Training_Data: s3://<model_base>/TrainingData.csv •Compare_Data: s3://<model_base>/ComparatorData.csv HDFS: •Training_Data: hdfs://<model_base>/TrainingData.csv •Compare_Data: hdfs://<model_base>/ComparatorData.csv SQL: •Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions> •Compare_Data: Training: SELECT * FROM <read_only_Comparator_table> WHERE <conditions> … may need to have tag to specify input vs. output comparator data |
Run a Bias Detection Job | •Evaluation Data | CLI, UI, API, MLC | S3: •Test_Data: s3://<model_base>/EvaluationData.csv HDFS: •Test_Data: hdfs://<model_base>/EvaluationData.csv SQL Asset: •Test_Data: SELECT * FROM <read_only_Eval_table> WHERE <conditions> |
Run a Training Job | •Training Data | MLC, UI, API | S3: •Training_Data: s3://<model_base>/TrainingData.csv HDFS: •Training_Data: hdfs://<model_base>/TrainingData.csv SQL Asset: •Training_Data: SELECT * FROM <read_only_Training_table> WHERE <conditions> |
Core Model Assets
Trained Model Artifacts
ModelOp Center supports Trained Model Artifacts stored in S3 buckets, Azure Blob Store, GCP Storage Buckets, or Artifactory. When adding asset that is used as Trained Model Artifact, please select the asset role to be “Weights File” or “Model Binary File” depending on the use case.
Schemas
ModelOp Center runtime supports data schemas which helps to define the data inputs/outputs for testing, monitoring, scoring, and governance. The schema can be used for input data and/or output data. When uploading a schema, select “Model Schema” asset role.
Requirements
ModelOp runtime comes with basic pre-installed libraries for different model and languages. If the model requires additional libraries to be installed by the ModelOp Runtime, it supports defining model library requirements in requirements.txt
file. The ModelOp runtime will check the requirements file before running the model and install the missing libraries. When uploading the requirements file, please select the asset role of “Requirements”.
Model Configuration Files
A model implementation may have a multitude of metadata, parameters, and other configurations that are imported and used throughout its life cycle. Below is a list of the optional configuration files where this metadata, parameters, and other configurations may be supplied during import and management of a model implementation :
Model Dependencies/Libraries (“requirements.txt”)
Purpose: contains all of the model-specific dependent libraries/packages that are required to execute the model.
Usage: the requirements.txt is used by the ModelOp runtime to determine if all libraries are loaded on the runtime. By default, if this particular asset exists on the model, when a model is deployed on a ModelOp runtime or a batch job is run on a ModelOp runtime, the runtime will execute a pip install and specify this requirements.txt contents. Pip then compares the versions on the runtime with those specified in the requirements.txt. If there are discrepancies, pip will attempt to install these items from the configured python repository (e.g. the Customer’s Artifactory)
Location: this file should be included in the model’s git repository
Import: when importing the model from git, ModelOp Center will recognize this file as the “Required Libraries” asset_role.
Structure: the list of libraries should be specified in a “requirements.txt” file, which is an industry standard for Machine Learning. Each library and its version are specified on its own line in a file. See below for a typical example of a requirements.txt file.
Model Metadata (“metadata.json”)
Purpose: contains all of the model-specific metadata that is to be imported into the primary metadata and/or the Custom metadata section of the business model (the ModelOp “storedModel” object). Typically model metadata is related to metadata about the model as a whole, NOT to the model’s training/test data or other assets.
Usage: the metadata.json should be used to automatically set core metadata elements for a model (e.g. the model’s “Description” field) or to add custom metadata that should be tracked. Please see this article on the structure of the business model’s metadata and/or see below for an example metadata.json file. NOTE, ModelOp Center only imports the assets upon initial import.
Location: this file should be included in the model’s git repository
Import: when importing the model from git, ModelOp Center will recognize this file and attempt to write the matching fields into the business model (storedModel) primary or custom metadata.
Structure: needs to be valid json that follows the structure of the “metadata” element of a business model (ModelOp “storedModel” object). Please see this article on the structure of the business model’s metadata
External Assets (“external_assets.json”)
Purpose: contains any external assets (e.g. data references) that a user may want to automatically add upon import of a model. This allows a user to avoid having to manually add the assets in the UI or CLI after import.
Usage: the external_assets.json should be used to automatically add external assets to a model during import. Please see this article on the structure of the “assets” section within the business model (“storedModel”) object and/or see below for an example external_assets.json file. NOTE, ModelOp Center only imports the assets upon initial import.
Location: this file should be included in the model’s git repository
Import: when importing the model from git, ModelOp Center will recognize this file and attempt to create each asset in the contained file.
Structure: needs to be valid json that follows the structure of the “assets” section of a business model (ModelOp “storedModel” object). Please see this article on the structure of the business model (“storedModel”) object
Model Schema (“input / output_schema.avsc")
Purpose: provides a description of all data-related model inputs or outputs, including the type, usage, and metadata for related to any input or output data sets.
Usage: the schemas are used to: (1) during scoring, ensure that all input records to the model adheres to the specified schema. If not, the records can be rejected (2) for tests/monitoring, provides all of the necessary information about the input features and output fields for metrics models (tests/monitors) to run. This can include specifiers such as “protected class” , but also additional metadata related to that specifier (e.g. if protected class specified, additional information could be “reference group”). NOTE, for tests/monitoring, ModelOp Center requires the “extended schema” to be produced, which can be created via the Schema Generation capability in ModelOp Center. Once the schema is generated, it is recommended that the user add the schema file(s) to the git repository for on-going management
Location: this file should be included in the model’s git repository once Schema Generation has been completed.
Import: when importing the model from git, ModelOp Center will recognize this file and classify as input or output schema files.
Structure: needs to be valid AVRO that follows the ModelOp Center “extended schema” structure. Please see this article on the structure of the ModelOp Center “extended schema” structure.
Required Assets (“required_assets.json")
Purpose: provides a list of all assets that are required to run a given metrics model (test/monitor).
Usage: primarily used by the ModelOp Center UI to indicate to the user which assets are required when adding a monitor.
Location: this file should be included in the model’s git repository.
Import: when importing the model from git, ModelOp Center will recognize this file and automatically use it to populate the “Add a Monitor” wizard “assets” screen.
Structure: needs to be valid json that follows the ModelOp Center “required assets” structure. Please see this article on the structure of the ModelOp Center “required assets” structure.
Next Article: Inventory Metadata >