ModelOp Center uses schemas to define the data that is used by the model for governance, testing, and monitoring. This article describes how ModelOp Center enables schema generation and use.
...
Table of Contents | ||||
---|---|---|---|---|
|
Background
Overview
As mentioned in prior articles, ModelOp Center uses externalized schemas to abstract the data sources from the model source itself, allowing for flexibility to use different data platforms throughout the model’s life cycle (e.g. use a CSV for testing but use Snowflake for production). The schema is used for governance traceability of the data sets that are used and the specific features in that data set. Additionally, the schema is used for providing flexible and streamlined testing and monitoring across a variety of different data sources. Furthermore, if the ModelOp Runtime is used for model execution, the schema is used to ensure that the model data ingress adheres to what the model expects and that the model output adheres to what the consuming application/process expects.
Background
Avro Schemas
ModelOp Center utilizes the Avro specification for schema checking. It can be found here: https://avro.apache.org/docs/current/spec.html. For most models, records, arrays, or simple types suffice; but in some instances, more complex structures are required, especially when input data is nested.
...
To generate an extended schema for a business model:
Navigate to the corresponding Business Model in the MOC UI (under Models).
Click on Schema.
Click on Generate Extended Schema. You should see the following window pop-up:
Enter the data you want to use to infer a schema in the top box. The data must be formatted as one-line dictionaries, as in the following sample:
Code Block language py {"UUID": "9a5d9f42-3f36-4f38-88dd-22353fdb66a7", "amount": 8875.50, "home_ownership": "MORTGAGE", "age": "Over Forty", "credit_age": 4511, "employed": true, "label": 1, "prediction": 1} {"UUID": "f8d95245-a186-45a6-b951-376323d06d02", "amount": 9000, "home_ownership": "MORTGAGE", "age": "Under Forty", "credit_age": 7524, "employed": false, "label": 0, "prediction": 1} {"UUID": "8607e327-4dca-4372-a4b9-df7730f83c8e", "amount": 5000.50, "home_ownership": "RENT", "age": "Under Forty", "credit_age": null, "employed": true, "label": 0, "prediction": 0}
Click on Generate Schema.
The schema can then be downloaded or saved as Input/Output Schema.
The recommended best practice is to download the generated schema, and then add it as an asset to the business model being monitored in the model’s git repository. Once the schema is properly versioned along with the source code (e.g. in a Github repo), one doesn’t have to regenerate the schema anymore. Note that MOC will not push the generated schema to the model repo; it is up to the user to do so.
Note: If generating the extended schema for monitoring purposes, you should save it as an input schema; the OOTB monitors will look for an extended input schema to set the monitoring parameters.
Using Extended Schemas
For Scoring Jobs
...
In some cases, a user will have to edit the generated schema manually, particularly when certain fields are too complex to be interpreted correctly (as the author intended) by the inference tool. The MOC UI allows you to edit the schemas on a storedModel. To do so:
Navigate to the storedModel in the MOC UI (under Models).
Click on Schemas.
Choose the schema you wish to edit from the menu on the left.
You should see two views of the schema: a Table view (most helpful for extended schemas), and a JSON view:
Click on either Edit Table or Edit JSON.
Click on Save Changes. Edits made to one object (JSON/Table) will be reflected in the other once they are saved.