The normal Avro schema checking will then fail the check as the numerical probability has been replaced by a string.

Schema Enforcement on ModelOp Runtimes

Batch Jobs

Batch jobs can be run from the ModelOp Center UI or from the CLI with schema checking enabled. When schema checking is enabled, records that do not conform to the provided schema are filtered out. If the input record fails the check against the input schema, then it is simply rejected by the schema and not scored. If the output fails the output schema check, then the record is scored, but won’t be piped to the output file. This is so that the output is not allowed into a downstream application where it could cause errors.

REST

If a model is deployed to a ModelOp Center Runtime as a REST endpoint with schema checking enabled, requests made to that Runtime that fail either the input or output schema checks return a 400 error with a rejected by schema message.

Extended Schema & Monitoring

To enable monitoring out-of-the-box (OOTB), ModelOp Center introduced in V2.4 the concept of an extended schema. We will go over the details below, but in short, an extended schema is a rich Avro schema, so that more information about the data can be learned. Traditionally, Avro schemas specify field names and types. Extended schemas add more key:value pairs to each field, so that OOTB monitors can make reasonable assumptions, such as inferring the role of a field ("predictor", "identifier", etc.)

An Example

Extended schemas are best understood through an example. Let’s consider the same records from the previous example:

...

language	json

Extended Schema & Monitoring

To enable monitoring out-of-the-box (OOTB), ModelOp Center introduced in V2.4 the concept of an extended schema. We will go over the details below, but in short, an extended schema is a rich Avro schema, so that more information about the data can be learned. Traditionally, Avro schemas specify field names and types. Extended schemas add more key:value pairs to each field, so that OOTB monitors can make reasonable assumptions, such as inferring the role of a field ("predictor", "identifier", etc.)

An Example

Extended schemas are best understood through an example. Let’s consider the same records from the previous example:

Code Block

language	json

{"UUID": "9a5d9f42-3f36-4f38-88dd-22353fdb66a7", "amount": 8875.50, "home_ownership": "MORTGAGE", "age": "Over Forty", "credit_age": 4511, "employed": true, "label": 1, "prediction": 1}
{"UUID": "9a5d9f42f8d95245-3f36a186-4f3845a6-88ddb951-22353fdb66a7376323d06d02", "amount": 8875.509000, "home_ownership": "MORTGAGE", "age": "OverUnder Forty", "credit_age": 45117524, "employed": truefalse, "label": 10, "prediction": 1}
{"UUID": "f8d952458607e327-a1864dca-45a64372-b951a4b9-376323d06d02df7730f83c8e", "amount": 90005000.50, "home_ownership": "MORTGAGERENT", "age": "Under Forty", "credit_age": 7524null, "employed": falsetrue, "label": 0, "prediction": 1}
{"UUID0}

The corresponding extended schema is the following object:

Expand

title	input_schema.avsc

Expand

title	input_schema.avsc

Code Block

language	json

{
    "type": "

8607e327-4dca-4372-a4b9-df7730f83c8e

record",

"amount":

5000.50,

home_ownership

name": "

RENT

inferred_schema",
    "

age

fields":

"Under Forty", "credit_age": null, "employed": true, "label": 0, "prediction": 0}

The corresponding extended schema is the following object:

Code Block

language	json

{[
        {
            "name": "UUID",
      "type": "record",     "nametype": "inferred_schemastring",
    "fields": [       "dataClass": "categorical",
{             "namerole": "UUIDidentifier",
            "typeprotectedClass": "string"false,
            "dataClassdriftCandidate": "categorical",false,
            "role": "identifier",
            "protectedClass": false,
            "driftCandidate": false,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "amount",
            "type": [
                "int",
                "double"
            ],
            "dataClass": "numerical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "home_ownership",
            "type": "string",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "age",
            "type": "string",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": true,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        },
        {
            "name": "credit_age",
            "type": [
                "null",
                "int"
            ],
            "dataClass": "numerical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "employed",
            "type": "boolean",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "label",
            "type": "int",
            "dataClass": "categorical",
            "role": "label",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        },
        {
            "name": "prediction",
            "type": "int",
            "dataClass": "categorical",
            "role": "score",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        }
    ]
}

...

If no special values are present, the default is an empty array: "specialValues": [].
Otherwise, "specialValues" is an array of JSON objects, with keys "values" and "purpose"; "values" is an array of any type, and "purpose" is a string.

The following are all valid examples:

Code Block
"specialValues": []

Code Block
"specialValues": [ { "values": ["N/A"], "purpose": "Field Not Applicable" } ]

Code Block

language	json

"specialValues": [
    {
        "values": [999, 998], 
        "purpose": "Flagged for review"
    }, 
    {
        "values": [-1000], 
        "purpose": "Invalid input"
    }
]

scoringOptional

"scoringOptional" is a boolean field used to indicate whether or not a field is optional for the scoring function. As a reminder, the presence of a field in the schema makes it required by default, and thus a record missing a required field will be rejected by the schema.

This is limiting, as one might want to specify a "score" or "label" field in the extended input schema, even though these fields are most likely not present in the input records. Thus, one can make these fields optional for the scoring job, which guarantees that a record not containing them will not be rejected by the schema.

Possible values

true or false.

Reserved field roles and protected classes

If a field role is one of "label", "score", or "weight", "scoringOptional" is set to true.
If a field has "protectedClass": true, "scoringOptional" is set to true.

Otherwise, "scoringOptional" is set to false.

 [-1000], 
        "purpose": "Invalid input"
    }
]

scoringOptional

"scoringOptional" is a boolean field used to indicate whether or not a field is optional for the scoring function. As a reminder, the presence of a field in the schema makes it required by default, and thus a record missing a required field will be rejected by the schema.

This is limiting, as one might want to specify a "score" or "label" field in the extended input schema, even though these fields are most likely not present in the input records. Thus, one can make these fields optional for the scoring job, which guarantees that a record not containing them will not be rejected by the schema.

Possible values

true or false.

Reserved field roles and protected classes

If a field role is one of "label", "score", or "weight", "scoringOptional" is set to true.
If a field has "protectedClass": true, "scoringOptional" is set to true.
Otherwise, "scoringOptional" is set to false.

Generating Extended Schemas

Using the UI

To generate an extended schema for a business model:

Navigate to the corresponding storedModel in the MOC UI (under Models).
Click on Schemas.
Click on Generate Extended Schema. You should see the following window pop-up:
Image Added
Enter the data you want to use to infer a schema in the top box. The data must be formatted as one-line dictionaries, as in the sample data above.
Click on Generate Schema.
The schema can then be downloaded or saved as Input/Output Schema.
1. The recommended best practice is to download the generated schema then add it as an asset to the business model being monitored. Once the schema is properly versioned along with the source code (e.g. in a Github repo), one doesn’t have to regenerate the schema anymore. Note that MOC will not push the generated schema to the model repo; it is up to the user to do so.
2. If generating the extended schema for monitoring purposes, you should save it as an input schema; the OOTB monitors will look for an extended input schema to set the monitoring parameters.

Using Extended Schemas

For Scoring Jobs

MOC allows for one input schema and one output schema per business model. In order to enable MOC to recognize these files OOTB, follow the naming convention:

input_schema.avsc for the input schema
output_schema.avsc for the output schema

To signal to ModelOp runtimes that schemas are to be used for scoring jobs on a particular model, we add the following smart comments at the top of the primary source code:

Code Block
# modelop.schema.0: input_schema.avsc # modelop.schema.1: output_schema.avsc

Note that the pound sign # assumes that the model is a Python model. You should use whatever syntax is reserved for one-line comments in the programming language of the model.
The primary source code is the code file where the scoring function is defined.

MOC allows you also to enable schema checking on either input or output, but not both. Say, for example, that you want to enable schema checking on input data, but not on outputs. The smart comments, in this case, should be:

Code Block
# modelop.schema.0: input_schema.avsc # modelop.slot.1: in-use

if neither slot (input/output) is to be schema-checked, one can leave the smart comments off altogether.

Schema Enforcement on ModelOp Runtimes

REST

If a model is deployed to a ModelOp Center runtime as a REST endpoint, the smart comments described above will be used to determine which slots are to be schema-checked. With schema-checking enabled, requests made to that runtime that fail either the input or output schema checks return a 400 error with a rejected by schema message.

Batch Jobs

Batch jobs can be run from the ModelOp Center UI or from the CLI. Batch Jobs are more flexible with schema checking, as one could override the smart comments when creating the job under Job Options. When schema checking is enabled, records that do not conform to the provided schema are filtered out. If the input record fails the check against the input schema, then it is simply rejected by the schema and not scored. If the output fails the output schema check, then the record is scored, but won’t be piped to the output file. This is so that the output is not allowed into a downstream application where it could cause errors.

For Monitoring

Monitoring OOTB requires that the business model has an extended input schema. When a monitoring job is created, the monitor’s init function accesses the extended input schema, and uses it to determine certain monitoring parameters, such as the names of the fields corresponding to specific roles (score, label, etc.)

Versions Compared

Old Version 13

New Version 14

Key

Schema Enforcement on ModelOp Runtimes

Batch Jobs

REST

Extended Schema & Monitoring

An Example

Extended Schema & Monitoring

An Example

scoringOptional

Possible values

Reserved field roles and protected classes

scoringOptional

Possible values

Reserved field roles and protected classes

Generating Extended Schemas

Using the UI

Using Extended Schemas

For Scoring Jobs

Schema Enforcement on ModelOp Runtimes

REST

Batch Jobs

For Monitoring

Page Comparison

Versions Compared

Old Version 13

New Version 14

Key

Schema Enforcement on ModelOp Runtimes

Batch Jobs

REST

Extended Schema & Monitoring

An Example

Extended Schema & Monitoring

An Example

scoringOptional

Possible values

Reserved field roles and protected classes

scoringOptional

Possible values

Reserved field roles and protected classes

Generating Extended Schemas

Using the UI

Using Extended Schemas

For Scoring Jobs

Schema Enforcement on ModelOp Runtimes

REST

Batch Jobs

For Monitoring