Page Comparison

...

The corresponding Avro schema is the following object:

Expand

title	input_schema.avsc

Code Block

language	json

{
    "type": "record",
    "name": "inferred_schema",
    "fields": [
        {
            "name": "UUID",
            "type": "string"
        },
        {
            "name": "amount",
            "type" : ["int", "double"]
        },
        {
            "name": "home_ownership",
            "type": "string"
        },
        {
            "name": "age",
            "type": "string"
        },
        {
            "name": "credit_age",
            "type": ["null", "int"]
        },
        {
            "name": "employed",
            "type": "boolean"
        },
        {
            "name": "label",
            "type": "int"
        },
        {
            "name": "prediction",
            "type": "int"
        }
    ]
}

The Avro schema declares the field names and their allowable types.
By default, if a field is listed in the schema, it cannot be omitted from the input/output record. In addition, the value of a field must match (one of) the allowable types as declared in the schema.

The key:value pair "type": "record" at the top-level of the JSON object indicates the overall structure of the input, i.e., a dictionary of key:value pairs. If instead of the records above, we have arrays of records such as

Code Block

[{"UUID": "9a5d9f42-3f36-4f38-88dd-22353fdb66a7", "amount": 8875.50, "home_ownership": "MORTGAGE", "age": "Over Forty"}]
[{"UUID": "f8d95245-a186-45a6-b951-376323d06d02", "amount": 9000, "home_ownership": "MORTGAGE", "age": "Under Forty"}]
[{"UUID": "8607e327-4dca-4372-a4b9-df7730f83c8e", "amount": 5000.50, "home_ownership": "RENT", "age": "Under Forty"}]

then the Avro schema would have to wrap the inner record in an array, as follows:

Code Block

language	json

{
    "type": "array",
    "items": {
        "type": "record",
        "name": "inferred_schema",
        "fields": [
            {
                "name": "UUID",
                "type": "string"
            },
            {
                "name": "amount",
                "type" : ["int", "double"]
            },
            {
                "name": "home_ownership",
                "type": "string"
            },
            {
                "name": "age",
                "type": "string"
            }
        ]
    }
}

...

The corresponding extended schema is the following object:

Expand

title	The corresponding extended schema is the following object:input_schema.avsc

Code Block

language	json

{
    "type": "record",
    "name": "inferred_schema",
    "fields": [
        {
            "name": "UUID",
            "type": "string",
            "dataClass": "categorical",
            "role": "identifier",
            "protectedClass": false,
            "driftCandidate": false,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "amount",
            "type": [
                "int",
                "double"
            ],
            "dataClass": "numerical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "home_ownership",
            "type": "string",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "age",
            "type": "string",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": true,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        },
        {
            "name": "credit_age",
            "type": [
                "null",
                "int"
            ],
            "dataClass": "numerical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "employed",
            "type": "boolean",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "label",
            "type": "int",
            "dataClass": "categorical",
            "role": "label",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        },
        {
            "name": "prediction",
            "type": "int",
            "dataClass": "categorical",
            "role": "score",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        }
    ]
}

...

Versions Compared

Old Version 7

New Version 8

Key