Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The corresponding Avro schema is the following object:

Expand
titleinput_schema.avsc
Code Block
languagejson
{
    "type": "record",
    "name": "inferred_schema",
    "fields": [
        {
            "name": "UUID",
            "type": "string"
        },
        {
            "name": "amount",
            "type" : ["int", "double"]
        },
        {
            "name": "home_ownership",
            "type": "string"
        },
        {
            "name": "age",
            "type": "string"
        },
        {
            "name": "credit_age",
            "type": ["null", "int"]
        },
        {
            "name": "employed",
            "type": "boolean"
        },
        {
            "name": "label",
            "type": "int"
        },
        {
            "name": "prediction",
            "type": "int"
        }
    ]
} 
  • The Avro schema declares the field names and their allowable types.

  • By default, if a field is listed in the schema, it cannot be omitted from the input/output record. In addition, the value of a field must match (one of) the allowable types as declared in the schema.

  • The key:value pair "type": "record" at the top-level of the JSON object indicates the overall structure of the input, i.e., a dictionary of key:value pairs. If instead of the records above, we have arrays of records such as

    Code Block
    [{"UUID": "9a5d9f42-3f36-4f38-88dd-22353fdb66a7", "amount": 8875.50, "home_ownership": "MORTGAGE", "age": "Over Forty"}]
    [{"UUID": "f8d95245-a186-45a6-b951-376323d06d02", "amount": 9000, "home_ownership": "MORTGAGE", "age": "Under Forty"}]
    [{"UUID": "8607e327-4dca-4372-a4b9-df7730f83c8e", "amount": 5000.50, "home_ownership": "RENT", "age": "Under Forty"}]

    then the Avro schema would have to wrap the inner record in an array, as follows:

    Code Block
    languagejson
    {
        "type": "array",
        "items": {
            "type": "record",
            "name": "inferred_schema",
            "fields": [
                {
                    "name": "UUID",
                    "type": "string"
                },
                {
                    "name": "amount",
                    "type" : ["int", "double"]
                },
                {
                    "name": "home_ownership",
                    "type": "string"
                },
                {
                    "name": "age",
                    "type": "string"
                }
            ]
        }
    }

...

The corresponding extended schema is the following object:

Expand
titleThe corresponding extended schema is the following object:input_schema.avsc
Code Block
languagejson
{
    "type": "record",
    "name": "inferred_schema",
    "fields": [
        {
            "name": "UUID",
            "type": "string",
            "dataClass": "categorical",
            "role": "identifier",
            "protectedClass": false,
            "driftCandidate": false,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "amount",
            "type": [
                "int",
                "double"
            ],
            "dataClass": "numerical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "home_ownership",
            "type": "string",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "age",
            "type": "string",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": true,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        },
        {
            "name": "credit_age",
            "type": [
                "null",
                "int"
            ],
            "dataClass": "numerical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "employed",
            "type": "boolean",
            "dataClass": "categorical",
            "role": "predictor",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": false
        },
        {
            "name": "label",
            "type": "int",
            "dataClass": "categorical",
            "role": "label",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        },
        {
            "name": "prediction",
            "type": "int",
            "dataClass": "categorical",
            "role": "score",
            "protectedClass": false,
            "driftCandidate": true,
            "specialValues": [],
            "scoringOptional": true
        }
    ]
}

...