Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The Avro schema declares the field names and their allowable types.

  • By default, if a field is listed in the schema, it cannot be omitted from the input/output record. In addition, the value of a field must match (one of) the allowable types as declared in the schema.

  • The key:value pair "type": "record" at the top-level of the JSON object indicates the overall structure of the input, i.e., a dictionary of key:value pairs. If instead of the records above, we have arrays of records such as

    Code Block
    [{"UUID": "9a5d9f42-3f36-4f38-88dd-22353fdb66a7", "amount": 8875.50, "home_ownership": "MORTGAGE", "age": "Over Forty", "credit_age": 4511, "employed": true, "label": 1, "prediction": 1}]
    [{"UUID": "f8d95245-a186-45a6-b951-376323d06d02", "amount": 9000, "home_ownership": "MORTGAGE", "age": "Under Forty", "credit_age": 7524, "employed": false, "label": 0, "prediction": 1}]
    [{"UUID": "8607e327-4dca-4372-a4b9-df7730f83c8e", "amount": 5000.50, "home_ownership": "RENT", "age": "Under Forty", "credit_age": null, "employed": true, "label": 0, "prediction": 0}]

    then the Avro schema would have to wrap the inner record in an array, as follows:

Expand
titleinput_schema.avsc
Code Block
languagejson
{
    "type": "array",
    "items": {
        "type": "record",
        "name": "inferred_schema",
        "fields": [
            {
                "name": "UUID",
                "type": "string"
            },
            {
                "name": "amount",
                "type" : ["int", "double"]
            },
            {
                "name": "home_ownership",
                "type": "string"
            },
            {
                "name": "age",
                "type": "string"
            }
        ]
    }
}

Type Unions & Missing Values

...