Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

ModelOp fModelOp Center seamlessly integrates with existing SQL-based databases, such as Snowflake, DB2, SQL Server and others, to allow enterprises to leverage existing IT investments in their data platforms.

Table of Contents

Table of Contents

Introduction

ModelOp Center provides the ability to integrate with SQL-based databases. Batch jobs can currently use SQL assets as inputs or outputs (or both). When used as an input, the provided SQL statement is run once, and the rows returned are fed into the underlying model, one at a time, as individual records. When used an output, it is expected that the model emit a flat dictionary of key/value pairs, and the output SQL statement is executed once for each emitted record, with the values from the dictionary mapped into the SQL statement via the keys and the parameters listed in the SQL asset.

SQL Asset Overview

SQL assets have 3 main components: the connection string, the SQL statement itself, and the parameters.

The connection string

Connection strings all follow the same basic pattern:

scheme://username:password@host:port/database

The scheme indicates which underlying odbc driver should be used to connect to the DB in question. Currently, the Centos and UBI 8 images support mysql, sqlserver, snowflakedsiidriver, db2, sqlserver, databricks, and db2 postgres. Additional drivers can be installed, so long as the /etc/odbcinst.ini file is properly updated.

The username and password fields are optional. Credentials can instead be configured in the runtime’s application.json file.

Code Block
languagejson
{
  "sql-credentials": [
    {
      "host": "host1.com",
      "username": "my_username",
      "password": "my_password"
    },
    {
      "host": "host2.com",
      "username": {"from_env": "MY_ENV_VARIABLE"},
      "password": {"from_file": "/config/the-password"}
    }
  ]
}

A connection string of scheme://host1.com:port/database would use the host1.com as the key to determining the username and password. Note that both the username and password can be specified directly in the config file, or looked for in an environment variable, or in a specified file. There is an additional from_file_from_env option that indicates the value in question should be found in a file, the name of which is found in the specified environment variable.

The host is a required field that indicates the hostname where the database lives.

The port field is required. The only exception is when the driver is mysql and the database is running on the default port of 3306.

The database field indicates… the database. And it is required.

The SQL statement

When used as an input asset, the specified SQL statement will be executed once, and the runtime will iterate over returned rows, feeding them one at a time into the model. And example SQL statement:

select * from table2

Alternatively, ? can be included in the statement to allow parameters specified in the SQL asset to be used:

select * from table2 where columnABC = ?

When used as an output asset, the specified SQL statement will be executed for each output record:

insert into table2 (str, i, bl) values (?, ?, ?)

The way values from emitted records are mapped into the output SQL statement is described in the next section.

Bulk inserts

When a SQL asset is being used as an output for a job, the runtime checks to see if the provided statement looks like a simple insert statement:

insert into tablename (columnA, columnB) values (?, ?);

If it matches the above pattern (with the column names being optional), then rather than execute the SQL statement for each output record, the runtime instead dynamically builds up a single large SQL statement that gets executed when the batch job is complete. In the above example, suppose you emitted 3 records with values of {a, b}, {c, d}, and {e, f}. The runtime will (re)generate the SQL statement to look like this:

insert into tablename (columnA, columnB) values (a, b), (c, d), (e, f);

The parameters

When used as an input asset, specified parameters are substituted into the specified SQL statement in the order they are presented. The number of ? in the SQL statement must match the number of specified parameters. Additionally, what is specified is the value that should be fed into the SQL statement. So if the statement looks like this:

select * from table2 where columnABC = ?

… and the only specified parameter is my_value, the SQL statement that gets executed will look like this:

select * from table2 where columnABC = 'my_value'

On the output side of things, the parameters specify the key to use to find the appropriate value in the output record. Consider the following SQL statement:

insert into table2 (alpha, beta) values (?, ?)

Consider parameters that specify aaa and xyz.

Consider an output record that looks like this:

Code Block
languagejson
{
  "aaa": "value1",
  "xyz": 123
}

The SQL statement executed will have the values (value1 and 123) inserted into the SQL statement in the order the parameters are specified:

insert into table2 (alpha, beta) values ('value1', 123)

SQL Setup - ModelOp Runtime

Snowflake

Prerequisites

The Centos or UBI/RHEL 8 image.

Runtime Configuration

No special runtime configuration is required.

Credentials Configuration

Credentials are mapped based on host name, not driver type, so credentials configuration is identical across all drivers.

SQL Server

Prerequisites

The Centos or the UBI 8 image.

Runtime Configuration

No special runtime configuration is required.

Credentials Configuration

Credentials are mapped based on host name, not driver type, so credentials configuration is identical across all drivers.

IBM DB2

Prerequisites

The Centos or UBI 8 image.

Runtime Configuration

No special runtime configuration is required.

Credentials Configuration

Credentials are mapped based on host name, not driver type, so credentials configuration is identical across all drivers..

It’s worth noting that databricks typically requires a personal access token, and so the username credential must be set to token and the password credential should be set to the personal access token.

Redshift Support

  • Connections String:

    • Note, the Redshift connector returns data as Unicode character type.

    • To work around this, add UseUnicode=0 to the connection string so the connector returns data as regular SQL types
      Connection String: redshift://redshift-cluster-1.ct8ouww8lud6.us-east-2.redshift.amazonaws.com:5439/dev;UseUnicode=0

  • Configuration Timeout: Redshift operates distinctly from traditional databases/data warehouses, and depending on the table structure, may require additional time to return results

    • By default, the ModelOp Runtime has an ODBC timeout set to 5 seconds. This value may need to be changed for Redshift.

    • To do so, the following timeout value should be configured in SCCS: modelop.runtime.odbc-timeout.

  • Data Types:

Adding SQL Assets to a Model

Via the ModelOp UI

1. On the Business Model details page, click on the Assets tab.

2. Click “Add Asset” in the top right corner. The user is presented with multiple options:

Image RemovedImage Added

3. Click “Add SQL Asset” button in the upper right hand corner.

4. Fill out the pop up form with the SQL asset details:

Running a Job using SQL Assets`

Via the ModelOp UI

1. In the “Create Job” wizard, when choosing the input and/or output assets, select SQL from the dropdown box:

Image RemovedImage Added

2. Next, fill out the form as follows, choosing the actual values you want to inject into the SQL String as the SQL Parameters:

Image RemovedImage Added

When filling out the form for an output asset, the parameters chosen need to match the names of the keys in the emitted dictionary:

Image RemovedImage Added