Introduction
ModelOp Center provides the ability to integrate with SQL-based databases. Batch jobs can currently use SQL assets as inputs or outputs (or both). When used as an input, the provided SQL statement is run once, and the rows returned are fed into the underlying model, one at a time, as individual records. When used an output, it is expected that the model emit a flat dictionary of key/value pairs, and the output SQL statement is executed once for each emitted record, with the values from the dictionary mapped into the SQL statement via the keys and the parameters listed in the SQL asset.
SQL Asset Overview
SQL assets have 3 main components: the connection string, the SQL statement itself, and the parameters.
The connection string
Connection strings all follow the same basic pattern:
scheme://username:password@host:port/database
The scheme
indicates which underlying odbc driver should be used to connect to the DB in question. Currently, the Centos and UBI 8 images support mysql
, sqlserver
, snowflakedsiidriver
, and db2
. Additional drivers can be installed, so long as the /etc/odbcinst.ini
file is properly updated.
The username
and password
fields are optional. Credentials can instead be configured in the runtime’s application.json
file.
{ "sql-credentials": [ { "host": "host1.com", "username": "my_username", "password": "my_password" }, { "host": "host2.com", "username": {"from_env": "MY_ENV_VARIABLE"}, "password": {"from_file": "/config/the-password"} } ] }
A connection string of scheme://host1.com:port/database
would use the host1.com
as the key to determining the username and password. Note that both the username and password can be specified directly in the config file, or looked for in an environment variable, or in a specified file. There is an additional from_file_from_env
option that indicates the value in question should be found in a file, the name of which is found in the specified environment variable.
The host
is a required field that indicates the hostname where the database lives.
The port
field is required. The only exception is when the driver is mysql
and the database is running on the default port of 3306.
The database
field indicates… the database. And it is required.
The SQL statement
When used as an input asset, the specified SQL statement will be executed once, and the runtime will iterate over returned rows, feeding them one at a time into the model. And example SQL statement:
select * from table2
Alternatively, ?
can be included in the statement to allow parameters specified in the SQL asset to be used:
select * from table2 where columnABC = ?
When used as an output asset, the specified SQL statement will be executed for each output record:
insert into table2 (str, i, bl) values (?, ?, ?)
The way values from emitted records are mapped into the output SQL statement is described in the next section.
Bulk inserts
When a SQL asset is being used as an output for a job, the runtime checks to see if the provided statement looks like a simple insert statement:
insert into tablename (columnA, columnB) values (?, ?);
If it matches the above pattern (with the column names being optional), then rather than execute the SQL statement for each output record, the runtime instead dynamically builds up a single large SQL statement that gets executed when the batch job is complete. In the above example, suppose you emitted 3 records with values of {a, b}, {c, d}, and {e, f}. The runtime will (re)generate the SQL statement to look like this:
insert into tablename (columnA, columnB) values (a, b), (c, d), (e, f);
The parameters
When used as an input asset, specified parameters are substituted into the specified SQL statement in the order they are presented. The number of ?
in the SQL statement must match the number of specified parameters. Additionally, what is specified is the value that should be fed into the SQL statement. So if the statement looks like this:
select * from table2 where columnABC = ?
… and the only specified parameter is my_value
, the SQL statement that gets executed will look like this:
select * from table2 where columnABC = 'my_value'
On the output side of things, the parameters specify the key to use to find the appropriate value in the output record. Consider the following SQL statement:
insert into table2 (alpha, beta) values (?, ?)
Consider parameters that specify aaa
and xyz
.
Consider an output record that looks like this:
{ "aaa": "value1", "xyz": 123 }
The SQL statement executed will have the values (value1
and 123
) inserted into the SQL statement in the order the parameters are specified:
insert into table2 (alpha, beta) values ('value1', 123)
SQL Setup - ModelOp Runtime
Snowflake
Prerequisites
The Centos or UBI 8 image.
Runtime Configuration
No special runtime configuration is required.
Credentials Configuration
Credentials are mapped based on host name, not driver type, so credentials configuration is identical across all drivers.
SQL Server
Prerequisites
The Centos or the UBI 8 image.
Runtime Configuration
No special runtime configuration is required.
Credentials Configuration
Credentials are mapped based on host name, not driver type, so credentials configuration is identical across all drivers.
IBM DB2
Prerequisites
The Centos or UBI 8 image.
Runtime Configuration
No special runtime configuration is required.
Credentials Configuration
Credentials are mapped based on host name, not driver type, so credentials configuration is identical across all drivers.
Adding SQL Assets to a Model
Via the ModelOp UI
On the Model Attachments page:
… with ‘Attachments’ selected, click the ‘Add SQL Asset’ button in the upper right hand corner.
Fill out the pop up form with the sql asset details:
Running a Job using SQL Assets`
Via the ModelOp UI
When choosing assets, select SQL
from the dropdown box:
Next, fill out the form as follows, choosing the actual values you want to inject into the SQL String as the SQL Parameters:
When filling out the form for an output asset, the parameters chosen need to match the names of the keys in the emitted dictionary: