Integrate with Git

ModelOp Center seamlessly integrates with existing enterprise source code management (SCM) systems, such as Bitbucket, Github, and Git, to allow enterprises to leverage existing IT investments.

Table of Contents

 

Introduction

ModelOp Center allows integration with Git platforms in order to externalize the management of model assets as well as allow for a distributed development of such resources. This integration allows for the development of such assets to be done from the platform (IDE) of choice of the Data Scientist providing greater flexibility using a widely accepted technology such as Git.

ModelOp Center allows different configured levels of interaction with Git:

  1. No Git interaction: Model assets are not stored in a Git repository but rather are stored in a local database.

  2. Local Git repository: It is possible to create and work in a local Git repository within ModelOp Center where changes to the asset will be committed changes. A remote git repository can be added to the asset at any point.

  3. Remote Git repository. It is also possible to configure a remote git repository to sync external changes back into ModelOp Center working directory. A configuration option controls whether the commits in the repo are pushed to the remote.

Remote import

Given a remote repository URL to clone, ModelOp Center clones down the repo and creates a “Model” with all files represented as assets within it. All non-source files are imported as external assets and loaded into the configured external file asset repository instance. ModelOp Center works with multiple external file asset repository instances, including MinIO, S3, HDFS and Azure Blob Storage. The user can select the desired repository instance during import or it will be automatically determined based on the selected OAuth2/LDAP group, if configured. All source assets will be loaded into the “Model” instance, and have their repository information loaded appropriately.

  • Source files are automatically determined (by file extension):

    • Model source: "py", "py3", "c", "cc", "cpp", "r", "pfa", "ipynb", "ppfa", "java", "m", "h", "hh", "hpp", "sql", "sas", "jexp"

    • Test result comparator: ".dmn", ".cmmn"

  • Non-source files are considered any other type of files.

ModelOp Center allows importing a git repository through ModelOp Center Dashboard

  1. Click on the “Models” menu item.

  2. On the top right corner click on “Import”.

  3. Click on the “Git” option.

  4. Provide the details of the “Remote Repository Url” , “Branch” , “Model Name“ and “Description”.
    Please note that the URL to extract depends on your Git platform. Generally, you can find this URL by going to the “Clone” button and selecting the “HTTPS clone” option.

  5. Click on “Import model

 

 

Note: these steps assume the repository is public without authentication. The following sections detail how to configure the integration when authentication is required.

Git Integration

In order to integrate authenticated remote repository, ModelOp Center will need to be configured to utilize a Service Account to pull from the specified repository. This Service Account should be able to read and/ or write for the repositories to imported and integrated with ModelOp Center.

Git Credentials

The username and password to integrate with the Git repository are set as properties in the container definition for model-manager. These can be obscured using Kubernetes Secrets or other existing credential management systems.

model-manage.git.username=<The git username> model-manage.git.password=<The username's passphrase>

The above configuration will be used to access all remote repositories from ModelOp Center, which requires the Service Account to be added to integrated repositories.

Alternatively, it is possible to add more granularity to the credentials used for git.

model-manage: git: username: <Overall git username> # <-- same as specified above password: <Overall git passphrase> # <-- same as specified above storedCredentials: - context: https://github.com/ username: <first context user> password: <first context passphrase> - context: https://gitlab.com/ username: <second context user> password: <second context passprhase>

The above allows us to determine different git credentials for ModelOp Center depending on the URL the instance is trying to reach out to.

Note: The same credential selection criteria is used as it works for git, so we can leverage Git Config (refer to subsection below) to add additional features like useHttpPath (refer to git docs https://git-scm.com/docs/gitcredentials#Documentation/gitcredentials.txt-useHttpPath ) and achieve a configuration that allows us one set of credentials per repository, if we really needed to.

Local Repository Settings

ModelOp Center also provides additional parameters to configure the behavior of the local repository.

Container: "model-manager":

model-manage.repo.push=false # Enable/Disable to push back to the remote repository after an update has been applied to an asset in ModelOp Center model-manage.repo.change-poll-rate=120000 # Time interval between remote repository polls for updates. model-manage.repo.base-directory=/tmp/model-manage-repos # Default container's base for git repositories

Git Config

It is possible to add git-specific configurations to control how ModelOp Center’s git repository behaves. Please refer to git scm for more details on the git configuration and how it can be used to achieve a certain behavior.

Assume a property key of the form 'section.subsection.variable=value', the following is used as a guideline:

  • section is considered to be everything up to the first dot of the param key.

  • subsection is considered everything in between the first and the last dots of the param key. This is optional and may be null if the key contains only one dot (i.e. 'section.variable' as opposed to 'section.subsection.variable')

  • variable name is considered everything from the last dot onwards.

  • variable value is the param value itself. Repeated values are ok and are considered multivars in gitconfig. These should be presented in yaml as list or its equivalent properties (key.1key.2).

See more about gitconfig syntax here.

The idea behind ModelOp Center’s git config customization is that any git config variable can be mapped in the following way.

The example desired git config file being the following:

Then the expected ModelOp Center configuration would be the following:

Please refer to git-scm for a more comprehensive list of variables that can be set through git config.

Load on startup

Model Manage can be configured to automatically import repositories upon startup.

Please note that this import operation will be performed only once per model in this environment, if the model has already been imported it will not attempt to import it again, or to perform any of the valid post-import operations.

Valid options

  • repositoryBranch - Indicates the remote git repository branch to import from.

  • repositoryRemote - Indicates the remote git repository clone URL to import from.

  • createBaseSnapshot (optional) - Indicates if an initial Snapshot is desired to be created right after import. Value is “false” by default.

  • group (optional) - Indicates the group that this model will be imported as. If this value is not present it will default to ModelOp’s default group (configured in the property here: oauth2.group-base-access.default-access-group, ‘modelop’ by default).

  • deployedModel.runtimeName (optional) - The name of the target runtime to deploy as batch right after import.

  • deployedModel.schedule.quartzSchedule (optional) - A valid Quartz expression to schedule the execution of a provided signal name.

  • deployedModel.schedule.signalActionName (optional) - A valid Signal name to trigger for a given schedule.

  • runtimeWaitTimeout (optional) - The amount of time in milliseconds to wait in the background for the runtime to be available so that we can proceed to deploy the model after import (if the deployedModel section was provided). This value is 10 minutes by default (600000 ms).

A snapshot (and deployment) will only be created if deployedModel.runtimeName is not empty. If this is set a background thread will wait (up to a configurable amount of time) for the runtime to register with model-manage so that it can be used as the engine for the deployed model. The name, type, and group are also used in the target runtime for the snapshot.

View Asset Git Details

ModelOp Center provides multiple ways to see the details of the git integration for various assets.

ModelOp Center Dashboard

The model’s assets repository configuration can be seen within the ModelOp Center Dashboard.

  1. Click on the “Models” menu item.

  2. From the list on the main panel, select the desired model to inspect.

  3. Click the Repository tab.

Note that ModelOp Center provides details of the last sync with the backing git repository. The sync rate is set in the ModelOp Center core configuration files, but is typically 2-3 minutes by default. The user can click “Sync Git” to force a git sync immediately.

 

Jupyter Notebook Plugin

Git configuration is available directly within a Notebook via the Jupyter Notebook Plugin. When registering or opening a model, these details are available in the “View” button, a ModelOp specific Cell Toolbar button.

  1. Within the Jupyter Notebook, click on the menu “View”.

  2. Click on “Cell Toolbar”.

  3. Select the “ModelOp Model” toolbar option.

  4. On the desired model asset (cell) click on the toolbar button “Asset Details”. This button will be visible for ModelOp registered models only.

    1. If the selected asset does not have a repository configured previously, it will allow the user to configure it.

    2. If the selected asset already had a repository configured, these values can be modified as well.

 

Related Articles

Register a Model