One of the features we like best about Snowflake, the cloud data platform is its capability to fit into automated deployment pipelines. This blog covers basic ideas on how to run SQL code using automated deployment technology.
This is particulary useful in a number of scenario’s, amongst which :
- We want to separate development practices from the actual execution of the code against production data
- We want to keep a version history of all code that ran against a database
- We want to automate a number of actions once a development team hits a milestone
The scenario we’ll use to illustrate the idea,
- Assumes teams or individuals commit code to a central version repository, in our case GIT.
- Assumes contributions to the repository trigger automated events to create an ad-hoc computing environment outside of Snowflake
- The ad-hoc computing environment is provisioned with both the committed code as well as the client interface tooling to execute this code against a target database.
We’ll also introduce good code hygiene, by witholding user credentials from our contributed code, instead storing these credentials as
Introducing SnowSQL
Snowflake offers a command line interface client, SnowSQL . The client runs on Windows, MacOS and a variety of Linux distributions. This makes it a great choice to manage our connections to Snowflake .
After installing the client, we can pass on a parameter to the CLI to execute a series of SQL commands .
snowsql -f code.sql;
The code.sql file contains one straightforward piece of code:
select current_date()
Let’s start from a code versioning system
For this example, we’ll use Github.com as it combines both of the features we need for the case:
- Managing code versions through their implementation of GIT.
- Providing continuous integration and continuous deployment capabilities through their Github Actions feature, which we’ll use to execute SQL code upon committing.
Any standard repository on github will be appropriate for the job. The code that needs execution will be committed to this repo.
Apart from one or more files containing SQL code, we need to provide Github with instructions on how to execute the SQL.
This can be done by creating a yml file in the .github/workflows directory. This yml file contains Github Actions syntax that will
- setup a temporary environment
- install a SnowSQL interface
- provide a secure environment to shield sensitive information, such as passwords.
A script that can provide all of this functionality using SnowSQL can be boiled down to:
name: SnowSQL
env:
SNOWSQL_DEST: ~/snowflake
SNOWSQL_ACCOUNT: tropos_zts.eu-west-1
SNOWSQL_USER: github
SNOWSQL_PWD: ${{ secrets.SF_PASSWORD }}
on: push
jobs:
executequery:
name: Install SnowSQL
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@master
- name: Download SnowSQL
run: curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/linux_x86_64/snowsql-1.2.9-linux_x86_64.bash
- name: Install SnowSQL
run: SNOWSQL_DEST=~/snowflake SNOWSQL_LOGIN_SHELL=~/.profile bash snowsql-1.2.9-linux_x86_64.bash
- name: Test installation
run: ~/snowflake/snowsql -v
- name: Execute SQL against Snowflake
run: ~/snowflake/snowsql -f code.sql;
Let’s have a look at the major focus areas here
Setting up a temporary environment
First, we decide when to build the environment. To keep things straightforward, we configure the job to run every time one contributes code to the central repository.
The environment will be built every time one contributes code to the repository. We’ll use a version of Ubuntu as a base.
on: push
jobs:
executequery:
name: Install SnowSQL
runs-on: ubuntu-latest
Install SnowSQL
As there’s no package-based installation available for Ubuntu, we’ll download an installer from Snowflake’s public website .Alongside this, the checkout@master command copies all files from the repository to the actual run environment.
steps:
- name: Checkout
uses: actions/checkout@master
- name: Download SnowSQL
run: curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/linux_x86_64/snowsql-1.2.9-linux_x86_64.bash
- name: Install SnowSQL
run: SNOWSQL_DEST=~/snowflake SNOWSQL_LOGIN_SHELL=~/.profile bash snowsql-1.2.9-linux_x86_64.bash
To make sure out version of SnowSQL is installed, we run a test
- name: Test installation
run: ~/snowflake/snowsql -v
Executing the code
Once the installation is verified, the content in. the code.sql file can be executed. – name: Execute SQL against Snowflake
- name: Execute SQL against Snowflake
run: ~/snowflake/snowsql -f code.sql;
Keeping secrets at bay
SnowSQL takes its configuration parameters from environment variables. As per best practice, we don’t check sensitive information such as access credentials in code repositories – and should actually prevent anyone from doing so.
To make sure SnowSQL is able to access Snowflake anyhow, we make use of the secrets feature in Github Actions.
These are protected variables that are injected in our envionment at run time. Their values aren’t logged neither displayed.
env:
SNOWSQL_DEST: ~/snowflake
SNOWSQL_ACCOUNT: my_snowflake_account.eu-west-1
SNOWSQL_USER: ${{ secrets.SF_USER }}
SNOWSQL_PWD: ${{ secrets.SF_PASSWORD }}
Conclusion
In a few lines of code, we are able to split development and execution of SQL code using a simplified version of a deployment pipeline.Real-world examples would be far more elaborate, with collaboration strategies, automated quality control and approval mechanisms built in. Nonetheless, this quick setup will ensure an isolated production environment of Snowflake can exist without anyone having access to the privileged access users.
Next steps
This article summarizes the basic concept of GitOps, i.e. deploying code authorized through a central version repository. This is merely an illustration of how we typically run operations in project delivery context. We have packed our experience in building CI/CD (continuous integration, continuous deployment) pipelines, automating data testing and aligning them with development processes. Reach out if you like to know more!