🥋 Karate 🥰 ❄️ Snowflake

🇫🇷 Ne ratez pas votre SQL avec un test Karate

🇬🇧 Don’t miss your SQL code with a Karate test

🪄 Once upon a time…​ ✨

A great data pipeline !

data pipeline simple

$ whoami 👨‍💻

A great pipeline…​ but sometimes…​ 😥

Not so great…​ more complex

data pipeline dirty data pipeline complex

💡We need Tests 🤗 & QA 🕵️ 🐛

But also some tools…​

Let’s talk about architecture 👷

ETL vs ELT…​

snowflake text

  • Data Cloud…​ DBMS++

  • Snowpark, Stream/Task…​

  • STORAGE 💰 dissociated from COMPUTE 💰💰💰

  • Scalable COMPUTE

kapoeira text

Kapoeira - Example (1/2)

burger factory

Kapoeira - Example (2/2)

Feature: Burger 🍔 feature

  Background:
    Given input topic
      | topic     | alias        | key_type | value_type |
      | bread     | bread-in     | string   | string     |
      | vegetable | vegetable-in | string   | string     |
      | meat      | meat-in      | string   | string     |
    And output topic
      | topic  | alias      | key_type | value_type | readTimeoutInSecond |
      | burger | burger-out | string   | string     | 5                   |
    And var uuid = call function: uuid

  Scenario Outline: Burger Factory
    When records with key and value are sent
      | topic_alias  | key        | value       |
      | bread-in     | 🧑‍🍳_${uuid} | <bread>     |
      | vegetable-in | 🧑‍🍳_${uuid} | <vegetable> |
      | meat-in      | 🧑‍🍳_${uuid} | <meat>      |
      
    Then expected records
      | topic_alias | key        | value  |
      | burger-out  | 🧑‍🍳_${uuid} | result |

    And assert result $ == "<output>"

    Examples:
      | bread | vegetable | meat | output       |
      | 🍞    | 🍅        | 🥩   | 🍔           |
      | 🍞    | 🍅        | 🍗   | 🍔           |
      | 🍞    | 🍅        | 🐟   | 🍔           |
      | 🍞    | 🥕        | 🥩   | 🍞 + 🥕 + 🥩 |

dbt text

karatelabs

Karate - Example

Feature: karate 'hello world' example
  
  Scenario: create and retrieve a cat
    Given url 'http://myhost.com/v1/cats'
    And request { name: 'Billie' }
    When method post
    Then status 201
    And match response == { id: '#notnull', name: 'Billie' }
    
    Given path response.id
    When method get
    Then status 200

Karate - HTTP
Snowflake - JDBC SQL
🤔

Karate/Snowflake - Example

Feature: SELECT V1
  Background:
    * url "https://<SNOWFLAKE_ACCOUNT>.snowflakecomputing.com/api/v2"
    * string jwtToken = karate.exec("snow connection generate-jwt --silent --account <SNOWFLAKE_ACCOUNT> --user <SNOWFLAKE_USER> --private-key-file <SNOWFLAKE_USER_PEM>")
    * header Content-Type = "application/json"
    * header Accept = "application/json"
    * header Authorization = "Bearer " + jwtToken
    * header User-Agent = "<MY_APP_IT>"
    * header X-Snowflake-Authorization-Token-Type = "KEYPAIR_JWT"

  Scenario: Select 1 cutter
    Given path "statements"
    And text payload =
        """
        {
          "statement": "SELECT SERIAL_NUMBER, CUTTER_TYPE FROM CUTTER WHERE SERIAL_NUMBER='MY_VECTOR'",
          "timeout": 60,
          "role": "<SNOWFLAKE_ROLE>",
          "warehouse": "<SNOWFLAKE_WAREHOUSE>",
          "database": "<SNOWFLAKE_DATABASE>",
          "schema": "<SNOWFLAKE_SCHEMA>"
        }
        """
    When request payload
    And method post
    Then status 200
    And match response.resultSetMetaData.numRows == 1
    And match response.resultSetMetaData.rowType[0].name == "SERIAL_NUMBER"
    And match response.resultSetMetaData.rowType[1].name == "CUTTER_TYPE"
    And match response.data[0][0] == "MY_VECTOR"
    And match response.data[0][1] == "VECTOR"

Complex for a SELECT…​😠
Run Task & DBT ?🤔

data pipeline elt
  • Karate DSL extension - Examples

  • karate.exec(<any_command>)

  • 😃

💡 karate-connect
🧩 extensions

  • karate-connect-standalone.jar

    • karate-core.jar karatelabs

    • karate-connect.jar lectra

      • 🧩 snowflake, kubernetes

      • 🧩 rabbitmq, kafka

  • 🐳 karate-connect

    • karate-connect-standalone.jar

    • 🛠️ dbt, kubectl, snowflake-cli

How to run ?

java -Dextensions=<ext1,ext2...> \
    -jar karate-connect-standalone.jar \
    <features_path> <optional_params...>
🐳
docker run --rm \
  -v $(pwd)/<features_path>:/features \
  -v $(pwd)/<reports_path>:/target \
  -e KARATE_EXTENSIONS=<ext1,ext2...>  -e ... \
  karate-connect:latest <optional_params...>

How to customize ?

karate-config.js
function fn() {
  const mySuperFunction = (input) => input.toUpperCase();
  return {
    "projectName": "mySuperProject",
    "myFunction": mySuperFunction
  };
}

See karate documentation for file location

snowflake extension config

cli-config
SNOWFLAKE_ACCOUNT=xxx.west-europe.azure
SNOWFLAKE_USER=MY_USER
SNOWFLAKE_PRIVATE_KEY_PATH=/my-path/private-key.pem
PRIVATE_KEY_PASSPHRASE=my-passphrase
{
  "account": "xxx.west-europe.azure",
  "user": "MY_USER",
  "privateKeyPath": "/my-path/private-key.pem",
  "privateKeyPassphrase": "my-passphrase"
}
snowflake-config
SNOWFLAKE_DATABASE=MY_DB
SNOWFLAKE_SCHEMA=MY_SCHEMA
SNOWFLAKE_ROLE=MY_ROLE
SNOWFLAKE_WAREHOUSE=MY_WH
{
  "database": "MY_DB",
  "schema": "MY_SCHEMA"
  "role": "MY_ROLE",
  "warehouse": "MY_WH"
}

snowflake extension

  • Config with env vars, JS files…​

  • JWT generation

  • SQL execution (with retry, timeout…​)

  • Upload CSV/JSON into Table

  • Clone / Drop schema

  • And more…​😃

Karate/Snowflake - Example

Feature: SELECT V2
  Background:
    * json cliConfig = read('classpath:cli-config.json')
    * json snowflakeConfig = read('classpath:snowflake-config.json')
    * string jwt = snowflake.cli.generateJwt(cliConfig)
    * json restConfig = ({jwt, cliConfig, snowflakeConfig})

  Scenario: Select 1 cutter
    Given text statement =
    """
      SELECT SERIAL_NUMBER, CUTTER_TYPE
      FROM CUTTER
      WHERE SERIAL_NUMBER='MY_VECTOR'
    """
    And def response = snowflake.rest.runSql({...restConfig, statement})
    And table expectedData
      | SERIAL_NUMBER | CUTTER_TYPE |
      | "MY_VECTOR"   | "VECTOR"    |
    And match response.data == expectedData

Demo Time 🎬

Kafka version
burger factory

Snowflake version
burger factory

DDL

ddl

How to run ?

java -Dextensions=snowflake -jar karate-connect-standalone.jar \
  ${IT_PATH}/features --configdir ${IT_PATH} --reportdir ${REPORTS_PATH} --threads ${NB_THREADS}
🐳
docker run --rm \
  -v $(pwd)/${IT_PATH}:/${IT_PATH} \
  -v $(pwd)/${REPORTS_PATH}:/${REPORTS_PATH} \
  -v ${SNOWFLAKE_PRIVATE_KEY_PATH}:/${SNOWFLAKE_PRIVATE_KEY_PATH} \
  -v $(pwd)/${DBT_PROJECT}:/${DBT_PROJECT} \
  --env-file ./demo.env -e KARATE_EXTENSIONS=snowflake \
  karate-connect:latest ${IT_PATH}/features --configdir ${IT_PATH} --reportdir ${REPORTS_PATH} --threads ${NB_THREADS}

burger-factory.feature

Feature: Demo
  Background:
    * json cliConfig = snowflake.cliConfigFromEnv
    * string jwt = snowflake.cli.generateJwt(cliConfig)
    * json restConfig = ({jwt, cliConfig})
    * string clientId = "😋_"+base.random.uuid()
    * def genStatement = (table, value) => "INSERT INTO "+table+"(CLIENT_ID, VALUE) VALUES ('"+clientId+"','"+value+"')"

  Scenario Outline: Burger Factory - <bread> + <vegetable> + <meat> = <output>
    Given table inserts
      | table       | value         | config                     |
      | "BREAD"     | "<bread>"     | snowflakeConfigs.BREAD     |
      | "VEGETABLE" | "<vegetable>" | snowflakeConfigs.VEGETABLE |
      | "MEAT"      | "<meat>"      | snowflakeConfigs.MEAT      |
    And json responses = karate.map(inserts, (row) => snowflake.rest.runSql({...restConfig, snowflakeConfig: row.config, statement: genStatement(row.table, row.value)}).status)
    And match each responses == "OK"

    When string dbtConsoleOutput = karate.exec("dbt run")
    And match dbtConsoleOutput contains "Completed successfully"

    Then string selectStatement = "SELECT VALUE FROM BURGER WHERE CLIENT_ID='"+clientId+"'"
    And json response = snowflake.rest.runSql({...restConfig, snowflakeConfig: snowflakeConfigs.BURGER, statement: selectStatement })
    And match response.data == [ { "VALUE" : "<output>" } ]

    Examples:
      | bread | vegetable | meat | output       |
      | 🍞    | 🍅        | 🥩   | 🍔           |
      | 🍞    | 🍅        | 🍗   | 🍔           |
      | 🍞    | 🍅        | 🐟   | 🍔           |
      | 🍞    | 🥕        | 🥩   | 🍞 + 🥕 + 🥩 |

Clone schemas…​

Feature: Demo - Clone Schemas
  Background:
    * json cliConfig = snowflake.cliConfigFromEnv
    * string jwt = snowflake.cli.generateJwt(cliConfig)
    * json restConfig = ({jwt, cliConfig, snowflakeConfig: snowflakeConfigs.BREAD})
    * string clientId = "😋_"+base.random.uuid()
    * def genStatement = (table, value) => "INSERT INTO "+table+"(CLIENT_ID, VALUE) VALUES ('"+clientId+"','"+value+"')"
    * json cloneResult = cloneSnowflakeConfigs(restConfig)
    * configure afterScenario = function(){ dropSnowflakeConfigs(restConfig, cloneResult.snowflakeConfigs) }

  Scenario Outline: Burger Factory - <bread> + <vegetable> + <meat> = <output>
    Given table inserts
      | table       | value         | config                                 |
      | "BREAD"     | "<bread>"     | cloneResult.snowflakeConfigs.BREAD     |
      | "VEGETABLE" | "<vegetable>" | cloneResult.snowflakeConfigs.VEGETABLE |
      | "MEAT"      | "<meat>"      | cloneResult.snowflakeConfigs.MEAT      |
    And json responses = karate.map(inserts, (row) => snowflake.rest.runSql({...restConfig, snowflakeConfig: row.config, statement: genStatement(row.table, row.value)}).status)
    And match each responses == "OK"

    * string cmd = cloneResult.dbtPrefix+" dbt run"
    When string dbtConsoleOutput = karate.exec("bash -c '"+cmd+"'")
    And match dbtConsoleOutput contains "Completed successfully"

    Then string selectStatement = "SELECT VALUE FROM BURGER WHERE CLIENT_ID='"+clientId+"'"
    And json response = snowflake.rest.runSql({...restConfig, snowflakeConfig: cloneResult.snowflakeConfigs.BURGER, statement: selectStatement })
    And match response.data == [ { "VALUE" : "<output>" } ]

    Examples:
      | bread | vegetable | meat | output       |
      | 🍞    | 🍅        | 🥩   | 🍔           |
      | 🍞    | 🍅        | 🍗   | 🍔           |
      | 🍞    | 🍅        | 🐟   | 🍔           |
      | 🍞    | 🥕        | 🥩   | 🍞 + 🥕 + 🥩 |

Next Steps 🚀

🎉 Thank you - Questions?
🥋🥰❄️

Slides 🖥️

qrcode slides

Feedback🙏

qrcode feedback