Skip to content

keboola/ex-cosmosdb

Repository files navigation

Cosmos DB Extractor

CosmosDB extractor for the Keboola Connection.

Configuration

The configuration config.json contains following properties in parameters key:

  • db - object (required): Configuration of the connection.

    • endpoint - string (required): Cosmos DB SQL API endpoint.
    • #key - string (required): Access key.
    • databaseId - string (required): Database ID.
  • id - integer (optional): Id of the config row.

  • name - string (optional): Name of the config row.

  • containerId - string (required): Container is similar to table in the relational db, or collection in the MongoDB.

  • output - string (required): Name of the output CSV file.

  • maxTries- integer (optional): Number of the max retries if an error occurred. Default 5.

  • ignoredKeys- array (optional):

    • CosmosDB automatically adds some metadata keys when the item is inserted.
    • By default, these keys are ignored: ["_rid", "_self", "_etag", "_attachments", "_ts"]
  • incremental - boolean (optional): Enables Incremental Loading. Default false.

  • incrementalFetchingKey - string (optional): Name of key for Incremental Fetching

  • mode - enum (optional)

    • mapping (default) - Documents are exported using specified mapping, read more.
    • raw - Documents are exported as plain JSON strings. CSV file will contain id and data columns.
  • mapping - string - required for mode = mapping, read more.

  • By default, Extractor exports all documents, using the generated SQL query.

    • Default query is SELECT * FROM c
    • Query can be modified with these parameters:
    • select - string (optional), eg. c.name, c.date, default *, read more.
      • For raw mode must be id field present in the query results.
    • from - string (optional), eg. Families f, default c, read more.
    • sort - string (optional), eg. c.date, read more.
    • limit - integer (optional), eg. 500, read more.
    • incrementalFetchingKey - string (optional), eg. c.id, read more
  • Or you can set a custom query using parameter:

    • query - string (optional), eg. SELECT f.name FROM Families f

Actions

Read more about actions in KBC documentation.

Test Connection

Action testConnection tests the connection to the server.

The parameters.db node must be specified in the configuration.

Data flow

  • The connection to CosmosDB is established from the NodeJs code, using the official package @azure/cosmos.
  • There is no reliable driver for PHP now.
  • The NodeJs code prints exported JSON documents to JSON_STREAM_FD file descriptor, from there they are read by the JsonDecoder PHP class.
  • This communication is asynchronous.
  • The code in PHP decodes the loaded JSON documents and writes them to the CSV files using keboola/php-csvmap.

Development

Clone this repository and init the workspace with following command:

git clone https://github.com/keboola/ex-cosmosdb
cd ex-cosmosdb
docker compose build
docker compose run --rm dev composer install --no-scripts
docker compose run --rm dev npm install

Create .env file with following variables:

ENDPOINT=
KEY=
DATABASE_ID=

Run the test suite using this command:

docker compose run --rm dev composer tests

Integration

For information about deployment and integration with KBC, please refer to the deployment section of developers documentation