Skip to content

Commit

Permalink
fix: fluvio/quickstart, sync w/ versioned_docs (#211)
Browse files Browse the repository at this point in the history
- **fix fluvio/quickstart**
- **resync versioned_docs**
  • Loading branch information
digikata authored Aug 21, 2024
1 parent 8f9f93e commit 4e8a402
Show file tree
Hide file tree
Showing 10 changed files with 639 additions and 5 deletions.
12 changes: 11 additions & 1 deletion docs/fluvio/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,17 @@ meta:
name: http-quotes
type: http-source
topic: quotes
http:{#my-explicit-id}
http:
endpoint: https://demo-data.infinyon.com/api/quote
interval: 3s
```
### Running the HTTP Connector
We'll use [Connector Developer Kit (cdk)] to download and run the connector.
```bash copy="fl"
$ cdk hub download infinyon/http-source@0.3.8
```

```bash copy="fl"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,11 @@ These connectors are not guaranteed to work with latest fluvio:
* https://github.com/infinyon/labs-redis-sink-connector
* https://github.com/infinyon/duckdb-connector



[Rust Installation Guide]: https://www.rust-lang.org/tools/install
[Fluvio Connector Development Kit (CDK)]: ../cdk.mdx
[Generate a Connector]: ./generate.mdx
[Build and Test]: ./build.mdx
[Start and Shutdown]: ./start-shutdown.mdx
[Logging]: ./logging.mdx
[Secrets]: ./secrets.mdx
[Publish to Connector Hub]: ./publish.mdx
[Install Rust]: https://www.rust-lang.org/tools/install
69 changes: 69 additions & 0 deletions versioned_docs/version-0.11.11/connectors/troubleshooting.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
sidebar_position: 200
title: "Troubleshooting"
description: "Connector Troubleshooting"
---

# Connector Build Troubleshooting

## Multiplatform builds

Connectors on the Hub can be published associated with multiple targets so
they can be run multiple platforms. This can be accomplished by building cdk on
the same target it's intended for, or cross compiling one target from another
platform. Compiling for one target while on another target can be complex, so
this troubleshooting section provides added target toolchain support information
for some common platfom/target combinations.

## MacOS

Build command for local builds, if `cdk build` does not work, explicitly specify
the target:

```bash
cdk build --target aarch64-apple-darwin
```

## Ubuntu or Debian based Linux Distributions {#ubuntu-debian}

Build prerequisites for `x86-unknown-linux-musl` on Ubuntu from an
`x86-unknown-linux-gnu` environment.

System packages:
```bash
sudo apt install build-essential musl-tools
```

Build command to build and test locally, instead of 'cdk build', use the
following:

```bash
CARGO_TARGET_X86_64_UNKNOWN_LINUX_MUSL_LINKER=x86_64-linux-musl-gcc cdk build
```

## Other rust cargo cross-platform build toolchains

Connector projects are rust projects, and different choices exist for cross
compilation. A project connector binary build by a rust toolchain can be
published by `cdk` with a `--no-build` flag. Different cross compilation
projects for rust include:

- Cargo cross https://github.com/cross-rs/cross
- Cargo zigbuild https://github.com/rust-cross/cargo-zigbuild

For a working example, see the [connector-publish github workflow].

## Windows, WSL, WSL2

Fluvio and `cdk` are supported for windows only though WSL2. WSL2 often installs
Ubuntu as the default Linux distrubition. See the [Ubuntu] section for more
build troubleshooting.

## Infinyon Cloud Certified Connectors

Infinyon Certified connectors are built for the `aarch64-unknown-linux-musl` target.

To build and publish for the cloud, InfinyOn often uses a [connector-publish github workflow].

[Ubuntu]: #ubuntu-debian
[connector-publish github workflow]: https://github.com/infinyon/fluvio/blob/master/.github/workflows/connector-publish.yml
12 changes: 11 additions & 1 deletion versioned_docs/version-0.11.11/fluvio/quickstart.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,17 @@ meta:
name: http-quotes
type: http-source
topic: quotes
http:{#my-explicit-id}
http:
endpoint: https://demo-data.infinyon.com/api/quote
interval: 3s
```
### Running the HTTP Connector
We'll use [Connector Developer Kit (cdk)] to download and run the connector.
```bash copy="fl"
$ cdk hub download infinyon/http-source@0.3.8
```

```bash copy="fl"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@

apiVersion: 0.1.0
meta:
version: 0.3.8
name: cat-facts
type: http-source
topic: cat-facts
create-topic: true

http:
endpoint: "https://catfact.ninja/fact"
interval: 10s
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: 0.1.0
meta:
version: 0.3.8
name: cat-facts-transformed
type: http-source
topic: cat-facts-data-transform
create-topic: true

http:
endpoint: https://catfact.ninja/fact
interval: 10s

transforms:
- uses: infinyon/jolt@0.4.1
with:
spec:
- operation: default
spec:
source: "http"
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# sql.yaml
apiVersion: 0.1.0
meta:
name: simple-cat-facts-sql
type: sql-sink
version: 0.4.3
topic: cat-facts
sql:
url: "postgres://user:password@db.postgreshost.example/dbname"
transforms:
- uses: infinyon/json-sql@0.2.1
invoke: insert
with:
mapping:
table: "animalfacts"
map-columns:
"length":
json-key: "length"
value:
type: "int"
default: "0"
required: true
"raw_fact_json":
json-key: "$"
value:
type: "jsonb"
required: true
195 changes: 195 additions & 0 deletions versioned_docs/version-0.11.11/fluvio/tutorials/output-sql.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
---
sidebar_position: 3
title: "Streaming data to SQL"
description: "Part 3 of HTTP to SQL Sink tutorial series."
---


# Prerequisites

This guide uses `local` Fluvio cluster. If you need to install it, please follow the instructions [here][installation]!

We will be using `Postgres` database, You can download it and set up from [PostgreSQL] website for your OS. Alternatively use a cloud service like [ElephantSQL].


# Introduction

In previous tutorials, we have seen how to read data from external sources and write it to a Fluvio topic. In this tutorial, we will go through how to sink data from a Fluvio topic to external sink such as a database.

We will use `sink` type of connectors. All `sink` connectors consume data from luvio topic and write it to an external system. Particularly, we will use the `SQL Sink Connector` which can write to a PostgreSQL or SQLite database.

Since this is targeted to `SQL` database, configuration will be concern with mapping the JSON data to SQL columns. Sink Connector will perform these steps:

- Read data from the topic
- Transform the data to SQL insert statement.
- Send the SQL insert statement to the database.

SQL transformation will be done using SmartModule which allow you to plug-in different transformation logic if needed.

We will be using topics from first tutorial [Streaming from HTTP Source] which stream data to `cat-facts` topic. Please run that tutorial first to set up the topic.

As in previous tutorials, we will use `cdk` to manage the connectors. Run following command to download the connector from the Hub.

```bash
$ cdk hub download infinyon/sql-sink@0.4.3
```

Then download SQL SmartModule from the Hub.

```bash
$ fluvio hub sm download infinyon/json-sql@0.2.1
```

Then you should see two smartmodules downloaded assuming you have already downloaded the `jolt` SmartModule from previous tutorial.

```bash
$ fluvio sm list
SMARTMODULE SIZE
infinyon/json-sql@0.2.1 559.6 KB
infinyon/jolt@0.4.1 589.3 KB
```

# Sink Connector configuration

Coyp and paste following config and save it as `sql-cat-fact.yaml`.

import CodeBlock from '@theme/CodeBlock';
import CatSQL from '!!raw-loader!./config/sql-cat-fact.yaml';

<CodeBlock language="yaml">{CatSQL}</CodeBlock>

This configuration will read data from `cat-facts` topic and insert into `animalfacts` table in the database. The `json-sql` SmartModule will transform the JSON data into SQL insert statement.

Please change line containing `url` to your database connection string.

## SQL Mapping

The SmartModule `json-sql` implements a domain specific language (DSL) to specify a transformation of input JSON to SQL insert statement. It uses model similar to [Django Model] where SQL tables are abstract into a model. The model is then used to generate SQL insert statement.

The mapping is designed for translation JSON into SQL. Each column of the table is mapped from a JSON expression.

For example, here is mapping for `length` column:

```yaml
"length":
json-key: "length"
value:
type: "int"
default: "0"
required: true
```
This mapping will take `length` field from JSON and insert into `length` column in the table. If `length` field is not found, it will use default value of `0`.

# Setting up the Database

In order to run the connector, you need to create a table in your database. Run following SQL command in postgres CLI:

```sql
# create table animalfacts(length integer, raw_fact_json jsonb);
```

You can confirm table is created:

```sql
# select * from animalfacts;
length | raw_fact_json
--------+---------------
(0 rows)
```


Once you have the config file, you can create the connector using the `cdk deploy start` command.

```bash
$ cdk deploy start --ipkg infinyon-sql-sink-0.4.3.ipkg --config ./sql-cat-fact.yaml
```

You can use `cdk deploy list` to view the status of the connector.

```bash
$ cdk deploy list
NAME STATUS
simple-cat-facts-sql Running
```

# Generate data and checking the data

Fluvio topic allow you to decouple the data source from the data sink. This means both source and sink can be run independently without affecting each other.
You can run the source connector to generate data but it is not required for this demo.

Here, we will manually produce same data from previous tutorial to the `cat-facts` topic. This way we can control the data and see how it is sinked to the database.
By default, sink connector will consume the data from the end of topic which means it will ignore exiting data in the topic.

Let's produce a single record to the topic.

```
$ fluvio produce cat-facts
{"fact":"A cat’s jaw can’t move sideways, so a cat can’t chew large chunks of food.","length":74}
Ok!
```
Then you can query the database to see the record.
```sql
# select * from animalfacts;
length | raw_fact_json
--------+------------------------------------------------------------------------------------------------------
74 | {"fact": "A cat’s jaw can’t move sideways, so a cat can’t chew large chunks of food.", "length": 74}
(1 row)
```

You can add more records to the topic and see how SQL connector is inserting the data into the database.

```
$ fluvio produce cat-facts
{"fact":"Unlike humans, cats are usually lefties. Studies indicate that their left paw is typically their dominant paw.","length":110}
Ok!
```

```sql
# select * from animalfacts;
length | raw_fact_json
--------+-------------------------------------------------------------------------------------------------------------------------------------------
74 | {"fact": "A cat’s jaw can’t move sideways, so a cat can’t chew large chunks of food.", "length": 74}
110 | {"fact": "Unlike humans, cats are usually lefties. Studies indicate that their left paw is typically their dominant paw.", "length": 110}
(2 rows)
```

# Cleaning up

Same in previous tutorials, use `cdk deploy shutdown` to stop the connector.


## Conclusion

This tutorial showed you how to sink data from a Fluvio topic to a SQL database. You can use the same concept to sink data to other databases or systems.

You can combine this tutorial with previous tutorials to create a complete data pipeline from source to sink. This just requires deploying multiple connectors.

With Fluvio's event driven architecture, source and sink can be run independently and doesn't effect each other. You can also chain together multiple sources and sinks to create complex data pipelines.

## Reference

* [Fluvio CLI Produce]
* [Fluvio CLI Consume]
* [Fluvio CLI Topic]
* [Fluvio CLI Profile]
* [SmartModule]
* [Transformations]

[Connector Overview]: connectors/overview.mdx
[Fluvio CLI Produce]: fluvio/cli/fluvio/produce.mdx
[Fluvio CLI Consume]: fluvio/cli/fluvio/consume.mdx
[Fluvio CLI Topic]: fluvio/cli/fluvio/topic.mdx
[Fluvio CLI Profile]: fluvio/cli/fluvio/profile.mdx
[SmartModule]: smartmodules/overview.mdx
[Transformations]: fluvio/concepts/transformations.mdx
[castfact.ninja]: https://catfact.ninja
[PostgreSQL]: https://www.postgresql.org/
[ElephantSQL]: https://www.elephantsql.com/
[Configuration]: connectors/configuration.mdx
[SmartModule Hub]: hub/smartmodules/index.md
[installation]: fluvio/quickstart.mdx#install-fluvio
[Django Model]: https://docs.djangoproject.com/en/5.0/topics/db/models/
Loading

0 comments on commit 4e8a402

Please sign in to comment.