Apache Hudi Examples

Apache Hudi examples designed to be run on AWS Elastic Map Reduce (EMR) via. EMR Studio and/or EMR Notebook(s).

Reference background on key concepts. If you are new to working with Hudi it is worth reading about Hudi's timeline, file management, index, table types, query types, copy on write, merge on read.

If you are not familiar with the core Hudi concepts or are new to Hudi I highly recommend you watch AWS re:Invent 2019: Insert, upsert, and delete data in Amazon S3 using Amazon.

Enviorment Set Up

The samples in this repository are designed to run on EMR via. EMR Notebooks or EMR Studio. To set up your enviorment follow the AWS documentation for EMR Notebooks or EMR Studio.

You can upload the .ipynb files in this repository directly to the Jupyter enviorments provides by EMR Notebooks / Studio

Copy on Write

The notebooks in copy_on_write is the best place to start. It covers working with data via. Hudi specific to copy on write tables. The notebook(s) covers

Writing data to S3
Reading data from S3
Upserting data
Incremental querying
Point in Time querying
Deleting Data

Both a Python and Scala notebooks are available.

Merge on Read

The notebook in merge_on_read is the best next step once you understand the copy_on_write notebook(s). The merge_on_read notebook covers

Writing data to S3
Upserting data
Snapshot queries
Read optimized queries
Compaction

Both a Python and Scala notebooks are available.

Future Imporvement to this Repo

Hudi SQL example(s)
Hudi time travel example(s)

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
copy_on_write		copy_on_write
merge_on_read		merge_on_read
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache Hudi Examples

Enviorment Set Up

Copy on Write

Merge on Read

Future Imporvement to this Repo

About

Releases

Packages

Languages

ev2900/EMR_Studio_Hudi

Folders and files

Latest commit

History

Repository files navigation

Apache Hudi Examples

Enviorment Set Up

Copy on Write

Merge on Read

Future Imporvement to this Repo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages