GitHub - color-company-it/modern-cloud-datalake-v1: First iteration of an entire repository representing a polylithic enterprise DataLake POC in AWS.

💾 Welcome to our datalake repository!

Check out the Documentation
Check out the main contributor DirkSCGM

This POC is designed to showcase a modern, polylithic data lake built using AWS technologies such as Glue, EMR, Step Functions, and Lambdas, as well as tools like Docker, Python, PySpark, Apache Hudi, and Terraform. Our data lake provides a single source of truth for data and allows for easy integration and analysis of data from various sources.

The repository is structured to reflect the software development life cycle, with sections for extract, transform, and load pipelines; configuration; infrastructure; and testing. We also provide detailed instructions for optimizing JDBC ETL pipelines and troubleshooting common issues.

📊 Why a DataLake?

Our data lake design offers many benefits, including improved data accessibility and flexibility, the ability to easily integrate data from various sources, and the ability to store and analyze large amounts of data at scale. This can help organizations gain a better understanding of their data and make more informed decisions.

In addition to the technical aspects of our data lake, this repository is also a place for collaboration, learning, and growth. We believe in the benefits of open source technologies and are always looking to improve our skills as data engineers.

As part of our commitment to learning and growth, we welcome contributions from the community. Whether you're a seasoned data engineer or just getting started, we encourage you to take a look at our codebase and offer suggestions for improvement.

🚀 Collaborate & Learn!

We also believe in the power of collaboration and are always looking for ways to work with others in the field. If you're interested in partnering with us or contributing to this repository, we'd love to hear from you!

Overall, this repository is a place for us to share our knowledge and expertise, as well as learn from others in the community. We're excited to see what we can accomplish together!

Thanks for checking out our repository! 🙌

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github		.github
.idea		.idea
codebase		codebase
configuration		configuration
documentation		documentation
infrastructure		infrastructure
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
cicd.py		cicd.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💾 Welcome to our datalake repository!

📊 Why a DataLake?

🚀 Collaborate & Learn!

About

Releases

Packages

Languages

License

color-company-it/modern-cloud-datalake-v1

Folders and files

Latest commit

History

Repository files navigation

💾 Welcome to our datalake repository!

📊 Why a DataLake?

🚀 Collaborate & Learn!

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages