Docker images for building hadoop3.2, hive 3.1, hbase2.3, prestodb 0.247, flink1.11.3 on yarn, etc.
You can use docker to set up hadoop based big data platform in a few minutes, docker images include Hadoop 3+, HBase 2+, Hive 3+, Kafka 2+, Prestodb 0.247+, Flink 1.11+, ELK 7.9+ ,etc. Integration tests between Hadoop and Hive, Hadoop and HBase, Flink on yarn, Prestodb against Kafka, Elasticsearch, HBase, Hive had been covered. You can used it in development evironment to test your applications, but it's not recommended to use in production.
Sample docker-compose.yml holds here, just download it and get started right now.
- Apache Hadoop 3.2
- Prestodb 0.247
- Kafka 2+
- Hbase 2.2
- Hive 3.1.2
- ELK 7.9.1
Git clone the repo and run docker-compose.
docker-compose up -d
Install docker ce on centos 7 or above:
1. yum remove docker docker-common docker-selinux docker-engine
2. yum install -y yum-utils device-mapper-persistent-data lvm2
3. yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
4. yum install -y docker-ce
5. systemctl start docker.service
6. systemctl enable docker.service
Install docker-compose:
1. sudo curl -L "https://github.com/docker/compose/releases/download/1.23.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
2. sudo chmod +x /usr/local/bin/docker-compose
3. docker-compose --version
- HDFS: http://namenode:9870/
- YARN: http://resourcemanager:8088/
- ES:http://elasticsearch:9200/
- Kibana:http://kibana:5601/
- Presto: http://prestodb:9999/
- Hbase: http://hbase-master:16010/
- Flink:http://jobmanager:8081/ (you have to start a yarn-session firstly)
Note: you have to add the server ip and services (which defined in docker-compose.yml) to your local hosts firstly. [how to configure hosts](https://www.howtogeek.com/howto/27350/beginner-geek-how-to-edit-your-hosts-file/)
## Produce some data to test HDFS & Hive
<pre>
create a table in hive:
```hive sql
create external table test(
id int
,name string
,skills array<string>
,add map<String,string>
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
location '/user/test'
create a txt file with the content below, put it under /data/ directory, such as /data/example.txt ``` 1,spancer,bigdata-ai-devops,changsha:lugu-changsha:chuanggu 2,jack,webdev-microservices,shenzhen:nanshan-usa:la 3,james,android-flutter,beijing:chaoyang 3,james,ios-flutter,beijing:chaoyang ```
load local file data into the external hive table, which we created above. ``` load data local inpath '/tools/example.txt'overwrite into table test; ```
Check the mr job in yarn web ui :http://resourcemanager:8088/ After the job is done, query in hive client. Alternative, we can query the data in presto client. ``` select * from test; ```
create a table named as person, with two column families, info and tags. ``` create 'person', 'info','tags' ``` list tables in HBase ``` list ``` describe table person ``` describe 'person' ``` Insert some data into table person with rowkey 1. ``` put 'person','1','info:name','spancer' put 'person','1','info:age','36' put 'person', '1', 'tags:skill','bigdata' ``` scan table person ``` scan 'person' ``` get data by rowkey ``` get 'person','1' ``` get data by rowkey and column family or column. ``` get 'person','1' get 'person','1','info' get 'person','1','info:name'
1. cd the prestodb container: docker-compose exec prestodb bash
2. connect to presto using presto-client: presto --server prestodb:8080 --catalog elasticsearch
3. query some data: show schemas; show tables; select * from nodes;
4. you can also verify presto status through web ui: http://your-host-ip:9999
For java developers, I provide some tests over the platform. You can fork it from here, the test project contains flink jobs with set of components, such as kafka, elasticsearch, iceberg, etc.. Source and sink examples are fully inclued.
- Integration flink 1.12
Integration hive 3.1 (Done)Integration hbase 2.2Integration iceberg- Integration alluxio
- Integration ozone