Aims to be a set of utilities to assist benchmarking performance for different fileformats for a given workload (Hive/Impala). Attributes it cares about -
- Size of blocks file
- Compression Ratio
- Query Performance - pending item
Warning this is a work in progress. At the moment, it does conversions for single tables using scripts
$ ./generate-conversion-hql.sh <input-db>.<input-table> <output-table-prefix> \
> hive-bechmark.hql
$ hive -f hive-bechmark.hql
- Avro conversion is not working at the moment
- Presentation on file-formats http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2