This project demonstrates the implementation of TPC-DS benchmarking on a MySQL database using Python scripts to automate the process. The primary goal is to evaluate the performance and scalability of the MySQL database under different conditions, as defined by the TPC-DS benchmark.
These instructions will help you set up and run the project on your local machine for development and testing purposes.
- Python 3.9
- MySQL Server
- TPC-DS Toolkit
- A bash-based command line tool to execute the bash scripts
- Clone the repository:
git clone https://github.com/QasimKhan5x/TPC-DS-MySQL
- Navigate to the project directory:
cd TPC-DS-MySQL
- Install the requisite python packages:
pip install -r requirements.txt
- First, generate data for scale factor 1 using the
tools/dsdgen
program. You first need to compile it. It can be done on a Linux distribution using
cd tools
make
For Windows, use Visual Studio to open the .sln
file and build the project.
- Create a folder called
../data/1
and../data-maintenance/1
. The data for SF=1 will go here. These folders should exist outside of the root folder. So, if you are intools
, you should make the directories two levels up from the current directory instead. - After building it, execute
./dsdgen
to build the data. Precise instructions regarding all the arguments are given in Section 3.3 of our report. - Remove empty values in each file by executing the following script:
chmod +x ./scripts/null_values.sh
./null_values 1
This means you give the bash script permission to act as an executable and then preprocess the data in ../data/1
- Create four folders in
../data-maintenance/1
called1
,2
,3
, and4
. The refresh data for each run of the data maintenance will go there. - Create the refresh data from
tools
using./dsdgen
. The command to generate the data is given in Section 3.8. of our report. - Preprocess the generated refresh data for each set of refresh data using the following:
chmod +x ./scripts/null_values_dm.sh
./scripts/null_values_dm.sh 1 <n> # execute for n=1,2,3,4
mkdir results
from the root directory. The results will go there.- Run the benchmark using the following:
python -m scripts.main
Text files will be generated in the results
folder containing the results for each test.
Note: you don't need to create the queries as we included them in our repository. However, for other scale factors, you will have to consult our report to generate queries for the power test and throughput test.