-
Notifications
You must be signed in to change notification settings - Fork 1
This repository contains python scripts that use the Faker library to generate a synthetic HR data set. There are three main data categories: 1) Personal information of each employee, 2) Annual evaluation scores of each employee for up to 5 years, 3) Psychometric test results for each employee based on the Big Five personality traits (https://en…
License
TheoEfthymiadis/HR-psychometrics-synthetic-data-set
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Required libraries to run the Python scripts: Package Version Latest Version Faker 5.0.2 5.0.2 Pillow 8.0.1 8.0.1 cycler 0.10.0 0.10.0 et-xmlfile 1.0.1 jdcal 1.4.1 1.4.1 kiwisolver 1.3.1 1.3.1 numpy 1.19.3 1.19.4 openpyxl 3.0.5 3.0.5 pandas 1.1.5 1.2.0 pip 19.0.3 20.3.3 pyparsing 2.4.7 2.4.7 python-dateutil 2.8.1 2.8.1 pytz 2020.4 2020.5 radar 0.3 0.3 setuptools 40.8.0 51.1.0.post20201221 six 1.15.0 1.15.0 text-unidecode 1.3 1.3 xlrd 2.0.1 2.0.1 xlwt 1.3.0 1.3.0 # Running the implementation The implementation of everything mentioned above was executed through the development of 3 Python scripts. The scripts were developed in a virtual environment using the Pycharm software. In order to reproduce the results, it is suggested that the user creates a similar virtual environment and installs all necessary python libraries that are listed in the ‘README.txt’ file. Pay extra attention to the version of the ‘numpy’ library, because the latest version is prone to a number of bugs and should not be preferred. After setting up the virtual environment and installing the libraries, the scripts should be executed in a specific order: • Run the ‘Personal_Profile.py’ script by using the command line. It will produce an excel file in the current folder containing the personal information and psychometric profile of all employees. The file is named ‘folder_path/employees.xlsx’. • Run the ‘Departments.py’ script by using the command line. This will read the excel file that was created in the previous step, assign the employees to random departments, estimate their evaluation metrics and append the information back to the ‘employees.xlsx’ file. This version of the file is also provided in the deliverable. • Run the ‘Noise_Insertion.py’ script by using the command line. This will read the ‘employees.xlsx’ file, insert noise to the data and provide an output file named ‘noisy_employees.xlsx’. This file is also provided in the deliverable. A large number of seed functions were used to control the random data generation from various python libraries. The results should be completely reproducible.
About
This repository contains python scripts that use the Faker library to generate a synthetic HR data set. There are three main data categories: 1) Personal information of each employee, 2) Annual evaluation scores of each employee for up to 5 years, 3) Psychometric test results for each employee based on the Big Five personality traits (https://en…
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published