Six sigma rule generator is a pyspark tool to generate six sigma rules for columns.
Background: https://www.isixsigma.com/tools-templates/control-charts/a-guide-to-control-charts/
The rule generator expects the target DataFrame to have a timestamp
column.
pip install -e .
python setup.py bdist
- Navigate to
Clusters
/[your cluster]
/Libraries
page: - Click
Install New
button - Select
Python Egg
fromLibrary Type
tab - Drag&drop the generated .egg file from the cloned repository's
dist
directory to the window - Click
Install
button
from wilson import SixSigma
df = spark.read.csv('example.csv')
sixsigma = SixSigma(timecol='timestamp')
df = sixsigma.apply(df, ['target_column_1'])
df.show()