-
Notifications
You must be signed in to change notification settings - Fork 0
/
Example Dataset Quality Report.txt
59 lines (48 loc) · 2.08 KB
/
Example Dataset Quality Report.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Dataset Quality Report
Dataset Summary
- Dataset Name: Sales Data Q1 2024
- Number of Records: 10,000
- Number of Features: 12
- Date of Analysis: May 17, 2024
Data Integrity
- Unique Identifiers: Each record has a unique identifier (OrderID). No duplicates found.
- Data Types Consistency: All fields conform to their expected data types.
- OrderID: Integer
- CustomerID: Integer
- OrderDate: Date
- ProductID: Integer
- Quantity: Integer
- Price: Float
- Total: Float
- Region: String
Completeness
- Missing Values:
- CustomerID: 0 missing values
- OrderDate: 2 missing values
- ProductID: 0 missing values
- Quantity: 0 missing values
- Price: 0 missing values
- Total: 0 missing values
- Region: 10 missing values
Anomalies
- Outliers:
- Quantity: 5 outliers detected (values > 100)
- Price: 3 outliers detected (values > $1000)
- Inconsistent Data:
- Total: 50 records where Total ≠ Quantity * Price
Compliance Checks
- Date Range Validity: All OrderDate values are within the expected range (Jan 1, 2024 - Mar 31, 2024), except for 2 records with missing dates.
- Region Codes: 10 records missing Region values. All other values conform to expected region codes.
Recommendations
1. Address Missing Values:
- Investigate and fill in the missing OrderDate and Region values.
- Consider imputing missing values where appropriate, using methods such as mean imputation for numerical data or mode imputation for categorical data.
2. Handle Outliers:
- Review records with outliers in Quantity and Price to determine if they are errors or valid high values.
- Implement outlier treatment strategies such as capping or transformation if necessary.
3. Fix Inconsistent Data:
- Correct the 50 records where Total does not equal Quantity * Price. This may involve recalculating the Total or correcting the input errors in Quantity or Price.
4. Enhance Data Validation:
- Implement stricter data validation rules at the point of data entry to prevent missing and inconsistent values.
- Regularly audit and clean the dataset to maintain high-quality data.
End of Report