-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to Position-Based Percentile Calculation for Regulatory Compliance | timeAverage #396
base: master
Are you sure you want to change the base?
Conversation
For regulatory percentile calculations, generating intermediate values is not permitted. With this change, the percentile is calculated by position.
change from ceiling to floor
update on how the round of a number is made
Thanks for this suggestion and apologies for the delay in responding (I was on holiday). This is an issue I have not looked at closely but can see how the method used will matter. Do you have a source / link for the preferred method to use, as I'm not familiar with that (at least in the UK)? All the best |
Hello David, I hope you are doing well. The above comes from Chilean regulations (based on USEPA) which detail the procedure for calculating percentiles. I am attaching the link (Chilean Regulation). As you understand, it is in Spanish, but here is its translation: "To calculate the percentile, all values of the PM10 respirable particulate concentrations will be listed in ascending order: X1 ≤ X2 ≤ X3 ≤... ≤ Xk < Xn-1 ≤ Xn. The k-th percentile will be the value of the element of rank 'k,' where 'k' is calculated using the following formula: k = q * n, where 'q' = 0.98, and 'n' corresponds to the total number of data points in the ordered list. The value of 'k' will be rounded to the nearest integer." Given this, I searched for the direct source in the EPA and found the following reference (EPA Regulation) in section 5, where I found an update to the regulation. While they still calculate by rank, the position is based on the number of valid records (I could implement this if you would like). As a complement, this form of calculation is quite common, as it is also the methodology used by air quality numerical simulation software for calculating percentiles. Here is a reference (CALPUFF View Percentiles). Additionally, I found the following text in section 2.5.2.1 of the WHO air quality guidelines (https://iris.who.int/bitstream/handle/10665/345329/9789240034228-eng.pdf): "In keeping with established practice, as a starting point, short-term AQG levels were considered by the GDG as the 99th percentiles of daily concentrations empirically observed in distributions with a mean equal to the long-term AQG level," where it is explicitly stated that the data must be empirically observed. |
Description:
This pull request updates the percentile calculation method to be compliant with regulatory requirements. Previously, percentiles were calculated using interpolation, which could generate intermediate values. This method is not permitted for regulatory calculations.
With this update:
This change improves accuracy and ensures that the calculations align with the required standards for normative use.
Key Changes:
Please review and let me know if any additional adjustments are needed.
I made this pull request to the master branch because I didn't see any other appropriate branch. (Tested and working on my enviroments)