Skip to content

Commit

Permalink
quantiles, firstQuartile, thirdQuartile
Browse files Browse the repository at this point in the history
closes #1
  • Loading branch information
roberto-butti committed Jan 30, 2022
1 parent 08f0007 commit b0c1b20
Show file tree
Hide file tree
Showing 9 changed files with 170 additions and 55 deletions.
13 changes: 9 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

## 0.1.4 - 2022-01-30
- quantiles()
- firstQuartile()
- thirdQuartile()
-
## 0.1.3 - 2022-01-29
- geometricMean(): geometric mean
- harmonicMean(): harmonic mean and weighted harmonic mean
Expand Down Expand Up @@ -27,8 +32,8 @@ Initial release with:
- getMean()
- count()
- median()
- lowerPercentile()
- higherPercentile()
- firstQuartile()
- thirdQuartile()
- mode()
- frequencies(): a frequency is the number of times a value of the data occurs;
- relativeFrequencies(): a relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes;
Expand All @@ -42,8 +47,8 @@ Initial release with:
- getMean()
- count()
- median()
- lowerPercentile()
- higherPercentile()
- firstQuartile()
- thirdQuartile()
- mode()
- frequencies(): a frequency is the number of times a value of the data occurs;
- relativeFrequencies(): a relative frequency is the ratio (fraction or proportion) of the number of times a value of the data occurs in the set of all outcomes to the total number of outcomes;
Expand Down
55 changes: 38 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,7 @@ PHP package that provides functions for calculating mathematical statistics of n

In this package I'm collecting some useful statistic functions.
Once upon a time, I was playing with FIT files. A FIT file is a file where is collected a lot of information about your sport activities. In that file you have the tracking of your Hearth Rate, Speed, Cadence, Power etc.
I needed to apply some statistic functions to understand better the numbers and the sport activity performance. I collected some functions like:
- mean: the average of the data set (and geometric mean);
- mode: the most common number in data set (and multi mode);
- median: the middle of the set of values (median low and median high);
- range: the difference between the largest and smallest values
- first quartile ( or lowest percentile);
- third quartile (or highest percentile);
- frequency table (cumulative, relative);
- standard deviation (population and sample);
- variance (population and sample);
- etc...
I needed to apply some statistic functions to understand better the numbers and the sport activity performance. I collected some functions like mean, mode, median, range, quantiles, first quartile ( or 25th percentile), third quartile (or 75th percentile), frequency table (cumulative, relative), standard deviation (population and sample), variance (population and sample) etc...

> This package is inspired by the [Python statistics module](https://docs.python.org/3/library/statistics.html)
Expand All @@ -44,8 +34,9 @@ Stat class has methods to calculate an average or typical value from a populatio
- medianHigh(): high median of data;
- mode(): single mode (most common value) of discrete or nominal data;
- multimode(): list of modes (most common values) of discrete or nominal data;
- higherPercentile(): 3rd quartile, is the value at which 75 percent of the data is below it;
- lowerPercentile(): first quartile, is the value at which 25 percent of the data is below it;
- quantiles(): cut points dividing the range of a probability distribution into continuous intervals with equal probabilities;
- thirdQuartile(): 3rd quartile, is the value at which 75 percent of the data is below it;
- firstQuartile(): first quartile, is the value at which 25 percent of the data is below it;
- pstdev(): Population standard deviation
- stdev(): Sample standard deviation
- pvariance(): variance for a population
Expand Down Expand Up @@ -133,6 +124,36 @@ $median = Stat::medianHigh([1, 3, 5, 7]);
// 5
```

#### Stat::quantiles( array $data, $n=4, $round=null )
Divide data into n continuous intervals with equal probability. Returns a list of n - 1 cut points separating the intervals.
Set n to 4 for quartiles (the default). Set n to 10 for deciles. Set n to 100 for percentiles which gives the 99 cuts points that separate data into 100 equal sized groups.


```php
use HiFolks\Statistics\Stat;
$quantiles = Stat::quantiles([98, 90, 70,18,92,92,55,83,45,95,88]);
// [ 55.0, 88.0, 92.0 ]
$quantiles = Stat::quantiles([105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110,100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129,106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86,111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95,103, 107, 101, 81, 109, 104], 10);
// [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0]
```
#### Stat::firstQuartile( array $data, $round=null )
The lower quartile, or first quartile (Q1), is the value under which 25% of data points are found when they are arranged in increasing order.

```php
use HiFolks\Statistics\Stat;
$percentile = Stat::firstQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 55.0
```

#### Stat::thirdQuartile( array $data, $round=null )
The upper quartile, or third quartile (Q3), is the value under which 75% of data points are found when arranged in increasing order.

```php
use HiFolks\Statistics\Stat;
$percentile = Stat::thirdQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 92.0
```

#### Stat::pstdev( array $data )
Return the **Population** Standard Deviation, a measure of the amount of variation or dispersion of a set of values.
A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.
Expand Down Expand Up @@ -229,10 +250,10 @@ echo "Count : " . $stat->count() . PHP_EOL;
// Count : 6
echo "Median : " . $stat->median() . PHP_EOL;
// Median : 4.5
echo "Lower Percentile : " . $stat->lowerPercentile() . PHP_EOL;
// Lower Percentile : 2.5
echo "Higher Percentile : " . $stat->higherPercentile() . PHP_EOL;
// Higher Percentile : 5
echo "First Quartile : " . $stat->firstQuartile() . PHP_EOL;
// First Quartile : 2.5
echo "Third Quartile : " . $stat->thirdQuartile() . PHP_EOL;
// Third Quartile : 5
echo "Mode : " . $stat->mode() . PHP_EOL;
// Mode : 5
```
Expand Down
10 changes: 10 additions & 0 deletions examples/stat_methods.php
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
<?php

require(__DIR__ . "/../vendor/autoload.php");


use HiFolks\Statistics\Stat;

$mean = Stat::mean([1, 2, 3, 4, 4]);
// 2.8
$mean = Stat::mean([-1.0, 2.5, 3.25, 5.75]);
Expand All @@ -19,6 +21,14 @@
// 3
$median = Stat::medianHigh([1, 3, 5, 7]);
// 5
$percentile = Stat::firstQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 55.0
$percentile = Stat::thirdQuartile([98, 90, 70,18,92,92,55,83,45,95,88]);
// 92.0
$quantiles = Stat::quantiles([98, 90, 70,18,92,92,55,83,45,95,88]);
// [ 55.0, 88.0, 92.0 ]
$quantiles = Stat::quantiles([105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110,100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129,106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86,111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95,103, 107, 101, 81, 109, 104], 10);
// [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0]
$stdev = Stat::pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75], 4);
// 0.9869
$stdev = Stat::stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75], 4);
Expand Down
9 changes: 9 additions & 0 deletions src/Math.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,13 @@ public static function round(float $value, ?int $round): float
{
return is_null($round) ? $value : round($value, $round);
}

/**
* @param int $number
* @return bool
*/
public static function isOdd(int $number): bool
{
return ($number % 2) == 1;
}
}
65 changes: 49 additions & 16 deletions src/Stat.php
Original file line number Diff line number Diff line change
Expand Up @@ -142,38 +142,71 @@ public static function multimode(array $data): mixed

/**
* @param mixed[] $data
* @return mixed
* @param int $n
* @param int|null $round
* @return mixed[]|null
*/
public static function lowerPercentile(array $data): mixed
public static function quantiles(array $data, int $n = 4, ?int $round = null): ?array
{
$count = Stat::count($data);
if (! $count) {
if ($count < 2) {
return null;
}
$index = floor($count / 4); // cache the index
if ($count & 1) { // count is odd
return $data[$index];
} else { // count is even
return ($data[$index - 1] + $data[$index]) / 2;
if ($n < 1) {
return null;
}
sort($data);
$m = $count + 1;
$result = [];
foreach (range(1, $n - 1) as $i) {
$j = floor($i * $m / $n);
if ($j < 1) {
$j = 1;
} elseif ($j > $count - 1) {
$j = $count - 1;
}
$delta = $i * $m - $j * $n;
$interpolated = ($data[$j - 1] * ($n - $delta) + $data[$j] * $delta) / $n;
$result[] = Math::round($interpolated, $round);
}

return $result;
}

/**
* REturn the rank at th 25th percentile.
* Return a number that is exist in the array
* @param mixed[] $data
* @return mixed
*/
public static function higherPercentile(array $data): mixed
public static function firstQuartile(array $data, ?int $round = null): mixed
{
$count = Stat::count($data);
if (! $count) {
$quartiles = self::quantiles($data, 4, $round);
if (is_null($quartiles)) {
return null;
}
$index = floor(($count * 3) / 4); // cache the index
if ($count & 1) { // count is odd
return $data[$index];
} else { // count is even
return ($data[$index - 1] + $data[$index]) / 2;
if (count($quartiles) !== 3) {
return null;
}

return $quartiles[0];
}

/**
* @param mixed[] $data
* @return mixed
*/
public static function thirdQuartile(array $data): mixed
{
$quartiles = self::quantiles($data, 4);
if (is_null($quartiles)) {
return null;
}
if (count($quartiles) !== 3) {
return null;
}

return $quartiles[2];
}

/**
Expand Down
10 changes: 5 additions & 5 deletions src/Statistics.php
Original file line number Diff line number Diff line change
Expand Up @@ -138,22 +138,22 @@ public function median(): mixed
/**
* @return mixed
*/
public function lowerPercentile(): mixed
public function firstQuartile(): mixed
{
return Stat::lowerPercentile($this->values);
return Stat::firstQuartile($this->values);
}

public function higherPercentile(): mixed
public function thirdQuartile(): mixed
{
return Stat::higherPercentile($this->values);
return Stat::thirdQuartile($this->values);
}

/**
* @return mixed
*/
public function interquartileRange()
{
return $this->higherPercentile() - $this->lowerPercentile();
return $this->thirdQuartile() - $this->firstQuartile();
}

/**
Expand Down
20 changes: 10 additions & 10 deletions tests/FrequenciesTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -43,42 +43,42 @@
expect($s->originalArray())->toHaveCount(4);
});

it('can calculate lowerPercentile', function () {
it('can calculate firstQuartile', function () {
$s = Statistics::make(
[3,4,3,1]
);
$a = $s->lowerPercentile();
expect($a)->toEqual(2);
$a = $s->firstQuartile();
expect($a)->toEqual(1.5);

$s = Statistics::make(
[3,4,3]
);
$a = $s->lowerPercentile();
$a = $s->firstQuartile();
expect($a)->toEqual(3);

$s = Statistics::make(
[]
);
$a = $s->lowerPercentile();
$a = $s->firstQuartile();
expect($a)->toBeNull();
});
it('can calculate higherPercentile', function () {
it('can calculate thirdQuartile', function () {
$s = Statistics::make(
[3,4,3,1]
);
$a = $s->higherPercentile();
expect($a)->toEqual(3.5);
$a = $s->thirdQuartile();
expect($a)->toEqual(3.75);

$s = Statistics::make(
[3,4,3]
);
$a = $s->higherPercentile();
$a = $s->thirdQuartile();
expect($a)->toEqual(4);

$s = Statistics::make(
[]
);
$a = $s->higherPercentile();
$a = $s->thirdQuartile();

expect($a)->toBeNull();
});
22 changes: 22 additions & 0 deletions tests/StatTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -167,3 +167,25 @@
Stat::harmonicMean([])
)->toBeNull();
});

it('calculates quantiles (static)', function () {
$q = Stat::quantiles([98, 90, 70,18,92,92,55,83,45,95,88,76]);
expect($q[0])->toEqual(58.75);
expect($q[1])->toEqual(85.5);
expect($q[2])->toEqual(92);
$q = Stat::quantiles([98, 90, 70,18,92,92,55,83,45,95,88]);
expect($q[0])->toEqual(55);
expect($q[1])->toEqual(88);
expect($q[2])->toEqual(92);
$q = Stat::quantiles([1,2]);
expect($q[0])->toEqual(0.75);
expect($q[1])->toEqual(1.5);
expect($q[2])->toEqual(2.25);
$q = Stat::quantiles([1,2,4]);
expect($q[0])->toEqual(1);
expect($q[1])->toEqual(2);
expect($q[2])->toEqual(4);
expect(
Stat::quantiles([1])
)->toBeNull();
});
21 changes: 18 additions & 3 deletions tests/StatisticTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,21 @@
);
expect($s->count())->toEqual(12);
expect($s->median())->toEqual(85.5);
expect($s->lowerPercentile())->toEqual(62.5);
expect($s->higherPercentile())->toEqual(92);
expect($s->interquartileRange())->toEqual(29.5);
expect($s->firstQuartile())->toEqual(58.75);
expect($s->thirdQuartile())->toEqual(92);
expect($s->interquartileRange())->toEqual(33.25);

expect($s->originalArray())->toHaveCount(12);

$s = Statistics::make(
[98, 90, 70,18,92,92,55,83,45,95,88]
);
expect($s->count())->toEqual(11);
expect($s->median())->toEqual(88);
expect($s->firstQuartile())->toEqual(55);
expect($s->thirdQuartile())->toEqual(92);
expect($s->interquartileRange())->toEqual(37);
expect($s->originalArray())->toHaveCount(11);
});

it('can calculate statistics again', function () {
Expand All @@ -27,6 +37,8 @@
expect($s->min())->toEqual(2);
expect($s->max())->toEqual(7);
expect($s->range())->toEqual(5);
expect($s->firstQuartile())->toEqual(2.75);
expect($s->thirdQuartile())->toEqual(5.5);
});

it('can calculate statistics again and again', function () {
Expand All @@ -41,6 +53,9 @@
expect($s->min())->toEqual(13);
expect($s->max())->toEqual(21);
expect($s->range())->toEqual(8);
expect($s->firstQuartile())->toEqual(13);
expect($s->thirdQuartile())->toEqual(17);


$s = Statistics::make(
[1, 2, 4, 7]
Expand Down

0 comments on commit b0c1b20

Please sign in to comment.