Skip to content

Commit

Permalink
Add logscale option, make no string normalization the default (#34)
Browse files Browse the repository at this point in the history
* Plotting of estimates on a log-scale #28
* Better backend for Confidence Intervals #29
* Maintain label character formatting (making no string normalisation the default) #27
* Update documentation to reflect new changes and fixing errors #30
  • Loading branch information
LSYS authored Oct 30, 2022
1 parent 3ae439b commit 3d4f8e9
Show file tree
Hide file tree
Showing 18 changed files with 195 additions and 275 deletions.
47 changes: 29 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,13 @@ Additional options allow easy addition of columns in the `dataframe` as annotati

| | |
| --- | --- |
| Release | [![PyPI](https://img.shields.io/pypi/v/forestplot?color=blue&label=PyPI&logo=pypi&logoColor=white)](https://pypi.org/project/forestplot/) [![GitHub release (latest by date)](https://img.shields.io/github/v/release/lsys/forestplot?color=blue&label=Latest%20release)](https://github.com/LSYS/forestplot/releases) |
| Release | [![PyPI](https://img.shields.io/pypi/v/forestplot?color=blue&label=PyPI&logo=pypi&logoColor=white)](https://pypi.org/project/forestplot/) ![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/forestplot?logo=conda-forge&logoColor=white) [![GitHub release (latest by date)](https://img.shields.io/github/v/release/lsys/forestplot?color=blue&label=Latest%20release)](https://github.com/LSYS/forestplot/releases) |
| Status | [![CI](https://github.com/LSYS/forestplot/actions/workflows/CI.yml/badge.svg)](https://github.com/LSYS/forestplot/actions/workflows/CI.yml) [![Notebooks](https://github.com/LSYS/forestplot/actions/workflows/nb.yml/badge.svg)](https://github.com/LSYS/forestplot/actions/workflows/nb.yml) |
| Coverage | [![Codecov](https://img.shields.io/codecov/c/github/lsys/forestplot?logo=codecov&logoColor=white&label=codecov)](https://app.codecov.io/gh/LSYS/forestplot) |
| Python | [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/forestplot?label=Python%203.6%2B&logo=python&logoColor=white)](https://pypi.org/project/forestplot/) |
| Docs | [![Read the Docs (version)](https://img.shields.io/readthedocs/forestplot/stable?label=docs&logo=readthedocs&logoColor=white)](https://forestplot.readthedocs.io/en/latest/?badge=latest) [![DocLinks](https://github.com/LSYS/forestplot/actions/workflows/links.yml/badge.svg)](https://github.com/LSYS/forestplot/actions/workflows/links.yml)|
| Meta | ![GitHub](https://img.shields.io/github/license/lsys/forestplot?color=purple&label=License) [![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/LSYS/forestplot.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/LSYS/forestplot/context:python) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![types - Mypy](https://img.shields.io/badge/types-Mypy-blue.svg)](https://github.com/python/mypy) [![DOI](https://zenodo.org/badge/510013191.svg)](https://zenodo.org/badge/latestdoi/510013191) |
| Binder| [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/lsys/forestplot/main?labpath=examples%2Freadme-examples.ipynb) |

<!---------------------- TABLE OF CONTENT ---------------------->
<details open><summary><b>Table of Contents</b></summary><p>
Expand All @@ -49,6 +50,12 @@ Install from PyPI<br>
pip install forestplot
```

Install from conda-forge<br>
![Conda (channel only)](https://img.shields.io/conda/vn/conda-forge/forestplot?logo=conda-forge&logoColor=white)
```bash
conda install forestplot
```

Install from source<br>
[![GitHub release (latest by date)](https://img.shields.io/github/v/release/lsys/forestplot?color=blue&label=Latest%20release)](https://github.com/LSYS/forestplot/releases)<br>
```bash
Expand Down Expand Up @@ -85,15 +92,13 @@ df.head(3)
| `moerror` | Conf. int.'s *margin of error*. | |
| `label` | Variable labels | &check; |
| `group` | Variable grouping labels | |
| `ll` | Conf. int. *lower limits* | &check;* |
| `hl` | Containing the conf. int. *higher limits* | &check;* |
| `ll` | Conf. int. *lower limits* | &check; |
| `hl` | Containing the conf. int. *higher limits* | &check; |
| `n` | Sample size | |
| `power` | Statistical power | |
| `p-val` | P-value | |

(*If `ll` *and* `hl` are specified, then the `moerror` (margin of error) is not required.
<br>
See [Gallery and API Options](#gallery-and-api-options) for more details on required and optional arguments.)
(See [Gallery and API Options](#gallery-and-api-options) for more details on required and optional arguments.)
</details>

Make the forest plot
Expand Down Expand Up @@ -121,6 +126,7 @@ fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
moerror="moerror", # columns containing conf. int. margin of error
varlabel="label", # column containing variable label
capitalize="capitalize", # Capitalize labels
groupvar="group", # Add variable groupings
# group ordering
group_order=["labor factors", "occupation", "age", "health factors",
Expand All @@ -136,6 +142,7 @@ fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # columns containing conf. int. lower and higher limits
varlabel="label", # column containing variable label
capitalize="capitalize", # Capitalize labels
groupvar="group", # Add variable groupings
# group ordering
group_order=["labor factors", "occupation", "age", "health factors",
Expand All @@ -157,6 +164,7 @@ fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # lower & higher limits of conf. int.
varlabel="label", # column containing the varlabels to be printed on far left
capitalize="capitalize", # Capitalize labels
pval="p-val", # column containing p-values to be formatted
annote=["n", "power", "est_ci"], # columns to report on left of plot
annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"], # ^corresponding headers
Expand All @@ -175,6 +183,7 @@ fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # lower & higher limits of conf. int.
varlabel="label", # column containing the varlabels to be printed on far left
capitalize="capitalize", # Capitalize labels
ci_report=False, # Turn off conf. int. reporting
flush=False, # Turn off left-flush of text
**{'fontfamily': 'sans-serif'} # revert to sans-serif
Expand All @@ -188,6 +197,7 @@ fp.forestplot(df, # the dataframe with results data
estimate="r", # col containing estimated effect size
ll="ll", hl="hl", # lower & higher limits of conf. int.
varlabel="label", # column containing the varlabels to be printed on far left
capitalize="capitalize", # Capitalize labels
pval="p-val", # column containing p-values to be formatted
annote=["n", "power", "est_ci"], # columns to report on left of plot
annoteheaders=["N", "Power", "Est. (95% Conf. Int.)"], # ^corresponding headers
Expand Down Expand Up @@ -253,9 +263,10 @@ More fined-grained control for base plot options (eg font sizes, marker colors)
| `dataframe` | Pandas dataframe where rows are variables (or studies for meta-analyses) and columns include estimated effect sizes, labels, and confidence intervals, etc. | &check; |
| `estimate` | Name of column in `dataframe` containing the *estimates*. | &check; |
| `varlabel` | Name of column in `dataframe` containing the *variable labels* (study labels if meta-analyses). | &check; |
| `ll` | Name of column in `dataframe` containing the conf. int. *lower limits*. | &check;* |
| `hl` | Name of column in `dataframe` containing the conf. int. *higher limits*. | &check;* |
| `moerror` | Name of column in `dataframe` containing the conf. int. *margin of errors*. | &check;* |
| `ll` | Name of column in `dataframe` containing the conf. int. *lower limits*. | &check; |
| `hl` | Name of column in `dataframe` containing the conf. int. *higher limits*. | &check; |
| `logscale` | If True, make the x-axis log scale. Default is False. | |
| `capitalize` | How to capitalize strings. Default is None. One of "capitalize", "title", "lower", "upper", "swapcase". | |
| `form_ci_report` | If True (default), report the estimates and confidence interval beside the variable labels. | |
| `ci_report` | If True (default), format the confidence interval as a string. | |
| `groupvar` | Name of column in `dataframe` containing the variable *grouping labels*. | |
Expand All @@ -278,21 +289,19 @@ More fined-grained control for base plot options (eg font sizes, marker colors)
| `preprocess` | If True (default), preprocess the `dataframe` before plotting. | |
| `return_df` | If True, returned the preprocessed `dataframe`. | |

(*If `ll` *and* `hl` are specified, then the `moerror` (margin of error) is not required, and vice versa.)
<p align="right">(<a href="#top">back to top</a>)</p>

<!------------------------ KNOWN ISSUES ------------------------>
## Known Issues[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#known-issues)
* Variable labels coinciding with group variables may lead to unexpected formatting issues in the graph.
* Horizontal CI lines cannot be recast as capped horizontal lines because of the backend `Matplotlib` API used.
* Left-flushing of annotations relies on the `monospace` font.
* Plot can get cluttered with too many variables/rows (~30 onwards)
<p align="right">(<a href="#top">back to top</a>)</p>

<!----------------- BACKGROUND AND ADDITIONAL RESOURCES ----------------->
## Background and Additional Resources[![](https://raw.githubusercontent.com/LSYS/forestplot/main/docs/images/pin.svg)](#background-and-additional-resources)

**More about forest plots:**
**More about forest plots**

[Forest plots](https://en.wikipedia.org/wiki/Forest_plot) have many aliases (h/t Chris Alexiuk). Other names include coefplots, coefficient plots, meta-analysis plots, dot-and-whisker plots, blobbograms, margins plots, regression plots, and ropeladder plots.

Expand All @@ -301,24 +310,22 @@ More fined-grained control for base plot options (eg font sizes, marker colors)
The simplest version of a forest plot has two columns: one for the variables/studies, and the second for the estimated coefficients and confidence intervals.
This layout is similar to coefficient plots ([coefplots](http://repec.sowi.unibe.ch/stata/coefplot/getting-started.html)) and is thus useful for more than meta-analyses.

<details><summary><i>Here are more resources about forest plots:</i></summary><p>
<details><summary><i>More resources about forest plots</i></summary><p>

* [[1]](https://doi.org/10.1038/s41433-021-01867-6) Chang, Y., Phillips, M.R., Guymer, R.H. et al. The 5min meta-analysis: understanding how to read and interpret a forest plot. Eye 36, 673675 (2022).
* [[2]](https://doi.org/10.1136/bmj.322.7300.1479) Lewis S, Clarke M. Forest plots: trying to see the wood and the trees BMJ 2001; 322 :1479
</p></details><p></p>

**More about this package:**

[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](http://numfocus.org)
**More about this package**

The package is lightweight, built on `pandas`, `numpy`, and `matplotlib`.

It is slightly opinioniated in that the aesthetics of the plot inherits some of my sensibilities about what makes a nice figure.
You can however easily override most defaults for the look of the graph. This is possible via `**kwargs` in the `forestplot` API (see [Gallery and API options](#gallery-and-api-options)) and the `matplotlib` API.

**Planned enhancements** include forest plots each row can have multiple coefficients (e.g. from multiple models).
**Planned enhancements** include forest plots where each row can have multiple coefficients (e.g. from multiple models).

<details><summary><i>Related packages:</i></summary><p>
<details><summary><i>Related packages</i></summary><p>

* [[1]](https://www.stata-journal.com/article.html?article=gr0059) [Stata] Jann, Ben (2014). Plotting regression coefficients and other estimates. The Stata Journal 14(4): 708-737.
* [[2]](https://www.statsmodels.org/devel/examples/notebooks/generated/metaanalysis1.html) [Python] Meta-Analysis in statsmodels
Expand Down Expand Up @@ -348,4 +355,8 @@ Contributions are welcome, and they are greatly appreciated!
Please submit bugs, questions, or issues you encounter to the [GitHub Issue Tracker](https://github.com/lsys/forestplot/issues).
For bugs, please provide a minimal reproducible example demonstrating the problem.

**Pull Requests**

Please feel free to open an issue on the [Issue Tracker](https://github.com/lsys/forestplot/issues) if you'd like to discuss potential contributions via PRs.

<p align="right">(<a href="#top">back to top</a>)</p>
16 changes: 16 additions & 0 deletions citation.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
cff-version: 1.2.0
message: "If you wish to cite this package, please cite it as below."
preferred-citation:
authors:
- family-names: "Shen"
given-names: "Lucas"
title: "Forestplot"
year: 2022
url: "https://pypi.org/project/forestplot/"
repository-code: "https://github.com/LSYS/forestplot"
license: MIT license
identifiers:
- description: "This is from the archived snapshot of the code, supported by Zenodo."
type: doi
value: 10.5281/zenodo.7029377
doi: 10.5281/zenodo.7029377
Binary file modified docs/images/group-grouporder-pvalue-sort-colorrows.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/group-grouporder-sort.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/leftannote-rightannote-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/main.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/vanilla.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/vcoefplot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
155 changes: 80 additions & 75 deletions examples/readme-examples.ipynb

Large diffs are not rendered by default.

71 changes: 15 additions & 56 deletions forestplot/arg_validators.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ def check_data(
varlabel: str,
groupvar: Optional[str] = None,
group_order: Optional[Sequence] = None,
moerror: Optional[str] = None,
ll: Optional[str] = None,
hl: Optional[str] = None,
annote: Optional[Union[Sequence[str], None]] = None,
Expand All @@ -33,33 +32,29 @@ def check_data(
estimate (str)
Name of column containing the estimates (e.g. pearson correlation coefficient,
OR, regression estimates, etc.).
varlabel (str)
Name of column containing the variable label to be printed out.
moerror (str)
Name of column containing the margin of error in the confidence intervals.
Should be available if 'll' and 'hl' are left empty.
groupvar (str)
Name of column containing group of variables.
group_order (list-like)
List of groups by order to report in the figure.
varlabel (str)
Name of column containing the variable label to be printed out.
groupvar (str)
Name of column containing group of variables.
group_order (list-like)
List of groups by order to report in the figure.
ll (str)
Name of column containing the lower limit of the confidence intervals.
Optional
hl (str)
Name of column containing the upper limit of the confidence intervals.
annote (list-like)
List of columns to add as additional annotation in the plot.
annoteheaders (list-like)
List of table headers to use as column headers for the additional annotations.
rightannote (list-like)
List of columns to add as additional annotation on the right-hand side of the plot.
right_annoteheaders (list-like)
List of table headers to use as column headers for the additional annotations
on the right-hand side of the plot.
pval (str)
Name of column containing the p-values.
ylabel2 (str)
Title of the right-hand side y-axis.
rightannote (list-like)
List of columns to add as additional annotation on the right-hand side of the plot.
right_annoteheaders (list-like)
List of table headers to use as column headers for the additional annotations
on the right-hand side of the plot.
pval (str)
Name of column containing the p-values.
ylabel2 (str)
Title of the right-hand side y-axis.
Returns
-------
Expand All @@ -77,12 +72,6 @@ def check_data(
except ValueError:
raise TypeError("Estimates should be float or int")

if (moerror is not None) and (not ptypes.is_numeric_dtype(dataframe[moerror])):
try:
dataframe[moerror] = dataframe[moerror].astype(float)
except ValueError:
raise TypeError("Margin of error values should be float or int")

if (ll is not None) and (not ptypes.is_numeric_dtype(dataframe[ll])):
try:
dataframe[ll] = dataframe[ll].astype(float)
Expand All @@ -95,36 +84,6 @@ def check_data(
except ValueError:
raise TypeError("CI higherlimit values should be float or int")

##########################################################################
## Check that either moerror or ll, hl are specified.
## Create the missing data from what is available
##########################################################################
if moerror is None:
try:
assert (ll is not None) & (hl is not None)
except Exception:
raise AssertionError(
'If "moerror" is not provided, then "ll" and "hl" must be provided.'
)

if (ll is None) or (hl is None):
try:
assert moerror is not None
except Exception:
raise AssertionError(
'If "ll, hl" is not provided, then "moerror" must be provided.'
)

# if moerror not there make it
if moerror is None:
dataframe["moerror"] = dataframe[estimate] - dataframe[ll]

# if ll, hl not there make it
if ll is None:
dataframe["ll"] = dataframe[estimate] - dataframe[moerror]
if hl is None:
dataframe["hl"] = dataframe[estimate] + dataframe[moerror]

##########################################################################
## Check that the annotations and headers specified are list-like
##########################################################################
Expand Down
Loading

0 comments on commit 3d4f8e9

Please sign in to comment.