Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make parameters() show fixed effects restricted to 0 #715

Closed
SchmidtPaul opened this issue May 17, 2022 · 17 comments · Fixed by #902
Closed

make parameters() show fixed effects restricted to 0 #715

SchmidtPaul opened this issue May 17, 2022 · 17 comments · Fixed by #902
Assignees
Labels
Enhancement 💥 Implemented features can be improved or revised

Comments

@SchmidtPaul
Copy link

SchmidtPaul commented May 17, 2022

Sorry if I am missing something, but I can't find a way to include the fixed effects solutions that are set to 0 due to restrictions/constraint. Here is an example where SAS does it:
image

As far as I can tell, paramters::parameters() (and stats::coef()) will always drop them from the table:

levels(PlantGrowth$group)
#> [1] "ctrl" "trt1" "trt2"

m <- lm(weight ~ group, PlantGrowth)

parameters::parameters(m)
#> Registered S3 method overwritten by 'parameters':
#>   method                         from      
#>   format.parameters_distribution datawizard
#> Parameter    | Coefficient |   SE |        95% CI | t(27) |      p
#> ------------------------------------------------------------------
#> (Intercept)  |        5.03 | 0.20 | [ 4.63, 5.44] | 25.53 | < .001
#> group [trt1] |       -0.37 | 0.28 | [-0.94, 0.20] | -1.33 | 0.194 
#> group [trt2] |        0.49 | 0.28 | [-0.08, 1.07] |  1.77 | 0.088
#> 
#> Uncertainty intervals (equal-tailed) and p values (two-tailed) computed
#>   using a Wald t-distribution approximation.

Created on 2022-05-17 by the reprex package (v2.0.1)

Yet, I sometimes want an additional line group [ctrl] with just a 0 for Coefficient and NA for everything else in my parameters table. Is there a way to do this with {parameters}?

@strengejacke
Copy link
Member

You mean adding an additional row for the reference level of factors?

@bwiernik
Copy link
Contributor

bwiernik commented May 17, 2022

I can see the value of an include_reference_level argument to include the reference for factors

So it would show up something like:


#> Parameter         | Coefficient |   SE |        95% CI | t(27) |      p
#> -----------------------------------------------------------------------
#> (Intercept)       |        5.03 | 0.20 | [ 4.63, 5.44] | 25.53 | < .001
#> group [ref: ctrl] |             |      |               |       |
#> group [trt1]      |       -0.37 | 0.28 | [-0.94, 0.20] | -1.33 | 0.194 
#> group [trt2]      |        0.49 | 0.28 | [-0.08, 1.07] |  1.77 | 0.088
#> 

@bwiernik
Copy link
Contributor

I would leave all of columns aside from Parameter blank. Tricky thing might be detecting when there is a reference level (eg, only for treatment or SAS contrasts with an Intercept included)

@mattansb
Copy link
Member

This would only work for treatment coding, so that would need to be tested.

Personally, I don't see the merit of adding all these 0s?

@SchmidtPaul
Copy link
Author

Well I'd be happy with an include_reference_level argument that leads to all blanks and no 0s, too. (I just found 0s to be intuitive when teaching and I am used to it from SAS.)

@strengejacke
Copy link
Member

Personally, I don't see the merit of adding all these 0s?

This is mainly for completeness. Alternatively, you could add a footnote indicating the reference levels. And it's not that unusual to add the "estimate" (i.e. 0 for linear, or 1 for OR etc.), sometimes there's just "Ref." in the estimate column.

Here's an example of a recent paper:
image

The idea is to have a table that is completely self-explaining, so you don't need to read the methods section to remember all levels of categorical variables.

@mattansb
Copy link
Member

But then the intercept(s) are omitted, to have a table of slopes only?

This is less intuitive to me (I would prefer a clear label of the intercept instead), but I can understand why someone would want this (especially if they are accustomed to it).

@strengejacke
Copy link
Member

But then the intercept(s) are omitted, to have a table of slopes only?

No, not necessarily. This is more common in my field, where we're more interested in the strength of the associations instead of the predicted outcome. That's why we often omit the intercept in tables.

@strengejacke
Copy link
Member

I just saw in two other of my recent papers, intercepts are included in the tables ;-)

@strengejacke strengejacke added the Enhancement 💥 Implemented features can be improved or revised label May 18, 2022
@vincentarelbundock
Copy link
Contributor

My (controversial) view: This will add code complexity and convey little information (literally 0s and dots). I understand why we may want to add empty rows for presentation purposes in "finished" regression tables intended for publication, but that's not quite the job of parameters.

@bwiernik
Copy link
Contributor

parameters is the package that provides regression parameters tables that people display in publications

@vincentarelbundock
Copy link
Contributor

Haha, yeah, sorry. I guess I always only ever use/see the markdown in console, so I lost sight ;)

@strengejacke
Copy link
Member

Actually, we already do something similar for grouping parameters: https://easystats.github.io/parameters/articles/model_parameters_print.html#group-parameters

@bwiernik
Copy link
Contributor

Maybe we just add a special option "reference" to that argument that adds the reference to factors?

And maybe allow a subset of factors or combining reference and grouping by, if the argument is given a list with a slot called "reference", the reference formatting is applied to the stated factors?

@SchmidtPaul
Copy link
Author

SchmidtPaul commented Jul 12, 2022

I found that {broom.helpers} does what I was looking for and {ggally} is making use of that, too:

library(dplyr)

m <- lm(weight ~ group, PlantGrowth)

broom.helpers::tidy_plus_plus(model = m) %>% 
  select(term, contrasts:conf.high)
#> # A tibble: 3 x 12
#>   term     contrasts contrasts_type reference_row label n_obs estimate std.error
#>   <chr>    <chr>     <chr>          <lgl>         <chr> <dbl>    <dbl>     <dbl>
#> 1 groupct~ contr.tr~ treatment      TRUE          ctrl     10    0        NA    
#> 2 grouptr~ contr.tr~ treatment      FALSE         trt1     10   -0.371     0.279
#> 3 grouptr~ contr.tr~ treatment      FALSE         trt2     10    0.494     0.279
#> # ... with 4 more variables: statistic <dbl>, p.value <dbl>, conf.low <dbl>,
#> #   conf.high <dbl>

GGally::ggcoef_model(
  model = m,
  add_reference_rows = TRUE,
  categorical_terms_pattern = "{level} (ref: {reference_level})"
)

Created on 2022-07-12 by the reprex package (v2.0.1)

@strengejacke
Copy link
Member

See examples (and maybe further discussion) here:
#902

strengejacke added a commit that referenced this issue Sep 11, 2023
* make `parameters()` show fixed effects restricted to 0
Fixes #715

* progress

* FIXES

* docs

* minor, make it work for OR etc.

* fix

* lintr

* docs

* fix

* fix

* fix

* dont print "(ref.)"

* update news
@strengejacke
Copy link
Member

Use add_reference = TRUE in the print() method.

library(parameters)
data("fish")
m1 <- glmmTMB::glmmTMB(
  count ~ child + camper + zg + (1 | ID),
  ziformula = ~ child + camper + (1 | persons),
  data = fish,
  family = glmmTMB::truncated_poisson()
)
print(model_parameters(m1, effects = "fixed"), add_reference = TRUE)
#> # Fixed Effects
#> 
#> Parameter   | Log-Mean |   SE |         95% CI |     z |      p
#> ---------------------------------------------------------------
#> (Intercept) |     1.41 | 0.18 | [ 1.06,  1.75] |  8.02 | < .001
#> child       |    -0.53 | 0.12 | [-0.77, -0.29] | -4.40 | < .001
#> camper [0]  |     0.00 |      |                |       |       
#> camper [1]  |     0.58 | 0.10 | [ 0.39,  0.78] |  5.93 | < .001
#> zg          |     0.13 | 0.04 | [ 0.05,  0.21] |  3.17 | 0.002 
#> 
#> # Zero-Inflation
#> 
#> Parameter   | Log-Odds |   SE |         95% CI |     z |      p
#> ---------------------------------------------------------------
#> (Intercept) |    -0.39 | 0.65 | [-1.67,  0.89] | -0.60 | 0.551 
#> child       |     2.05 | 0.31 | [ 1.45,  2.66] |  6.63 | < .001
#> camper [0]  |     0.00 |      |                |       |       
#> camper [1]  |    -1.01 | 0.32 | [-1.64, -0.37] | -3.12 | 0.002
#> 
#> Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
#>   using a Wald z-distribution approximation.
#> 
#> The model has a log- or logit-link. Consider using `exponentiate =
#>   TRUE` to interpret coefficients as ratios.

data(mtcars)
mtcars$gear <- as.factor(mtcars$gear)
m <- glm(vs ~ wt + gear, data = mtcars, family = "binomial")
print(model_parameters(m, exponentiate = TRUE, drop = "(Intercept)"), add_reference = TRUE)
#> Parameter | Odds Ratio |   SE |        95% CI |     z |     p
#> -------------------------------------------------------------
#> wt        |       0.07 | 0.09 | [0.00,  0.52] | -2.05 | 0.040
#> gear [3]  |       1.00 |      |               |       |      
#> gear [4]  |       3.21 | 3.98 | [0.27, 41.36] |  0.94 | 0.348
#> gear [5]  |       0.03 | 0.07 | [0.00,  1.47] | -1.41 | 0.159
#> 
#> Uncertainty intervals (profile-likelihood) and p-values (two-tailed)
#>   computed using a Wald z-distribution approximation.

Created on 2023-09-11 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement 💥 Implemented features can be improved or revised
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants