Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot for showing the proportion of missing data per form/country #8

Open
aghaynes opened this issue Apr 30, 2021 · 3 comments
Open
Labels
enhancement New feature or request question Further information is requested

Comments

@aghaynes
Copy link
Member

Is your feature request related to a problem? Please describe.

image

Describe alternatives you've considered

inst <- comp %>%
  group_by(institute.name) %>%
  mutate(n_pat = length(unique(.data$record_id))) %>%
  group_by(institute.name, form) %>%
  summarize(n_pat = max(n_pat),
            n_var = max(N),
            n = sum(n),
            N = sum(N)
            ) %>%
  mutate(pct = n/N*100) %>%
  ungroup()

xval <- inst %>%
  group_by(form) %>%
  summarize(width = log(max(n_var))) %>%
  arrange(match(form, names(dfs))) %>%
  mutate(x1 = cumsum(width),
         x0 = x1 - width,
         lab_x = x0 + width/2) %>%
  select(form, x1, x0, width, lab_x)

yval <- inst %>%
  group_by(institute.name) %>%
  summarize(height = log(max(n_pat)+1)) %>%
  arrange(-order(institute.name)) %>%
  mutate(y1 = cumsum(height),
         y0 = y1 - height,
         lab_y = y0 + height/2) %>%
  select(institute.name, y0, y1, height, lab_y)

p <- inst %>%
  full_join(xval, by = "form") %>%
  full_join(yval, by = "institute.name") %>%
  ggplot(aes(xmin = x0, xmax = x1, ymin = y0, ymax = y1, fill = pct)) +
  geom_rect() +
  myggtheme() +
  viridis::scale_fill_viridis(option="viridis"
                              , direction = -1
                              ) +
  # scale_fill_continuous(type = "viridis") +
  # scale_fill_gradientn(colours = viridisLite::mako(256)) +
  labs(fill = "% complete") +
  scale_x_continuous(
    # breaks = xval$lab_x,
    breaks = c(0,xval$x1),
    # labels = xval$form,
    labels = rep("", length(c(0,xval$x1))),
    position = "top") +
  scale_y_continuous(
    breaks = c(0,yval$y1),
    labels = rep("", length(c(0,yval$y1)))
    # breaks = yval$lab_y,
    # labels = yval$institute.name
    ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 0),
        plot.margin = unit(c(4.5,0,0,5), units = "lines")
        )

yrange <- ggplot_build(p)$layout$panel_scales_y[[1]]$range$range
ylabpos <- yrange[2] + (yrange[2] - yrange[1]) * 0.06
xrange <- ggplot_build(p)$layout$panel_scales_x[[1]]$range$range
xlabpos <- xrange[1] - (xrange[2] - xrange[1]) * 0.06
# xlabels
for (i in 1:length(xval$form))  {
  p <- p + annotation_custom(
    grob = textGrob(label = xval$form[i],
                    hjust = 0, vjust = 0,
                    gp = gpar(cex = 0.8),
                    rot = 45),
    ymin = ylabpos,      # Vertical position of the textGrob
    ymax = ylabpos,
    xmin = xval$lab_x[i], # Note: The grobs are positioned outside the plot area
    xmax = xval$lab_x[i])
}
# ylabels
for (i in 1:length(yval$y0))  {
  p <- p + annotation_custom(
    grob = textGrob(label = yval$institute.name[i],
                    hjust = 1, vjust = 0,
                    gp = gpar(cex = 0.8),
                    rot = 0),
    ymin = yval$lab_y[i],      # Vertical position of the textGrob
    ymax = yval$lab_y[i],
    xmin = xlabpos, # Note: The grobs are positioned outside the plot area
    xmax = xlabpos)
}
gt <- ggplot_gtable(ggplot_build(p))
gt$layout$clip[gt$layout$name == "panel"] <- "off"

pdf(paths("fd", "complete_matrix_site.pdf"),
       height = cm2inch(16), width = cm2inch(16))
grid.draw(gt)

dev.off()

@aghaynes aghaynes added enhancement New feature or request question Further information is requested labels Apr 30, 2021
@aghaynes
Copy link
Member Author

usefulness? generalizability?

@mbranca
Copy link
Contributor

mbranca commented Jun 18, 2021

Like here (I think), this is useful to be done per form (% of completeness per form), like # complete items / # total items. and then average over the site. Correct?

@aghaynes
Copy link
Member Author

probably, site height is proportional to the number of participants from that site and width is proportional to the number of fields on that form

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants