Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added group_by parameter to util_corr_fit() #69

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Added group_by parameter to util_corr_fit() #69

wants to merge 12 commits into from

Conversation

emcfalls
Copy link
Contributor

No description provided.

…nces where the group_by variable(s) are different for the synthetic and actual data
@awunderground awunderground changed the base branch from version0.0.2 to version0.0.4 May 10, 2024 17:37
Copy link
Contributor

@awunderground awunderground left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. What happens if a group exists in one data set but not the other data set?
  2. I like how you restructured the output of util_corr(), but it broke ALL of the tests. We need to rewrite the tests to reference the new output structure.


# reorder data names
# reorder data names (this appears to check if the variables are the same)
# issue when the groups in the synthetic data do not match the groups in the og data, and vice versa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"og data" may be a little casual for our roxygen headers...

dplyr::group_split(dplyr::across({{ group_by }}))

groups <- lapply(data, function(x) dplyr::select(x, {{ group_by }}) |>
slice(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dplyr::slice() instead of just slice().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace this with count(data, groups)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is to add the group by variables to the final datasets. I can add additional code to add the Ns to the metric data. I need to think more about how to add it to the corr_data dataset.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count(data, {{ group_by }}) will return a data frame with the groups and the frequency of the groups that you can plug into bind_cols() below.

Comment on lines 67 to 70
return(list(
corr_data,
metrics
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return(
  list(
    corr_data,
    metrics
  )
)

data <- dplyr::select(data, names(synthetic_data))

synthetic_data <- dplyr::select(synthetic_data, dplyr::where(is.numeric), {{ group_by }}) |>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still using %>% instead of |> now to make sure the code is backwards compatible with R < 4.0.0.

difference = .data$original - .data$synthetic,
proportion_difference = .data$difference / .data$original)

correlation_data <- bind_cols(correlation_data, groups)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dplyr::binds_cols()

Comment on lines 59 to 61
correlation_fit = map_dbl(results, "correlation_fit"),
correlation_difference_mae = map_dbl(results, "correlation_difference_mae"),
correlation_difference_rmse = map_dbl(results, "correlation_difference_rmse"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purrr::map_dbl()

correlation_fit = map_dbl(results, "correlation_fit"),
correlation_difference_mae = map_dbl(results, "correlation_difference_mae"),
correlation_difference_rmse = map_dbl(results, "correlation_difference_rmse"),
bind_rows(groups)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dplyr::bind_rows()

bind_rows(groups)
)

corr_data <- dplyr::bind_rows(map_dfr(results, "correlation_data"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purrr::map_dfr()

Base automatically changed from version0.0.4 to main October 30, 2024 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants