-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset should be messier and larger #108
Comments
I got a question about the many There are other inconsistencies that are not covered by the lesson:
Maybe I should say that the data is messy in the wrong places ;) |
@datacarpentry/curriculum-advisors-social-science Your input would be very welcome. |
Yes
Yes (perhaps with accents, whitespace, etc)
In addition to this, we could consider adding some dates in a different format to show that they can be processed too
Yes (something like NULL or NA would work). Also, we can add missing data codes (-99, etc) if we want the numeric facet to be especially useful. |
Thanks for your comments, @ndporter! I edited them to get the formatting right. I do wonder who would be responsible for actually updating the dataset... |
CAC agrees with all above recommendations, and further suggests mentioning how to manually specify encodings (esp for work with non-English text), perhaps following the model in step 4 listed in the LC lesson. Thanks! |
@bencomp I recommend adding the 'help wanted' label to this issue now, and removing the CAC and discussion labels so that would-be contributors know it is ready to be tackled. You could also pin the issue to the repo issue listing, so that it is even more visible. Finally, as the desired changes are spread across several posts here, it might be helpful to summarise what changes should be made to solve the issue, all in one post at the end of this thread? If you would like to take additional steps to encourage community members to contribute, we could post about the issue on Slack, and offer help to anyone who is interested but not fully confident with making changes to a lesson. [Edit: when it comes to it, the Curriculum Team will be able to log in and take care of updating the FigShare entry to include the new version of the dataset.] |
Thanks for the feedback, @ndporter! And thanks for the suggestions and explanation, @tobyhodges! Edit: I'm moving the tasks to the top of the issue. |
The prepared SAFI dataset is, I think, not messy enough to really show OpenRefine's power.
I would like:
See also #35. The number of columns doesn't make the data messy.
Summarising the to-dos from the discussion below:
village
andrespondent_roof_type
columns in rows that are far apartvillage
columnWhile we are making changes, I think this should (or could) be part of the update too, even though it was part of #29 and not explicitly mentioned now:
The text was updated successfully, but these errors were encountered: