Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarifying data collection to end users #172

Open
kevinrobinson opened this issue Mar 15, 2019 · 5 comments
Open

Clarifying data collection to end users #172

kevinrobinson opened this issue Mar 15, 2019 · 5 comments

Comments

@kevinrobinson
Copy link

kevinrobinson commented Mar 15, 2019

This came from an awesome helpful Twitter thread: https://twitter.com/dalelane/status/1106565327457054720, thanks @dalelane! 👍

I read https://cloud.ibm.com/docs/services/assistant?topic=assistant-information-security#information-security as suggesting that products using IBM services own responsibility for ethical use, and legal liability for ensuring compliance with local laws and regulations (eg, GDPR).

I also see pretty explicit guidance on personally identifiable information that doesn't seem like it's presented to end users:

Screen Shot 2019-03-15 at 11 20 08 AM

To me, helping young people understand these responsibilities seems super important for teaching them how to build AI systems ethically. Would you be open to a pull request that tried to do this?

@kevinrobinson
Copy link
Author

oops, thinks I put this on the wrong issue earlier, sorry!

Also, the help awesomely describes the site's use of things like the analytics and error reporting services.

Screen Shot 2019-03-15 at 11 46 05 AM

Teachers or parents may read this copy and have the expectation that this is the exhaustive list of where the data that they submit is stored. But it doesn't mention anything about sending data to IBM services, and that data storage, so it seems like maybe this should be updated to include that as well.

What do you think makes sense?

@dalelane
Copy link
Member

Yeah, I'd be happy with a pull request that improved the wording.

The English version of the text is here:

"Q4": "What information is stored about users?",
"Q4-A-1": "Teacher accounts",
"Q4-A-2": "I store the username and email address that you provide when you create the account, so that I can keep in touch with you.",
"Q4-A-3": "Student accounts",
"Q4-A-4": "I store the usernames for student accounts, so that I can let them log in. I have no email addresses or contact details for students. I have no other personally identifying information about students.",
"Q4-A-5": "I also <a href='https://github.com/IBM/taxinomitis-docs/raw/master/docs/pdf/machinelearningforkids-schools.pdf'>recommend that students are given generic usernames</a> (e.g. student1) so that students are not identifiable outside of their school or coding group.",
"Q4-A-6": "General information stored about all accounts",
"Q4-A-7": "User management for Machine Learning for Kids is implemented using the third party service, <a href='https://auth0.com/'>Auth0</a>. They store the IP address that you last logged into Machine Learning for Kids from, and the type of browser you used. I've never found a reason to use that, but it is stored if I did want to go and look for it.",
"Q4-A-8": "Errors that happen in the web browser are captured using the third party service, <a href='https://sentry.io/'>Sentry</a>. If something goes wrong, it will capture information about the error, including your username, IP address, type of browser you were using, and a technical description of what went wrong.",
"Q4-A-9": "I use <a href='https://www.google.com/analytics/'>Google Analytics</a> so that I know how many users visit Machine Learning for Kids each day. Although it captures information such as geographic location and browser type, this is only ever displayed to me in an anonymised aggregate way."

And is displayed here:

<div class="panel panel-default">
<div class="panel-heading">
<h4 class="panel-title">
<a data-toggle="collapse" href="#helpPersonalInfo" target="_self" translate="HELP.ACCOUNTISSUES.Q4"></a>
</h4>
</div>
<div id="helpPersonalInfo" class="panel-collapse collapse">
<div class="panel-body">
<p><strong translate="HELP.ACCOUNTISSUES.Q4-A-1"></strong></p>
<p translate="HELP.ACCOUNTISSUES.Q4-A-2"></p>
<p><strong translate="HELP.ACCOUNTISSUES.Q4-A-3"></strong></p>
<p translate="HELP.ACCOUNTISSUES.Q4-A-4"></p>
<p translate="HELP.ACCOUNTISSUES.Q4-A-5"></p>
<p><strong translate="HELP.ACCOUNTISSUES.Q4-A-6"></strong></p>
<p translate="HELP.ACCOUNTISSUES.Q4-A-7"></p>
<p translate="HELP.ACCOUNTISSUES.Q4-A-8"></p>
<p translate="HELP.ACCOUNTISSUES.Q4-A-9"></p>
</div>
</div>
</div>

@kevinrobinson
Copy link
Author

kevinrobinson commented Mar 20, 2019

@dalelane awesome! I opened #173 as a first step, thanks for your help! 👍

I also thought it might be good to put a notice and link with guidelines directly into the point of the app where folks are adding training data, what do you think? That way folks can realize where the data is going, even if they don't go dig into the fine print like I did.

So on the screen like this...
Screen Shot 2019-03-20 at 5 08 55 PM

...one approach might be to add something like this:
Screen Shot 2019-03-20 at 5 28 24 PM

The idea is that this might help prevent any surprises, especially for young children who are new to ML and services. Thanks for listening, and I'm happy to help out with this too, or with other ideas you have. I didn't do anything in that first PR, and wanted to see what you thought first. 👍 Thanks!

@dalelane
Copy link
Member

I've made an effort to keep the UI for kids as clean, simple and uncluttered as possible. The majority of the site users, as far as I know, are primary school age. I think this sort of legalese warning is unlikely to be useful or helpful to a 7 year old. At worst, it'll confuse them. At best, they'll likely ignore it. My (admittedly untested) assumption is that a message like this won't really be an effective way to address the issue here.

The simplest approach would be to separate this out for teacher and student users. Give the teacher/parent users all the detailed info (links to more info, explain what is happening, explain the implications, etc.) and keep it out of the student / training UI. I'm very comfortable putting any information in the hands of teachers/parents and letting them decide what to do about it, and what is appropriate to tell their children/students.

The more nuanced approach would be to also add something to the training/student UI, but make it much more child-friendly. If anything is going in the training UI, it needs to be something that would make sense to a young child. That needs more thought about where it should go, the sort of language that should be used, how it should be explained, how it should be presented, etc.

@kevinrobinson
Copy link
Author

@dalelane This is awesome, thanks for sharing all this! ❤️ 👍

Yeah, I think your question gets right to the heart of this - as we make awesome new ways for young children to make their own things with computing, and use more powerful tools like third-party services, how do we teach them about risks along the way, and help them do this ethically and safely? I'm super excited about this project and others like it that are trying to tackle these hard questions, and doing it with young children, rather than limiting kids' access. 💻 😄

For this suggestion, I was thinking that the primary audience here is first the CS teachers, volunteers, or parents who would be introducing this to students, to help make sure they were aware of these issues so they can decide for themselves what to do. Showing something simpler to young people seems even better! Especially if it's understandable and not wall-of-legal-text.

To brainstorm, I remembered how Scratch cues young people not to use their real names in the sign-in flow. This seems like a good balance maybe:

Screen Shot 2019-03-21 at 12 53 55 PM

From there I tried to make something subtle to the UI at the point where children are deciding what to enter as data, but that is also direct. This uses the example from Twitter earlier, where students were entering text messages to train a model that could tell who they were talking to in their family, so the buckets are "mom" and "sister":

Another iteration might check the text for common things in the browser before it travels over the network (eg, names, birthdates, other potentially personally identifiable information) and warn:

It's hard to find the balance between simple, clean and low-friction learning, while also helping folks who are new to machine learning to learn about how to do it right in terms of privacy and ethics, especially with young children.

I'm super happy to keep brainstorming with you on this too, or to pitch in with working on this over time, and help with finding whatever you think the right balance is for your project. The simple UX is so great, and I'm excited to try out more ways of using this with kids. Thanks again for sharing your awesome work in the open! 👍

dalelane added a commit that referenced this issue Mar 25, 2019
Contributes to: #172

Signed-off-by: Dale Lane <dale.lane@uk.ibm.com>
dalelane added a commit that referenced this issue Mar 25, 2019
Contributes to: #172

Signed-off-by: Dale Lane <dale.lane@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants