Clarifying data collection to end users #172

kevinrobinson · 2019-03-15T15:40:16Z

This came from an awesome helpful Twitter thread: https://twitter.com/dalelane/status/1106565327457054720, thanks @dalelane! 👍

I read https://cloud.ibm.com/docs/services/assistant?topic=assistant-information-security#information-security as suggesting that products using IBM services own responsibility for ethical use, and legal liability for ensuring compliance with local laws and regulations (eg, GDPR).

I also see pretty explicit guidance on personally identifiable information that doesn't seem like it's presented to end users:

To me, helping young people understand these responsibilities seems super important for teaching them how to build AI systems ethically. Would you be open to a pull request that tried to do this?

kevinrobinson · 2019-03-15T21:36:36Z

oops, thinks I put this on the wrong issue earlier, sorry!

Also, the help awesomely describes the site's use of things like the analytics and error reporting services.

Teachers or parents may read this copy and have the expectation that this is the exhaustive list of where the data that they submit is stored. But it doesn't mention anything about sending data to IBM services, and that data storage, so it seems like maybe this should be updated to include that as well.

What do you think makes sense?

dalelane · 2019-03-16T22:07:41Z

Yeah, I'd be happy with a pull request that improved the wording.

The English version of the text is here:

taxinomitis/public/languages/en.json

Lines 1001 to 1010 in adaec31

    
           "Q4": "What information is stored about users?", 
        
           "Q4-A-1": "Teacher accounts", 
        
           "Q4-A-2": "I store the username and email address that you provide when you create the account, so that I can keep in touch with you.", 
        
           "Q4-A-3": "Student accounts", 
        
           "Q4-A-4": "I store the usernames for student accounts, so that I can let them log in. I have no email addresses or contact details for students. I have no other personally identifying information about students.", 
        
           "Q4-A-5": "I also <a href='https://github.com/IBM/taxinomitis-docs/raw/master/docs/pdf/machinelearningforkids-schools.pdf'>recommend that students are given generic usernames</a> (e.g. student1) so that students are not identifiable outside of their school or coding group.", 
        
           "Q4-A-6": "General information stored about all accounts", 
        
           "Q4-A-7": "User management for Machine Learning for Kids is implemented using the third party service, <a href='https://auth0.com/'>Auth0</a>. They store the IP address that you last logged into Machine Learning for Kids from, and the type of browser you used. I've never found a reason to use that, but it is stored if I did want to go and look for it.", 
        
           "Q4-A-8": "Errors that happen in the web browser are captured using the third party service, <a href='https://sentry.io/'>Sentry</a>. If something goes wrong, it will capture information about the error, including your username, IP address, type of browser you were using, and a technical description of what went wrong.", 
        
           "Q4-A-9": "I use <a href='https://www.google.com/analytics/'>Google Analytics</a> so that I know how many users visit Machine Learning for Kids each day. Although it captures information such as geographic location and browser type, this is only ever displayed to me in an anonymised aggregate way."

And is displayed here:

taxinomitis/public/components/help/help.html

Lines 362 to 381 in adaec31

    
           <div class="panel panel-default"> 
        
               <div class="panel-heading"> 
        
                   <h4 class="panel-title"> 
        
                       <a data-toggle="collapse" href="#helpPersonalInfo" target="_self" translate="HELP.ACCOUNTISSUES.Q4"></a> 
        
                   </h4> 
        
               </div> 
        
               <div id="helpPersonalInfo" class="panel-collapse collapse"> 
        
                   <div class="panel-body"> 
        
                       <p><strong translate="HELP.ACCOUNTISSUES.Q4-A-1"></strong></p> 
        
                       <p translate="HELP.ACCOUNTISSUES.Q4-A-2"></p> 
        
                       <p><strong translate="HELP.ACCOUNTISSUES.Q4-A-3"></strong></p> 
        
                       <p translate="HELP.ACCOUNTISSUES.Q4-A-4"></p> 
        
                       <p translate="HELP.ACCOUNTISSUES.Q4-A-5"></p> 
        
                       <p><strong translate="HELP.ACCOUNTISSUES.Q4-A-6"></strong></p> 
        
                       <p translate="HELP.ACCOUNTISSUES.Q4-A-7"></p> 
        
                       <p translate="HELP.ACCOUNTISSUES.Q4-A-8"></p> 
        
                       <p translate="HELP.ACCOUNTISSUES.Q4-A-9"></p> 
        
                   </div> 
        
               </div> 
        
           </div>

kevinrobinson · 2019-03-20T21:36:26Z

@dalelane awesome! I opened #173 as a first step, thanks for your help! 👍

I also thought it might be good to put a notice and link with guidelines directly into the point of the app where folks are adding training data, what do you think? That way folks can realize where the data is going, even if they don't go dig into the fine print like I did.

So on the screen like this...

...one approach might be to add something like this:

The idea is that this might help prevent any surprises, especially for young children who are new to ML and services. Thanks for listening, and I'm happy to help out with this too, or with other ideas you have. I didn't do anything in that first PR, and wanted to see what you thought first. 👍 Thanks!

dalelane · 2019-03-21T15:40:18Z

I've made an effort to keep the UI for kids as clean, simple and uncluttered as possible. The majority of the site users, as far as I know, are primary school age. I think this sort of legalese warning is unlikely to be useful or helpful to a 7 year old. At worst, it'll confuse them. At best, they'll likely ignore it. My (admittedly untested) assumption is that a message like this won't really be an effective way to address the issue here.

The simplest approach would be to separate this out for teacher and student users. Give the teacher/parent users all the detailed info (links to more info, explain what is happening, explain the implications, etc.) and keep it out of the student / training UI. I'm very comfortable putting any information in the hands of teachers/parents and letting them decide what to do about it, and what is appropriate to tell their children/students.

The more nuanced approach would be to also add something to the training/student UI, but make it much more child-friendly. If anything is going in the training UI, it needs to be something that would make sense to a young child. That needs more thought about where it should go, the sort of language that should be used, how it should be explained, how it should be presented, etc.

kevinrobinson · 2019-03-21T17:42:58Z

@dalelane This is awesome, thanks for sharing all this! ❤️ 👍

Yeah, I think your question gets right to the heart of this - as we make awesome new ways for young children to make their own things with computing, and use more powerful tools like third-party services, how do we teach them about risks along the way, and help them do this ethically and safely? I'm super excited about this project and others like it that are trying to tackle these hard questions, and doing it with young children, rather than limiting kids' access. 💻 😄

For this suggestion, I was thinking that the primary audience here is first the CS teachers, volunteers, or parents who would be introducing this to students, to help make sure they were aware of these issues so they can decide for themselves what to do. Showing something simpler to young people seems even better! Especially if it's understandable and not wall-of-legal-text.

To brainstorm, I remembered how Scratch cues young people not to use their real names in the sign-in flow. This seems like a good balance maybe:

From there I tried to make something subtle to the UI at the point where children are deciding what to enter as data, but that is also direct. This uses the example from Twitter earlier, where students were entering text messages to train a model that could tell who they were talking to in their family, so the buckets are "mom" and "sister":

Another iteration might check the text for common things in the browser before it travels over the network (eg, names, birthdates, other potentially personally identifiable information) and warn:

It's hard to find the balance between simple, clean and low-friction learning, while also helping folks who are new to machine learning to learn about how to do it right in terms of privacy and ethics, especially with young children.

I'm super happy to keep brainstorming with you on this too, or to pitch in with working on this over time, and help with finding whatever you think the right balance is for your project. The simple UX is so great, and I'm excited to try out more ways of using this with kids. Thanks again for sharing your awesome work in the open! 👍

Contributes to: #172 Signed-off-by: Dale Lane <dale.lane@uk.ibm.com>

kevinrobinson mentioned this issue Mar 20, 2019

Add info about data sent to IBM services in help #173

Closed

dalelane added a commit that referenced this issue Mar 25, 2019

docs: Document storage implications for training data

4210e67

Contributes to: #172 Signed-off-by: Dale Lane <dale.lane@uk.ibm.com>

dalelane added a commit that referenced this issue Mar 25, 2019

docs: Document storage implications for training data

3bd9de2

Contributes to: #172 Signed-off-by: Dale Lane <dale.lane@uk.ibm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifying data collection to end users #172

Clarifying data collection to end users #172

kevinrobinson commented Mar 15, 2019 •

edited

Loading

kevinrobinson commented Mar 15, 2019

dalelane commented Mar 16, 2019

kevinrobinson commented Mar 20, 2019 •

edited

Loading

dalelane commented Mar 21, 2019

kevinrobinson commented Mar 21, 2019

Clarifying data collection to end users #172

Clarifying data collection to end users #172

Comments

kevinrobinson commented Mar 15, 2019 • edited Loading

kevinrobinson commented Mar 15, 2019

dalelane commented Mar 16, 2019

kevinrobinson commented Mar 20, 2019 • edited Loading

dalelane commented Mar 21, 2019

kevinrobinson commented Mar 21, 2019

kevinrobinson commented Mar 15, 2019 •

edited

Loading

kevinrobinson commented Mar 20, 2019 •

edited

Loading