Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Data Culture Questions - community perspective #11

Open
FrTr opened this issue Jun 19, 2017 · 5 comments
Open

Research Data Culture Questions - community perspective #11

FrTr opened this issue Jun 19, 2017 · 5 comments

Comments

@FrTr
Copy link

FrTr commented Jun 19, 2017

We had a project in Germany in Baden-Württemberg (bwFDM-Communities) and intensively interviewed almost 800 scientists from all disciplines in our federal state about their data handling and further needs. So we have a very detailed overview. The results are only available in German. However, I will try to extract short answers to your data culture questions today from a community perspective.

How FAIR are current research practices in different parts of the research ecosystem (e.g., disciplines, sectors, geographic regions…)?

  • You should first consider, that half of our scientists were happy with data availability in their disciplines. They might not always know what they miss and a good third was unhappy, but it is hard to motivate beyond that without presenting clear individual benefits. FAIR practices vary a lot, even within a disciplin (like architecture or zoology), but there are also connections beyond specific domains. I once had this picture on a slide to increase our understanding and it might be helpful to categorize from a different perspective (data contributes to a scientific goal and is more or less important): https://doi.org/10.6084/m9.figshare.5117728
  • Please also have the long tail of science in mind. In our survey only 10% of the researchers had already published data openly. That there "is" data can be seen, when this number is compared to 40% who already shared data when they were asked for it and 90% who already shared data within a project or their research group. This is not fully representative, because small groups were overrepresented in our survey (group-wise questioning), but these numbers can still serve as rough estimates for the situation in Germany 3 years ago.

What are good examples of aggregating large amounts of data of different origins and how this offers new scientific possibilities?

What are the key barriers to FAIR practices (e.g. lack of metadata standards, domain repositories, data sharing norms, …)?

In our survey we collected for Chapters 2.3, 3.3 and 5.2 (http://bwfdm.scc.kit.edu/downloads/Abschlussbericht.pdf (in German!))

  • 243 user stories that wished a better suited or a useful repository at all
    Domain repositories are more preferred than institutional ones, especially when a data search need was formulated for own research progress
    "institutional" repositories were wished for "archiving" data (but also there not preferred). They almost always had special "origin" e.g. to: keep full control over access, (automatic) link to own publications, have central visibility, store high amount of data, have reliability, trustworthiness and similar properties
  • 199 user stories that wished a better scientific culture in their disciplines
  • 122 user stories that wished more consistent (meta)data formats and standards
  • 89 user stories that wished more compatible software
  • 38 user stories that wished clearer guidelines on the formats, archival process etc. to use
  • Trust is also very important. A lot of data will not be trusted enough to encourage own work on it, especially when the corresponding paper is not in a high-impact journal: That has to do with the replication crisis: https://doi.org/10.1038%2F533452a

What could different players do to increase the FAIRness of data within their remit? How have research workflows actually been adapted in order to make data more FAIR? How much of this can be automated?

  • In my opinion we need a certificate for OAIS conform organisational structures at universities (no repository or technology issues). That means checking if there is for each created set of data a person who is in charge of making it FAIR and a control over that. It is not enough to say "the scientists should...", scientists are just "producers". The management and responsibility roles have to be clearly defined from the group domain upwards and cover cases of leaving scientists and incorporate professional data managers. This is not difficult to do, but difficult to enforce.
  • We should teach more about automation possibilities and support these in specific disciplines. This is an easy way to combine better data structures and reproducibility with higher research speed.
  • We have a lot of different pilot projects in Baden-Württemberg that try to make data more FAIR. To wrap it up I would answer two things:
    First: The publication of data is coming earlier in the scientific workflow. Often this is possible, although there are "fears". To work around that, I would suggest to give data the same credit as the publication it directly supports, if it is published within 2 years in front, even if it is not mentioned in the publication. That way even if another group overtakes the own "discovery" by steeling and "reproducing" the data, one could claim for the same discovery without writing a paper, if the own data is FAIR enough. (That is the idea in a nutshell). This incentive would wipe out most "sharing-fears" and FAIR-problems.
    Second: Scientific workflows are built more robust, reproducible and standardised. The most difficult thing here is to transfer single developments over universities or domains. I have no full solution for that. What we try now is to establish a local professional network to support this and discuss with all new "data-projects" some aspects of further use from the start. But it also depends on the incentive to use work that matches the "NIH syndrome".
  • Critical players in disciplines can be service or data providers (like huge instruments) that can foster a good culture in whole disciplines by their terms of use.
  • Repositories should invest in user friendly functionality and get closer to the users e.g. by:
  1. export functions to a generic data formats
  2. (web)software to visualize data
  3. (web)software to analyze data (e.g. statistically) and search by available analysis output
  4. discussion site for each data set (e.g. with answers from the author)

What training opportunities and career paths are needed to support researchers and other players in the research ecosystem with data management and sharing?

  • A researcher who contributes mainly to a groups infrastructure (by caring for the data) needs a real perspective.

What improvements could be made to the current EC approach to DMPs?
How can DMPs become more integrated and machine-actionable?

  • I would like to see some standardised nicely coloured (and machine-readable) data-flow-pictures at the end of DMPs. What are the data sources, what are the processing steps and where do all products or primary data will be published and/or archived. So the reviewers can easily see if the important data is included in the plan and if it "ends" green, red or yellow. Nobody wants to read or write these things into longish DMPs.
@band
Copy link

band commented Jun 19, 2017

👍

@band
Copy link

band commented Jun 19, 2017

For some reason (that I cannot figure out) all the URLs, except for the link to the PDF file, resolve to this: https://github.com/FAIR-Data-EG/consultation/issues/url (and return 404s).

@FrTr
Copy link
Author

FrTr commented Jun 20, 2017

Thx, now it should work.

@CaroleGoble
Copy link

Very interesting - a related study
Van den Eynden, Veerle; Knight, Gareth; Vlad, Anca; Radler, Barry; Tenopir, Carol; Leon, David; Manista, Frank; Whitworth, Jimmy; Corti, Louise (2016): Survey of Wellcome researchers and their attitudes to open research. figshare.
https://dx.doi.org/10.6084/m9.figshare.4055448.v1Retrieved: 20 19, Dec 05, 2016 (GMT)

@CaroleGoble
Copy link

CaroleGoble commented Jul 31, 2017

The EC EOSC Intents at the EOSCSummit included
[FAIR Incentives] incentives given to research organisations and brokers to commit.
The implication of this is data centre based. But its not. there is a key role for institutional repositories and for public archives and subject-specific repositories. It is important to enable and support specialist data providers who are often PIs running specialist systems - for example CATH protein structure classification database run by Prof Orengo of UCL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants