You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We had a project in Germany in Baden-Württemberg (bwFDM-Communities) and intensively interviewed almost 800 scientists from all disciplines in our federal state about their data handling and further needs. So we have a very detailed overview. The results are only available in German. However, I will try to extract short answers to your data culture questions today from a community perspective.
How FAIR are current research practices in different parts of the research ecosystem (e.g., disciplines, sectors, geographic regions…)?
You should first consider, that half of our scientists were happy with data availability in their disciplines. They might not always know what they miss and a good third was unhappy, but it is hard to motivate beyond that without presenting clear individual benefits. FAIR practices vary a lot, even within a disciplin (like architecture or zoology), but there are also connections beyond specific domains. I once had this picture on a slide to increase our understanding and it might be helpful to categorize from a different perspective (data contributes to a scientific goal and is more or less important): https://doi.org/10.6084/m9.figshare.5117728
Please also have the long tail of science in mind. In our survey only 10% of the researchers had already published data openly. That there "is" data can be seen, when this number is compared to 40% who already shared data when they were asked for it and 90% who already shared data within a project or their research group. This is not fully representative, because small groups were overrepresented in our survey (group-wise questioning), but these numbers can still serve as rough estimates for the situation in Germany 3 years ago.
What are good examples of aggregating large amounts of data of different origins and how this offers new scientific possibilities?
243 user stories that wished a better suited or a useful repository at all
Domain repositories are more preferred than institutional ones, especially when a data search need was formulated for own research progress
"institutional" repositories were wished for "archiving" data (but also there not preferred). They almost always had special "origin" e.g. to: keep full control over access, (automatic) link to own publications, have central visibility, store high amount of data, have reliability, trustworthiness and similar properties
199 user stories that wished a better scientific culture in their disciplines
122 user stories that wished more consistent (meta)data formats and standards
89 user stories that wished more compatible software
38 user stories that wished clearer guidelines on the formats, archival process etc. to use
Trust is also very important. A lot of data will not be trusted enough to encourage own work on it, especially when the corresponding paper is not in a high-impact journal: That has to do with the replication crisis: https://doi.org/10.1038%2F533452a
What could different players do to increase the FAIRness of data within their remit? How have research workflows actually been adapted in order to make data more FAIR? How much of this can be automated?
In my opinion we need a certificate for OAIS conform organisational structures at universities (no repository or technology issues). That means checking if there is for each created set of data a person who is in charge of making it FAIR and a control over that. It is not enough to say "the scientists should...", scientists are just "producers". The management and responsibility roles have to be clearly defined from the group domain upwards and cover cases of leaving scientists and incorporate professional data managers. This is not difficult to do, but difficult to enforce.
We should teach more about automation possibilities and support these in specific disciplines. This is an easy way to combine better data structures and reproducibility with higher research speed.
We have a lot of different pilot projects in Baden-Württemberg that try to make data more FAIR. To wrap it up I would answer two things: First: The publication of data is coming earlier in the scientific workflow. Often this is possible, although there are "fears". To work around that, I would suggest to give data the same credit as the publication it directly supports, if it is published within 2 years in front, even if it is not mentioned in the publication. That way even if another group overtakes the own "discovery" by steeling and "reproducing" the data, one could claim for the same discovery without writing a paper, if the own data is FAIR enough. (That is the idea in a nutshell). This incentive would wipe out most "sharing-fears" and FAIR-problems. Second: Scientific workflows are built more robust, reproducible and standardised. The most difficult thing here is to transfer single developments over universities or domains. I have no full solution for that. What we try now is to establish a local professional network to support this and discuss with all new "data-projects" some aspects of further use from the start. But it also depends on the incentive to use work that matches the "NIH syndrome".
Critical players in disciplines can be service or data providers (like huge instruments) that can foster a good culture in whole disciplines by their terms of use.
Repositories should invest in user friendly functionality and get closer to the users e.g. by:
export functions to a generic data formats
(web)software to visualize data
(web)software to analyze data (e.g. statistically) and search by available analysis output
discussion site for each data set (e.g. with answers from the author)
What training opportunities and career paths are needed to support researchers and other players in the research ecosystem with data management and sharing?
A researcher who contributes mainly to a groups infrastructure (by caring for the data) needs a real perspective.
What improvements could be made to the current EC approach to DMPs?
How can DMPs become more integrated and machine-actionable?
I would like to see some standardised nicely coloured (and machine-readable) data-flow-pictures at the end of DMPs. What are the data sources, what are the processing steps and where do all products or primary data will be published and/or archived. So the reviewers can easily see if the important data is included in the plan and if it "ends" green, red or yellow. Nobody wants to read or write these things into longish DMPs.
The text was updated successfully, but these errors were encountered:
Very interesting - a related study
Van den Eynden, Veerle; Knight, Gareth; Vlad, Anca; Radler, Barry; Tenopir, Carol; Leon, David; Manista, Frank; Whitworth, Jimmy; Corti, Louise (2016): Survey of Wellcome researchers and their attitudes to open research. figshare. https://dx.doi.org/10.6084/m9.figshare.4055448.v1Retrieved: 20 19, Dec 05, 2016 (GMT)
The EC EOSC Intents at the EOSCSummit included
[FAIR Incentives] incentives given to research organisations and brokers to commit.
The implication of this is data centre based. But its not. there is a key role for institutional repositories and for public archives and subject-specific repositories. It is important to enable and support specialist data providers who are often PIs running specialist systems - for example CATH protein structure classification database run by Prof Orengo of UCL.
We had a project in Germany in Baden-Württemberg (bwFDM-Communities) and intensively interviewed almost 800 scientists from all disciplines in our federal state about their data handling and further needs. So we have a very detailed overview. The results are only available in German. However, I will try to extract short answers to your data culture questions today from a community perspective.
How FAIR are current research practices in different parts of the research ecosystem (e.g., disciplines, sectors, geographic regions…)?
What are good examples of aggregating large amounts of data of different origins and how this offers new scientific possibilities?
What are the key barriers to FAIR practices (e.g. lack of metadata standards, domain repositories, data sharing norms, …)?
In our survey we collected for Chapters 2.3, 3.3 and 5.2 (http://bwfdm.scc.kit.edu/downloads/Abschlussbericht.pdf (in German!))
Domain repositories are more preferred than institutional ones, especially when a data search need was formulated for own research progress
"institutional" repositories were wished for "archiving" data (but also there not preferred). They almost always had special "origin" e.g. to: keep full control over access, (automatic) link to own publications, have central visibility, store high amount of data, have reliability, trustworthiness and similar properties
What could different players do to increase the FAIRness of data within their remit? How have research workflows actually been adapted in order to make data more FAIR? How much of this can be automated?
First: The publication of data is coming earlier in the scientific workflow. Often this is possible, although there are "fears". To work around that, I would suggest to give data the same credit as the publication it directly supports, if it is published within 2 years in front, even if it is not mentioned in the publication. That way even if another group overtakes the own "discovery" by steeling and "reproducing" the data, one could claim for the same discovery without writing a paper, if the own data is FAIR enough. (That is the idea in a nutshell). This incentive would wipe out most "sharing-fears" and FAIR-problems.
Second: Scientific workflows are built more robust, reproducible and standardised. The most difficult thing here is to transfer single developments over universities or domains. I have no full solution for that. What we try now is to establish a local professional network to support this and discuss with all new "data-projects" some aspects of further use from the start. But it also depends on the incentive to use work that matches the "NIH syndrome".
What training opportunities and career paths are needed to support researchers and other players in the research ecosystem with data management and sharing?
What improvements could be made to the current EC approach to DMPs?
How can DMPs become more integrated and machine-actionable?
The text was updated successfully, but these errors were encountered: