“Space,” it says, “is big. Really big. You just won’t believe how vastly hugely, mind-bogglingly big it is.”
— The Hitchhiker’s Guide to the Galaxy, Douglas Adams
Traditional Problems
Researchers love their data right up until the moment they are asked to share it. Despite funders’ requirements and the FAIR principles (Findable, Accessible, Interoperable, and Reusable), many researchers still hesitate to make their data publicly available. One major reason is fear: fear that others might use their data and scoop them with faster results. Given the significant time and effort researchers invest in collecting and processing data, they are understandably reluctant to let it go. Another obstacle is lack of knowledge about how to organize and describe metadata. Many researchers are also unfamiliar with the ethical and legal aspects of data processing, particularly when dealing with sensitive information. This creates uncertainty about how and where to store and share data – and what the benefits might actually be for their research. The lack of recognition doesn’t help either. Researchers rarely get credit for sharing data, and many view it as just another task adding to their workload and burnout. Taken together, these factors limit data sharing and reuse across the scientific community.
What are “Open Research Data”?
Open research data include datasets, protocols, materials, lab notebooks, software code, workflows, and other outputs – all aligned with the FAIR principles – that are freely accessible and ideally without further restrictions. The main aim is not simply to publish and make data transparent, but to make them usable: to provide clarity about how, where, and with what methods the data were collected so that others can check and validate results. This makes research reproducible, enabling other scientists to reuse and build upon open data. The payoff is more citations and new partnerships.
How Can Open Data Solve Traditional Problems?
If properly structured, open research data can help address many of researchers’ anxieties. Data, metadata, and collection protocols make the entire process visible and traceable, enabling faster reproduction of results. Data citation and links to ORCID extend research impact beyond publications and citations. Shared code or lab notebooks act as automated documentation, saving time and endless explanations. Trusted repositories help researchers structure their data according to FAIR principles, provide secure storage and access, and eliminate the risk of losing valuable datasets. Even the biggest fear – plagiarism – can be addressed with preregistration and DOI, which clearly establish researchers’ priority and ownership.
Practical Steps to Open Data the FAIR Way
FAIR is not just another trendy acronym. As the Bulgarian Open Science Portal notes, these are principles for describing and disseminating research data that underpin open science initiatives. According to FAIR principles, research resources should be:
- Findable – described with detailed metadata and persistent identifiers.
- Accessible – available to both humans and machines via standard protocols.
- Interoperable – using standard formats and metadata schemas, enabling exchange across systems.
- Reusable – with clear licenses and access rights.
They are, in short, a life-support system for valuable research data. More information on FAIR can be found on the website of GO FAIR.
Step 1: Think about data from the very beginning, not the end!
- Draft a Data Management Plan (DMP) using tools like DMPonline or ARGOS.
- Identify data types, formats, and any ethical or legal risks early.
Step 2: Choose the right repository!
- General repositories: Zenodo, Figshare, Dryad.
- Discipline-specific: PANGAEA (Earth sciences), GenBank (biology), OpenNeuro (neurosciences).
- Institutional: many universities and research organizations maintain their own.
- European: OpenAIRE, the European Open Science Cloud (EOSC).
Always ensure your data has persistent identifiers (DOIs) and clear licensing (e.g., CC0, CC-BY, ODC-BY for data).
Step 3: Apply FAIR principles!
- Add detailed metadata answering Who? What? When? How? and Why?
- Use open, non-proprietary formats where possible (.CSV instead of .XLSX, .TXT instead of .DOC).
- Employ controlled vocabularies or ontologies for clarity.
- Write documentation for humans and machines (.README files are your best friends).
Step 4: Handle sensitive data carefully!
- Use anonymization techniques.
- Apply restricted-access licenses when needed.
- Follow GDPR and institutional ethics policies!
- Use controlled-access repositories such as the European Genome-Phenome Archive (EGA).
Good Practices and EU Examples – Repositories
Open Data Portal (Bulgaria): access to public data in open, machine-readable formats.
Horizon Europe Programme: requires funded projects to make data open whenever possible; sensitive data may remain closed only when necessary. All data must be deposited in trusted repositories with DMPs included in reporting.
OpenAIRE: a mega-index linking data, publications, software, and projects worldwide – fully open and machine-readable.
EOSC (European Open Science Cloud): a federated infrastructure for sharing data across the EU – think of it as the Heart of Gold for research objects.
The examples of the Social Sciences and Humanities Open Cloud (SSHOC) and the European Life-Science Infrastructure for Biological Information (ELIXIR):
- SSHOC builds domain-specific open data workflows in the social sciences and humanities;
- ELIXIR maintains interoperable data infrastructures for life sciences.
These are more than repositories – they are ecosystems that sustain, connect, and reward your data long after your article is published.
Conclusion
Opening data is not always easy, but it is always worthwhile. It makes science verifiable, transparent, and reliable, while helping researchers structure their knowledge.
So – don’t panic. Just tidy your data, label your files clearly, and deposit your datasets in trusted repositories. Remember: the best place for your data is not a USB stick in the back of a desk drawer, but a galaxy-sized repository with its own DOI.
In short, open data is about turning your research from a closed galaxy into a shared, expanding universe.