If… such a machine is a virtual impossibility, then it must logically be a finite improbability.
— The Hitchhiker’s Guide to the Galaxy, Douglas Adams
Traditional Problems
Modern science runs on code and large language models (LLMs). But when it comes to sharing that code – or disclosing what data the models were trained on – many researchers react as though they’ve just been asked to recite Vogon poetry in public.
Why? Often the code is a working draft: undocumented, messy, or tied to a researcher’s personal laptop. Researchers worry that sharing their software code will expose errors. It should not be underestimated that many researchers, who are not trained as software engineers, lack the skills, tools, or time to write clean, shareable code for their research.
There are also institutional barriers: lack of access to infrastructures or organizational policies that discourage code sharing.
As a result, a significant amount of research software resembles the “infinite improbability drive” – powerful, but nobody really knows how it works or where it came from.
What Is “Open Code and Software”?
Open research software means code that is used to generate or analyze research, which is transparent – anyone can read it, test it, or critique it; reusable – with documentation, examples, and permissive licensing; reproducible – allowing others to repeat the results of your research; improved through collaboration – others can fix errors, add features, or translate it into another language.
The broader definition of open source does not mean only access to the source code. As noted by the Open Source Initiative (OSI, 2024), “The distribution terms of open-source software must comply with the following criteria: free redistribution; the program must include source code and must allow distribution both in source code and in compiled form; integrity of the author’s source code; the license must allow modifications and derived works; it must not discriminate against persons or groups, nor against fields of endeavor; the rights attached to the program must apply to all; the license must not restrict other software, and the license must be technology-neutral.”
Strange as it may seem, science can have a code that is version-controlled, community-managed, bug-tracked, and properly licensed – instead of mysterious scripts emailed around with names like: analysis_final_v7_definitely_final_REAL_FINAL_FINAL
How Can Open Code and Software Solve Traditional Problems?
Open code is not just about sharing – it’s a better way to create software for science. The infamous syndrome of “it only works on my laptop” can be solved through open software by using version control and containerization (e.g., Docker), making code portable and reproducible.
The problem of readability is solved through structured documentation, such as .README files aligned with community standards. This turns so-called “spaghetti code” into a reusable tool. When code is buggy, issue tracking and peer review on platforms like GitHub allow many eyes to test and fix errors far more quickly.
Researchers can even gain academic credit for sharing software by making it citable, for example with a DOI minted through Zenodo.
Collaborative development not only improves, extends, and maintains the code, but also builds researchers’ skills and capacities along the way.
Practical Steps for Using Open Code and Software
Step 1: Control versions of your software code from the very beginning, and as often as possible!
- Use Git + platforms like GitHub or GitLab from the start of writing code, not just at the end. Commit your changes, track their history!
Step 2: Choose the right license!
- Licensing tells users what they can and cannot do with your software. Use choosealicense.com to find the most suitable license for your work.
Step 3: Document everything!
- Include a README.md file so users know how to run the software.
- Use docstrings and comments.
- Add usage examples or a small test dataset.
Step 4: Package and publish!
- Use tools like setuptools, conda, or pip to make the software easy to install.
- Create a DOI with Zenodo, linked to your GitHub repository.
- Register your code in community repositories such as:
bio.tools (life sciences)
PyPI (Python)
CRAN (R)
Software Heritage
Step 5: Contribute and acknowledge community contributions!
- Encourage issue reporting.
- Include your ORCID identifier in the repository.
- Add citation information for your software using CITATION.cff or codemeta.json.
Good Practices and EU Examples
EOSC
EOSC supports open software as a key component of its data and research infrastructure, promoting FAIR principles for software and shared repositories.
ELIXIR
In the life sciences, ELIXIR encourages sharing tools with standardized metadata, tests, and containers, leading to a more interoperable and discoverable code.
- Projects funded under Horizon Europe must ensure that research software is:
- Open, wherever possible.
- Documented and reusable.
- Linked with persistent identifiers.
And ideally, compliant with FAIR4RS principles (Findable, Accessible, Interoperable, and Reusable Research Software).
CodeMeta and Citation File Format (CFF)
The CodeMeta project encourages researchers to include machine-readable metadata for software contributors, authorship, versions, and usage.
SSI
The SSI provides templates, guidelines, and community training in best practices for research software engineering across disciplines.
Conclusion
It is sometimes difficult for researchers to accept that they cannot solve all problems alone. But open code is not like the Ultimate Answer “42” – it is not perfect. It must be clear, developed collaboratively with the community, and designed for continuity. By opening their research software, scientists make science transparent, verifiable, and usable.
So, if researchers keep version control on their scripts, include .README files, specify a license, and share it, the Galaxy – or at least the academic community – will be grateful.