How to Make a Great Open-Source Scientific Project

by Jonathon Vandezande · Sept 9, 2025

Noah's Ark - Nuremberg Chronicle (1493)

Noah's Ark from the Nuremberg Chronicle (1493)

Science today is run on many amazing open-source projects, from PyTorch to NumPy to Matplotlib. The continued flourishing of science is predicated on a healthy scientific-software ecosystem and the many contributions of scientists and developers worldwide. Unfortunately, it can be a bit daunting to start contributing to this ecosystem. We've put together this guide to help developers build great open-source scientific projects that can help scientists everywhere.

This is an opinionated guide to Python-based scientific projects, although we hope parts of this guide will be applicable to other languages as well. For those starting new projects, the pixi-cookiecutter provides a simple way to start projects that follow these guidelines, making it easy to write and maintain high-quality code. For those working on existing projects, we recommend focusing on the simplest additions first (e.g. using an autoformatter, having tests run when code is merged, and packaging code in a manner that is easy for others to use)—there's no need to change everything immediately.

If you are interested in developing new projects, we maintain a list of open-source projects that we wish existed. We'd be happy to help get you started developing these projects and provide advice on how best to develop them so that the packages are useful to the scientific community.

We've organized this guide around what we think the top eight attributes of good open-source projects are. Good packages are:

Minimal

Before the days of easily distributable libraries and good packaging practices, programs used to be monoliths, bundling a large number of different internally developed routines into a single program. Too many packages today are still beset by this Swiss-Army-knife mentality. Many QM codes bundle ERI calculations, functional and basis-set specifications, geometry optimization, and thermochemistry. This "vertically integrated" way of building software is fragile and unsustainable, as it requires the package owners to be experts in a large variety of things and prevents reuse across projects. (For instance, a new NNP might want to use a package's optimization code but not the ERI or basis-set code).

While there is still a need for applications that cobble many libraries together to achieve an end goal, such as finding the Gibbs free-energy barrier to a reaction, most applications should comprise many focused libraries so that the advances in each library can be shared across multiple applications. However, to achieve this vision, individual libraries must be highly focused and adhere to good programming practices so that the integration becomes simple and updating libraries doesn't break things.

Packaged

Good packages are packaged well. That means it's easy to install them from PyPI, conda-forge, or pyx, and it also means that they play nicely with other packages in a shared environment. Excessive version pinning is the enemy of good science. Lock files are useful to ensure that applications don't randomly break when a package updates, but if you are writing a library, it is important to have automated tests running against multiple versions of packages (e.g. stjames is currently tested with Python 3.11, 3.12, and 3.13).

We recommend using a modern package manager (e.g. pixi or uv) for setting up a clean environment with a lock file for repeatable builds (please don't use Conda environments). No more "it works on my machine"; proper package managers ensure that every developer is running the exact same version of every package, making it easy to pinpoint what packages are causing problems.

Clean

Science runs on Python these days, and we don't mean Python 2.7 or 3.8. Code should be written to run with the latest available version of Python (currently 3.13) and be:

These checks should be run on every commit (we use pre-commit) and push to the parent repo (we recommend whatever is integrated into to your version-control platform, e.g. GitHub Actions).

When setting up the package, it is important to separate the formatting, linting, testing, and training packages into a development environment that is different from the environment used for packaging (ideally a superset of the packaging environment). Nobody wants to have to download your formatting package just to run the library. If you're dealing with an ML model, please don't use all of your training pipeline and dependencies for test-time inference!

Tested

Tests serve the dual purpose of confirming that what you've built performs as expected, and that anything you changed didn't break other parts of the program. Tests can be broken down into:

All codebases should have unit tests (we recommend pytest) that run on PRs and merges to master to ensure that code changes don't break anything (again, we recommend whatever is integrated into your version-control platform, like GitHub Actions). If you have difficulty writing a unit test for a function, it is likely because your functions are too complicated or they cause side effects. In these cases you should either break your code into smaller functions and test them individually or do a better job of copying objects instead of mutating.

Your code coverage should be tracked (we recommend Codecov) to ensure that tests properly cover all parts of your codebase. Obviously, achieving 100% code coverage is a waste of time, but less than 70% often indicates that large portions are not tested.

Regression tests ensure that old bugs stay closed, and should be run regularly; the definition of regularly is highly dependent on how active the codebase is, and can be anything from nightly to monthly (regression tests typically take longer to run than unit tests, and thus running them on every merge is unfeasible for active codebases). They should ideally be linked with a bug database for more information about specific bugs when they inevitably fail.

The correct mix of other types of tests is package-specific. Integration tests are great for applications that bring together multiple packages and ensure that package updates don't break things. Smoke tests are quick tests that run through the basic actions of a package and can be useful when developing to make sure you avoided major mistakes. Fuzz tests ensure that malformed inputs don't cause everything to break and are great for packages that can receive user input.

Intuitive

This is a bit of an art, but functions should do simple, well-documented routines that are obvious from their name and documentation. Every programming language has its own idiomatic way to do things that has developed from careful crafting of the language (in Python this is referred to as being Pythonic).

Tips:

Know when to break the rules. Beginners should strive to carefully write Pythonic code, but advanced developers will know when the most Pythonic thing is to actually break the rules, as it will make the code significantly easier to use and understand ("practicality beats purity"). Knowing how to do this well comes with time.

Documented

Code should be well documented at the module and function level. All publicly available functions should have a docstring stating what the function does, with documentation of the arguments for all but the most trivial functions. We recommend enabling pydocstyle linting rules to help enforce a consistent style and formatting of documentation in your package. Be sparing with inline code comments, as they almost always indicate your code is overly complicated, too clever, or too long—and thus should probably be broken into multiple functions.

Larger packages should also have a dedicated documentation site with separate information on how to use the package.

Maintained and Versioned

Good packages are actively maintained to remove cruft, improve subroutines, and add new features. Of course, this means that things will change over time, and you may need to deprecate or rewrite whole sections of the code. To handle this, use semantic versioning for your package so that users can know when an update might break things. For established packages, it is highly recommended to add deprecation warnings well ahead of any changes.

Packages should also be compatible with the current versions of major packages and shouldn't arbitrarily prevent users from updating their code to the latest version of a package. Your code shouldn't require Python 3.8 or Torch 2.2.0. This is one of the big issues with academic code dumps: even if the code is technically present, it almost always has to be modified to integrate with modern well-maintained packages.

Actually Open Source

Academic-only licenses are popular these days but aren't "open-source" in the generally accepted meaning of the term. You can license your code however you want to, of course! But if you want anyone outside a university to use your code, it should be under a standard open-source license—probably the MIT license, unless you have a good reason to pick Apache 2.0, BSD, or another MIT-ish license. This is particularly important if you want your code to be integrated into other codebases. A tangled web of licenses will cause legal departments to block the use of the code.

The license should be placed in your repository (ideally in a file named LICENSE). This is the only meaningful license: any private communication or reinterpretation to allow broader use will usually not fly with legal departments.

Conclusion

At Rowan, we're interested in supporting open-source software across the chemical sciences. If you're an academic interested in converting a useful utility into a package that meets modern software standards, please reach out! We've worked with academic software developers before to help generate production-quality libraries.

And if you're interested in contributing to open science, take a look at our list of open needs for scientific libraries! We maintain a list of projects that we think would be useful for the ecosystem, and are happy to discuss this in more detail with any interested parties.

More resources on building great software:

Banner background image

What to Read Next

Open-Source Projects We Wish Existed

Open-Source Projects We Wish Existed

The lacunæ we've identified in computational chemistry and suggestions for future work.
Sep 9, 2025 · Corin Wagen, Jonathon Vandezande, and Ari Wagen
How to Make a Great Open-Source Scientific Project

How to Make a Great Open-Source Scientific Project

Guidelines for building great open-source scientific-software projects.
Sep 9, 2025 · Jonathon Vandezande
ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

ML Models for Aqueous Solubility, NNP-Predicted Redox Potentials, and More

the promise & peril of solubility prediction; our approach and models; pH-dependent solubility; testing NNPs for redox potentials; benchmarking opt. methods + NNPs; an FSM case study; intern farewell
Sep 5, 2025 · Eli Mann, Corin Wagen, and Ari Wagen
Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Machine-Learning Methods for pH-Dependent Aqueous-Solubility Prediction

Prediction of aqueous solubility for unseen organic molecules remains an outstanding and important challenge in computational drug design.
Sep 5, 2025 · Elias L. Mann, Corin C. Wagen
What Isaiah and Sawyer Learned This Summer

What Isaiah and Sawyer Learned This Summer

Reflections from our other two interns on their time at Rowan and what they learned.
Sep 5, 2025 · Isaiah Sippel and Sawyer VanZanten
Benchmarking OMol25-Trained Models on Experimental Reduction-Potential and Electron-Affinity Data

Benchmarking OMol25-Trained Models on Experimental Reduction-Potential and Electron-Affinity Data

We evaluate the ability of neural network potentials (NNPs) trained on OMol25 to predict experimental reduction-potential and electron-affinity values for a variety of main-group and organometallic species.
Sep 4, 2025 · Sawyer VanZanten, Corin C. Wagen
Which Optimizer Should You Use With NNPs?

Which Optimizer Should You Use With NNPs?

The results of optimizing 25 drug-like molecules with each combination of four optimizers (Sella, geomeTRIC, and ASE's implementations of FIRE and L-BFGS) and four NNPs (OrbMol, OMol25's eSEN Conserving Small, AIMNet2, and Egret-1) & GFN2-xTB.
Sep 4, 2025 · Ari Wagen and Corin Wagen
Double-Ended TS Search and the Invisible Work of Computer-Assisted Drug Design

Double-Ended TS Search and the Invisible Work of Computer-Assisted Drug Design

finding transition states; the freezing-string method; using Rowan to find cool transition states; discussing drug design
Sep 3, 2025 · Jonathon Vandezande, Ari Wagen, Spencer Schneider, and Corin Wagen
The Invisible Work of Computer-Assisted Drug Design

The Invisible Work of Computer-Assisted Drug Design

Everything that happens before the actual designing of drugs, and how Rowan tries to help.
Aug 28, 2025 · Corin Wagen
MSA, Protein–Ligand Binding Affinity Exploration, and Stereochemistry

MSA, Protein–Ligand Binding Affinity Exploration, and Stereochemistry

MSA-related occurrences and our incident postmortem; MSA server coming soon; exploring new approaches to binding-affinity prediction; a farewell to interns; a new stereochemistry lab
Aug 22, 2025 · Ari Wagen, Corin Wagen, and Spencer Schneider