How to Make a Great Open-Source Scientific Project

by Jonathon Vandezande · Sept 9, 2025

Noah's Ark - Nuremberg Chronicle (1493)

Noah's Ark from the Nuremberg Chronicle (1493)

Science today is run on many amazing open-source projects, from PyTorch to NumPy to Matplotlib. The continued flourishing of science is predicated on a healthy scientific-software ecosystem and the many contributions of scientists and developers worldwide. Unfortunately, it can be a bit daunting to start contributing to this ecosystem. We've put together this guide to help developers build great open-source scientific projects that can help scientists everywhere.

This is an opinionated guide to Python-based scientific projects, although we hope parts of this guide will be applicable to other languages as well. For those starting new projects, the pixi-cookiecutter provides a simple way to start projects that follow these guidelines, making it easy to write and maintain high-quality code. For those working on existing projects, we recommend focusing on the simplest additions first (e.g. using an autoformatter, having tests run when code is merged, and packaging code in a manner that is easy for others to use)—there's no need to change everything immediately.

If you are interested in developing new projects, we maintain a list of open-source projects that we wish existed. We'd be happy to help get you started developing these projects and provide advice on how best to develop them so that the packages are useful to the scientific community.

We've organized this guide around what we think the top eight attributes of good open-source projects are. Good packages are:

Minimal

Before the days of easily distributable libraries and good packaging practices, programs used to be monoliths, bundling a large number of different internally developed routines into a single program. Too many packages today are still beset by this Swiss-Army-knife mentality. Many QM codes bundle ERI calculations, functional and basis-set specifications, geometry optimization, and thermochemistry. This "vertically integrated" way of building software is fragile and unsustainable, as it requires the package owners to be experts in a large variety of things and prevents reuse across projects. (For instance, a new NNP might want to use a package's optimization code but not the ERI or basis-set code).

While there is still a need for applications that cobble many libraries together to achieve an end goal, such as finding the Gibbs free-energy barrier to a reaction, most applications should comprise many focused libraries so that the advances in each library can be shared across multiple applications. However, to achieve this vision, individual libraries must be highly focused and adhere to good programming practices so that the integration becomes simple and updating libraries doesn't break things.

Packaged

Good packages are packaged well. That means it's easy to install them from PyPI, conda-forge, or pyx, and it also means that they play nicely with other packages in a shared environment. Excessive version pinning is the enemy of good science. Lock files are useful to ensure that applications don't randomly break when a package updates, but if you are writing a library, it is important to have automated tests running against multiple versions of packages (e.g. stjames is currently tested with Python 3.11, 3.12, and 3.13).

We recommend using a modern package manager (e.g. pixi or uv) for setting up a clean environment with a lock file for repeatable builds (please don't use Conda environments). No more "it works on my machine"; proper package managers ensure that every developer is running the exact same version of every package, making it easy to pinpoint what packages are causing problems.

Clean

Science runs on Python these days, and we don't mean Python 2.7 or 3.8. Code should be written to run with the latest available version of Python (currently 3.13) and be:

properly formatted (we strongly recommend Ruff),
linted (we strongly recommend Ruff with the most common linting rules), and
type checked (we recommend mypy or ty).

These checks should be run on every commit (we use pre-commit) and push to the parent repo (we recommend whatever is integrated into to your version-control platform, e.g. GitHub Actions).

When setting up the package, it is important to separate the formatting, linting, testing, and training packages into a development environment that is different from the environment used for packaging (ideally a superset of the packaging environment). Nobody wants to have to download your formatting package just to run the library. If you're dealing with an ML model, please don't use all of your training pipeline and dependencies for test-time inference!

Tested

Tests serve the dual purpose of confirming that what you've built performs as expected, and that anything you changed didn't break other parts of the program. Tests can be broken down into:

doctests: quick tests in the documentation of a function showing how the function is expected to work
unit tests: tests that check that a single function works as expected
smoke tests: quick tests to check that basic functionality works (e.g. checks that the code isn't currently a dumpster fire)
integration tests: tests that confirm integration between packages or large sections of the codebase
regression tests: tests that confirm old bugs stay closed
fuzz tests: tests that bad inputs don't cause crashes

All codebases should have unit tests (we recommend pytest) that run on PRs and merges to master to ensure that code changes don't break anything (again, we recommend whatever is integrated into your version-control platform, like GitHub Actions). If you have difficulty writing a unit test for a function, it is likely because your functions are too complicated or they cause side effects. In these cases you should either break your code into smaller functions and test them individually or do a better job of copying objects instead of mutating.

Your code coverage should be tracked (we recommend Codecov) to ensure that tests properly cover all parts of your codebase. Obviously, achieving 100% code coverage is a waste of time, but less than 70% often indicates that large portions are not tested.

Regression tests ensure that old bugs stay closed, and should be run regularly; the definition of regularly is highly dependent on how active the codebase is, and can be anything from nightly to monthly (regression tests typically take longer to run than unit tests, and thus running them on every merge is unfeasible for active codebases). They should ideally be linked with a bug database for more information about specific bugs when they inevitably fail.

The correct mix of other types of tests is package-specific. Integration tests are great for applications that bring together multiple packages and ensure that package updates don't break things. Smoke tests are quick tests that run through the basic actions of a package and can be useful when developing to make sure you avoided major mistakes. Fuzz tests ensure that malformed inputs don't cause everything to break and are great for packages that can receive user input.

Intuitive

This is a bit of an art, but functions should do simple, well-documented routines that are obvious from their name and documentation. Every programming language has its own idiomatic way to do things that has developed from careful crafting of the language (in Python this is referred to as being Pythonic).

Tips:

Avoid using novel abbreviations in function arguments.
Avoid relying on multiple positional arguments and provide good keyword argument defaults.
Avoid mutating inputs, instead return copies that have been updated.
Use TypeAlias to give types meaning (e.g. tuple[NDArray[np.floating], NDArray[np.floating]] → PathResults).
Use TypedDict to ensure that the keys and values in dictionaries are the correct type.
When possible, use generic types for inputs (i.e. don't make everything a list[int] when it can more generally accept an Iterable[float]) and use specific types for outputs.

Know when to break the rules. Beginners should strive to carefully write Pythonic code, but advanced developers will know when the most Pythonic thing is to actually break the rules, as it will make the code significantly easier to use and understand ("practicality beats purity"). Knowing how to do this well comes with time.

Documented

Code should be well documented at the module and function level. All publicly available functions should have a docstring stating what the function does, with documentation of the arguments for all but the most trivial functions. We recommend enabling pydocstyle linting rules to help enforce a consistent style and formatting of documentation in your package. Be sparing with inline code comments, as they almost always indicate your code is overly complicated, too clever, or too long—and thus should probably be broken into multiple functions.

Larger packages should also have a dedicated documentation site with separate information on how to use the package.

Maintained and Versioned

Good packages are actively maintained to remove cruft, improve subroutines, and add new features. Of course, this means that things will change over time, and you may need to deprecate or rewrite whole sections of the code. To handle this, use semantic versioning for your package so that users can know when an update might break things. For established packages, it is highly recommended to add deprecation warnings well ahead of any changes.

Packages should also be compatible with the current versions of major packages and shouldn't arbitrarily prevent users from updating their code to the latest version of a package. Your code shouldn't require Python 3.8 or Torch 2.2.0. This is one of the big issues with academic code dumps: even if the code is technically present, it almost always has to be modified to integrate with modern well-maintained packages.

Actually Open Source

Academic-only licenses are popular these days but aren't "open-source" in the generally accepted meaning of the term. You can license your code however you want to, of course! But if you want anyone outside a university to use your code, it should be under a standard open-source license—probably the MIT license, unless you have a good reason to pick Apache 2.0, BSD, or another MIT-ish license. This is particularly important if you want your code to be integrated into other codebases. A tangled web of licenses will cause legal departments to block the use of the code.

The license should be placed in your repository (ideally in a file named LICENSE). This is the only meaningful license: any private communication or reinterpretation to allow broader use will usually not fly with legal departments.

Conclusion

At Rowan, we're interested in supporting open-source software across the chemical sciences. If you're an academic interested in converting a useful utility into a package that meets modern software standards, please reach out! We've worked with academic software developers before to help generate production-quality libraries.

And if you're interested in contributing to open science, take a look at our list of open needs for scientific libraries! We maintain a list of projects that we think would be useful for the ecosystem, and are happy to discuss this in more detail with any interested parties.

More resources on building great software: