Co-Folding Failures, Our Response, and Rowan-Hosted MSA

by Ari Wagen · Aug 22, 2025

An illustration of the Parable of the Wise Builder and the Foolish Builder

The House Upon the Rock by Jan Luyken

At Rowan, we are in the final stages of deploying our own ColabFold server for both Rowan users using our web application and API and other computational scientists. If you would like access to our cloud MSA server, please fill out this form.

What follows is a detailed play-by-play narrative and postmortem examining a rise in MSA failures that caused protein co-folding job failures, both from our perspective and from what we can piece together from public sources, and some reflections about what this means for the field.

Co-Folding Jobs Start Failing

Saturday, August 16, 2025 at 7:37pm. I was at a friend's house in Arlington, MA playing cards when I got a notification on my phone from a Rowan user alerting me to a pattern of Boltz-2 co-folding failures—the user wrote that Boltz-2 was failing "reproducibly," but the inputs he shared all seemed normal at a glance.

Sunday, August 17, 2025 at 12:11am. In an attempt to understand what problem our user was facing, I ran a series of co-folding jobs with different protein sequences, numbers of tokens, and settings. To my surprise, all of them failed relatively quickly. I escalated the problem to our engineering team, writing:

I think we might be getting rate-limited by KOBIC [the MSA server at api.colabfold.com], causing Boltz-2 jobs to fail: [link to jobs I ran]
Even the most simple jobs aren't completing

I also checked our job failure rate dashboard; over the last 24 hours, roughly 50% of the protein–ligand co-folding jobs submitted through Rowan had failed. This job failure rate was anomalous.

Our director of engineering wrote back 12 minutes later with a string of messages:

Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.

error message doesn't suggest rate limit at least. It could be but unclear

Shud [sic] probably figure out a more robust way to do MSA regardless

I logged some to-dos from our conversation and went to sleep.

Sunday, August 17, 2025 at 5:31pm. I was at home cooking dinner when a second Rowan user submitted a ticket reporting Boltz-2 failures. Shortly after, Corin messaged our engineering channel:

Let's do a detailed investigation tomorrow morning and refund credits and email affected users

Our First Response and Credit Refunds

Monday, August 18, 2025 at 8:59am. In the discussion section of our meeting, we talked about the MSA-related errors and came up with a plan to start addressing the issue. We determined that we would take the following near-term steps:

  1. Determine the impact of MSA-related errors over the weekend. Alert and refund any affected users for their failed co-folding jobs.
  2. Add a warning to the co-folding submit page letting users know about the high job failure rate.
  3. Determine whether or not Boltz-2 could be run without MSA inputs, and enable Boltz-2 to be run through Rowan without MSA if so.

We also decided to prioritize exploring a more permanent set of MSA solutions, including standing up an internal version of the ColabFold server and caching MSA query outputs for reuse.

Monday, August 18, 2025 at 2:28pm. That morning and early afternoon, we added a warning to Rowan's co-folding submit page, tested and turned on MSA-free Boltz-2 runs, and compiled a list of affected users. Between 2:28pm and 3:31pm, I emailed each affected user, sending variations on the following message:

hi [user],

I'm emailing to let you know that we've refunded your Rowan account [123] credits on account of co-folding job failures.

Starting on August 16, 2025, the MSA server at https://api.colabfold.com/ (hosted by the Korean Bioinformation Center) has been intermittently failing with vague errors, causing ~50% of Boltz-2 jobs submitted through Rowan to fail. (This issue is affecting all Boltz-2 users, not just Rowan users.)

Between the beginning of the impacted period and now, you've had [12] jobs fail, which consumed [123] credits. We've added [123] credits back to your account.

While we work to improve the stability of MSA queries, the submit page will display a warning message about this issue. We've also added support for running Boltz-2 jobs without MSA: to run co-folding without MSA, deselect the "Use MSA Server?" toggle. We are actively working on longer-term improvements to co-folding stability.

If you have any questions, feel free to respond to this email.

best, Ari

Co-Folding Failure Rates on Rowan

From our internal aggregated job data, we've seen that protein–ligand co-folding job failures have not yet returned to pre–August 16 levels:

DateCo-Folding Failure Rate
Monday, August 111.2%
Tuesday, August 124.1%
Wednesday, August 131.3%
Thursday, August 144.7%
Friday, August 1510.1%
Saturday, August 1651.6%
Sunday, August 1767.6%
Monday, August 1848.8%
Tuesday, August 1923.0%
Wednesday, August 2016.8%

Additionally, we've noticed that our users are submitting fewer co-folding jobs than before we added the warning to our submit page.


Before continuing, I'd like to look briefly at what MSA is, why it matters, and how it's run.

What MSA Is and Why It Matters

Multiple sequence alignment (MSA) is the process of aligning a set of protein sequences to identify regions of similarity. In this context, "similar" means that they share subsequences, or substrings, of amino acids. The motivation behind MSA is often described using evolutionary logic; the story goes something like this:

Long ago, there were fewer organisms and fewer proteins. Over time, millions of mutations happened, and deleterious mutations were lost to the wind. However, some mutations were able to improve or modify a protein's function, and these proteins have persisted. Proteins with a common ancestor protein are likely to have similar secondary and tertiary structures, so data from relatives can help improve protein folding and co-folding algorithms.

The extra context that MSA adds to the problem of protein folding has made it a cornerstone of computational biology: AlphaFold, AlphaFold2, Boltz-1, Chai-1, and Boltz-2 have all relied on MSA to make high-quality structural predictions. MSA queries are run by default on protein sequences whenever a user submits a co-folding job using any code or platform.

How People Run MSA

State-of-the-art MSA generally relies on MMseqs2 (Steinegger and Söding [2017]) run through ColabFold (Mirdita et al. [2022]). With these modern tools, MSA and protein folding are theoretically within the reach of anyone with a PC—so why do places like Rowan, Neurosnap, Oxford, CINES, and the Boltz code itself all use the same public API?

While setting up any server can be a pain, ColabFold servers in particular require nearly 1 terabyte of data to run standard MSA queries. Nevertheless, a computer with a terabyte of storage can be purchased for under $1,000–2,000, so this shouldn't be out of reach for university labs and companies.

The real issue here is memory requirement. Efficiently running MSA queries requires a lot of memory. To run the alignment process, many protein sequences have to be loaded from storage into a computer's memory. If that computer has very little memory, then there's a lot of back and forth between the storage and the memory, making queries take as long as hours on consumer hardware.

Because of this, many teams have chosen to rely on the free ColabFold server at api.colabfold.com, which is hosted by the Steinegger lab at the Korean Bioinformation Center, or KOBIC (references: 1a0b670, d9adee4, 04a1791, 16a09e1).

OK, back to the story.


Other Reports of MSA Failures

Wednesday, August 6. On the ColabFold GitHub, user TuganBasaran opened an issue saying:

Hi,

I'm trying to use Alpafold 2 Batch on colab and run a prediction of a fasta file and at the moment it has been pending for almost 5 hours for a single sequence. I have no idea why does it happen. I tried to restart session, changed the GPU 2-3 times and currently it is GPU is set to A100.

Is there a way to fix this problem?

sirius777coder, a second coder, replied to report having a similar issue.

Friday, August 15. On the ColabFold GitHub, sirius777coder opened an issue saying:

Hi, I submitted several jobs that remain pending after predicting 20 sequences, and the status has been stuck for a day. Could you check the current MSA server?

Monday, August 18 at 2:35am. On the Boltz community Slack, a member reported encountering errors with MSA step in their Boltz-2 pipeline, saying:

Hi Everyone, I managed to run few predictions, but as of today inference fails with: boltz/data/msa/mmseqs2.py", line 215, in run_mmseqs2. But they were working three days back. Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.

At 4:44am, a second memeber responded: "I am also having the same issue since Friday (15th August)…"

Monday, August 18. GitHub user Jacoberts reported receiving a 403 Forbidden Error from the ColabFold API.

Tuesday, August 19 at 5:26am. Responding to the previous Boltz-2 Slack thread, a third member asked "is it working for you? it's still not working fo rme [sic]."

Wednesday, August 20 at 5:51am. Another Boltz-2 Slack member reported having issues with ColabFold as well, writing:

Hi everyone, I am unable create a GitHub issue in the below repository.

https://github.com/sokrypton/ColabFold/issues

Can someone please help me how to get collaborator on this repo? So that I can create an issue. Thank you in advance!!

What Happened to api.colabfold.com

From the responses to a number of GitHub issues, we can start to understand why we and so many other Boltz-2 users have been experiencing difficulties using the api.colabfold.com API this past week.

Last week, user milot-mirdita (a postdoc in Steinegger's lab) responded to ColabFold issue #759, saying:

I just killed the whole queue, someone filled the queue with 2500 jobs :/

Sorry about that. Please resubmit your jobs.

We will need to implement some fairer prioritization mechanism and rethink how this job queue works

This issue was opened and close before this week, and it seems like the impact was small enough to fly under our radar at Rowan, but we can see retrospectively that the global demand for co-folding and MSA queries was already putting a strain on the api.colabfold.com computers.

Over the weekend, in response to ColabFold issue #763, Prof. Steinegger wrote, "https://api.colabfold.com/queue It looks like the queue is crazy full. Somebody is probably running a big job :/," and later "We restarted the server and blocked some user groups. I hope it is working better now." sirius777coder reported that "my task is running normally now!", and the issue was closed.

This second issue adds another data point to our understanding. A surge in MSA queries was putting Steinegger's lab into a "whac-a-mole" problem-solving mode: manually clearing the queue and blacklisting users from accessing the API.

The most complete information we have is from issue #764, which was opened on Monday. In response to this issue, Milot wrote:

I guess you became collateral damage of someone starting a huge job over the weekend and totally flooding the server. They did that in such a way that it was difficult for the normal rate limiting mechanisms to take deal with it, so I had to take a sledgehammer approach.

Please send me (either here or per email) your ip address/range and I'll enable access again.

And:

I changed the blocks to target cloud IPs at finer granularity. I hope you are not blocked anymore.

After three more users replied with similar issues, Milot wrote a comment on Wednesday, August 20, saying:

Please generate MSA's locally and run boltz [sic] on the locally generated MSAs. We currently cannot support the load from batch cloud runs.

Coming Soon: A Rowan-Hosted MSA Server

The progress in biomolecular structure prediction over the past two decades is one of the big success stories of deep learning and has transformed the way many teams approach science.

As demand for running methods like Boltz-2, AlphaFold2, and Chai-1 continues to grow, our team finds it unlikely that academic labs will continue to be able and willing to host reliable, public web servers for commercial entities to run expensive MSA queries at library scale.

To shift the MSA load generated by Rowan users off of the Steinegger lab's MSA web server as well as to provide any blacklisted groups with an alternative option to self-hosting a machine with very onerous memory requirements, we will be standing up a Rowan-hosted instance of the ColabFold server. We are currently in the final stages of testing and optimizing this. If you would like access to this server, please complete this form.

As soon as we have our server tested, stable, and deployed for production use, we will shift the MSA queries being made by Rowan users over to our server. We plan to implement a priority queuing system to prioritize MSA queries made by users of Rowan's web application and free-to-use Python API before running MSA queries made by others with access to the server. We also plan to explore opportunities to save compute cost and optimize job time by automatically caching MSA queries made through the Rowan web application and Python API where applicable.

I'm very sorry for the unforeseen sharp increase in job failure rates that Rowan users experienced over the weekend, and I invite any feedback about our incident handling, response, and communication to contact@rowansci.com.

Banner background image

What to Read Next

MSA, Protein–Ligand Binding Affinity Exploration, and Stereochemistry

MSA, Protein–Ligand Binding Affinity Exploration, and Stereochemistry

MSA-related occurrences and our incident postmortem; MSA server coming soon; exploring new approaches to binding-affinity prediction; a farewell to interns; a new stereochemistry lab
Aug 22, 2025 · Ari Wagen, Corin Wagen, and Spencer Schneider
Co-Folding Failures, Our Response, and Rowan-Hosted MSA

Co-Folding Failures, Our Response, and Rowan-Hosted MSA

A narrative account of our response to a sudden rise in protein–ligand co-folding failures.
Aug 22, 2025 · Ari Wagen
Exploring Protein–Ligand Binding-Affinity Prediction

Exploring Protein–Ligand Binding-Affinity Prediction

Trying a few modern ML-based approaches for predicting protein–ligand binding affinity.
Aug 20, 2025 · Ishaan Ganti
What Ishaan and Vedant Learned This Summer

What Ishaan and Vedant Learned This Summer

Reflections from two of our interns on their time at Rowan and a few things they learned.
Aug 15, 2025 · Ishaan Ganti and Vedant Nilabh
Projects: Organization, Sharing, and Saving Structures

Projects: Organization, Sharing, and Saving Structures

better organization through projects; saving structures; usage tracking; new conf. search features; second-order SCF; ex. API repo; SMILES imports; a guide to the pKa-perplexed; our inaugural demo day
Aug 14, 2025 · Ari Wagen, Spencer Schneider, Corin Wagen, and Jonathon Vandezande
Macroscopic and Microscopic pKa

Macroscopic and Microscopic pKa

Two different ways to calculate acidity, what they mean, and when to use them.
Aug 11, 2025 · Corin Wagen
Computational Chemistry in the Classroom

Computational Chemistry in the Classroom

chemical modeling; Diels–Alder; call for more labs
Jul 31, 2025 · Jonathon Vandezande and Isaiah Sippel
Modeling Thia-Michael Reactions

Modeling Thia-Michael Reactions

In which the addition of a thiolate to an enone proves to be unexpectedly difficult to model.
Jul 25, 2025 · Corin Wagen
API v2, New BDE Methods, MCP, And More

API v2, New BDE Methods, MCP, And More

new API philosophy; streamlined interfaces for workflows; using NNPs and g-xTB to predict bond strength; an MCP server; .sdf files; benchmarking protein–ligand interactions; Diels–Alder visualizations
Jul 21, 2025 · Spencer Schneider, Corin Wagen, Ari Wagen, Jonathon Vandezande, Ishaan Ganti, and Isaiah Sippel
ExpBDE54: A Slim Experimental Benchmark for Exploring the Pareto Frontier of Bond-Dissociation-Enthalpy-Prediction Methods

ExpBDE54: A Slim Experimental Benchmark for Exploring the Pareto Frontier of Bond-Dissociation-Enthalpy-Prediction Methods

ExpBDE54 is a benchmark dataset of experimental homolytic bond-dissociation enthalpies (BDEs) for 54 small molecules, used for benchmarking DFT, semiempirical methods, and NNPs.
Jul 17, 2025 · Jonathon E. Vandezande, Corin C. Wagen