The transient nature of the node anomaly makes it difficult to reproduce the fault. However, further investigations on the root cause are underway.
Input Output CEO Charles Hoskinson said the transient nature of the Cardano node anomaly makes it challenging to pin down the exact cause.
However, he added that the network handled the outage precisely as designed. Further, he praised the IO staff and Stake Pool Operators (SPOs) for rallying together to offset the issue.
“Sh*t always breaks Sunday morning, or Monday morning, late at night when everyone is sleeping. That’s just the way things work.
The long and short is that it seems to be a transient issue.”
50% of Cardano nodes suffered a brief outage
On Jan. 22, 50% of Cardano nodes suffered “a transient anomaly,” causing them to disconnect and then restart.
This affected block production for two to five minutes, causing the chain to fall out of sync briefly as the affected nodes restarted. There was a short period of network degradation, but this recovered through what Hoskinson called “self-healing.”
Initial investigations by IO turned up no apparent root cause. The IO CEO has since elaborated on this, saying the anomaly was likely down to multiple factors converging, meaning it would be difficult to reproduce the same conditions that led to the issue.
“It’s probably a collection of things that happened at the same time, which means the reproducibility is unlikely.”
Hoskinson expands on the details
In the course of further investigations, the call error was identified, but the triggering event has yet to be determined, said Hoskinson.
Expanding on this, Hoskinson blamed “emergent bugs,” adding that quirks of this type can sometimes arise in distributed systems due to their global framework.
“The problem is that distributed systems sometimes create what is called emergent bugs. So locally, it’s not reproducible, but a collection of things create a collective global state that, for some reason, triggers something and the whole system basically stops for some people.”
These “once in a five-year” glitches sometimes cannot be figured out. Sometimes they can be resolved, but “you never want to drive yourself nuts over it,” said Hoskinson.
A team is continuing to probe the issue, and a post-mortem will follow once more is known.