Aptos Innovative Shoal Framework: Bringing 40-80% latency optimization to Bullshark Consensus

2025-07-14 22:27:44

Shoal Framework: Optimizing Bullshark Consensus Latency on Aptos

Aptos Labs has recently addressed two important open problems in DAG BFT, significantly reducing latency, and for the first time eliminated the need for timeouts in deterministic practical protocols. Overall, the latency of Bullshark has been improved by 40% in the absence of faults and by 80% in the presence of faults.

Shoal is a framework that enhances the Narwhal-based Consensus protocol ( through pipeline processing and a leader reputation mechanism, such as DAG-Rider, Tusk, and Bullshark ). The pipeline reduces DAG sorting latency by introducing an anchor point in each round, and the leader reputation further improves latency issues by ensuring that the anchor point is associated with the fastest validating nodes. Additionally, the leader reputation allows Shoal to leverage asynchronous DAG construction to eliminate timeouts in all scenarios. This enables Shoal to provide universally responsive characteristics, including what is typically required for optimistic responses.

The technology of Shoal is relatively simple, mainly running multiple instances of the underlying protocol in sequence one after another. When instantiated with Bullshark, it forms a group of "sharks" running a relay race.

Background and Motivation

In the pursuit of high performance in blockchain networks, reducing communication complexity has always been a focus. However, this approach has not resulted in a significant increase in throughput. For example, the Hotstuff implemented in early versions of Diem only achieved 3500 TPS, far below the target of 100k+ TPS.

The recent breakthrough stems from the realization that data dissemination is the main bottleneck based on the leader's protocol, which can benefit from parallelization. The Narwhal system separates data dissemination from core consensus logic, proposing an architecture where all validators disseminate data simultaneously, while the consensus component only orders a small amount of metadata. The Narwhal paper reports a throughput of 160,000 TPS.

Aptos previously introduced Quorum Store, which is the implementation of Narwhal, separating data propagation from Consensus, and how to use it to scale the current Consensus protocol Jolteon. Jolteon is a leader-based protocol that combines Tendermint's linear fast path and PBFT-style view changes, reducing Hotstuff latency by 33%. However, it is clear that leader-based Consensus protocols cannot fully leverage Narwhal's throughput potential. Despite separating data propagation from Consensus, as throughput increases, the leaders of Hotstuff/Jolteon are still constrained.

Therefore, Aptos has decided to deploy Bullshark, a zero-communication-overhead consensus protocol, on top of the Narwhal DAG. Unfortunately, the DAG structure that supports high throughput for Bullshark incurs a 50% latency cost compared to Jolteon.

DAG-BFT Background

Each vertex in the Narwhal DAG is associated with a round. To enter round r, a validator must first obtain n-f vertices belonging to round r-1. Each validator can broadcast one vertex per round, and each vertex must reference at least n-f vertices from the previous round. Due to the asynchronous nature of the network, different validators may observe different local views of the DAG at any point in time.

A key property of DAG is non-ambiguity: if two validating nodes have the same vertex v in their local DAG views, then they have exactly the same causal history of v.

Preface

The total order of all vertices in the DAG can be agreed upon without additional communication overhead. To this end, the validators in DAG-Rider, Tusk, and Bullshark interpret the structure of the DAG as a Consensus protocol, where vertices represent proposals and edges represent votes.

Although the logic of group intersection in DAG structure is different, all existing Narwhal-based Consensus protocols have the following structure:

Anchor point: Every few rounds (, for example, there will be a predetermined leader in Bullshark every two rounds ), and the leader's peak is called the anchor point.
Sorting Anchor Points: Validators independently but deterministically decide which anchor points to sort and which anchor points to skip.
Causal History Ordering: Validators process an ordered list of anchor points one after another, sorting all previously unordered vertices in their causal history for each anchor point according to deterministic rules.

The key to ensuring security is to ensure that in step 2, all honest validating nodes create an ordered anchor point list so that all lists share the same prefix. In Shoal, the following observations can be made regarding all the protocols mentioned above: all validators agree on the first ordered anchor point.

Bullshark latency

The latency of Bullshark depends on the number of rounds between the ordered anchors in the DAG. Although the synchronous version of Bullshark is more practical and has better latency than the asynchronous version, it is still far from optimal.

There are two main issues:

Average block latency: In Bullshark, each even round has an anchor point, and the vertices of each odd round are interpreted as votes. In common cases, two rounds of DAG are needed to sort the anchor points; however, the vertices in the causal history of the anchor points require more rounds to wait for the anchor points to be sorted. In common cases, the vertices in the odd rounds need three rounds, while the non-anchor vertices in the even rounds need four rounds.
Fault Case Latency: If a round's leader fails to broadcast the anchor point quickly enough, the anchor point cannot be sorted ( and is thus skipped ). All unsorted vertices from previous rounds must wait for the next anchor point to be sorted. This significantly reduces the performance of the geographical replication network, especially since Bullshark uses timeouts to wait for the leader.

Shoal Framework

Shoal enhances the Bullshark( or any other Narwhal-based BFT protocol) through pipeline processing, allowing for an anchor point in each round and reducing the latency of all non-anchor vertices in the DAG to three rounds. Shoal also introduces a zero-cost leader reputation mechanism in the DAG, biasing the selection towards fast leaders.

Challenge

In the context of the DAG protocol, pipeline processing and leader reputation are considered difficult issues for the following reasons:

Previous attempts to modify the core Bullshark logic in the pipeline processing seem to be impossible in essence.
The introduction of leader reputation in DiemBFT and its formalization in Carousel is the idea of dynamically selecting future leaders based on the past performance of validators, specifically in ( Bullshark's anchor ). While the existence of disagreement on leader identity does not violate the security of these protocols, in Bullshark, it may lead to entirely different orderings. This raises the core issue that dynamically and deterministically selecting round anchors is necessary for achieving Consensus, and validators need to reach consensus on the ordered history to select future anchors.

Protocol

Despite the challenges mentioned above, the solution lies in simplicity. In Shoal, the capability to perform local computations on the DAG is utilized to preserve and reinterpret information from previous rounds. With the core insight that all validators agree on the first ordered anchor point, Shoal sequentially combines multiple Bullshark instances for pipelining, enabling:

The first ordered anchor point is the switching point of the instance.
The causal history of the anchor point is used to calculate the leader's reputation.

Pipeline processing

V that maps rounds to leaders. Shoal runs instances of Bullshark one after another, so for each instance, the anchor is predetermined by the mapping F. Each instance sorts an anchor, which triggers a switch to the next instance.

Initially, Shoal launched the first instance of Bullshark in the first round of the DAG and ran it until the first ordered anchor point was determined, such as in round r. All validators agreed on this anchor point. Therefore, all validators could confidently agree to reinterpret the DAG starting from round r+1. Shoal simply launched a new instance of Bullshark in round r+1.

In the best-case scenario, this allows Shoal to rank an anchor in each round. The anchor points in the first round are sorted by the first instance. Then, Shoal starts a new instance in the second round, which has an anchor point that is sorted by that instance, and then another new instance ranks anchor points in the third round, and the process continues.

Leader Reputation

During the Bullshark sorting, skipping anchor points increases latency. In this case, pipeline processing techniques are powerless because a new instance cannot be started before the previous instance sorts the anchor point. Shoal ensures that the corresponding leader is less likely to be selected in the future to handle the missing anchor points by assigning a score to each validator based on the historical activity of each validation node's most recent activity using a reputation mechanism. Validators that respond and participate in the protocol will receive high scores; otherwise, validation nodes will be assigned low scores because they may crash, be slow, or act maliciously.

The concept is to deterministically recalculate the predefined mapping F from rounds to leaders during each score update, favoring leaders with higher scores. In order for validators to reach consensus on the new mapping, they should reach consensus on the scores, thus achieving consensus on the history used to derive scores.

In Shoal, pipeline processing and leading reputation can naturally combine, as they both use the same core technology, which is to reinterpret the DAG after reaching consensus on the first ordered anchor point.

In fact, the only difference is that after sorting the anchor points in the r-th round, the validators only need to calculate the new mapping F' from the (r+1)-th round based on the causal history of the ordered anchor points in the r-th round. Then, the validating nodes start executing new instances of Bullshark using the updated anchor point selection function F' from the (r+1)-th round.

No timeout required

Timeout plays a critical role in all leader-based deterministic partially synchronous BFT implementations. However, the complexity they introduce increases the number of internal states that need to be managed and observed, which adds to the complexity of the debugging process and requires more observability techniques.

Timeouts can also significantly increase latency, as it is very important to configure them properly, and they often require dynamic adjustments, as they are highly dependent on the environment ( network ). Before moving to the next leader, the protocol pays the full timeout latency penalty for the faulty leader. Therefore, the timeout settings cannot be too conservative, but if the timeout duration is too short, the protocol may skip good leaders.

Unfortunately, leader-based protocols like Hotstuff and Jolteon ( essentially require timeouts to ensure that the protocol makes progress whenever a leader fails. Without timeouts, even a crashed leader could potentially halt the protocol forever. Since it is impossible to distinguish between a faulty leader and a slow leader during asynchronous periods, timeouts may lead to validating nodes viewing changes to all leaders without consensus activity.

In Bullshark, latency is used for DAG construction to ensure that honest leaders add anchors to the DA during synchronization.

APT4.48%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

10 Likes