Blockchain Indexer: The Key Infrastructure for Building Efficient dApps

Evolution of Blockchain Data Access: Introduction to Indexers and Related Projects

Data is the core of Blockchain technology and is the foundation for developing decentralized applications. Current discussions are mostly focused on data availability (DA), which ensures that network participants can access the latest transaction data for verification. However, the equally important aspect of data accessibility is often overlooked.

In the era of modular Blockchain, DA solutions have become a necessity. They ensure that all participants can access transaction data, achieve real-time verification, and maintain network integrity. However, the DA layer is more like a billboard than a database; data is not stored indefinitely and will be deleted over time.

In contrast, data accessibility focuses on the ability to retrieve historical data, which is crucial for developing dApps and conducting Blockchain analysis. Although it is discussed less, it is equally important as data availability. Both play a complementary role in the Blockchain ecosystem, and comprehensive data management must address both issues simultaneously to support robust and efficient Blockchain applications.

Since its inception, Blockchain has completely transformed infrastructure, driving the creation of dApps in areas such as gaming, finance, and social networks. However, building these dApps requires access to a large amount of Blockchain data, which is both difficult and expensive.

For dApp developers, one option is to host and run their own archival RPC nodes. These nodes store all historical Blockchain data, allowing for full access. However, the maintenance costs are high and query capabilities are limited. Running cheaper nodes is another option, but data retrieval capabilities are limited, which may hinder dApp operation.

Another method is to use commercial RPC node providers. They are responsible for node costs and management, providing data through RPC endpoints. Public RPC endpoints are free but have rate limits, which may affect user experience. Private RPC endpoints perform better, but even simple data retrieval requires a lot of communication, making it inefficient and difficult to scale.

Blockchain indexers play a key role in organizing chain data and storing it in databases for querying, hence they are referred to as "the Google of Blockchain." They index blockchain data, making it accessible through a SQL-like query language. Indexers provide a unified interface, allowing developers to quickly and accurately retrieve information using standardized query language, greatly simplifying the process.

Different types of indexers optimize data retrieval:

  1. Full Node Indexer: Runs a full Blockchain node to directly extract data, ensuring completeness and accuracy, but requires a large amount of storage and processing power.

  2. Lightweight Indexer: Relies on full nodes to fetch specific data on demand, reducing storage requirements but potentially increasing query time.

  3. Dedicated Indexer: Optimized for specific use case retrieval for certain types of data or Blockchain, such as NFT data or DeFi transactions.

  4. Aggregated Indexer: Extracts data from multiple blockchains and sources, including off-chain information, providing a unified query interface, particularly useful for multi-chain dApps.

Ethereum alone requires 3TB of storage space, which continues to grow. The indexer protocol deploys multiple indexers to efficiently index and query large amounts of data at high speed, something that RPC cannot achieve.

Indexers also allow for complex queries, easy data filtering, and post-analysis extraction. Some indexers can aggregate data from multiple sources, avoiding the need for multi-chain dApp deployments with multiple APIs. By being distributed across multiple nodes, indexers provide enhanced security and performance, while RPC providers may experience interruptions due to centralized characteristics.

Overall, compared to RPC node providers, indexers improve the efficiency and reliability of data retrieval while reducing the cost of deploying a single node. This makes the Blockchain indexer protocol the preferred choice for dApp developers.

The Development of Web3 Data Access: Introduction to Indexers and Related Projects

Building a dApp requires retrieving and reading Blockchain data to operate services. This includes DeFi, NFT platforms, games, and even social networks, as these platforms need to read data first before executing other transactions.

DeFi protocols require different information to provide users with prices, ratios, and fees. AMMs need price and liquidity information to calculate swap rates, while lending protocols need utilization to determine borrowing rates and liquidation ratios. It is essential to input information into the dApp before calculating the execution rates for users.

GameFi requires fast indexing and access to data to ensure a smooth gaming experience for users. Only through quick data retrieval and execution can Web3 games compete with Web2 games in terms of performance and attract more users. These games need data such as land ownership, token balances, and in-game operations. Using an indexer can better ensure stable data flow and uptime, guaranteeing a perfect gaming experience.

NFT markets and lending platforms need to index data to access various information, such as NFT metadata, ownership and transfer data, royalty information, etc. Quickly indexing such data can avoid browsing each NFT one by one to find ownership or attribute data.

Whether it's DeFi AMMs that require price and liquidity information or SocialFi applications that need to update new user posts, quickly retrieving data is crucial for the normal operation of dApps. With the help of indexers, they can efficiently and accurately retrieve data, providing a smooth user experience.

The indexer provides a method to extract specific data from the raw Blockchain data (, including smart contract events in each Block, offering opportunities for more detailed data analysis and thereby providing comprehensive insights.

For example, perpetual trading protocols can identify which tokens have high trading volumes and generate fees, thereby deciding whether to launch them as perpetual contracts. DEX developers can create dashboards to gain insights into which liquidity pools have the highest returns or the strongest liquidity. They can also create public dashboards, allowing developers to freely and flexibly query any type of data to display on charts.

With multiple blockchain indexers available, identifying the differences between indexing protocols is crucial to ensure that developers choose the indexer that best fits their needs.

The Graph is the first indexing protocol launched on Ethereum, allowing easy queries of transaction data that was previously difficult to access. It uses subgraphs to define and filter subsets of data collected from the blockchain, such as all transactions related to a specific transaction pool.

Using index proof, indexers stake GRT tokens for indexing and query services, and delegators can choose to stake tokens. Curators can access high-quality subgraphs, helping indexers determine which subgraphs to compile data for to earn the best query fees. The Graph is transitioning towards greater decentralization and will ultimately stop hosting services, requiring subgraphs to upgrade to its network.

Its infrastructure allows the average cost per million queries to reach $40, which is much lower than the cost of self-hosted nodes. It also supports parallel indexing of both on-chain and off-chain data, enabling efficient data retrieval.

The rewards for The Graph's indexers have steadily increased over the past few quarters, partly due to the increase in query volume and also attributed to the rise in token prices, as they plan to incorporate AI-assisted queries in the future.

![Development of Web3 Data Access: Introduction to Indexers and Related Projects])https://img-cdn.gateio.im/webp-social/moments-16396b955382c2c74010c264affdca46.webp(

Subsquid is a peer-to-peer, horizontally scalable decentralized data lake that efficiently aggregates large amounts of on-chain and off-chain data, protected by zero-knowledge proofs. As a decentralized worker network, each node is responsible for storing a specific subset of Block data, accelerating the retrieval process by quickly identifying the nodes that hold the required data.

Subsquid supports real-time indexing, allowing for indexing before the block is finalized. It also supports storing data in a format chosen by the developer, facilitating analysis using tools like BigQuery, Parquet, or CSV. Subgraphs can be deployed on the Subsquid network without migrating to the Squid SDK, enabling no-code deployment.

Despite still being in the testnet phase, Subsquid has achieved impressive statistics, with over 80,000 testnet users, more than 60,000 Squid indexers deployed, and over 20,000 validated developers on the network. On June 3rd, Subsquid launched the mainnet of its data lake.

In addition to indexing, the Subsquid Network data lake can also replace RPC in use cases such as analytics, ZK/TEE co-processors, AI agents, and Oracles.

![Development of Web3 Data Access: Introduction to Indexers and Related Projects])https://img-cdn.gateio.im/webp-social/moments-53dbb4fd659cf6a7184990c886901658.webp(

SubQuery is a decentralized middleware infrastructure network that provides RPC and indexing data services. Initially supporting the Polkadot and Substrate networks, it has now expanded to over 200 chains. Its working principle is similar to The Graph using index proof, where indexers index data and provide query requests, and delegators stake their shares to the indexers. However, it introduces consumers submitting purchase orders, indicating that indexer revenue is guaranteed, rather than managed.

It will introduce SubQuery data nodes that support sharding to prevent nodes from continuously syncing new data, optimizing query efficiency while moving towards greater decentralization. Users can choose to pay approximately 1 SQT token as a computation fee for every 1000 requests, or set custom fees for indexers through the protocol.

Although SubQuery only launched its token earlier this year, the issuance rewards for nodes and delegators have also increased in USD value month-on-month, representing a continuous increase in the number of query services offered on its platform. Since the TGE, the total amount of staked SQT has increased from 6 million to 125 million, highlighting the growth of network participation.

![The Development of Web3 Data Access: Introduction to Indexers and Related Projects])https://img-cdn.gateio.im/webp-social/moments-52ee29205aa307720198994a5f3de61f.webp(

Covalent is a decentralized indexing network, created by block sample producers )BSP( network nodes through bulk exports to create copies of blockchain data, and publishes proofs on the Covalent L1 Blockchain. This data is then refined by block result producers )BRP( nodes according to set rules to filter out the required data.

Through a unified API, developers can easily extract relevant Blockchain data in a consistent request and response format, without the need to write custom complex queries to access the data. The CQT token, which can be settled on Moonbeam, can be used as a payment method to extract these pre-configured datasets from network operators.

The rewards from Covalent seem to show an overall upward trend from the first quarter of 2023 to the first quarter of 2024, partly due to the increase in the price of Covalent token CQT.

When choosing an indexer, the following factors should be considered:

Data Customizability: Certain indexers ) like Covalent ( are general-purpose indexers that provide standard pre-configured datasets via API. While fast, they do not offer flexibility for developers needing custom datasets. Using an indexer framework allows for more custom data processing to meet specific application requirements.

Security: Index data must be secure; otherwise, dApps built on these indexers can also be vulnerable to attacks. For example, if transactions and wallet balances can be manipulated, the dApp may lose liquidity, affecting users. All indexers adopt some form of security through staking tokens, but other solutions may use proofs to further enhance security.

Subsquid offers options using optimistic and zero-knowledge proofs, Covalent releases proofs containing block hash values. The Graph provides a dispute challenge period for indexer queries during the optimistic challenge window, and SubQuery generates Merkle Mountain proofs for each block, calculating the block hash values of all data stored in its database.

Speed and Scalability: As the Blockchain grows and transaction volume increases, indexing large amounts of data becomes more cumbersome, requiring more processing power and storage space. Maintaining efficiency becomes more difficult, but the Indexer Protocol introduces solutions to meet these demands.

Subsquid achieves horizontal scaling by adding more nodes to store data, and it can scale with hardware improvements. Graph provides parallel streaming data for faster synchronization, while SubQuery introduces node sharding to accelerate the synchronization process.

Supported networks: Although most blockchain activities are still happening on Ethereum, different blockchains are becoming increasingly popular. Layer 2s, Solana, Move Blockchain, and the Bitcoin ecosystem chains all have their own growing developers and activities, and also require indexing services.

Supporting certain chains that are not supported by other indexer protocols can gain more market share fees. Indexing data-intensive networks ) such as Solana ( is not an easy task, and currently, only Subsquid has successfully provided indexing support for them.

Although indexers are widely adopted in dApp development, their potential remains enormous, especially in the context of integrating AI. As AI becomes more prevalent in Web2 and Web3, its improvement capabilities depend on accessing relevant data to train models and develop AI agents. Ensuring data integrity is crucial for AI applications, as it prevents models from being fed biased or inaccurate information.

In the indexer解

DAPP1.91%
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 4
  • Share
Comment
0/400
GlueGuyvip
· 2h ago
This data management is so complicated.
View OriginalReply0
LayerZeroHerovip
· 22h ago
DA this time really can't do it
View OriginalReply0
PumpDoctrinevip
· 22h ago
I can't understand what you're saying. Is there anyone who understands?
View OriginalReply0
rekt_but_resilientvip
· 22h ago
I can't handle it, why is this technology so convoluted?
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)