Canyon: A Permanent Storage Network for Polkadot | PW Exclusive Interview

This is an original article in Chinese by and from PolkaWorld.

The vision of Web3.0 is to further decentralize the Internet.

A secure, reliable, low-cost, and easy-to-use decentralized storage infrastructure will undoubtedly become an indispensable part of the Web3.0 world.

In this exclusive interview, we are delighted to have Xu Liucheng with us, founder of Canyon, a project focusing on permanent storage for the Polkadot ecosystem, to talk about why we need permanent storage and how Canyon will improve the storage infrastructure for Polkadot and Web3.0.

PW: Could you tell us what Canyon is in plain terms?

Liucheng: Canyon is a permanent storage network based on Substrate. It’s a project, to a large extent, inspired by Arweave, or to put it simply, it’s the PoS version of Arweave in Polkadot.

By combining the PoA (Proof of Access) consensus with PoS, Canyon greatly lowers the entry for storage miners and incentivizes them to store as much data as possible to win more rewards. It also makes strides in data durability and data redundancy in its storage system through innovation and improvement while enjoying advantages brought by the iteration of blockchain technology. Our fundamental vision is to serve as a practical and sustainable storage infrastructure in the Web3.0 era.

PW: Why would you want to build a permanent storage network?

Liucheng: This question can be answered from two perspectives. Why did I choose storage? And why did I choose permanent storage?

Well, the first question, why I chose blockchain storage. The is actually very simple. When decentralized computing is no longer a problem, there definitely will be an increased demand for decentralized storage. In computer science, computing and storage are inseparable. The same also holds true for blockchain. At present, the entire blockchain sector is still trying to improve decentralized computing or to put it simply, TPS. Ethereum is the first to focus on decentralized computing which is proven to be really useful, but problems begin to emerge as the ecosystem gradually grows that it’s far from enough to meet the growing demand. Projects like Polkadot, Solana, and Near later rose to the challenge, trying to solve the scalability problem, and have already made lots of progress. We believe each one step forward in blockchain computing will not only enrich DeFi applications but also diversify and give birth to more rich and varied DApps which will in return entail more decentralized storage demand.

Now to the second question, why chose permanent storage. There are several reasons. First, there is indeed the need for permanent storage, as we do share some of Arweave’s views during close observation. For example, the need to store NFT metadata. Projects based on a permanent storage network like everFinance seek to move on-chain computing completely off-chain, which, I believe, is a great idea worth trying. Second, there may be the need to permanently store the blockchain history. For example, Ethereum is moving from PoW to PoS, the Ethereum Foundation proposed to store its entire history in a storage network. These are known storage needs and I believe there will be more in the future. Therefore, permanent storage will certainly have a foothold in Web3.0. That’s what we chose it.

PW: Given the fact that decentralized storage still lacks demand, how should current storage projects deal with it?

Liucheng: First of all, I need to point out the lack of demand is largely due to the fact that the entire industry is still in its early stages, the scalability issue has not been resolved completely. It’s not up to a single project to drive up the demand, instead, it is the continuous development of the crypto space, the deepening understanding of blockchain, and the flourishing variety of Dapps in cloud storage, video, and social media that can really boost the demand.

Let’s assume that in the future personal data will be encrypted and stored in a storage network that issues access requests to users when the data is required by an app. If the request is approved, the data will be decrypted and accessed; if not, denied. Unlike the current Web 2.0 era where personal information is always required when logging onto a network and stored in a centralized entity that claims the ownership. Even though the current demand is gloomy, any storage project that does not encourage the miners to seek to profit from storing junk data will embrace a promising future.

PW: In your opinion, what are the problems with other storage solutions like IPFS and Arweave, etc.?

Liucheng: Let’s talk about Filecoin first. One of its problems is the high cost of data storage. Filecoin adopts the zero-knowledge proof mechanism to ensure data is correctly stored within a specified time by miners, but it also means very high hardware costs. As we all know, high cost always translates into the high price for users. Early introduced subsidies, however, are far from enough to offset the cost in the long run, with customers being left on the receiving end to foot the bill. Therefore, Filecoin is not able to provide a truly low-cost data storage service.

Another problem with Filecoin is the sheer amount of junk data being stored on its network. Miners capitalize on storing junk to ramp up their mining power for profit. Its current storage capacity has reached EiB, with a whopping amount, 99%, being junk data. In comparison, Arweave has only a small portion of its capacity occupied, 11 T. That’s six orders of magnitude away.

Unlike Filecoin, Arweave manages to incentivize data storage with almost zero cost via PoA which is a probabilistic storage consensus. However, it is not perfect, either. The PoA consensus of Arweave is based on PoW which is energy-consuming, slow, and guarantees no finality, and Arweave encourages miners to store data probabilistically, without a guarantee that the data will never be lost, making users’ data vulnerable in theory.

Moreover, it does not guarantee data retrievability in the protocol layer, that is, the stored data may not be readable. Although its white paper stipulates that there are mechanisms like wildfire in place, working through network nodes to discipline the system, this has not actually been implemented to my knowledge. Since retrieving data requires bandwidth which definitely comes at a cost, it’s unjustifiable to require nodes to provide the service for free, and a storage network is of no use if it cannot guarantee the users can read data on request.

Another shortcoming of Arweave is something similar to selfish mining, that is, some data can be deliberately withheld from the rest of the network by an individual node in order to gain mining advantages. The design is initially choreographed to incentivize miners to store data that are deemed scarce and unique for mining advantages, thereby increasing the redundancy of scarce data, However, in fact, it gradually evolves into an exploitive means for selfish mining.

Last but not least, Arweave may face an awkward situation someday in the future where there is only one storage mining pool left. Why is that? Well, the PoA (Proof of Access) consensus only requires access to data. As long as a node remains in a large mining pool where data can be retrieved easily, it’s not obliged to store the data itself, which may lead to catastrophic consequences as all the data is stored only once, if a node fails, the data with it will be permanently lost, with no backup to recover from.

Finally, in conclusion, as far as storage is concerned, Arweave fails to achieve any theoretical guarantee in terms of data durability, retrievability, and redundancy.

PW: How does Canyon solve these problems?

Liucheng: As I have mentioned, Arweave adopts “PoW + PoA”. Many of its problems actually stem from the former. In PoW, it is impossible to know the number of miners as they are free to join and leave, nor is it viable to impose limits on the storage services provided by miners.

That’s why Canyon chose to implement “PoS + PoA”. In a PoS system, we’ll know the number of nodes, and be able to limit node storage, so as to achieve a high durability rate. To do that, a minimum limit on storage ratio is imposed on nodes through PoS, which means that each node must store at least a minimum percentage of data. Let’s do the calculation as it’s a simple probability problem. Assuming that there are 100 nodes and each is required to store at least 20% of all data, it’s easy to get the probability of the loss of certain data which is not stored by any nodes and then subtracted by 1.

The mechanism helps Canyon reach a 99.9999999999% (12) durability rate in the case of 200 nodes, with each storing at least 10% of the data. If each node stores at least 90%, a dozen will be enough to do the trick. In comparison, Amazon’s durability rate is 99.999999999% (11). Like Arweave, the rate is also probabilistic, but what really draws the distinction is there is a theoretical minimum limit of data loss in Canyon, but not to Arweave as it chose PoW over PoS.

In terms of data redundancy, we can indirectly achieve that by charging some fees for data retrieval which in our opinion is not of zero cost. Why retrieval charges can help with data redundancy? Because it’s profitable for nodes, which drives them to store data that is in high demand. The more frequently certain data is retrieved, the more likely that it’s well backed up as more nodes seek to get hold of it for profit. Therefore, this market-driven mechanism can reach data redundancy very adaptively.

To sum up, PoS+PoA ensures high durability of data which, whatever hot or cold, surely has at least one backup on the network. Retrieval charges motivate nodes to store as much data as possible, and the more frequently accessed, the more copies it has, thus redistributing storage resources more reasonably.

PW: Is there any requirement for miners in Canyon storage mining?

Liucheng: Canyon is not demanding at all in terms of hardware, in fact, it does not require special mining machines, or any specified types of hardware, thanks to the ultra-lightweight storage consensus (PoA). Average equipment for running a PoS node, plus a hard drive with decent capacity will be all that’s needed.

PW: What is the purpose of Canyon token?

Liucheng: Ordinary users can use tokens to pay gas fees, and charges for storage and retrieval services, and to participate in staking and on-chain governance, etc. For nodes, tokens serve as incentives, motivating them to actively engage in network consensus and improve storage services. In addition, nodes can make profits from the retrieval services they provide.

PW: When did the project start? How about the current progress?

Liucheng: We started focusing on blockchain storage at the end of 2020, and applied for the first grant from the Web3 Foundation around March this year. So far, we have received two rounds of the Web3 Foundation Grant, with the second round being very helpful in implementing the PoA consensus on Substrate and completing the integration with PoS. Check out our last grant delivery.

Major technical issues, to a large extent, have been detected and studied upon. And we expect the mainnet to be launched within a year or two depending on how future financing and team building go.

PW: Why did Canyon choose Substrate?

Liucheng: Well, I’m quite familiar with Substrate for I have been involved with it for several years, and its framework is proven to be very useful, indeed. Besides, I believe Polkadot will enjoy robust growth prospects and so I’m looking forward to some sort of interoperability with future Polkadot projects. Last but not least, there was no project in Polkadot dedicated to permanent storage when we first took on the course, we were intended to be the first in Polkadot.

Links:

A permanent storage layer for Web3.0, with high data durability and retrievability