Skip to main content

Erasure Coding

Erasure coding (also known as erasure code) is an efficient and flexible approach to data protection which is an optional feature for Swarm uploads. It is a technique that increases data protection by enabling the recovery of original data even when some encoded chunks are lost or corrupted. When used, it ensures that data on Swarm can always be accessed reliably, even if some nodes or entire neighborhoods go offline. Refer to the official erasure coding paper for more in depth details.

How It Works

Erasure coding enhances data protection by dividing the source data into "chunks" and adding additional redundant chunks.

Specifically, data is divided into m chunks, and k additional chunks are generated, resulting in m + k total chunks. The data is encoded across these chunks such that as long as m chunks are intact, the original data can be fully reconstructed. Chunks are then distributed across the network as with a standard upload. This approach provides a robust method for data recovery in distributed storage networks like Swarm.

Example

For an 8KB image, if we set m = 2 and k = 1, we create 3 chunks (2 original + 1 redundant). As long as any 2 of these 3 chunks are available, we can reconstruct the original data. By increasing k to 4, we can tolerate the loss of up to 4 chunks while still recovering the original data.

Erasure Code Example

Levels of Protection

In Swarm's implementation of erasure coding, there are four named levels of protection, Medium, Strong, Insane, and Paranoid. For each level, the m and k values have been adjusted in order to meet a certain level of data protection:

Table A:

Redundancy Level ValueLevel NameChunk Loss Tolerance
1Medium1%
2Strong5%
3Insane10%
4Paranoid50%

The "Redundancy Level" is a numeric value for each level of protection, the "Level Name" is the official name for each level, and the "Chunk Loss Tolerance" column corresponds to the exact level of data protection for each level. For each redundancy level, the original data is retrievable with >=99.9999% statistical certainty given a percent chunk loss equal or less than the percent shown in the "Chunk Loss Tolerance" column.

Note that this guarantee of retrievability is for each 128 chunk segment, and therefore does not correspond to retrievability of a whole file. The retrievability failure rate for any individual file depends on the size of the file, and increases with the size of the file. For a detailed explanation of how to calculate the retrievability of any sized file refer to section 3 in the erasure coding paper.

Usage

For usage instructions, see the erasure coding page in the "Develop" section.

Cost Calculation

In Swarm's implementation of erasure coding, there are four levels of protection: Medium, Strong, Insane, and Paranoid. Each level adds additional parity chunks for a corresponding increase in data protection (and also cost).

The table below shows the number of parities and data chunks for each level, as well as the percent increase in cost vs a non-erasure coded upload.

Table B:

RedundancyParitiesData ChunksPercentChunks EncryptedPercent Encrypted
Medium91197.6%5915%
Strong2110719.6%5340%
Insane319732%4865%
Paranoid9038240.5%18494%

For each redundancy level, there are m + k = 128 chunks, where m are the data chunks (shown in column "Data Chunks") and k are the parity chunks (shown in column "Parities"). The "Percent" and "Percent Encrypted" columns show percent of "parity overhead" cost increase from using erasure coding for normal and encrypted uploads respectively.

Cost Calculation for Smaller Uploads

To find the percent increase in cost for uploads of less than 128 chunks, refer to the table below:

Table C:

SecurityParitiesChunksPercentChunks EncryptedPercent Encrypted
Medium21200%
Medium32-5150% - 60%1-2300% - 150%
Medium46-1466.7% - 28.6%3-7133.3% - 57.1%
Medium515-2833.3% - 17.9%7-1471.4% - 35.7%
Medium629-4620.7% - 13%14-2342.9% - 26.1%
Medium747-6814.9% - 10.3%23-3430.4% - 20.6%
Medium869-9411.6% - 8.5%34-4723.5% - 17%
Medium995-1199.5% - 7.6%47-5919.1% - 15.3%
Strong41400%
Strong52-3250% - 166.7%1500%
Strong64-6150% - 100%2-3300% - 200%
Strong77-10100% - 70%3-5233.3% - 140%
Strong811-1572.7% - 53.3%5-7160% - 114.3%
Strong916-2056.2% - 45%8-10112.5% - 90%
Strong1021-2647.6% - 38.5%10-13100% - 76.9%
Strong1127-3240.7% - 34.4%13-1684.6% - 68.8%
Strong1233-3936.4% - 30.8%16-1975% - 63.2%
Strong1340-4632.5% - 28.3%20-2365% - 56.5%
Strong1447-5329.8% - 26.4%23-2660.9% - 53.8%
Strong1554-6127.8% - 24.6%27-3055.6% - 50%
Strong1662-6925.8% - 23.2%31-3451.6% - 47.1%
Strong1770-7724.3% - 22.1%35-3848.6% - 44.7%
Strong1878-8623.1% - 20.9%39-4346.2% - 41.9%
Strong1987-9521.8% - 20%43-4744.2% - 40.4%
Strong2096-10420.8% - 19.2%48-5241.7% - 38.5%
Strong21105-10720% - 19.6%52-5340.4% - 39.6%
Insane51500%
Insane62300%1600%
Insane73233.3%1700%
Insane84-5200% - 160%2400%
Insane96-8150% - 112.5%3-4300% - 225%
Insane109-10111.1% - 100%4-5250% - 200%
Insane1111-13100% - 84.6%5-6220% - 183.3%
Insane1214-1685.7% - 75%7-8171.4% - 150%
Insane1317-1976.5% - 68.4%8-9162.5% - 144.4%
Insane1420-2270% - 63.6%10-11140% - 127.3%
Insane1523-2665.2% - 57.7%11-13136.4% - 115.4%
Insane1627-2959.3% - 55.2%13-14123.1% - 114.3%
Insane1730-3356.7% - 51.5%15-16113.3% - 106.2%
Insane1834-3752.9% - 48.6%17-18105.9% - 100%
Insane1938-4150% - 46.3%19-20100% - 95%
Insane2042-4547.6% - 44.4%21-2295.2% - 90.9%
Insane2146-5045.7% - 42%23-2591.3% - 84%
Insane2251-5443.1% - 40.7%25-2788% - 81.5%
Insane2355-5941.8% - 39%27-2985.2% - 79.3%
Insane2460-6340% - 38.1%30-3180% - 77.4%
Insane2564-6839.1% - 36.8%32-3478.1% - 73.5%
Insane2669-7337.7% - 35.6%34-3676.5% - 72.2%
Insane2774-7736.5% - 35.1%37-3873% - 71.1%
Insane2878-8235.9% - 34.1%39-4171.8% - 68.3%
Insane2983-8734.9% - 33.3%41-4370.7% - 67.4%
Insane3088-9234.1% - 32.6%44-4668.2% - 65.2%
Insane3193-9733.3% - 32%46-4867.4% - 64.6%
Paranoid1911900%
Paranoid2321150%12300%
Paranoid263866.7%12600%
Paranoid294725%21450%
Paranoid315620%21550%
Paranoid346566.7%31133.3%
Paranoid367514.3%31200%
Paranoid388475%4950%
Paranoid409444.4%41000%
Paranoid4310430%5860%
Paranoid4511409.1%5900%
Paranoid4712391.7%6783.3%
Paranoid4813369.2%6800%
Paranoid5014357.1%7714.3%
Paranoid5215346.7%7742.9%
Paranoid5416337.5%8675%
Paranoid5617329.4%8700%
Paranoid5818322.2%9644.4%
Paranoid5919310.5%9655.6%
Paranoid6120305%10610%
Paranoid6321300%10630%
Paranoid6522295.5%11590.9%
Paranoid6623287%11600%
Paranoid6824283.3%12566.7%
Paranoid7025280%12583.3%
Paranoid7126273.1%13546.2%
Paranoid7327270.4%13561.5%
Paranoid7528267.9%14535.7%
Paranoid7629262.1%14542.9%
Paranoid7830260%15520%
Paranoid8031258.1%15533.3%
Paranoid8132253.1%16506.2%
Paranoid8333251.5%16518.8%
Paranoid8434247.1%17494.1%
Paranoid8635245.7%17505.9%
Paranoid8736241.7%18483.3%
Paranoid8937240.5%18494.4%

Example Cost Calculation

For each redundancy level, there are m + k = 128 chunks, where m are the data chunks (shown in column "Data Chunks") and k are the parity chunks. If the number of chunks in the data being uploaded are an exact multiple of m, then the percent cost of the upload will simply equal the one shown in table B from the section above in the "Percent" column for the corresponding redundancy level.

Exact Multiples

For example, if we are uploading with the Strong redundancy level, and our source data consists of 321 (3 * 107) chunks, then we can simply use the percentage from the "Percent" column for the Strong level - 19.6% (63 parities / 321 data chunks).

With Remainders

However, generally speaking uploads will not come in exact multiples of m, so we need to adjust our calculations. To do so we need to use table C from the section above which shows the number of parities for sets of chunks starting at a single chunk for each redundancy level up to the maximum number of data chunks for that level. Then we simply sum up the total parities and data chunks for the entire upload and calculate the resulting percentage.

Let's say for example we have a source file of 340 chunks which we want to upload with the Strong level of protection. Referring to table B, we see for the Strong level there are 21 parity chunks for each 107 data chunks. 340 / 107 = ~3.177, meaning our upload will have three full sets of 128 chunks where m = 107 and k = 21. The remainder can be calculated from the modulus of 340 % 107 = 19

Looking at our chart, we can see that at the Strong level for 19 data chunks we need 9 parity chunks. From this we can calculate the final percentage price: 72 / 340 = 21.17%.