When a node uploads data to Swarm, it 'attaches' postage stamps to each chunk of data. Postage stamps are purchased in batches rather than one by one. The value assigned to a stamp indicates how much it is worth to persist the associated data on Swarm, which nodes use to prioritize which chunks to remove from their reserve first.
The value of a postage stamp decreases over time as if storage rent was regularly deducted from the batch balance. We say that a stamp expires when the batch it is issued from has insufficient balance. A chunk with an expired stamp can not be used in the proof of entitlement storer nodes need to submit in order to get compensated for their contributed storage space, therefore such expired chunks are evicted from nodes' reserves and put into the cache where their continued persistence depends on their popularity.
Postage stamps are issued in batches with a certain number of storage slots partitioned into equally sized address space buckets. Each bucket is responsible for storing chunks that fall within a certain range of the address space. When uploaded, files are split into 4kb chunks, each chunk is assigned a unique address, and each chunk is then assigned to the bucket in which its address falls. Falling into the same range means a match on
n leading bits of the chunk and bucket. This restriction is necessary to ensure (incentivise) uniform utilisation of the address space and is fair since the distribution of content addresses are uniform as well. Uniformity depth is the number of leading bits determining bucket membership (also called
bucket depth). The uniformity depth is set to 16, so there are a total of buckets.
Each bucket has a certain number of slots which can be "filled" by chunks (In other words, for each bucket, a certain number of chunks can be stamped). Once all the slots of a bucket are filled, the entire postage batch will be fully utilised and can no longer be used to upload additional data.
bucket depthdetermines how many chunks are allowed in each bucket. The number of chunks allowed in each bucket is calculated like so:
So with a batch depth of 24 and a bucket depth of 16:
Batch Depth and Batch Amount
Each batch of stamps has two key parameters,
batch depth and
amount, which are recorded on Gnosis Chain at issuance. Note that these "depths" do not refer to the depth terms used to describe topology which are outlined here in the glossary.
Batch depth determines how much data can be stored by a batch. The number of chunks which can be stored (stamped) by a batch is equal to .
For a batch with a
batch depth of 24, a maximum of chunks can be stamped.
Since we know that one chunk can store 4 kb of data, we can calculate the theoretical maximum amount of data which can be stored by a batch from the
However, due to the way postage stamp batches are utilised, batches will become fully utilised before stamping the theoretical maximum number of chunks. Therefore when deciding which batch depth to use, it is important to consider the effective amount of data that can be stored by a batch, and not the theoretical maximum. The effective rate of utilisation increases along with the batch depth. See section on stamp batch utilisation below for more information.
Batch Amount (& Batch Cost)
amount parameter is the quantity of xBZZ in PLUR that is assigned per chunk in the batch. The total number of xBZZ that will be paid for the batch is calculated from this figure and the
batch depth like so:
The paid xBZZ forms the
balance of the batch. This
balance is then slowly depleted as time ticks on and blocks are mined on Gnosis Chain.
For example, with a
batch depth of 20 and an
amount of 1000000000 PLUR:
Utilisation of an immutable batch is computed using a hash map of size which is for all batches, so 65536 total entries. For the keys of the key-value pairs of the hash map, the keys are 16 digit binary numbers from 0 to 65535, and the value is a counter.
As chunks are uploaded to Swarm, each chunk is assigned to a bucket based the first 16 binary digits of the chunk's hash. The chunk will be assigned to whichever bucket's key matches the first 16 bits of its hash, and that bucket's counter will be incremented by 1.
The batch is deemed "full" when ANY of these counters reach a certain max value. The max value is computed from the batch depth as such: . For example with batch depth of 24, the max value is or 256. A bucket can be thought of as have a number of "slots" equal to this maximum value, and every time the bucket's counter is incremented, one of its slots gets filled.
Note that 18 is below the minimum batch depth, but is used in these examples to simplify the explanation of batch utilisation.
In the diagram below, the batch depth is 18, so there are or 4 slots for each bucket. The utilisation of a batch is simply the highest number of filled slots out of all 65536 entries or "buckets". In this batch, none of the slots in any of the buckets have yet been filled with 4 chunks, so the batch is not yet fully utilised. The most filled slots out of all buckets is 2, so the stamp batch's utilisation is 2 out of 4.
As more chunks get uploaded and stamped, the bucket slots will begin to fill. As soon as the slots for any SINGLE bucket get filled, the entire batch is considered 100% utilised and can no longer be used to upload additional chunks.
Mutable batches use the same hash map structure as immutable batches, however its utilisation works very differently. In contrast with immutable batches, mutable batches are never considered fully utilised. Rather, at the point where an immutable batch would be considered fully utilised, a mutable batch can continue to stamp chunks. However, if any chunk's address lands in a bucket whose slots are already filled, rather than the batch becoming fully utilised, that bucket's counter gets reset, and the new chunk will replace the oldest chunk in that bucket.
Therefore rather than speaking of the number of slots as determining the utilisation of a batch as with immutable batches, we can think of the slots as defining a limit to the amount of data which can be uploaded before old data starts to get overwritten.
Which Type of Batch to Use
Immutable batches are suitable for long term storage of data or for data which otherwise does not need to be changed and should never be overwritten, such as records archival, legal documents, family photos, etc.
Mutable batches are great for data which needs to be frequently updated and does not require a guarantee of immutability. For example, a blog, personal or company websites, ephemeral messaging app, etc.
The default batch type when unspecified is immutable. This can be modified through the Bee api by setting the
immutable header with the
\stamps POST endpoint to
Implications for Swarm Users
Due to the nature of batch utilisation described above, batches are often fully utilised before reaching their theoretical maximum storage amount. However as the batch depth increases, the chance of a postage batch becoming fully utilised early decreases. At batch depth 24, there is a 0.1% chance that a batch will be fully utilised/start replacing old chunks before reaching 64.33% of its theoretical maximum.
Let's look at an example to make it clearer. Using the method of calculating the theoretical maximum storage amount outlined above, we can see that for a batch depth of 24, the theoretical maximum amount which can be stored is 68.72 gb:
Therefore we should use 64.33% the effective rate of usage for the stamp batch:
The details of how the effective rates of utilisation are calculated will be published soon.
Effective Utilisation Table
When a user buys a batch of stamps they may make the naive assumption that they will be able to upload data equal to the sum total size of the maximum capacity of the batch. However, in practice this assumption is incorrect, so it is essential that Swarm users understand the relationship between batch depth and the theoretical and effective volumes of a batch.
The provided table shows the effective volume for each batch depth from 20 to 41. The "utilisation rate" is the rate of utilisation a stamp batch can reach with a 0.1% failure rate (that is, there is a 1/1000 chance the batch will become fully utilised before reaching that utilisation rate). The "effective volume" figure shows the actual amount of data which can be stored at the effective rate. The effective volume figure is the one which should be used as the de-facto maximum amount of data that a batch can store before becoming either fully utilised (for immutable batches), or start overwriting older chunks (mutable batches).
|Batch Depth||Utilisation Rate||Theoretical Max Volume||Effective Volume|
|20||0.00%||4.29 GB||0.00 B|
|21||0.00%||8.59 GB||0.00 B|
|22||28.67%||17.18 GB||4.93 GB|
|23||49.56%||34.36 GB||17.03 GB|
|24||64.33%||68.72 GB||44.21 GB|
|25||74.78%||137.44 GB||102.78 GB|
|26||82.17%||274.88 GB||225.86 GB|
|27||87.39%||549.76 GB||480.43 GB|
|28||91.08%||1.10 TB||1.00 TB|
|29||93.69%||2.20 TB||2.06 TB|
|30||95.54%||4.40 TB||4.20 TB|
|31||96.85%||8.80 TB||8.52 TB|
|32||97.77%||17.59 TB||17.20 TB|
|33||98.42%||35.18 TB||34.63 TB|
|34||98.89%||70.37 TB||69.58 TB|
|35||99.21%||140.74 TB||139.63 TB|
|36||99.44%||281.47 TB||279.91 TB|
|37||99.61%||562.95 TB||560.73 TB|
|38||99.72%||1.13 PB||1.12 PB|
|39||99.80%||2.25 PB||2.25 PB|
|40||99.86%||4.50 PB||4.50 PB|
|41||99.90%||9.01 PB||9.00 PB|
This table is based on preliminary calculations and may be subject to change.
Nodes' storage is actually defined as a number of chunks with a size of 4kb (2^12 bytes) each, but in fact some SOC chunks can be a few bytes longer, and some chunks can be smaller, so the conversion is not precise. Furthermore, due to the way Swarm represents files in a merkle tree, the intermediate chunks are additional overhead which must also be accounted for.
Additionally, when a node stores chunks it uses additional indexes — therefore the disk space a maximally filled reserve would demand cannot be calculated with perfect accuracy.