Data Types
All of the used data types and JSON structures can be found in self-documenting JSON-LD:
Preparation
Data passed in
The following data is what we pass in:
paste_password: UTF-8 string
paste_data: UTF-8 string containing a JSON structure with the different components of the paste
paste_data_json = {
"paste": "text content of the paste",
"attachment": "[data URI as per RFC 2397]",
"attachment_name": "filename.ext",
"children": [
"paste_id#key",
"https://example.com/"
]
}
paste_meta: UTF-8 string containing a JSON structure with the meta data of the paste
paste_meta_json = [
[
base64(cipher_iv),
base64(kdf_salt),
kdf_iterations,
kdf_keysize,
cipher_tag_size,
cipher_algo,
cipher_mode,
compression type - "zlib" or "none" (the rawdeflate library used before PrivateBin version 1.3 is not quite zlib compatible)
],
format of the paste - "plaintext" or "syntaxhighlighting" or "markdown",
open-discussion flag - 1 or 0,
burn-after-reading flag - 1 or 0
]
# comments have a simpler meta format:
comment_meta_json = [
base64(cipher_iv),
base64(kdf_salt),
kdf_iterations,
kdf_keysize,
cipher_tag_size,
cipher_algo,
cipher_mode,
compression type - "zlib" or "none"
]
The children may contain external URLs (privatebin pastes or other websites) or just paste IDs followed by the key. Both don't include a password, so users have the option of linking to a new paste without giving access, if they change the password during the clone.
Note that ECMA script strings are UTF-16 encoded (this includes contents of form fields retrieved via the DOM on a otherwise UTF-8 encoded web page) and need to be converted to UTF-8 first.
Process data
If paste_password is an empty string:
paste_passphrase = random(32) # 32 bytes
if a paste_password has been specified:
paste_passphrase = random(32) + paste_password
Processing of the paste_data, if compression is enabled (the default):
paste_blob = zlib.compress(paste_data)
Because of a bug in the deflate algorithm used in PrivateBin you can't use a standard-conform deflate algorithm for that in the format version 1.
Key derivation (PBKDF2)
Since passwords and keys are usually too short to be usable for encryption, it is common practice to use salted key derivation to turn such low entropy input into the actual key to use during en/decryption.
kdf_salt = random(8) # 8 bytes
kdf_iterations = 100000 # was 10000 before PrivateBin version 1.3
kdf_keysize = 256 # bits of resulting kdf_key
kdf_key = PBKDF2_HMAC_SHA256(kdf_keysize, kdf_salt, paste_password)
Encryption
cipher_algo = "aes"
cipher_mode = "gcm" # was "ccm" before PrivateBin version 1.0
cipher_iv = random(16) # 128 bit
cipher_tag_size = 128
cipher_text = cipher(AES(kdf_key), GCM(iv, paste_meta), paste_blob)
Format version 2 (PrivateBin >= 1.3)
The main changes in this over version 1 are:
- the use of a standards conforming deflate implementation #193 and offering compression to be turned off #38.
- allow paste versioning, by including an encrypted link to another paste #255
- increase the iterations in the used KDF to at least 10000 #350
- proper use of adata for authenticating the meta data. Clients can be sure the server didn't change the static parts created with the paste. The dynamic parts of the meta data is stored separately.
Paste format:
{
"v": 2,
"adata": paste_meta,
"ct": base64(cipher_text),
"meta": {
"expire": "5min" # generated client side on paste creation, not returned by server
"created": unix_timestamp_created, # generated server side, only returned for comments but not pastes
"time_to_live": [seconds], # generated server side based on creation minus expiration timestamps, only returned for pastes
"icon": [data URL] # generated on the server for every comment, returned only for comments
}
}
The paste is JSON encoded and as such the order of properties doesn't matter. The order of list elements in arrays (i.e. the children or comments) is important and needs to be preserved.
The "meta" block is mostly filled in by the server on requests. When creating a paste it is not present for comments and or contains only the "expire" value for pastes. If missing a paste will be created with the servers configured default expiration setting. The server validates the format and will reject storing invalid formats if detected.
The meta data in "adata" and "meta" isn't encrypted for the following reasons:
created
- Needed for comments as these need to be sorted by date on the server side to allow for (TBD) pagination. Could be useful for sorting of pastes if we ever offer an administration interface. Not really a secret, as the server knows this anyway.expire
- Required server side to handle expiration of pastes. When responding the server calculates thetime_to_live
based on this.time_to_live
- Calculated by the server based on theexpire
andcreated
to check if the paste has expired and needs to be deleted. Since the server has it anyway, it is returned to the client so it doesn't need to do the same calculation, reducing the risk of incorrect display if the clients clock isn't set correctly.formatter
- Required by the server to check if it supports it's display (configurable option).burnafterreading
- Required to allow for deletion after first access.opendiscussion
- Required server side to know if comments are accepted for a given paste or not. When responding it is used to avoid searching for non-existing comments when they are disabled in the paste.
Format version 1 (PrivateBin <= 1.2.1)
The main difference to version 2 is the use of the RawDeflate library that isn't quite compliant with the deflate standard and with some inputs creates messages that can't be decompressed even by itself.
The key derivation deviates, also. If paste_password is an empty string:
paste_passphrase = base64(random(32)) # 32 bytes
If a paste_password has been specified:
paste_passphrase = base64(random(32)) + hex(sha256(paste_password))
The paste_data is purely the paste contents, not a JSON structure:
paste_data: UTF-8 text
The meta data is not authenticated as part of the adata property and instead part of the general meta data.
Before PrivateBin version 1.0 the cipher_mode
was "ccm", from version 1.0 onwards it is "gcm". Both versions can be read, even by older PrivateBin instances.
This format uses 1000 iterations for key derivation when creating new messages, but it can read messages with a higher iteration count.
kdf_iterations = 1000
cipher_data = {"iv": cipher_iv,
"v": 1,
"iter": kdf_iterations,
"ks": kdf_key_size,
"ts": cipher_tag_size,
"mode": cipher_mode,
"adata": cipher_associated_data,
"cipher": cipher_algo,
"salt": kdf_salt,
"ct": cipher_text}
Legacy Format (ZeroBin)
This is nearly identical to format version 1, but uses Base64.js version 1.7 which produces non-standard base64 encoding due to a faulty implementation. Pastes encoded in this format can't be read without enabling the legacy mode in PrivateBin.