For now, the only class in this repository is Base2n.
Base2n is for binary-to-text conversion with arbitrary encoding schemes that represent binary data in a base 2n notation. It can handle non-standard variants of many standard encoding schemes such as Base64 and Base32. Many binary-to-text encoding schemes use a fixed number of bits of binary data to generate each encoded character. Such schemes generalize to a single algorithm, implemented here.
Binary-to-text encoding is usually used to represent data in a notation that is safe for transport over text-based protocols, and there are several other practical uses. See the examples below.
With Base2n, you define your encoding scheme parametrically. Let's instantiate a Base32 encoder:
// RFC 4648 base32 alphabet; case-insensitive
$base32 = new Base2n(5, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567', FALSE, TRUE, TRUE);
$encoded = $base32->encode('encode this');
// MVXGG33EMUQHI2DJOM======
integer $bitsPerCharacter
Required. The number of bits to use for each encoded character; 1–8. The most practical range is 1–6. The encoding's radix is a power of 2:2^$bitsPerCharacter
.- base-2, binary
- base-4, quaternary
- base-8, octal
- base-16, hexadecimal
- base-32
- base-64
- base-128
- base-256
-
string $chars
This string specifies the base alphabet. Must be2^$bitsPerCharacter
long. Default:0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_
-
boolean $caseSensitive
To decode in a case-sensitive manner. Default:FALSE
-
boolean $rightPadFinalBits
How to encode the last character when the bits remaining are fewer than$bitsPerCharacter
. WhenTRUE
, the bits to encode are placed in the most significant position of the final group of bits, with the lower bits set to0
. WhenFALSE
, the final bits are placed in the least significant position. For RFC 4648 encodings,$rightPadFinalBits
should beTRUE
. Default:FALSE
-
boolean $padFinalGroup
It's common to encode characters in groups. For example, Base64 (which is based on 6 bits per character) converts 3 raw bytes into 4 encoded characters. If insufficient bytes remain at the end, the final group will be padded with=
to complete a group of 4 characters, and the encoded length is always a multiple of 4. Although the information provided by the padding is redundant, some programs rely on it for decoding; Base2n does not. Default:FALSE
-
string $padCharacter
When$padFinalGroup
isTRUE
, this is the pad character used. Default:=
string $rawString
Required. The data to be encoded.
string $encodedString
Required. The string to be decoded.boolean $strict
WhenTRUE
,NULL
will be returned if$encodedString
contains an undecodable character. WhenFALSE
, unknown characters are simply ignored. Default:FALSE
PHP does not provide any Base32 encoding functions. By setting $bitsPerCharacter
to 5 and specifying your desired alphabet in $chars
, you can handle any variant of Base32:
// RFC 4648 base32 alphabet; case-insensitive
$base32 = new Base2n(5, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567', FALSE, TRUE, TRUE);
$encoded = $base32->encode('encode this');
// MVXGG33EMUQHI2DJOM======
// RFC 4648 base32hex alphabet
$base32hex = new Base2n(5, '0123456789ABCDEFGHIJKLMNOPQRSTUV', FALSE, TRUE, TRUE);
$encoded = $base32hex->encode('encode this');
// CLN66RR4CKG78Q39EC======
Octal notation:
$octal = new Base2n(3);
$encoded = $octal->encode('encode this');
// 312671433366214510072150322711
A convenient way to go back and forth between binary notation and its real binary representation:
$binary = new Base2n(1);
$encoded = $binary->encode('encode this');
// 0110010101101110011000110110111101100100011001010010000001110100011010000110100101110011
$decoded = $binary->decode($encoded);
// encode this
PHP uses a proprietary binary-to-text encoding scheme to generate session identifiers from random hash digests. The most efficient way to store these session IDs in a database is to decode them back to their raw hash digests. PHP's encoding scheme is configured with the session.hash_bits_per_character
php.ini setting. The decoded size depends on the hash function, set with session.hash_function
in php.ini.
// session.hash_function = 0
// session.hash_bits_per_character = 5
// 128-bit session ID
$sessionId = 'q3c8n4vqpq11i0vr6ucmafg1h3';
// Decodes to 16 bytes
$phpBase32 = new Base2n(5, '0123456789abcdefghijklmnopqrstuv');
$rawSessionId = $phpBase32->decode($sessionId);
// session.hash_function = 1
// session.hash_bits_per_character = 6
// 160-bit session ID
$sessionId = '7Hf91mVc,q-9W1VndNNh3evVN83';
// Decodes to 20 bytes
$phpBase64 = new Base2n(6, '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-,');
$rawSessionId = $phpBase64->decode($sessionId);
Generate random security tokens:
$tokenEncoder = new Base2n(6, '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-,');
$binaryToken = openssl_random_pseudo_bytes(32); // PHP >= 5.3
$token = $tokenEncoder->encode($binaryToken);
// Example: U6M132v9FG-AHhBVaQWOg1gjyUi1IogNxuen0i3u3ep
The rest of these examples are probably more fun than they are practical.
We can encode arbitrary data with a 7-bit encoding. (Note that this is not the same as the 7bit MIME content-transfer-encoding.)
// This uses all 7-bit ASCII characters
$base128chars = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
. "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F"
. "\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2A\x2B\x2C\x2D\x2E\x2F"
. "\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3A\x3B\x3C\x3D\x3E\x3F"
. "\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4A\x4B\x4C\x4D\x4E\x4F"
. "\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5A\x5B\x5C\x5D\x5E\x5F"
. "\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6A\x6B\x6C\x6D\x6E\x6F"
. "\x70\x71\x72\x73\x74\x75\x76\x77\x78\x69\x7A\x7B\x7C\x7D\x7E\x7F";
$base128 = new Base2n(7, $base128chars);
$encoded = $base128->encode('encode this');
The following encoding guarantees that the most significant bit is set for every byte:
// "High" base-128 encoding
$high128chars = "\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8A\x8B\x8C\x8D\x8E\x8F"
. "\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9A\x9B\x9C\x9D\x9E\x9F"
. "\xA0\xA1\xA2\xA3\xA4\xA5\xA6\xA7\xA8\xA9\xAA\xAB\xAC\xAD\xAE\xAF"
. "\xB0\xB1\xB2\xB3\xB4\xB5\xB6\xB7\xB8\xB9\xBA\xBB\xBC\xBD\xBE\xBF"
. "\xC0\xC1\xC2\xC3\xC4\xC5\xC6\xC7\xC8\xC9\xCA\xCB\xCC\xCD\xCE\xCF"
. "\xD0\xD1\xD2\xD3\xD4\xD5\xD6\xD7\xD8\xD9\xDA\xDB\xDC\xDD\xDE\xDF"
. "\xE0\xE1\xE2\xE3\xE4\xE5\xE6\xE7\xE8\xE9\xEA\xEB\xEC\xED\xEE\xEF"
. "\xF0\xF1\xF2\xF3\xF4\xF5\xF6\xF7\xF8\xF9\xFA\xFB\xFC\xFD\xFE\xFF";
$high128 = new Base2n(7, $high128chars);
$encoded = $high128->encode('encode this');
Let's create an encoding using exclusively non-printable control characters!
// Base-32 non-printable character encoding
$noPrintChars = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A\x0B\x0C\x0D\x0E\x0F"
. "\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F";
$nonPrintable32 = new Base2n(5, $noPrintChars);
$encoded = $nonPrintable32->encode('encode this');
Why not encode data using only whitespace? Here's a base-4 encoding using space, tab, new line, and carriage return:
// Base-4 whitespace encoding
$whitespaceChars = " \t\n\r";
$whitespace = new Base2n(2, $whitespaceChars);
$encoded = $whitespace->encode('encode this');
// "\t\n\t\t\t\n\r\n\t\n \r\t\n\r\r\t\n\t \t\n\t\t \n \t\r\t \t\n\n \t\n\n\t\t\r \r"
$decoded = $whitespace->decode(
"\t\n\t\t\t\n\r\n\t\n \r\t\n\r\r\t\n\t \t\n\t\t \n \t\r\t \t\n\n \t\n\n\t\t\r \r"
);
// encode this
Base2n is not slow, but it will never outperform an encoding function implemented in C. When one exists, use it instead.
PHP provides the base64_encode()
and base64_decode()
functions, and you should always use them for standard Base64. When you need to use a modified alphabet, you can translate the encoded output with strtr()
or str_replace()
.
A common variant of Base64 is modified for URLs and filenames, where +
and /
are replaced with -
and _
, and the =
padding is omitted. It's better to handle this variant with native PHP functions:
// RFC 4648 base64url with Base2n...
$base64url = new Base2n(6, 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_', TRUE, TRUE, FALSE);
$encoded = $base64url->encode("encode this \xBF\xC2\xBF");
// ZW5jb2RlIHRoaXMgv8K_
// RFC 4648 base64url with native functions...
$encoded = str_replace(array('+', '/', '='), array('-', '_', ''), base64_encode("encode this \xBF\xC2\xBF"));
// ZW5jb2RlIHRoaXMgv8K_
Native functions get slightly more cumbersome when every position in the alphabet has changed, as seen in this example of decoding a Bcrypt hash:
// Decode the salt and digest from a Bcrypt hash
$hash = '$2y$14$i5btSOiulHhaPHPbgNUGdObga/GC.AVG/y5HHY1ra7L0C9dpCaw8u';
$encodedSalt = substr($hash, 7, 22);
$encodedDigest = substr($hash, 29, 31);
// Using Base2n...
$bcrypt64 = new Base2n(6, './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789', TRUE, TRUE);
$rawSalt = $bcrypt64->decode($encodedSalt); // 16 bytes
$rawDigest = $bcrypt64->decode($encodedDigest); // 23 bytes
// Using native functions...
$bcrypt64alphabet = './ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
$base64alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
$rawSalt = base64_decode(strtr($encodedSalt, $bcrypt64alphabet, $base64alphabet)); // 16 bytes
$rawDigest = base64_decode(strtr($encodedDigest, $bcrypt64alphabet, $base64alphabet)); // 23 bytes
You can encode and decode hexadecimal with bin2hex()
and pack()
:
// Hexadecimal with Base2n...
$hexadecimal = new Base2n(4);
$encoded = $hexadecimal->encode('encode this'); // 656e636f64652074686973
$decoded = $hexadecimal->decode($encoded); // encode this
// It's better to use native functions...
$encoded = bin2hex('encode this'); // 656e636f64652074686973
$decoded = pack('H*', $encoded); // encode this
// As of PHP 5.4 you can use hex2bin() instead of pack()