Making the Most of Bit Arrays

18 December, 2025

Gleam has a special piece of syntax that most other languages don't: Bit arrays. Taken from Erlang, bit array syntax allows the constructing and pattern matching on binary data. Bit arrays are extremely powerful, but unfortunately the documentation is a little sparse. It lists all the possible options you can use, as well as linking to the Erlang documentation for further reading, but the syntax isn't exactly the same as on Erlang, so there's some ambiguity as to how it exactly works. To make it easier, I wanted to write a comprehensive guide, to make it as easy as possible to understand how they work.

The basics
Segment types
Segment size
Endianness
Pattern matching
JavaScript support
Example

The basics

Bit arrays are delimited by double angle brackets (<< and >>), and contain zero or more segments, separated by commas. A segment is a value which is encoded somehow as a sequence of bits. They have no actual separation other than syntactically, they are just a way of building up a bit array out of various different parts.

A segment consists of a value, followed by a series of options using the syntax value:option1-option2-option3.

There are several different data types that can appear as the value of a bit array segment, and each has a slightly different set of defaults, as well as options that can be used to modify how it is encoded.

The default assumed type is Int, and if you want a segment that is a non-integer, you need to specify that by using the type-specific option, unless you are using a literal, in which case it is inferred automatically.

Segment types

As mentioned above, the default segment type is Int. By default, integer segments are encoded as an 8-bit signed integer, although this can be modified using various options, which will be mentioned later.

The syntax for printing bit arrays with the echo keyword uses 8-bit unsigned integer segments to represent the structure of the bit array, so that is what I will be using in the rest of this article to show the encoding of various bit arrays.

echo <<1, 2, -3>>
// <<1, 2, 253>>

Bit array syntax also allows for Float segments. If you are not using a literal float value, the float option is required for the program to type-check. By default, floats are encoded as 64-bit IEEE 754 floats, although the size can be changed to either 32 or 16 bit.

echo <<3.14>>
// <<64, 9, 30, 184, 81, 235, 133, 31>>

let some_float = 1.0
echo <<some_float>>
// Error: Expected type Int, found type Float

echo <<some_float:float>>
// <<63, 240, 0, 0, 0, 0, 0, 0>>

Strings are another data type that can be used as a bit array segment. By default, strings are encoded in UTF-8, although this can be changed using the utf16 and utf32 options. The utf8 option can also be used, when the value is not a literal.

echo <<"Hello 🌍">>
// <<72, 101, 108, 108, 111, 32, 240, 159, 140, 141>>

echo <<"Hello 🌍":utf16>>
// <<0, 72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 216, 60, 223, 13>>

echo <<"Hello 🌍":utf32>>
// <<0, 0, 0, 72, 0, 0, 0, 101, 0, 0, 0, 108, 0, 0, 0, 108, 0, 0, 0, 111, 0, 0, 0, 32, 0, 1, 243, 13>>

let greeting = "Hello"
echo <<greeting>>
// Error: Expected type Int, found type String

echo <<greeting:utf8>>
// <<72, 101, 108, 108, 111>>

UTF codepoints, using the built-in UtfCodepoint type, are also possible to use as bit array segments. These work similar to strings, but only represent a single codepoint instead of multiple. Like strings, they can be encoded as UTF-8, UTF-16 or UTF-32, although they have differently named options: utf8_codepoint, utf16_codepoint and utf32_codepoint. Since there are no UTF codepoint literals, on of these options is always required.

let assert [codepoint] = string.to_utf_codepoints("🌍")

echo <<codepoint>>
// Error: Expected type Int, found type UtfCodepoint

echo <<codepoint:utf8_codepoint>>
// <<240, 159, 140, 141>>

echo <<codepoint:utf16_codepoint>>
// <<216, 60, 223, 13>>

echo <<codepoint:utf32_codepoint>>
// <<0, 1, 243, 13>>

The last data type that can be used is BitArray. The encoding here is pretty obvious, simply consisting of the bits inside the specified bit array. Bit array segments must use the bits option.

let bit_array = <<3, 4, 5>>

echo <<bit_array>>
// Error: Expected type Int, found type BitArray

echo <<bit_array:bits>>
// <<3, 4, 5>>

echo <<1, 2, bit_array:bits, 6>>
// <<1, 2, 3, 4, 5, 6>>

Segment size

Possibly the most commonly used bit array option is size. The size option allows you to customise the size that a particular segment has, specified in bits.

echo <<1024:size(16)>>
// <<4, 0>>

Since size is so commonly used, it even has a shorthand syntax that you can use, where the size part is omitted:

echo <<1.0:32>>
// <<63, 128, 0, 0>>

There is another option you can use in conjunction with size: unit. The unit option allows you to specify a value that is multiplied by the size. This is most commonly useful for specifying sizes in bytes, but can be used for any size unit.

echo <<1024:2-unit(8)>>
// <<4, 0>>

echo <<12:3-unit(2)>>
// <<12:6>>

For bit arrays which are not a whole number of bytes, the trailing bits at the end are suffixed with the number of bits.

There are some limitations to the size option though. While it is unrestricted on integer segments, float segments only support sizes of 16, 32 or 64, as other size floats are not well defined.

String and UTF codepoint segments cannot use the size option; they have a fixed size based on their value.

Bit array segments can use the size option to truncate the bit array to a particular size, but if the specified size is larger than the size of the bit array, it will lead to a runtime error.

Endianness

By default, bit array segments are big endian, however it is possible to configure them to be little endian instead, using the little option. There is a big option too, but it does nothing other than perhaps making the intention of the code clearer. There is also native, which chooses endianness based on the processor that is running the code.

Endianness is easiest to understand when it comes to integer segments, but it also applies to Floats as well as UTF-16 and UTF-32 strings and codepoints. It is not allowed for UTF-8 or bit array segments.

Endianness usually doesn't matter when using bit arrays internally; it is often only useful when it comes to interacting with a predefined API.

Pattern matching

The syntax shown until now has been for constructing bit arrays, but as mentioned at the beginning, bit array syntax can also used to pattern match on binary data and extract information from it. The syntax is largely the same, but there are some limitations and additional features when it comes to pattern matching.

In general, most of the syntax that can be used when constructing bit arrays can be used in the same way when pattern matching. You can either match on a specific literal, or assign the value to a variable.

One thing to note is that segment information is not stored in the bit array, so for example a Float segment can be matched on as an Int, and vice versa.

let assert <<x, y, z>> = <<1, 2, 3>>
echo #(x, y, z)
// #(1, 2, 3)

let assert <<1, a, 3>> = <<1, 2, 3>>
echo a
// 2

let assert <<x, y>> = <<3.14:16>>
echo #(x, y)
// #(66, 72)

let assert <<float:float-size(16)>> = <<60, 0>>
echo float
// 1.0

Strings

One restriction to note is that arbitrary length strings cannot be matched on in this way. The following is an error:

let assert <<message:size(5)>> = <<"Hello">>

Because UTF-8 is variable sized, there's no guarantee that any given sequence of bytes is valid UTF-8. You can still match on UTF codepoints though, as well as string literals.

let assert <<"Hello", last_char>> = <<"Hello!">>
echo last_char
// 33

let assert <<first:utf8_codepoint>> = <<81>>
echo first
// utfcodepoint(Q)

Bits and bytes

For matching on bit arrays, there are two options: The bits option that is used in construction, and a second bytes option, which only matches whole numbers of bytes. If given an explicit size, that number of bits/bytes is matched. If the size option is not used, they match everything remaining in the bit array.

Note: When using the bytes option, size is measured in bytes, and the unit option cannot be used. This is currently a bug in Gleam, you can track its status here.

let assert <<_, bits:bits-size(12), _:size(4)>> = <<1, 2, 3>>
echo bits
// <<2, 0:4>>

let assert <<_, bytes:bytes-size(3), _>> = <<1, 2, 3, 4, 5>>
echo bits
// <<2, 3, 4>>

let assert <<first:4, rest:bits>> = <<1, 2, 3>>
echo rest
// <<16, 32, 3:4>>

let assert <<_, _, rest:bytes>> = <<1, 2, 3, 4, 5, 6>>
echo rest
// <<3, 4, 5, 6>>

let assert <<value:bytes>> = <<1, 2:2, 3>>
// Error: Pattern match failed

Signedness

When matching on integers, they are treated by default as unsigned. If you want to match on a signed integer, the signed option can be used, and the number is interpreted using two's complement. The unsigned option also exists for consistency, but like big, it does nothing. Signedness only applies to integers and cannot be used with any other type of segment.

let assert <<x>> = <<-1>>
echo x
// 255

let assert <<x:signed>> = <<255>>
echo x
// -1

JavaScript support

Bit arrays are a feature of Erlang, built in to the BEAM virtual machine. This is what inspired the Gleam feature, and it means we get all this behaviour for free on the Erlang target. But on JavaScript, all features need to be implemented from scratch. While most of the Erlang behaviour already exists, a few features are still lacking. At the time of writing the two missing features are the native option, and pattern matching on UTF codepoints. You can check the tracking issue to see if any progress has been made since.

Example

Now that we know about all the features of bit arrays, we can use them. Here is an example of a basic en/decoder for Minecraft's NBT format, using bit arrays.

First, we define our type to represent the NBT data:

pub type Nbt {
  Byte(Int)
  Short(Int)
  Int(Int)
  Long(Int)
  Float(Float)
  Double(Float)
  ByteArray(List(Int))
  IntArray(List(Int))
  LongArray(List(Int))
  String(String)
  List(List(Nbt))
  Compound(Dict(String, Nbt))
}

Next, we can create a decode function to turn a bit array into NBT:

pub fn decode(bits: BitArray) -> Nbt {
  // The first byte tells us what kind of data the value is
  case bits {
    <<1, byte:8-signed, _:bits>> -> Byte(byte)
    <<2, short:16-signed, _:bits>> -> Short(short)
    <<3, int:32-signed, _:bits>> -> Int(int)
    <<4, long:64-signed, _:bits>> -> Long(long)
    <<5, float:32-float, _:bits>> -> Float(float)
    <<6, double:64-float, _:bits>> -> Double(double)

    <<8, length:32, bytes:bytes-size(length), _:bits>> -> {
      // We can't match on arbitrary UTF-8 so we must extract the bytes then
      // convert it to a string.
      let assert Ok(string) = bit_array.to_string(bytes)
      String(string)
    }

    <<7, length:32-signed, bytes:bytes-size(length), _:bits>> ->
      ByteArray(bytes_to_list(bytes, 8, []))

    <<11, length:32-signed, bytes:bytes-size(length * 4), _:bits>> ->
      IntArray(bytes_to_list(bytes, 32, []))

    <<12, length:32-signed, bytes:bytes-size(length * 8), _:bits>> ->
      LongArray(bytes_to_list(bytes, 64, []))

    // Omitted for brevity
    <<9, _:bits>> -> todo
    <<10, _:bits>> -> todo
    // For the sake of this example, we will just crash the program here
    _ -> panic as "Invalid NBT"
  }
}

The bytes_to_list splits a bit array into n-bit chunks:

fn bytes_to_list(bytes: BitArray, chunk_size: Int, out: List(Int)) -> List(Int) {
  case bytes {
    <<first:size(chunk_size)-signed, rest:bytes>> ->
      bytes_to_list(rest, chunk_size, [first, ..out])
    _ -> list.reverse(out)
  }
}

Finally, an encode function to turn the NBT back into binary:

pub fn encode(nbt: Nbt) -> BitArray {
  case nbt {
    Byte(byte) -> <<1, byte:8>>
    Short(short) -> <<2, short:16>>
    Int(int) -> <<3, int:32>>
    Long(long) -> <<4, long:64>>
    Float(float) -> <<5, float:float-32>>
    Double(double) -> <<6, double:float-64>>

    // Append the tag (7), followed by each byte in the list
    ByteArray(bytes) ->
      list.fold(bytes, <<7>>, fn(out, byte) { <<out:bits, byte:8>> })
    IntArray(ints) ->
      list.fold(ints, <<11>>, fn(out, int) { <<out:bits, int:32>> })
    LongArray(longs) ->
      list.fold(longs, <<12>>, fn(out, long) { <<out:bits, long:64>> })

    String(string) -> <<8, string:utf8>>

    // Omitted for brevity
    List(_) -> todo
    Compound(_) -> todo
  }
}