# CIP-0123

## Abstract

We describe the semantics of a set of bitwise operations for Plutus
`BuiltinByteString`

s. Specifically, we provide descriptions for:

- Bit shifts and rotations
- Counting the number of set bits (
`popcount`

) - Finding the first set bit

We base our work on similar operations described in CIP-58, but use the bit indexing scheme from the logical operations cip for the semantics. This is intended as follow-on work from both of these.

## Motivation: why is this CIP necessary?

Bitwise operations, both over fixed-width and variable-width blocks of bits,
have a range of uses. Indeed, we have already proposed CIP-122,
with some example cases, and a range of primitive operations on
`BuiltinByteString`

s designed to allow bitwise operations in the service of
those example cases, as well as many others. These operations form a core of
functionality, which is important and necessary, but not complete. We believe
that the operations we describe in this CIP form a useful 'completion' of the
work in CIP-122, based on similar work done in the earlier CIP-58.

To demonstrate why our proposed operations are useful, we re-use the cases provided in the CIP-122, and show why the operations we describe would be beneficial.

### Case 1: integer set

For integer sets, the previous description lacks two important, and useful, operations:

- Given an integer set, return its cardinality; and
- Given an integer set, return its minimal member (or specify it is empty).

These operations have a range of uses. The first corresponds to the notion of Hamming weight, which can be used for operations ranging from representing boards in chess games to exponentiation by squaring to succinct data structures. Together with bitwise XOR, it can also compute the Hamming distance. The second operation also has a range of uses, ranging from succinct priority queues to integer normalization. It is also useful for rank-select dictionaries, a succinct structure that can act as the basis of a range of others, such as dictionaries, multisets and trees of different arity.

In all of the above, these operations need to be implemented efficiently to be
useful. While we could use only bit reading to perform all of these, it is
extremely inefficient: given an input of length $n$, assuming that any bit
distribution is equally probable, we need $Theta(8 cdot n)$ time in the
average case. While it is
impossible to do both of these operations in sub-linear time in general, the
large constant factors this imposes (as well as the overhead of looping over
*bit* indexes) is a cost we can ill afford on-chain, especially if the goal is
to use these operations as 'building blocks' for something like a data
structure.

### Case 2: hashing

In our previously-described case, we stated what operations we would need for the Argon2 family of hashes specifically. However, Argon2 has a specific advantage in that the number of operations it requires are both relatively few, and the most complex of which (BLAKE2b hashing) already exists in Plutus Core as a primitive. However, other hash functions (and indeed, many other cryptographic primitives) rely on two other important instructions: bit shifts and bit rotations. As an example, consider SHA512, which is an important component in several cryptographic protocols (including Ed25519 signature verification): its implementation requires both shifts and rotations to work.

Like with Case 1, we can theoretically simulate both rotations and shifts using
a combination of bit reads and bit writes to an empty `BuiltinByteString`

.
However, the cost of this is extreme: we would need to produce a list of
index-value pairs of length equal to the Hamming weight of the input, only to
then immediately discard it! To put this into some perspective, for an 8-byte
input, performing a rotation involves allocating an expected 32 index-value
pairs, using *significantly* more memory than the result. On-chain, we can't
really afford this cost, especially in an operation intended to be used as part
of larger constructions (as would be necessary here).

## Specification

We describe the proposed operations in several stages. First, we give an overview of the proposed operations' signatures and costings; second, we describe the semantics of each proposed operation in detail, as well as some examples. Lastly, we provide laws that any implementation of the proposed operations should obey.

Throughout, we make use of the bit indexing scheme described in a CIP-122. We also re-use the notation $x[i]$ to refer to the value of at bit index $i$ of $x$, and the notation $x{i}$ to refer to the byte at byte index $i$ of $x$: both are specified in CIP-122.

### Operation semantics

Our proposed operations will have the following signatures:

`bitwiseShift :: BuiltinByteString -> BuiltinInteger -> BuiltinByteString`

`bitwiseRotate :: BuiltinByteString -> BuiltinInteger -> BuiltinByteString`

`countSetBits :: BuiltinByteString -> BuiltinInteger`

`findFirstSetBit :: BuiltinByteString -> BuiltinInteger`

We assume the following costing, for both memory and execution time:

Operation | Execution time cost | Memory cost |
---|---|---|

`bitwiseShift` | Linear in the `BuiltinByteString` argument | As execution time |

`bitwiseRotate` | Linear in the `BuiltinByteString` argument | As execution time |

`countSetBits` | Linear in the argument | Constant |

`findFirstSetBit` | Linear in the argument | Constant |

`bitwiseShift`

`bitwiseShift`

takes two arguments; we name and describe them below.

- The
`BuiltinByteString`

to be shifted. This is the*data argument*. - The shift, whose sign indicates direction and whose magnitude indicates the
size of the shift. This is the
*shift argument*, and has type`BuiltinInteger`

.

Let $b$ refer to the data argument, whose length in bytes is $n$, and let $i$
refer to the shift argument. Let the result of `bitwiseShift`

called with $b$
and $i$ be $b_r$, also of length $n$.

For all $j in 0, 1, ldots 8 cdot n - 1$, we have

$$ b_r[j] = begin{cases} b[j - i] & text{if } j - i in 0, 1, ldots 8 cdot n - 1 0 & text{otherwise} end{cases} $$

Some examples of the intended behaviour of `bitwiseShift`

follow. For
brevity, we write `BuiltinByteString`

literals as lists of hexadecimal values.

`-- Shifting the empty bytestring does nothing`

bitwiseShift [] 3 => []

-- Regardless of direction

bitwiseShift [] (-3) => []

-- Positive shifts move bits to higher indexes, cutting off high indexes and

-- filling low ones with zeroes

bitwiseShift [0xEB, 0xFC] 5 => [0x7F, 0x80]

-- Negative shifts move bits to lower indexes, cutting off low indexes and

-- filling high ones with zeroes

bitwiseShift [0xEB, 0xFC] (-5) => [0x07, 0x5F]

-- Shifting by the total number of bits or more clears all bytes

bitwiseShift [0xEB, 0xFC] 16 => [0x00, 0x00]

-- Regardless of direction

bitwiseShift [0xEB, 0xFC] (-16) => [0x00, 0x00]

`bitwiseRotate`

`bitwiseRotate`

takes two arguments; we name and describe them below.

- The
`BuiltinByteString`

to be rotated. This is the*data argument*. - The rotation, whose sign indicates direction and whose magnitude indicates
the size of the rotation. This is the
*rotation argument*, and has type`BuiltinInteger`

.

Let $b$ refer to the data argument, whose length in bytes is $n$, and let $i$
refer to the rotation argument. Let the result of `bitwiseRotate`

called with $b$
and $i$ be $b_r$, also of length $n$.

For all $j in 0, 1, ldots 8 cdot n - 1$, we have $b_r = b[(j - i) text{ } mathrm{mod} text { } (8 cdot n)]$.

Some examples of the intended behaviour of `bitwiseRotate`

follow. For
brevity, we write `BuiltinByteString`

literals as lists of hexadecimal values.

`-- Rotating the empty bytestring does nothing`

bitwiseRotate [] 3 => []

-- Regardless of direction

bitwiseRotate [] (-1) => []

-- Positive rotations move bits to higher indexes, 'wrapping around' for high

-- indexes into low indexes

bitwiseRotate [0xEB, 0xFC] 5 => [0x7F, 0x9D]

-- Negative rotations move bits to lower indexes, 'wrapping around' for low

-- indexes into high indexes

bitwiseRotate [0xEB, 0xFC] (-5) => [0xE7, 0x5F]

-- Rotation by the total number of bits does nothing

bitwiseRotate [0xEB, 0xFC] 16 => [0xEB, 0xFC]

-- Regardless of direction

bitwiseRotate [0xEB, 0xFC] (-16) => [0xEB, 0xFC]

-- Rotation by more than the total number of bits is the same as the remainder

-- after division by number of bits

bitwiseRotate [0xEB, 0xFC] 21 =>[0x7F, 0x9D]

-- Regardless of direction, preserving sign

bitwiseRotate [0xEB, 0xFC] (-21) => [0xE7, 0x5F]

`countSetBits`

Let $b$ refer to `countSetBits`

' only argument, whose length in bytes is $n$,
and let $r$ be the result of calling `countSetBits`

on $b$. Then we have

$$ r = sum_{i=0}^{8 cdot n - 1} b[i] $$

Some examples of the intended behaviour of `countSetBits`

follow. For
brevity, we write `BuiltinByteString`

literals as lists of hexadecimal values.

`-- The empty bytestring has no set bits`

countSetBits [] => 0

-- Bytestrings with only zero bytes have no set bits

countSetBits [0x00, 0x00] => 0

-- Set bits are counted regardless of where they are

countSetBits [0x01, 0x00] => 1

countSetBits [0x00, 0x01] => 1

`findFirstSetBit`

Let $b$ refer to `findFirstSetBit`

's only argument, whose length in bytes is $n$,
and let $r$ be the result of calling `findFirstSetBit`

on $b$. Then we have the
following:

- $r in -1, 0, 1, ldots, 8 cdot n - 1$
- If for all $i in 0, 1, ldots n - 1$, $b{i} = texttt{0x00}$, then $r = -1$; otherwise, $r > -1$.
- If $r > -1$, then $b[r] = 1$, and for all $i in 0, 1, ldots, r - 1$, $b[i] = 0$.

Some examples of the intended behaviour of `findFirstSetBit`

follow. For
brevity, we write `BuiltinByteString`

literals as lists of hexadecimal values.

`-- The empty bytestring has no first set bit`

findFirstSetBit [] => -1

-- Bytestrings with only zero bytes have no first set bit

findFirstSetBit [0x00, 0x00] => -1

-- Only the first set bit matters, regardless what comes after it

findFirstSetBit [0x00, 0x02] => 1

findFirstSetBit [0xFF, 0xF2] => 1

### Laws

Throughout, we use `bitLen bs`

to indicate the number of bits in `bs`

; that is,
`sizeOfByteString bs * 8`

. We also make reference to logical
operations from a previous CIP as part of specifying these laws.

#### Shifts and rotations

We describe the laws for `bitwiseShift`

and `bitwiseRotate`

together, as they
are similar. Firstly, we observe that `bitwiseShift`

and `bitwiseRotate`

both
form a monoid homomorphism between natural number
addition and function composition:

`bitwiseShift bs 0 = bitwiseRotate bs 0 = bs`

bitwiseShift bs (i + j) = bitwiseShift (bitwiseShift bs i) j

bitwiseRotate bs (i + j) = bitwiseRotate (bitwiseRotate bs i) j

However, `bitwiseRotate`

's homomorphism is between *integer* addition and
function composition: namely, `i`

and `j`

in the above law are allowed to have
different signs. `bitwiseShift`

's composition law only holds if `i`

and `j`

don't have opposite signs: that is, if they're either both non-negative or both
non-positive.

Shifts by more than the number of bits in the data argument produce an empty
`BuiltinByteString`

:

`-- n is non-negative`

bitwiseShift bs (bitLen bs + n) =

bitwiseShift bs (- (bitLen bs + n)) =

replicateByteString (sizeOfByteString bs) 0x00

Rotations, on the other hand, exhibit 'modular roll-over':

`-- n is non-negative`

bitwiseRotate bs (binLen bs + n) = bitwiseRotate bs n

bitwiseRotate bs (- (bitLen bs + n)) = bitwiseRotate bs (- n)

Shifts clear bits at low indexes if the shift argument is positive, and at high indexes if the shift argument is negative:

`-- 0 y) = countSetBits x + countSetBits y`

`countSetBits`

also demonstrates that `bitwiseRotate`

indeed preserves the
number of set (and thus clear) bits:

`countSetBits bs = countSetBits (bitwiseRotate bs i)`

There is also a relationship between the result of `countSetBits`

on a given
argument and its complement:

`countSetBits bs = bitLen bs - countSetBits (bitwiseLogicalComplement bs)`

Furthermore, `countSetBits`

exhibits (or more precisely, gives evidence for) the
inclusion-exclusion principle from combinatorics, but only
under truncation semantics:

`countSetBits (bitwiseLogicalXor False x y) = countSetBits (bitwiseLogicalOr`

False x y) - countSetBits (bitwiseLogicalAnd False x y)

Lastly, `countSetBits`

has a relationship to bitwise XOR, regardless
of semantics:

`countSetBits (bitwiseLogicalXor semantics x x) = 0`

`findFirstSetBit`

`BuiltinByteString`

s where every byte is the same (provided they are not empty)
have the same first bit as a singleton of that same byte:

`-- 0 = 1`

findFirstSetBit (replicateByteString n w8) =

findFirstSetBit (replicateByteString 1 w8)

Additionally, `findFirstSet`

has a relationship to bitwise XOR, regardless of
semantics:

`findFirstSetBit (bitwiseLogicalXor semantics x x) = -1`

Any result of a `findFirstSetBit`

operation that isn't `-1`

gives a valid bit
index to a set bit, but any non-negative `BuiltinInteger`

less than this will
give an index to a clear bit:

`-- bs is not all zero bytes or empty`

readBit bs (findFirstSetBit bs) = True

-- 0 = bitLen

then weMissed

else validIndex

This requires us to do considerably more work (finding the length of the
argument, multiplying by 8, then compare against that result), and is also much
more prone to error: users have to remember to use a `>=`

comparison, as well as
to multiply the argument length by 8. This is less of an issue with
implementations of this operation in other languages, as their equivalent
operations are designed for fixed-width arguments (indeed, `FiniteBits`

*requires* this), which makes their bit length a constant. In our case, this
isn't as simple, as `BuiltinByteString`

s have variable length, which would make
the cost described above unavoidable. Our solution is both more efficient and
less error-prone: all a user needs to remember is that invalid indexes from
`findFirstSetBit`

are negative. On this basis, we decided to vary from
conventional approaches.

One notable omission from our operators is the equivalent of counting *leading*
zeroes: namely, an operation that would find the *last* set bit. Typically, both
a count of leading *and* trailing zeroes is provided in both hardware and
software: this is the case for `FiniteBits`

, as well as most hardware
implementations. To relate this to our cases, specifically Case 1, this
would allow us to efficiently find the *largest* element in an integer set. The
reason we omit this operation is because, when compared with counting trailing
zeroes, it is far less useful: while it can be used for computing fast integer
square roots, it lacks many other uses. Counting trailing zeroes, on the other
hand, is essential for rank-select dictionaries (specifically for the `select`

operation), and also enables a range of other uses, which we have already
mentioned. In order to limit the number of new primitives, we decided that
counting leading zeroes can be omitted for now. However, our design doesn't
preclude such an operation from being added later if a use case for it is found
to be useful.

## Path to Active

### Acceptance Criteria

We consider the following criteria to be essential for acceptance:

- A proof-of-concept implementation of the operations specified in this document, outside of the Plutus source tree. The implementation must be in GHC Haskell, without relying on the FFI.
- The proof-of-concept implementation must have tests, demonstrating that it behaves as the specification requires.
- The proof-of-concept implementation must demonstrate that it will successfully build, and pass its tests, using all GHC versions currently usable to build Plutus (8.10, 9.2 and 9.6 at the time of writing), across all Tier 1 platforms.

Ideally, the implementation should also demonstrate its performance characteristics by well-designed benchmarks.

### Implementation Plan

MLabs has begun the implementation of the proof-of-concept as required in the acceptance criteria. Upon completion, we will send a pull request to Plutus with the implementation of the primitives for Plutus Core, mirroring the proof-of-concept.

## Copyright

This CIP is licensed under Apache-2.0.

## CIP Information

This null ./CIP-0123 created on **2024-05-16** has the status: Proposed.

This page was generated automatically from: cardano-foundation/CIPs.