Custom Bitwidth Integers: a case for Mojo
Mojo's ability to interface directly with the Multi-Level Intermediate Representation opens up a whole new programming paradigm with custom bitwidth integers.
Mojo
Mojo is a low-level, system programming language specifically focusing on Artificial Intelligence applications. It was invented by Chris Lattner, the creator of LLVM and the Swift programming language.
MLIR - Multi-Level Intermediate Representation
One of the most interesting aspects of Mojo is its direct interface with the Multi-Level Intermediate Representation (MLIR). Both LLVM-Intermediate Representation (LLVM-IR) and MLIR are low-level intermediate representations that kind of read like assembly but are way more human-readable.
MLIR is further sub-categorized into many different ‘dialects’, e.g. `arith` dialect for compiler operations related to arithmetic. Mojo’s `Int` type is a wrapper around the `index` dialect.
One very interesting application of this is that it enables developers to write a type that represents unsigned (it can be signed as well) integers of a custom bitwidth, e.g. `u7`, representing an unsigned integer of exactly 7 bits.
Why custom bitwidth integers?
In performance-critical applications and especially those using FPGA, one might be interested in having custom bitwidth integers to avoid wasting bits as logic gates are an exceptionally valuable resource in these. [1]
Implementation in Mojo
I’ll show an example of an implementation of `U24` in Mojo.
Why U24?
An integer that is exactly 24 bits is not within the standard library.
The Mojo’s standard library’s `sizeof` function returns the size of the type in bytes, hence, choosing 24 bits is to show that indeed the `U24` takes up exactly 24 bits = 3 bytes, else, a 7 bits integer will have `sizeof` function returning 1 byte as its result.
Constructor
@register_passable("trivial")
struct U24[N: Int]:
var value: __mlir_type.ui24
alias MAX: Int = 2**24 - 1
alias ZERO: Int = 0
alias greater_than_16777215_err_msg: StringLiteral = "Unable to parameterize a value of type `U24` with an integer value greater than 16777215"
alias lesser_than_0_err_msg: StringLiteral = "Unable to parameterize a value of type `U24` with a negative integer value"
@always_inline("nodebug")
fn __init__() -> Self:
constrained[N <= Self.MAX, Self.greater_than_16777215_err_msg]()
constrained[N >= Self.ZERO, Self.lesser_than_0_err_msg]()
return Self {
value: __mlir_op.`index.castu`[_type = __mlir_type.ui24](N.__mlir_index__())
}
@register_passable("trivial")
struct U24[N: Int]:
var value: __mlir_type.ui24
The decorator `@register_passable(“trivial”)` makes sure that the value of the type can be easily passed into the machine’s registers. This is an optimization that not even Rust does. [2]
The type `U24` is defined as a generic struct, with a generic type parameter that is bounded to the in-built `Int` type (Mojo does not have traits yet, which it will soon).
The `var value: __mlir_type.ui24` declares a field belonging to the `U24` struct with a name `value`. The magical part is that a developer can declare the type of `value` to be `__mlir_type.ui24`, representing a 24-bit unsigned integer (‘ui’) in the MLIR’s Builtin dialect [3].
alias MAX: Int = 2**24 - 1
alias ZERO: Int = 0
alias greater_than_16777215_err_msg: StringLiteral = "Unable to parameterize a value of type `U24` with an integer value greater than 16777215"
alias lesser_than_0_err_msg: StringLiteral = "Unable to parameterize a value of type `U24` with a negative integer value"
The next few lines declare compile time constants using the keyword `alias`. `alias` is similar to C++’s `constexpr` and `consteval`, TypeScript’s `infer` and Rust’s `const` keywords. It instructs the Mojo compiler to compute the value of the expression at compile time.
@always_inline("nodebug")
fn __init__() -> Self:
constrained[N <= Self.MAX, Self.greater_than_16777215_err_msg]()
constrained[N >= Self.ZERO, Self.lesser_than_0_err_msg]()
return Self {
value: __mlir_op.`index.castu`[_type = __mlir_type.ui24](N.__mlir_index__())
}
The constructor `fn __init__() → Self: …` is fairly easy to understand. First, we always inline the function using the decorator `@always_inline`, then within the function body, we use the function `constrained` to check for the bounds. Mojo has two ‘spaces’, the ‘parameter’ space (which operates during compile time and the ‘value’ space (which operates during run time). Mojo functions can do computations in both spaces, the `[]` accepts arguments for the ‘parameter’ space, and `()` accepts arguments for the ‘value’ space [4]. Here, we check if the type argument `N` passed in is larger than `2 ** 24 - 1` i.e. the largest integer value that an unsigned 24-bit integer can represent, if it is indeed larger, we raise a COMPILE TIME error.
This is pretty cool, that we can seemingly be doing compile-time metaprogramming with extremely simple and Pythonic syntax and with great functionality like creating our own custom bitwidth integers.
We might then use the struct `U24` like so:
print("Size of `sizeof[U24]`: ", sizeof[U24[0]]()) # prints '3'
alias c = U24[600]()
print("c = ", c.__into_int__()) # Should print "c = 600"
If you attempt to construct a value that is larger than what U24 can represent, a compile-time error is raised, like so:
# Compile time error: `constraint failed: Unable to parameterize a value of type `U24` with an integer value greater than 16777215`# alias e = U24[U24[0].MAX + 1]()
Adding `U24`s
What’s the point of having numbers if we cannot add them?
@always_inline("nodebug")
fn __add__[M: Int](self: Self, other: U24[M]) -> U24[Self.N + M]:
return U24[Self.N + M]()
Above is a code snippet showing the implementation of the dunder (short for double underscore) `__add__` method. You can think of dunder methods as interface methods to change the runtime behaviour of types, e.g. by implementing `__add__`, we are telling the Mojo compiler, how to add two `U24`s.
For this implementation, it is pretty straightforward.
First, the function is generic over the type parameter `M`, which itself is bounded to the `Int` type. The function value parameters are self-explanatory. The return type `U24[Self.N + M]` is what is interesting. We can directly add generic type parameters at compile time. This feature known as compile-time evaluation of const generics is still not present in Rust.
Second, the returned value of the function is simply the concrete type, i.e. U24[Self.N + M], initialized.
Conclusion
Mojo is a language that aims to be almost everything. It is simple enough to understand compile-time metaprogramming, it is a compiled language yet is runnable like Python with a REPL, it also aims to make it memory-safe with future lifetime annotations and allows for both dynamically typed and statically typed codes to run in the same program.
Yet, I am totally mind-blown by Mojo’s ability to allow average developers like myself to make custom bitwidth integer types so easily. To do the same in languages like Rust [5] and C++ is virtually impossible and to do this in newer languages like Zig takes way more than what it took in Mojo [6].
I am definitely looking forward to more developments in Mojo and will keep a very close eye on.
Codes
[Github]: https://github.com/jymchng/varintjo
References
[1]: https://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html
[2]: https://docs.modular.com/mojo/programming-manual.html#immutable-arguments-borrowed
[3]: https://mlir.llvm.org/docs/Dialects/Builtin/#integerattr
[4]: https://docs.modular.com/mojo/programming-manual.html#parameterization-compile-time-metaprogramming
[5]: https://users.rust-lang.org/t/why-cant-i-declare-a-variable-that-uses-custom-amount-of-bits/27450
[6]: https://ziglang.org/documentation/master/