Writing An Optimizing Tensor Compiler From Scratch

January 30, 2026

Not everyone will write their own optimizing compiler from scratch, but those who do sometimes roll into it during the course of ever-growing project scope creep. People like [Michael Moroz], who wrote up a long and detailed article on the why and how. Specifically, a ‘small library’ involving a few matrix operations for a Unity-based project turned into a static optimizing tensor compiler, called TensorFrost, with a Python front-end and a shader-like syntax, all of which is available on GitHub.

The Python-based front-end implements low-level NumPy-like operations, with development still ongoing. As for why Yet Another Tensor Library had be developed, the reasons were that most of existing libraries are heavily focused on machine learning tasks and scale poorly otherwise, dynamic control flow is hard to implement, and the requirement of writing custom kernels in e.g. CUDA.

Above all [Michael] wanted to use a high-level language instead of pure shader code, and have something that can output graphical data in real-time. Taking the gamble, and leaning on LLVM for some parts, there is now a functional implementation, albeit with still a lot of work ahead.

2 thoughts on “Writing An Optimizing Tensor Compiler From Scratch”

helge says:

January 31, 2026 at 2:27 am

In a time where we’re threatened with statements like “in the future, 100% of your game pixels will be generated” and all effort will be poured into silicon with large NPUs instead, my a-priori questions are:
– can we run shader code on NPUs (assuming they’re not limited to INT8 ops)?
– what good are NPUs for general purpose computing?

Since the headline here doesn’t say “Nice NPU You Got There, Would Be A Shame If Someone Turned It Back Into A Good Old Graphics Card”, I take it the matter is a lot more complex, and after a quick glance it seems like machine learning libraries canonically used to utilize NPUs don’t provide the means to even implement control flow or graphics interaction:

“control flow can be very prevalent, but unfortunately it’s quite inconvenient to express in these libraries, if even possible”

“The lack of a native way to output graphical data from these libraries is even more annoying when you remember that GPU’s are called Graphics Processing Units, not Tensor Processing Units. And they have all the required hardware to work with and output graphics.
PS. Taichi actually does have a way to this! It has integration with GLFW and ImGUI.”

(also: if this tensor compiler is what I think it is, it’s definitely a hack!)

Report comment

Reply
tamusjroyce says:

January 31, 2026 at 7:07 am

I wonder how this ties into Taichi. Been using its C++ only interface. But it has a well developed tensor compiler for Cuda, OpenCL with plugin, Metal, Vulkin, and CPU

Report comment

Reply

Hackaday

Writing An Optimizing Tensor Compiler From Scratch

2 thoughts on “Writing An Optimizing Tensor Compiler From Scratch”

Leave a Reply to tamusjroyceCancel reply

Search

Never miss a hack

If you missed it

Building Natural Seawalls To Fight Off The Rising Tide

Ask Hackaday: How Do You Digitize Your Documents?

The Amazing Maser

Zombie Netscape Won’t Die

Does Carbon Fiber PLA Make Sense?

Our Columns

Secret Ingredients

Hackaday Podcast Episode 355: Person Detectors, Walkie Talkies, Open Smartphones, And A WiFi Traffic Light

Did We Overestimate The Potential Harm From Microplastics?

FLOSS Weekly Episode 862: Have Your CAKE And Eat It Too

The Fancy Payment Cards Of Taiwan

2 thoughts on “Writing An Optimizing Tensor Compiler From Scratch”

Leave a Reply to tamusjroyceCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns