Hey HN, I'm Orazio. I built microcrad (with a 'c'), a tiny scalar-valued automatic differentiation engine, with a small multi-layer perceptron implementation on top. It's reimplementation of Andrej Karpathy's micrograd in C. For me, this was a learning project to revisit backpropagation from first principles, with the additional difficulties that come with programming in C.
The basic idea is the same as micrograd: each number is a `Value` node in a computation graph, ops connect nodes, and the `backward` function topologically sorts the graph before applying the chain rule in reverse. The C-specific parts are memory management and two simple data structures I needed to implement backprop: sets and vectors.
The source code is about 1,350 lines, MIT licensed, and well documented. Dependencies are just the standard library and libm. In addition, the repo contains two examples to showcase how the engine works: a toy regression and an MNIST task.
What this is not: a framework to build and train neural networks in production. Being scalar-valued makes it slow, and it wasn't built for numerical robustness or large datasets. There's no commercial aim here; it's a learning project.
If you read through it, I'd like to hear thoughts, both on the ML engineering aspect and on anything that reads as un-idiomatic C.
Two things stick out as un-idiomatic for C. First, the casts before malloc are unnecessary. This you do in C++ but not in C. Second, names with beginning underscore are reserved, and the underscore + capital letter is specifically problematic.
The rest looks fairly nice but there are a couple of things I would do differently: I would not have the tests for NULL, but use signed integers for indices and dimensions, use a flexible array member to integrate the data into the vector type directly, and omit the capacity field (as long as benchmarking does not show it is really needed). I would also use variably modified types for bounds checking, and with C23 the include guards become largely unnecessary.
I re-read the second part with more calm once I got back home. There were at least a couple of things I wasn't familiar with (and therefore didn't even consider), specifically FAM and variably modified types. Thanks for the pointer, I'm realizing that having written this post and reading the comments is teaching me more about C than coding the project itself, that's crazy!
I can answer about the include guards, though. I consciously added them for portability, following the same general approach that led me to handle the big-endian to little-endian conversion explicitly in examples/mnist/idx.c: even if that safeguard is not strictly necessary on most modern systems, I love the idea that this project is potentially buildable and runnable in most environments.
All identifiers that begin with an underscore are reserved for use as identifiers with file scope in both the ordinary and tag name spaces.
Single underscore followed by non-uppercase is allowed, but not in file scope. This means that you can use them in structs and as local variables, but never as globals.
You're right, and I guess I've been breaking that rule for a while. What's the purpose there? The double-underbar and underbar-capital rules seem to be allowing for non-conflicting introduction of keywords. Is the single-underbar rule to protect standard library headers or something?
[deprecated post-edit] I guess I used function names beginning with underscore as it didn’t occur to me that it might be un-idiomatic. The intention was to make clear to myself that those functions are private and meant to be only used only in that file. [\deprecated post-edit]
About the second paragraph, first of all, thank you for the suggestions. Can I ask you to elaborate a little on the reasons for your proposals? For instance, even if redundant in some cases, I thought to myself it couldn’t be a bad thing to check for null pointers (though I could improve the error handling itself).
In C, you would typically rely much more on tooling to find bugs (but there are different styles and opinions). Checking for null is not bad, but does not usually add anything. If you de-reference a null pointer, you get a segmentation fault (which is safe) and a debugger will give a nice backtrace. So why catch this by writing additional code if the right tool will give you this automatically? A sanitizer could also add such tests automatically.
For a similar reason, it makes sense to use signed integers. A signed overflow sanitizer will find the overflow bugs or safely terminate the program. Finding unsigned wraparound bugs is much harder.
I see, I agree that especially checking for null really comes to styles and opinions, I still don't have one I can call mine. Thanks for the explanation!
Interesting project. Do you think manual memory management help understand computational graph lifecycle better, or does it distract from backprop itself?
btw, I went down the micrograd path with numpy-primitives all the way to building a PyTorch clone that can pre-train and post-train LLMs (https://github.com/workofart/ml-by-hand). My learning focus was on the math/calculus <-> high-level APIs, instead of efficiency. I'm glad to see more people tackling this problem from different angles.
ngl, it distracts from backprop itself a little, but teaches a lot about memory management. I did it this way because in parallel I wanted to get better at C, but if your aim is to purely work on ML fundamentals, it’s probably better to do it in python
I guess I interpreted this part of their README as implying that the author found RC too fragile
> Reference counting buys correctness and composability, but at a cost.
> Disadvantage #1: you must balance every reference. Each value_create, value_retain, and each operation's implicit retain of its operands has to be matched by a value_release. Forget one and you leak; do one too many and you free memory that is still in use. The training examples in examples/ are verbose precisely because they are scrupulous about this in their error paths; that verbosity is the price of leak-free C.
I’d put it this way: though the idea of ref counting felt very natural at the beginning for my use case, at some point I realized there were probably better techniques to achieve the same goal. I found myself multiple times writing nonsense like:
sum = value_add(v1, v2)
mul = value_mul(sum, v3)
[...]
value_release(sum)
value_release(mul)
so that later I could release the sum. When you only have one intermediate value it’s still acceptable, but at 3-4 it starts getting cumbersome.
After asking for feedback, someone rightfully pointed out that the better and faster approach for an autograd engine is using an arena allocator. My reason for saying “rightfully” is that arenas are ideal when you have many “objects” that have the same lifespan, such as the values involved in the forward/backward pass. RC is better when you have a lot of “objects” with independent lifespans.
I did a similar project, but my approach to the topology definition was declaring perceptron structs with inputs as pointer arrays and output as a regular variable. With this scheme, perceptrons can reference directly to the outputs from other perceptrons — or even their own output (I haven't implemented that yet).
The basic idea is the same as micrograd: each number is a `Value` node in a computation graph, ops connect nodes, and the `backward` function topologically sorts the graph before applying the chain rule in reverse. The C-specific parts are memory management and two simple data structures I needed to implement backprop: sets and vectors.
The source code is about 1,350 lines, MIT licensed, and well documented. Dependencies are just the standard library and libm. In addition, the repo contains two examples to showcase how the engine works: a toy regression and an MNIST task.
What this is not: a framework to build and train neural networks in production. Being scalar-valued makes it slow, and it wasn't built for numerical robustness or large datasets. There's no commercial aim here; it's a learning project.
If you read through it, I'd like to hear thoughts, both on the ML engineering aspect and on anything that reads as un-idiomatic C.
The rest looks fairly nice but there are a couple of things I would do differently: I would not have the tests for NULL, but use signed integers for indices and dimensions, use a flexible array member to integrate the data into the vector type directly, and omit the capacity field (as long as benchmarking does not show it is really needed). I would also use variably modified types for bounds checking, and with C23 the include guards become largely unnecessary.
(edit: minor edit for clarity)
I can answer about the include guards, though. I consciously added them for portability, following the same general approach that led me to handle the big-endian to little-endian conversion explicitly in examples/mnist/idx.c: even if that safeguard is not strictly necessary on most modern systems, I love the idea that this project is potentially buildable and runnable in most environments.
About the second paragraph, first of all, thank you for the suggestions. Can I ask you to elaborate a little on the reasons for your proposals? For instance, even if redundant in some cases, I thought to myself it couldn’t be a bad thing to check for null pointers (though I could improve the error handling itself).
For a similar reason, it makes sense to use signed integers. A signed overflow sanitizer will find the overflow bugs or safely terminate the program. Finding unsigned wraparound bugs is much harder.
btw, I went down the micrograd path with numpy-primitives all the way to building a PyTorch clone that can pre-train and post-train LLMs (https://github.com/workofart/ml-by-hand). My learning focus was on the math/calculus <-> high-level APIs, instead of efficiency. I'm glad to see more people tackling this problem from different angles.
> Reference counting buys correctness and composability, but at a cost.
> Disadvantage #1: you must balance every reference. Each value_create, value_retain, and each operation's implicit retain of its operands has to be matched by a value_release. Forget one and you leak; do one too many and you free memory that is still in use. The training examples in examples/ are verbose precisely because they are scrupulous about this in their error paths; that verbosity is the price of leak-free C.
I’d put it this way: though the idea of ref counting felt very natural at the beginning for my use case, at some point I realized there were probably better techniques to achieve the same goal. I found myself multiple times writing nonsense like:
sum = value_add(v1, v2) mul = value_mul(sum, v3) [...] value_release(sum) value_release(mul)
so that later I could release the sum. When you only have one intermediate value it’s still acceptable, but at 3-4 it starts getting cumbersome.
After asking for feedback, someone rightfully pointed out that the better and faster approach for an autograd engine is using an arena allocator. My reason for saying “rightfully” is that arenas are ideal when you have many “objects” that have the same lifespan, such as the values involved in the forward/backward pass. RC is better when you have a lot of “objects” with independent lifespans.