I had a similar surprise about how approachable PL is, but from going from 'the bottom up' instead from a normal language.
I wrote a compiler toolchain and debugger that takes a Turing machine description plus input string and emits an encoded tape runnable by a Universal Turing Machine [0]. I had some prior PL experience, but never did an end-to-end compiler pipeline, at least not this low level.
It started as a joke/experiment, but I couldn't believe how fast it pulled me into designing:
- a small low-level ASM for building the UTM
- an ABI for symbol widths and encoding grammar
- an interpreter used as the behavioral oracle
- raw TM transitions for each ASM instruction, generated by having an LLM iterate on candidate emissions and checked against the interpreter oracle
- a CFG-style IR to fix the LLM mess once direct ASM -> TM emission became too hard to keep sane (LLM did a decent job actually, I don't think I would have done a much better job without the IR either)
- a gdb-style debugger for raw transitions, ASM routines, and blocks
- a trace visualizer
- a bootstrapping experiment where an L1 UTM/input pair was itself run through an L2 UTM
- optimisation experiments
And every step came quite naturally and was easy to tie in with everything else. Each one was just the next local repair needed to make the previous layer tractable.
this project is pretty interesting, although i'm wondering how they're planning to address the "easy sandboxing" design goal in a compiled language with raw pointer arithmetic and clib interop... in that regard i think lua would have been a lot easier to sandbox, despite the author's concerns.
(also, they might want to look into lua userdata, since that would address their concern about the overhead of converting between native and lua data structures. the language is designed to be embedded in C programs after all)
Making you own language is easy. Creating the library that will actually solve problems without forcing the developers to reinvent the wheel is the crux. There is a reason why C++ / Java / JavaScript etc are established, it's the already proven libraries around those languages that allows them to be so successful.
I have only read the first end of the article but I can't help but think that a project like libriscv[0] would've/could've worked for their game project too because fun fact but the creator of librsicv, the legendary fwsgonzo is also making a game. I highly recommend for people to check out their discord server.
But my main point is that libriscv is one of the fastest libriscv emulators and then something like C/C++/lua could've been used with sandboxing purposes for the purposes of the game then.
Am I missing something? Although, making a programming language is one kind of its own projects and that's really cool as well :-D
but I would also love to hear the author's opinion on libriscv as it feels like it ticks of all the boxes from my understanding
I wrote a compiler toolchain and debugger that takes a Turing machine description plus input string and emits an encoded tape runnable by a Universal Turing Machine [0]. I had some prior PL experience, but never did an end-to-end compiler pipeline, at least not this low level.
It started as a joke/experiment, but I couldn't believe how fast it pulled me into designing:
- a small low-level ASM for building the UTM
- an ABI for symbol widths and encoding grammar
- an interpreter used as the behavioral oracle
- raw TM transitions for each ASM instruction, generated by having an LLM iterate on candidate emissions and checked against the interpreter oracle
- a CFG-style IR to fix the LLM mess once direct ASM -> TM emission became too hard to keep sane (LLM did a decent job actually, I don't think I would have done a much better job without the IR either)
- a gdb-style debugger for raw transitions, ASM routines, and blocks
- a trace visualizer
- a bootstrapping experiment where an L1 UTM/input pair was itself run through an L2 UTM
- optimisation experiments
And every step came quite naturally and was easy to tie in with everything else. Each one was just the next local repair needed to make the previous layer tractable.
[0] Repo: https://github.com/ouatu-ro/mtm
(also, they might want to look into lua userdata, since that would address their concern about the overhead of converting between native and lua data structures. the language is designed to be embedded in C programs after all)
But my main point is that libriscv is one of the fastest libriscv emulators and then something like C/C++/lua could've been used with sandboxing purposes for the purposes of the game then.
Am I missing something? Although, making a programming language is one kind of its own projects and that's really cool as well :-D
but I would also love to hear the author's opinion on libriscv as it feels like it ticks of all the boxes from my understanding
[0]: https://github.com/libriscv/libriscv
Roughly 100%.