If we generate so much code using AI that no one is really looking or reading the code anymore, just verifying end functionality, we can really just skip all that and go straight to assembler, no?
Sure, we could reuse some basic building blocks like implementations of the tcp/ip protocol, http, sockets etc but server frameworks like fastapi are just human friendly abstractions over all that.
First reason, LLMs are modeled from what humans have been doing, and the have been writing software that way recently so it's easier to mimick that to get straight to results. This reason might fade away in the future.
Second reason, something related to impedance (mis)match, a signal processing notion (when the interface between two media is not well-suited, it is difficult to have a signal pass through).
Going through intermediate levels makes a structured workflow where each steps follows the previous one "cheaply". On the contrary, straight generating something many layers away requires juggling with all the levels at once, hence more costly. So "cheaply" above both means "better use of a LLM context" but also use regular tools where they are good instead of paying the high price (hardware+computation+environment) of doing it via LLM.
Interestingly, AIs are used to generate sample-level audio and some video, which may look like it contradicts the point. Still they are costly (especially video).
also you still need to maintain it. When something breaks in production you need to understand what the code does.
the real bottleneck with AI coding isnt the language its context. The model needs to understand your conventions, patterns, business logic. That gets exponentially worse with lower level languages not better.
Hell my coding is non deterministic with different degrees of quality depending on what else I have going on.
But just like a developer, an LLM can also reason over intent based on clearly named functions, modularity, etc.
[1] if someone is pulling well defined tickets off the board. They are a mid level developer regardless of title.
LLM can automate a part of the process where human might take slightly but, ultimately, any output generated by LLM cannot be trusted and should be checked by human that understands the issue...and that is actually the hard part where humans will struggle so they won't actually do it.
When human is producing the output that human is performing the following actions: -analysing the issue -analysing the exiting process -building the understanding of the existing process -building the understanding of how issue affects the existing process -producing the output to address the issue in the existing process -checking the output as it is being produced -updating the understanding of the existing process with lessons learned from the above -checking the final product to ensure that it has solved the original issue and hasn't broken some other part of the system
LLM can help speed up one of those steps (producing the output) at the expense of slowing down the other parts (which were already slow) and reducing the understanding and reliability of the existing system which will make future iterations even slower.
LLM can be used to speed up the generation of examples but just like in the past you could not just copy the example from some random internet search result, you should not just copy the LLM output without understanding it...and that is the slow part where LLM might not help (and might actually make worse) for most people.
And when in the past you encountered comprehensive and well documented output you could assume human that put that amount of effort actually understood what they were doing and wouldn't have expended that much effort to generate garbage, you cannot make that same assumption now with LLMs.
For context: for the project I’m about to describe, I did the 3 week discovery process where I iterated through the design. I designed the architecture from an empty AWS account with IAC and an empty git repo. I know every decision that was made and why.
An issue was reported while the client was testing - a duplicate message was displayed to the user.
I gave codex three pieces of information - the duplicate IDs and told it was duplicate.
Codex:
1. Created and ran a query in the Postgres database after finding the ARN to the credentials - you don’t have to pass credentials to the database in AWS, you pass the entry in Secrets Manager directly to the database as long as you have permission to both (Dev account). I didn’t tell it the database and queried where I was storing the event.
2. It found the lambda that stored the events in the database.
3. It looked at the CloudFormation template to figure out the Lambda was triggered by messages in an SQS queue
4. Looking at the same template it saw that the SQS message was described to an SNS topic
5. It found the code that sent the events - a 3000 line lambda
6. It was able to explain what the lambda did and find there wasn’t a bug in the logic
7. It saw that the flow was data driven and got the information from a DDB table defined by an environment variable.
8. It then looked at that CloudFormation template that deployed the Lambda
9. It ran a query on the DDB table after looking at that CloudFormation template to figure out the schema
It then told me that there was a duplicate entry in the database.
I knew the entire structure of the system - again I designed all of this myself. I wanted to see how codex would do.
Everything you are saying a modern LLM can do.
I won’t even go to how well it debugged a vibe coded internal website just by telling it to use Docker container with headless chromium and Playwright. It debugged it by taking screenshots while navigating and making changes.
https://x.com/davepl1968/status/2044482592620351955
Runtime also matters; you can’t run assembly on the web.
Security mechanisms can also preclude assembly.
Etc.
FWIW, your question stopped short before the bottom turtle in the stack. Below assembly is machine code. So your question could rather be, why not emit machine code. Assembly is made for humans because we can understand it, but machine code is not really tractable for humans to engage with in a meaningful way.
We could also just autogenerate the content of our websites, emails, contracts.
And we do, resulting in mountains of slop, varying from soulless to wildly incorrect.
Code is a precise way to describe intent. Using LLMs make up some of the intent results in the author not knowing what the precise functionality of the resulting code is.
The companies selling LLM services present this as magic which will magically do what the author wants it to do, without even the author themselves knowing or defining it.
In reality it is simply ignorance and lies.
Sorry we can’t wishful think good working software into existence.
Any 'public' (rate limited) web API (using CURL) from current AI inferences services?