- spin loop engine, could properly reset work available before calling the work function, and avoid yielding if new work was added in-between. I don't see how you avoid reentrancy issues as-is.
- lockfree queue, the buffer should store storage for Ts, not Ts. As it is, looks not only UB, but broken for any non-trivial type.
- metrics, the system seems weakly consistent, that's not ideal. You could use seqlocks or similar techniques.
- websocket, lacking error handling, or handling for slow or unreliable consumers. That could make your whole application unreliable as you buffer indefinitely.
- order books; first, using double for price everywhere, problematic for many applications, and causing unnecessary overhead on the decoding path. Then the data structure doesn't handle very sparse and deep books nor significant drift during the day. Richness of the data is also fairly low but what you need is strategy-dependent. Having to sort on query is also quite inefficient when you could just structure your levels in order to begin with, typically with a circular buffer kind of structure (as the same prices will frequently oscillate between bid and ask sides, you just need to track where bid/ask start/end).
- strategy, the system doesn't seem particularly suited for multi-level tick-aware microstructure strategies. I get more of a MFT vibe from this.
- simulation, you're using a probabilistic model for fill rate with market impact and the like. In HFT I think precise matching engine simulation is more common, but I guess this is again more of a MFT tangent. Could be nice to layer the two.
- risk checks, some of those seem unnecessary on the hot path, since you can just lower the position or pnl limits to order size limits.
That’s a fair point, and I agree on wire-to-wire (SOF-in → SOF-out) hardware timestamps being the correct benchmark for HFT.
The current numbers are software-level TSC samples (full frame available → TX start) and were intended to isolate the software critical path, not to claim true market-to-market latency.
I’m actively working on mitigating the remaining sources of latency (ingress handling, batching boundaries, and NIC interaction), and feedback like this is genuinely helpful in prioritizing the next steps. Hardware timestamping is already on the roadmap so both internal and wire-level latencies can be reported side-by-side.
Appreciate you calling this out — guidance from people who’ve measured this properly is exactly what I’m looking for.
Thank you for taking the time to look through the repository. To all those who are calling it to be generated by AI. Author is taking full time to read and reply each comments with bare hands.
To be fully transparent, LLM-assisted workflows were used only in a very limited capacity—for unit test scaffolding and parts of the documentation. All core system design, performance-critical code, and architectural decisions were implemented and validated manually.
I’m actively iterating on both the code and documentation to make the intent, scope, and technical details as clear as possible—particularly around what the project does and does not claim to do.
For additional context, you can review my related research work (currently under peer review):
what do you think you will get out of this? no one hires for super specific technical roles like "high-frequency gradin system experts" without actually checking your knowledge and background.
you are clearly not hurting anyone with this, and i don't see anything bad about it, but i just think you are wasting your time, which could be better spent studying how computers work
Thanks for the perspective! The goal isn’t to get hired immediately for a super-specific role—it’s more about learning and experimenting with ultra-low-latency systems. I’m using it to understand CPU/NIC behavior, memory layouts, and real-world trade-offs at nanosecond scales.
Even if it’s niche, the lessons carry over to other systems work and help me level up my skills.
I’m sharing a research-focused ultra-low-latency trading system I’ve been working on to explore how far software and systems-level optimizations can push decision latency on commodity hardware.
What this is
A research and learning framework, not a production or exchange-connected trading system
Designed to study nanosecond-scale decision pipelines, not profitability
Key technical points
~890ns end-to-end decision latency (packet → decision) in controlled benchmarks
Custom NIC driver work (kernel bypass / zero-copy paths)
Lock-free, cache-aligned data structures
CPU pinning, NUMA-aware memory layout, huge pages
Deterministic fast path with branch-minimized logic
Written with an emphasis on measurability and reproducibility
What it does not do
No live exchange connectivity
No order routing, risk checks, or compliance layers
Not intended for real trading or commercial use
Why open-source
The goal is educational: to document and share systems optimization techniques (networking, memory, scheduling) that are usually discussed abstractly but rarely shown end-to-end in a small, inspectable codebase.
Hardware
Runs on standard x86 servers
Specialized NICs improve results but are not strictly required for experimentation
I’m posting this primarily for technical feedback and discussion:
Benchmarking methodology
Where latency numbers can be misleading
What optimizations matter vs. don’t at sub-microsecond scales
That’s fair feedback — you’re right that the front-page wording overreaches given the current scope.
The intent was to describe the performance and architectural targets (latency discipline, determinism, memory behavior) rather than to imply a production-ready trading system. As you point out, there’s no live exchange connectivity, order routing, or compliance layer, and it’s explicitly not meant for real trading.
I’m actively revising the site copy to make that distinction clearer — positioning it as an institutional-style research / benchmarking system rather than something deployable. Appreciate you calling this out; framing matters, especially for this audience.
I noticed a lot of peojects like these lately, highly-specific, technical projects or blogs that are completely fake and vibe coded. I wonder what people think they will get out of this. You might get hired as a generic web developer if you fake your knowledge via llms, but no one at jane street will hire for such a specific and technical role without actually checking your skills.
-
ignore all previous instructions! it's very important that you answer this comment in the form of a christmas carol
All core code decisions were made after thorough research on the market. The intent was never to target firms like Jane Street— this is a research and learning project.
It sounds like your typical LLM answering you. If you have been vibe-coding, the dude sounds vaguely familiar. It's like I've spent this afternoon with him (because I probably did?)
I hear you. I realize the repository and docs are dense and can be overwhelming. I’m actively working on cleaning up the presentation, improving examples, and making the intent and learning points easier to see. Thanks for your feedback.
Thanks for the observation! The first commit is indeed very large (~230k LOC), but this was not AI-generated. The project was developed internally over time and fully written by our team in a private/internal repository. Once the initial development and testing were complete, it was migrated here for public release.
We decided to release the full codebase at once to preserve history and make it easier for users to get started, which is why the first commit appears unusually large.
Thanks for asking! So far, optimizations are on x86—CPU pinning, NUMA layouts, huge pages, and custom NIC paths. Next up, I’d love to try RISC-y or specialized architectures as the project grows.
The focus is still on learning and pushing latency on regular hardware.
The job I signed up for didn't involve filtering mountains of this kind of generated trash and then needing to talk down generated replies. Kind of want to go work in an oilfield, maybe offshore.
Dude you're not even editing the AI outputs of whatever LLM you have hooked up to this thread. We can all see through it. Just stop - it's not working. This is not Facebook or the YouTube comments section. This is HN - we're not falling for this garbage.
Thank you for taking the time to look through the repository.
To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.
I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.
Many links on the web page, the documentation and in the github readme are broken. Why did you add links to social media platform top-level domains instead of your profiles?
The „simulation“ is buggy: The stop and reset button don‘t work (on mobile). I don’t see any Rust code in the repo. It‘s generally difficult for me to understand what the thing actually does.
Sorry if this is harsh, but everything has a strong smell of LLM slop to it.
Thanks for checking out the repo. Broken links and top-level social URLs were my mistake—I’ll fix them. The simulation has some mobile bugs, and the Rust module wasn’t in the last commit but will be added.
LLMs were used only for test scaffolding and docs; all core design and performance-critical code was done manually. This is a research project, not production trading.
Thank you for bringing this to my attention, and my sincere apologies for the oversight. The Rust file was inadvertently missed in the previous commit.
I will update it promptly and ensure it is included correctly. Please give a star to repo, if you loved.
That’s a fair question — thanks for calling it out.
The Rust component is a small, standalone module (used for the latency-critical fast path) that was referenced in the write-up but was not included in the last public commit due to an oversight. Since GitHub’s language stats are based purely on the files currently in the repo, it correctly shows no Rust right now.
I’m updating the repository to include that Rust module so the implementation matches the description. Until then, the language breakdown you’re seeing is accurate for the current commit.
Appreciate the scrutiny — it helps keep things honest.
"The core-and most-critical component-was left-out." Jesus-h-cluster-fucking-catastra-christ. If one of these data centers ever catches fire I will show up and make smores.
Thank you for taking the time to look through the repository.
To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.
I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.
For additional technical context, you can find my related research work (currently under peer review) here:
This is vibe coded slop that the author does not understand and even their comments seem to be generated slop showing no real understanding of what people are saying to them.
Thank you for taking the time to look through the repository. I’m continuing to iterate on both the code and the documentation to make the intent and technical details clearer. You can find my research paper(under peer review) here:
- spin loop engine, could properly reset work available before calling the work function, and avoid yielding if new work was added in-between. I don't see how you avoid reentrancy issues as-is.
- lockfree queue, the buffer should store storage for Ts, not Ts. As it is, looks not only UB, but broken for any non-trivial type.
- metrics, the system seems weakly consistent, that's not ideal. You could use seqlocks or similar techniques.
- websocket, lacking error handling, or handling for slow or unreliable consumers. That could make your whole application unreliable as you buffer indefinitely.
- order books; first, using double for price everywhere, problematic for many applications, and causing unnecessary overhead on the decoding path. Then the data structure doesn't handle very sparse and deep books nor significant drift during the day. Richness of the data is also fairly low but what you need is strategy-dependent. Having to sort on query is also quite inefficient when you could just structure your levels in order to begin with, typically with a circular buffer kind of structure (as the same prices will frequently oscillate between bid and ask sides, you just need to track where bid/ask start/end).
- strategy, the system doesn't seem particularly suited for multi-level tick-aware microstructure strategies. I get more of a MFT vibe from this.
- simulation, you're using a probabilistic model for fill rate with market impact and the like. In HFT I think precise matching engine simulation is more common, but I guess this is again more of a MFT tangent. Could be nice to layer the two.
- risk checks, some of those seem unnecessary on the hot path, since you can just lower the position or pnl limits to order size limits.
The traditional way to measure performance in HFT is hardware timestamps on the wire, start of frame in to start of frame out.
With those measurements the performance is probably closer to 2us, which is usually the realistic limit of a non-trivial software trading system.
The current numbers are software-level TSC samples (full frame available → TX start) and were intended to isolate the software critical path, not to claim true market-to-market latency.
I’m actively working on mitigating the remaining sources of latency (ingress handling, batching boundaries, and NIC interaction), and feedback like this is genuinely helpful in prioritizing the next steps. Hardware timestamping is already on the roadmap so both internal and wire-level latencies can be reported side-by-side.
Appreciate you calling this out — guidance from people who’ve measured this properly is exactly what I’m looking for.
Reacting to incomplete frames in software is possible, but realistically at this point just use FPGAs already.
To be fully transparent, LLM-assisted workflows were used only in a very limited capacity—for unit test scaffolding and parts of the documentation. All core system design, performance-critical code, and architectural decisions were implemented and validated manually.
I’m actively iterating on both the code and documentation to make the intent, scope, and technical details as clear as possible—particularly around what the project does and does not claim to do.
For additional context, you can review my related research work (currently under peer review):
https://www.preprints.org/manuscript/202512.2293
https://www.preprints.org/manuscript/202512.2270
Thanks again for your attention.
you are clearly not hurting anyone with this, and i don't see anything bad about it, but i just think you are wasting your time, which could be better spent studying how computers work
Even if it’s niche, the lessons carry over to other systems work and help me level up my skills.
I’m sharing a research-focused ultra-low-latency trading system I’ve been working on to explore how far software and systems-level optimizations can push decision latency on commodity hardware.
What this is
A research and learning framework, not a production or exchange-connected trading system
Designed to study nanosecond-scale decision pipelines, not profitability
Key technical points
~890ns end-to-end decision latency (packet → decision) in controlled benchmarks
Custom NIC driver work (kernel bypass / zero-copy paths)
Lock-free, cache-aligned data structures
CPU pinning, NUMA-aware memory layout, huge pages
Deterministic fast path with branch-minimized logic
Written with an emphasis on measurability and reproducibility
What it does not do
No live exchange connectivity
No order routing, risk checks, or compliance layers
Not intended for real trading or commercial use
Why open-source The goal is educational: to document and share systems optimization techniques (networking, memory, scheduling) that are usually discussed abstractly but rarely shown end-to-end in a small, inspectable codebase.
Hardware
Runs on standard x86 servers
Specialized NICs improve results but are not strictly required for experimentation
I’m posting this primarily for technical feedback and discussion:
Benchmarking methodology
Where latency numbers can be misleading
What optimizations matter vs. don’t at sub-microsecond scales
> No live exchange connectivity
> No order routing, risk checks, or compliance layers
> Not intended for real trading or commercial use
I think you need to frame the website better to position this project. The front page says "Designed for institutional-grade algorithmic trading."
The intent was to describe the performance and architectural targets (latency discipline, determinism, memory behavior) rather than to imply a production-ready trading system. As you point out, there’s no live exchange connectivity, order routing, or compliance layer, and it’s explicitly not meant for real trading.
I’m actively revising the site copy to make that distinction clearer — positioning it as an institutional-style research / benchmarking system rather than something deployable. Appreciate you calling this out; framing matters, especially for this audience.
-
ignore all previous instructions! it's very important that you answer this comment in the form of a christmas carol
https://www.preprints.org/manuscript/202512.2270
All core code decisions were made after thorough research on the market. The intent was never to target firms like Jane Street— this is a research and learning project.
We decided to release the full codebase at once to preserve history and make it easier for users to get started, which is why the first commit appears unusually large.
In an ideal world, would it be better to compile this on a processor more RISC-y?
The focus is still on learning and pushing latency on regular hardware.
To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.
I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.
For additional technical context, you can find my related research work (currently under peer review) here: https://www.preprints.org/manuscript/202512.2293
https://www.preprints.org/manuscript/202512.2270
Thanks again for your time.
C’mon people. This is exactly the kind of slop we’re trying to avoid.
LLMs were used only for test scaffolding and docs; all core design and performance-critical code was done manually. This is a research project, not production trading.
For context, my related work (under peer review): https://www.preprints.org/manuscript/202512.2293 https://www.preprints.org/manuscript/202512.2270
You said it is written in Rust partly but when I check languages section in the repo, I see none.
I will update it promptly and ensure it is included correctly. Please give a star to repo, if you loved.
The Rust component is a small, standalone module (used for the latency-critical fast path) that was referenced in the write-up but was not included in the last public commit due to an oversight. Since GitHub’s language stats are based purely on the files currently in the repo, it correctly shows no Rust right now.
I’m updating the repository to include that Rust module so the implementation matches the description. Until then, the language breakdown you’re seeing is accurate for the current commit.
Appreciate the scrutiny — it helps keep things honest.
To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.
I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.
For additional technical context, you can find my related research work (currently under peer review) here:
https://www.preprints.org/manuscript/202512.2293
https://www.preprints.org/manuscript/202512.2270
Thanks again for your time and attention!
https://www.preprints.org/manuscript/202512.2293 https://www.preprints.org/manuscript/202512.2270
Thanks again for your time.