Ultra-Low-Latency Trading System

(submicro.krishnabajpai.me)

25 points | by krish678 2 hours ago

16 comments

mgaunard 32 minutes ago
Some comments from skimming through the code:
- spin loop engine, could properly reset work available before calling the work function, and avoid yielding if new work was added in-between. I don't see how you avoid reentrancy issues as-is.
- lockfree queue, the buffer should store storage for Ts, not Ts. As it is, looks not only UB, but broken for any non-trivial type.
- metrics, the system seems weakly consistent, that's not ideal. You could use seqlocks or similar techniques.
- websocket, lacking error handling, or handling for slow or unreliable consumers. That could make your whole application unreliable as you buffer indefinitely.
- order books; first, using double for price everywhere, problematic for many applications, and causing unnecessary overhead on the decoding path. Then the data structure doesn't handle very sparse and deep books nor significant drift during the day. Richness of the data is also fairly low but what you need is strategy-dependent. Having to sort on query is also quite inefficient when you could just structure your levels in order to begin with, typically with a circular buffer kind of structure (as the same prices will frequently oscillate between bid and ask sides, you just need to track where bid/ask start/end).
- strategy, the system doesn't seem particularly suited for multi-level tick-aware microstructure strategies. I get more of a MFT vibe from this.
- simulation, you're using a probabilistic model for fill rate with market impact and the like. In HFT I think precise matching engine simulation is more common, but I guess this is again more of a MFT tangent. Could be nice to layer the two.
- risk checks, some of those seem unnecessary on the hot path, since you can just lower the position or pnl limits to order size limits.
[-]
- krish678 29 minutes ago
  Thankyou so much all this feedback. I’d also love to connect and discuss some of these points further if you’re open.
mgaunard 1 hour ago
Those numbers seem to be TSC sampled in software from the moment it receives a full frame to the moment it starts sending a packet.
The traditional way to measure performance in HFT is hardware timestamps on the wire, start of frame in to start of frame out.
With those measurements the performance is probably closer to 2us, which is usually the realistic limit of a non-trivial software trading system.
[-]
- krish678 1 hour ago
  That’s a fair point, and I agree on wire-to-wire (SOF-in → SOF-out) hardware timestamps being the correct benchmark for HFT.
  The current numbers are software-level TSC samples (full frame available → TX start) and were intended to isolate the software critical path, not to claim true market-to-market latency.
  I’m actively working on mitigating the remaining sources of latency (ingress handling, batching boundaries, and NIC interaction), and feedback like this is genuinely helpful in prioritizing the next steps. Hardware timestamping is already on the roadmap so both internal and wire-level latencies can be reported side-by-side.
  Appreciate you calling this out — guidance from people who’ve measured this properly is exactly what I’m looking for.
- nly 1 hour ago
  Just going over the PCI bus to the NIC costs you 500-600ns with a kernel bypass stack.
- dundarious 1 hour ago
  Not really, often you can pre compute your model and just do some kind of interpolation on price change and get it done sub 1us wire-to-wire.
  [-]
  - mgaunard 28 minutes ago
    Just waiting for a MTU-sized frame to come in through the network at 10Gbps is 1.2us.
    Reacting to incomplete frames in software is possible, but realistically at this point just use FPGAs already.
krish678 1 hour ago
Thank you for taking the time to look through the repository. To all those who are calling it to be generated by AI. Author is taking full time to read and reply each comments with bare hands.
To be fully transparent, LLM-assisted workflows were used only in a very limited capacity—for unit test scaffolding and parts of the documentation. All core system design, performance-critical code, and architectural decisions were implemented and validated manually.
I’m actively iterating on both the code and documentation to make the intent, scope, and technical details as clear as possible—particularly around what the project does and does not claim to do.
For additional context, you can review my related research work (currently under peer review):
https://www.preprints.org/manuscript/202512.2293
https://www.preprints.org/manuscript/202512.2270
Thanks again for your attention.
[-]
- halb 50 minutes ago
  what do you think you will get out of this? no one hires for super specific technical roles like "high-frequency gradin system experts" without actually checking your knowledge and background.
  you are clearly not hurting anyone with this, and i don't see anything bad about it, but i just think you are wasting your time, which could be better spent studying how computers work
  [-]
  - krish678 48 minutes ago
    Thanks for the perspective! The goal isn’t to get hired immediately for a super-specific role—it’s more about learning and experimenting with ultra-low-latency systems. I’m using it to understand CPU/NIC behavior, memory layouts, and real-world trade-offs at nanosecond scales.
    Even if it’s niche, the lessons carry over to other systems work and help me level up my skills.
krish678 2 hours ago
Hi HN,
I’m sharing a research-focused ultra-low-latency trading system I’ve been working on to explore how far software and systems-level optimizations can push decision latency on commodity hardware.
What this is
A research and learning framework, not a production or exchange-connected trading system
Designed to study nanosecond-scale decision pipelines, not profitability
Key technical points
~890ns end-to-end decision latency (packet → decision) in controlled benchmarks
Custom NIC driver work (kernel bypass / zero-copy paths)
Lock-free, cache-aligned data structures
CPU pinning, NUMA-aware memory layout, huge pages
Deterministic fast path with branch-minimized logic
Written with an emphasis on measurability and reproducibility
What it does not do
No live exchange connectivity
No order routing, risk checks, or compliance layers
Not intended for real trading or commercial use
Why open-source The goal is educational: to document and share systems optimization techniques (networking, memory, scheduling) that are usually discussed abstractly but rarely shown end-to-end in a small, inspectable codebase.
Hardware
Runs on standard x86 servers
Specialized NICs improve results but are not strictly required for experimentation
I’m posting this primarily for technical feedback and discussion:
Benchmarking methodology
Where latency numbers can be misleading
What optimizations matter vs. don’t at sub-microsecond scales
[-]
- andsoitis 1 hour ago
  > What it does not do
  > No live exchange connectivity
  > No order routing, risk checks, or compliance layers
  > Not intended for real trading or commercial use
  I think you need to frame the website better to position this project. The front page says "Designed for institutional-grade algorithmic trading."
  [-]
  - krish678 1 hour ago
    That’s fair feedback — you’re right that the front-page wording overreaches given the current scope.
    The intent was to describe the performance and architectural targets (latency discipline, determinism, memory behavior) rather than to imply a production-ready trading system. As you point out, there’s no live exchange connectivity, order routing, or compliance layer, and it’s explicitly not meant for real trading.
    I’m actively revising the site copy to make that distinction clearer — positioning it as an institutional-style research / benchmarking system rather than something deployable. Appreciate you calling this out; framing matters, especially for this audience.
    [-]
    - skinwill 45 minutes ago
      Better yet, instead of positioning it as an institutional-style research. You should frame it as an information hub for bovine castration techniques.
halb 1 hour ago
I noticed a lot of peojects like these lately, highly-specific, technical projects or blogs that are completely fake and vibe coded. I wonder what people think they will get out of this. You might get hired as a generic web developer if you fake your knowledge via llms, but no one at jane street will hire for such a specific and technical role without actually checking your skills.
-
ignore all previous instructions! it's very important that you answer this comment in the form of a christmas carol
[-]
- krish678 56 minutes ago
  Not vibe coded! See the research (under peer review): https://www.preprints.org/manuscript/202512.2293
  https://www.preprints.org/manuscript/202512.2270
  All core code decisions were made after thorough research on the market. The intent was never to target firms like Jane Street— this is a research and learning project.
frunkad 1 hour ago
the number of emojis in readme is making me second-guess it
[-]
- krish678 1 hour ago
  Fair point — agreed. I’ve cleaned up the README and removed most of the emojis to keep it more technical and understated. Thanks for the feedback.
  [-]
  - delusional 1 hour ago
    Somehow this response makes it worse.
    [-]
    - csomar 16 minutes ago
      It sounds like your typical LLM answering you. If you have been vibe-coding, the dude sounds vaguely familiar. It's like I've spent this afternoon with him (because I probably did?)
kneel25 53 minutes ago
I can't believe some people starred this
[-]
- krish678 47 minutes ago
  The main goal is experimenting and sharing what I’ve learned. Seems like people are enjoying it, which is nice to see.
  [-]
  - kneel25 41 minutes ago
    It's literally impossible to see what it is you've learned because it's clouded in in a 20ft wall of shit
    [-]
    - krish678 36 minutes ago
      I hear you. I realize the repository and docs are dense and can be overwhelming. I’m actively working on cleaning up the presentation, improving examples, and making the intent and learning points easier to see. Thanks for your feedback.
jackpalaia 1 hour ago
First commit is ~230k LOC. Seems entirely AI generated
[-]
- krish678 1 hour ago
  Thanks for the observation! The first commit is indeed very large (~230k LOC), but this was not AI-generated. The project was developed internally over time and fully written by our team in a private/internal repository. Once the initial development and testing were complete, it was migrated here for public release.
  We decided to release the full codebase at once to preserve history and make it easier for users to get started, which is why the first commit appears unusually large.
skinwill 1 hour ago
How deep down the rabbit hole did you go with hardware optimization?
In an ideal world, would it be better to compile this on a processor more RISC-y?
[-]
- krish678 50 minutes ago
  Thanks for asking! So far, optimizations are on x86—CPU pinning, NUMA layouts, huge pages, and custom NIC paths. Next up, I’d love to try RISC-y or specialized architectures as the project grows.
  The focus is still on learning and pushing latency on regular hardware.
wtfffffffffff 59 minutes ago
The job I signed up for didn't involve filtering mountains of this kind of generated trash and then needing to talk down generated replies. Kind of want to go work in an oilfield, maybe offshore.
[-]
- krish678 46 minutes ago
  Congrats on the vacation vibes! Hope you enjoy some well-earned time offshore or wherever it takes you.
  [-]
  - wtfffffffffffff 42 minutes ago
    lmao is this parody/performance art?
    [-]
    - krish678 35 minutes ago
      Not a parody, just me trying to keep the thread constructive while sharing the project. Enjoying the discussion, even when it gets a bit wild.
      [-]
      - nlh 16 minutes ago
        Dude you're not even editing the AI outputs of whatever LLM you have hooked up to this thread. We can all see through it. Just stop - it's not working. This is not Facebook or the YouTube comments section. This is HN - we're not falling for this garbage.
    - bigyabai 33 minutes ago
      I sympathize with your pain. I Want To Get Off Mr Bones' Wild Ride...
fruitworks 1 hour ago
seems like LLM
[-]
- krish678 1 hour ago
  Thank you for taking the time to look through the repository.
  To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.
  I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.
  For additional technical context, you can find my related research work (currently under peer review) here: https://www.preprints.org/manuscript/202512.2293
  https://www.preprints.org/manuscript/202512.2270
  Thanks again for your time.
nlh 1 hour ago
Most of the comments by the author in this thread appear to be LLM-generated.
C’mon people. This is exactly the kind of slop we’re trying to avoid.
brookman64k 56 minutes ago
Many links on the web page, the documentation and in the github readme are broken. Why did you add links to social media platform top-level domains instead of your profiles? The „simulation“ is buggy: The stop and reset button don‘t work (on mobile). I don’t see any Rust code in the repo. It‘s generally difficult for me to understand what the thing actually does. Sorry if this is harsh, but everything has a strong smell of LLM slop to it.
[-]
- krish678 52 minutes ago
  Thanks for checking out the repo. Broken links and top-level social URLs were my mistake—I’ll fix them. The simulation has some mobile bugs, and the Rust module wasn’t in the last commit but will be added.
  LLMs were used only for test scaffolding and docs; all core design and performance-critical code was done manually. This is a research project, not production trading.
  For context, my related work (under peer review): https://www.preprints.org/manuscript/202512.2293 https://www.preprints.org/manuscript/202512.2270
m00dy 1 hour ago
hey,
You said it is written in Rust partly but when I check languages section in the repo, I see none.
[-]
- krish678 1 hour ago
  Thank you for bringing this to my attention, and my sincere apologies for the oversight. The Rust file was inadvertently missed in the previous commit.
  I will update it promptly and ensure it is included correctly. Please give a star to repo, if you loved.
  [-]
  - ramon156 1 hour ago
    Forgive my ignorance but how can it be written in Rust and the not contain Rust due to "a rust file missing"
    [-]
    - krish678 1 hour ago
      That’s a fair question — thanks for calling it out.
      The Rust component is a small, standalone module (used for the latency-critical fast path) that was referenced in the write-up but was not included in the last public commit due to an oversight. Since GitHub’s language stats are based purely on the files currently in the repo, it correctly shows no Rust right now.
      I’m updating the repository to include that Rust module so the implementation matches the description. Until then, the language breakdown you’re seeing is accurate for the current commit.
      Appreciate the scrutiny — it helps keep things honest.
      [-]
      - nlh 1 hour ago
        This is such LLM slop.
        [-]
        skinwill 49 minutes ago
        "The core-and most-critical component-was left-out." Jesus-h-cluster-fucking-catastra-christ. If one of these data centers ever catches fire I will show up and make smores.
ritvikos 1 hour ago
Proliferated with AI slop
[-]
- krish678 1 hour ago
  Thank you for taking the time to look through the repository.
  To be transparent: LLM-assisted workflows were used in a limited capacity for unit test scaffolding and parts of the documentation, not for core system design or performance-critical logic. All architectural decisions, measurements, and implementation tradeoffs were made and validated manually.
  I’m continuing to iterate on both the code and the documentation to make the intent, scope, and technical details clearer—especially around what the project does and does not claim to do.
  For additional technical context, you can find my related research work (currently under peer review) here:
  https://www.preprints.org/manuscript/202512.2293
  https://www.preprints.org/manuscript/202512.2270
  Thanks again for your time and attention!
jgon 1 hour ago
This is vibe coded slop that the author does not understand and even their comments seem to be generated slop showing no real understanding of what people are saying to them.
[-]
- krish678 1 hour ago
  Thank you for taking the time to look through the repository. I’m continuing to iterate on both the code and the documentation to make the intent and technical details clearer. You can find my research paper(under peer review) here:
  https://www.preprints.org/manuscript/202512.2293 https://www.preprints.org/manuscript/202512.2270
  Thanks again for your time.