Reliable 25 Gigabit Ethernet via Thunderbolt

(kohlschuetter.github.io)

81 points | by kohlschuetter 4 days ago

10 comments

  • throwaway2037 51 minutes ago
    This is an outstanding blog post. Initially, the title did little to captivate me, but the blog post was so well written that I got nerd-sniped. Who knew this little adapter was so fascinating! I wonder if the manufacturer is buying the Mellanox cards used from data center tear-downs. The author claims they can be had for only 20 USD online. That seems too good to be true!

    Small thing: I just checked Amazon.com: https://www.amazon.com/s?k=thunderbolt+25G&crid=2RHL4ZJL96Z9...

    I cannot find anything for less than 285 USD. The blog post gave a price of 174 USD. I have no reason to disbelieve the author, but a bummer to see the current price is 110 USD more!

    • geerlingguy 6 minutes ago
      I saw the blog post last week and immediately bought the last one on that Amazon listing for the original price... hopefully they restock soon!

      I'm going to try a couple other fan assisted cooling options, as I'd like to keep the setup reasonably compact.

      I just ran fiber to my desk and I have a more expensive QNAP unit that does 10G SFP+, but this will let me max out the connection to my NAS.

    • kohlschuetter 21 minutes ago
      Thank you!

      I think, tragically, the blog post has caused this price increase.

      The offers on Amazon are most likely all drop shippers trying to gauge a price that works for them.

      You might have better luck ordering directly from China for a fraction of the price: https://detail.1688.com/offer/836680468489.html

  • omgtehlion 1 hour ago
    Ha! Been running these for years on both linux and windows (on lenovo x1 laptops). Using cheap chinese thunderbolt-to-nvme adapters + nvme-to-pcie boards + mellanox cx4 cards (recently got one cx5 and a solarflare x2).

    Pic of a previous cx3 (10 gig on tb3) setup: https://habrastorage.org/r/w780/getpro/habr/upload_files/d3c...

    10gig can saturate full speed, 25G in my experience rarely reaches same 20G as the author observed.

    • kasey_junk 11 minutes ago
      If you don’t mind me asking, what are you using these for? Saturating these seems like it would have reasonably few workloads outside of like cdn or multi-tenant scenarios. Curious what my lack of imagination is hiding here.
      • geerlingguy 4 minutes ago
        I do media production, and sometimes move giant files (like ggufs) around my network, so 25 Gbps is more useful than 10 Gbps, if it's no too expensive.
  • zokier 3 hours ago
    Note that you can do point-to-point network links directly with thunderbolt (and usb4).

    https://support.apple.com/guide/mac-help/ip-thunderbolt-conn... etc

    • kohlschuetter 2 hours ago
      Yes! However, I got around 15 Gbps with a Thunderbolt-only setup (TB3/TB4) = only 75% of the Ethernet setup.

      You'd also mostly be limited to short cables (1-2m) and a ring topology.

  • consp 2 hours ago
    I'm surprised you are only getting 20gbit/s. I did not expect PCIe to be be the limiting factor here. I've got a 100gbit cx4 card currently in a PCIe3 X4 slot (for reasons, don't judge) and it easily maxes that out. I would have expected the 25g cx4 cards to be at least able to get everything out of it. RDMA is required to achieve that in a useful way though.

    Edit: forgot is isn't "true" PCIe but tunneled.

    • kohlschuetter 2 hours ago
      The limitation is Thunderbolt (32 Gbps theoretical limit for PCIe 3 tunneling).
  • userbinator 3 hours ago
    Thunderbolt is basically external PCIe, so this is not so surprising. High speed NICs do consume a relatively large amount of power. I have a feeling I've seen that logo on the board before.
    • kohlschuetter 3 hours ago
      I don't know how to measure the direct power impact on a MacBook Pro (since it's got a battery), but the typical power consumption of these cards is 9 W, not much more than Aquantia 10 GBit cards.

      Also, if you remember where you saw that logo, please let me know!

      • usagisushi 40 minutes ago
        JFYI, for measuring power draw, you might be able to use `macmon`[0] to see the total system power consumption. The values reported by the internal current sensor seem to be quite accurate.

        [0] https://github.com/vladkens/macmon

        • kohlschuetter 5 minutes ago
          Very nice tip, thank you!

          I measure around +11W idle. While running a speed test, I read ca. +15W.

        • usagisushi 17 minutes ago
          Speaking of hardware, the RTL8159 (10Gbps) hit the market late last year and is said to consume only about 2–3W. It apparently runs very cool compared to older chips. (Though it would need to be bonded to reach 25Gbps ;-)
      • consp 2 hours ago
        Plus 1-2.5w per active cable. You need the heatsinks as the cx4 cards expect active airflow, and active transceivers as well.

        I have a 10gbit dual port card in a Lenovo mini pc. There is no normal way to get any heat out of there so I put a 12v small radial fan in there as support. It works great at 5v: silent and cool. It is a fan though so might not suit your purpose.

    • xattt 3 hours ago
      The PCI-E logo or the “octopus in a chip” logo? I’m more interested in the latter.
  • Nextgrid 3 hours ago
    Neat, but the thermal design is absolutely terrible. Sticking that heatsink inside the aluminum case without any air circulation is awful.
    • kohlschuetter 3 hours ago
      Yeah, it's because the network card adapter's heatsink is sandwiched between two PCBs. Not great, not terrible, works for me.

      The placement is mostly determined by the design of the OCP 2.0 connector. OCP 3.0 has a connector at the short edge of the card, which allows exposing/extending the heat sink directly to the outer case.

      If somebody has the talent, designing a Thunderbolt 5 adapter for OCP 3.0 cards could be a worthwhile project.

      • Nextgrid 2 hours ago
        A Flex PCB connecting the OCP2 connector would allow to put the converter board behind the NIC board, allowing the NIC board to be exposed to the aluminum case to use the case itself as a heatsink (would need a split case so the NIC board can be screwed to one side of the case, pressing the main chip against it via a thermal pad).

        As a stop-gap, I'd see if there was any way to get airflow into the case - I'd expect even a tiny fan would do much more than those two large heatsinks stuck onto the case (since the case itself has no thermal connection to the chip heatsink).

        • kohlschuetter 2 hours ago
          My goal was to get a fanless setup (for a quiet office).

          If that's not a requirement just get the Raiden Digit Light One, which does have a fan (and otherwise the same network card).

          If I could design an adapter PCB myself, I would go straight to OCP 3.0, which allows for a much simpler construction, and TB5 speeds.

          Alternatively, there are DELL CX422A rNDC cards (R887V) that appear to have an OCP 2.0 connector but a better heatsink design.

      • consp 2 hours ago
        I'd be more worried about cooling the transceivers properly.
        • kohlschuetter 2 hours ago
          My optical transceiver gets to around 52 °C (measured via IR camera), well below its design limit, so that's not bad.

          If truly concerned, one could use SFP28 to SFP28 cage adapters to have the heat outside the case, and slap on some extra heatsinks there.

  • project2501a 37 minutes ago
    nitpicking but why would someone type `sudo su` vs `sudo -i`
    • Bjartr 18 minutes ago
      I've mostly only ever seen `sudo su` in tutorials, so someone who's only familiar with the command through those is one possible reason why.
  • madduci 1 hour ago
    I still have issues under Linux (Kernel 6.14) and Thinderboldt 4 docking stations. The simply don't get recognised.

    But this is a cool solution

    • kohlschuetter 1 hour ago
      Thanks! Have you tried the boltctl/rescan setup I mentioned in the post? It should get you going, as long as your Thunderbolt/USB4 setup is correct.

      If you're using an adapter card to add Thunderbolt functionality, then your mainboard needs to support that, and the card must be connected to a PCIe bus that's wired to the Intel PCH, not to the CPU.

      • madduci 1 hour ago
        Yes, rescan, re-enroll too. But it still shows as disconnected. I don't know if the firmware is completely incompatible, but it is weird that under windows works and in Linux doesn't
        • kohlschuetter 1 hour ago
          Disconnected as in "network"? What PCIe card do you use? Can you update the firmware (maybe from Windows)?

          Also check the BIOS settings (try setting TB security to "No Security" or "User Authorization")

          Some OEM Mellanox cards can be cross-flashed to NVIDIA's stock firmware, maybe that's also relevant.

  • otterpro 2 hours ago
    > reduces temperatures by at least 15 Kelvin, bringing the ambient enclosure temperature below 40 °C,

    I had to do a double-take when it mentioned Kelvin since That is physically impossible.

    • maratc 1 hour ago
      Isn't "reduces temperatures by 15 Kelvin" the same as "reduces temperatures by 15 Celsius"?
    • sigio 1 hour ago
      reduces temperatures by at least 15 Kelvin == the same as reduces temperatures by at least 15 Celcius.

      It 'reduces it by' ... not reduces it TO

  • cs02rm0 4 hours ago
    Now I just have to contrive the circumstances where this is useful to me. :)
    • mcny 4 hours ago
      I don't know about the Ethernet part but it bothers me that even wifi has become faster than the wired USB port on our phones.

      All I want to do is copy over all the photos and videos from my phone to my computer but I have to baby sit the process and think whether I want to skip or retry a failed copy. And it is so slow. USB 2.0 slow. I guess everybody has given up on the idea of saving their photos and videos over USB?

      • rbanffy 33 minutes ago
        If the photos on the phone are visible as files on a mounted filesystem, you can use rsync to copy them. If the connection drops but recovers by itself, you can put rsync inside a while true loop until it’s doing nothing.

        I’m using Dropbox for syncing photos from phone to Linux laptop, and mounting the SDcard locally for cameras, so this is a guess.

      • diogocp 3 hours ago
        > USB 2.0 slow

        Many phones indeed only support USB 2.0. For example the base iPhone 17. The Pro does support USB 3.2, however.

        > I guess everybody has given up on the idea of saving their photos and videos over USB?

        Correct.

      • jacquesm 3 hours ago
        Wifi is fast but the latency is terrible and the reliability is even worse. It can go up and down like a yo-yo. USB is far more predictable even if it is a bit slower.
        • rbanffy 39 minutes ago
          I have a cluster of 4 RPi Zero Ws and network reliability is not great. Since it is for the chaos, it’s fine, but it’s very common to have a node be offline at any given time.

          Even worse, the control plane is exposed, but for something that runs 3 Hercules mainframe emulation and two Altairs with MP/M, it’s fine.

      • drawfloat 2 hours ago
        I feel like this is an artifact from the late 2010s when the talk was of removing the port completely from phones, where that was being touted alongside swapping speakers with haptic screen audio as a way to make them completely waterproof.

        As wireless charging never quite reached the level hoped – see AirPower – and Google/Apple seemingly bought and never did anything with a bunch of haptic audio startups, I figure that idea died....but they never cared enough to make sure the USB port remained top end.

      • ranguna 3 hours ago
        Why don't you get a phone with 3.0+ USB?

        My last two phones in the last 4 years had at least USB 3.1

      • walterbell 3 hours ago
        > given up on the idea of saving their photos and videos over USB?

        Until USB has monthly service business to compete with cloud storage revenue.

      • cirrusfan 3 hours ago
        > but I have to baby sit the process and think whether I want to skip or retry a failed copy

        Do you import originals or do you have the "most compatible" setting turned on?

        I always assumed apple simply hated people that use windows/linux desktops so the occasional broken file was caused by the driver being sort-of working and if people complain, well, they can fuck off and pay for icloud or a mac. After upgrading to 15 pro which has 10 gbps usb-c it still took forever to import photos and the occasional broken photos kept happening, and after some research it turns out that the speed was limited by the phone converting the .heic originals into .jpg when transferring to a desktop. Not only does it limit the speed, it also degrades the quality of the photos and deletes a bunch of metadata.

        After changing the setting to export original files the transfer is much faster and I haven’t had a single broken file / video. The files are also higher quality and lower filesize, although .heic is fairly computationally-demanding.

        Idk about Android but I suspect it might have a similar behavior

    • consp 2 hours ago
      I recently did a complete disk backup/clone which only took 15 minutes instead of hours. Maxed the SSD which was being backed up at about 2.5GB/s.
    • rbanffy 3 hours ago
      Wouldn’t this be useful for clustering Macs over TB5? Wasn’t the maximum bandwidth over USB-cables 5Gbps? With a switch, you could cluster more than just 4 Mac Studios and have a couple terabytes for very large models to work with.
      • kohlschuetter 3 hours ago
        I was hoping somebody would suggest that (and eventually try it out).

        With TB5, and deep pockets, you might probably also benchmark it against a setup with dedicated TB5 enclosures (e.g., Mercury Helios 5S).

        TB5 has PCIe 4.0 x4 instead of PCIe 3.0 x4 -- that should give you 50 GbE half-duplex instead of 25 GbE. You would need a different network card though (ConnectX-5, for example).

        Pragmatically though, you could also aggregate (bond) multiple 25 GbE network card ports (with Mac Studio, you have up to 6 Thunderbolt buses, so more than enough to saturate a 100GbE connection).

        • rbanffy 2 hours ago
          Too bad Jeff Geerling returned his Mac Studios to Apple. Would be lovely to see how 5x faster RDMA impacts the performance.
    • notrustincloud 2 hours ago
      rsync...grsync...a solution for broken partial batch transfers since forever
    • sschueller 2 hours ago
      Would be useful if I had to debug my internet link and I only had a laptop.
    • kohlschuetter 3 hours ago
      Remote Time Machine backups are snappier than ever before :)
    • e40 4 hours ago
      Porn?
      • leosanchez 4 hours ago
        What kind of porn requires 25 gigabits ?
        • modderation 1 hour ago
          As a guess, large-scale volumetric or photogrammetric "datasets" could be difficult to stream over lesser interconnects.
        • LeoPanthera 4 hours ago
          A lot of porn.