NIST was 5 μs off UTC after last week's power cut

(jeffgeerling.com)

137 points | by jtokoph 5 hours ago

13 comments

  • semenko 1 hour ago
    I found the most interesting part of the NIST outage post [1] is NIST's special Time Over Fiber (TOF) program [2] that "provides high-precision time transfer by other service arrangements; some direct fiber-optic links were affected and users will be contacted separately."

    I've never heard of this! Very cool service, presumably for … quant / HFT / finance firms (maybe for compliance with FINRA Rule 4590 [3])? Telecom providers synchronizing 5G clocks for time-division duplexing [4]? Google/hyperscalers as input to Spanner or other global databases?

    Seriously fascinating to me -- who would be a commercial consumer of NIST TOF?

    [1] https://groups.google.com/a/list.nist.gov/g/internet-time-se...

    [2] https://www.nist.gov/pml/time-and-frequency-division/time-se...

    [3] https://www.finra.org/rules-guidance/rulebooks/finra-rules/4...

    [4] https://www.ericsson.com/en/blog/2019/8/what-you-need-to-kno...

    • dmurray 1 hour ago
      I never saw a need for this in HFT. In my experience, GPS was used instead, but there was never any critical need for microsecond accuracy in live systems. Sub-microsecond latency, yes, but when that mattered it was in order to do something as soon as possible rather than as close as possible to Wall Clock Time X.

      Still useful for post-trade analysis; perhaps you can determine that a competitor now has a faster connection than you.

      The regulatory requirement you linked (and other typical requirements from regulators) allows a tolerance of one second, so it doesn't call for this kind of technology.

      • blibble 1 hour ago
        > I never saw a need for this in HFT. In my experience, GPS was used instead, but there was never any critical need for microsecond accuracy in live systems.

        mifid ii (uk/eu) minimum is 1us granularity

        https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:...

        • dmurray 39 minutes ago
          It's 1 us granularity, which means you should report your timestamps with six figures after the decimal point.

          The required accuracy (Tables 1 and 2 in that document) is 100 us or 1000 us depending on the system.

          • blibble 9 minutes ago
            > The required accuracy (Tables 1 and 2 in that document)

            no, Tables 1 and 2 say divergence, not accuracy

            accuracy is a mix of both granularity and divergence

            regardless, your statement before:

            > The regulatory requirement you linked (and other typical requirements from regulators) allows a tolerance of one second, so it doesn't call for this kind of technology.

            is not true

    • bob1029 11 minutes ago
      > a commercial consumer

      Where does it say these are commercial consumers?

      https://en.wikipedia.org/wiki/Schriever_Space_Force_Base#Rol...

      > Building 400 at Schriever SFB is the main control point for the Global Positioning System (GPS).

    • goalieca 1 hour ago
      My guess would be scientific experiments where they need to correlate or sequence data over large regions. Things like correlating gravitational waves with radio signals and gamma ray bursts.
    • mmaunder 58 minutes ago
      SIGINT as a source clock for others in a network doing super accurate TDOA for example.
  • loph 58 minutes ago
    Only Boulder servers lost sync.

    To say NIST was off is clickbait hyperbole.

    This page: https://tf.nist.gov/tf-cgi/servers.cgi shows that NIST has > 16 NTP servers on IPv4, of those, 5 are in Boulder and were affected by the power failure. The rest were fine.

    However, most entities should not be using these top-level servers anyway, so this should have been a problem for exactly nobody.

    IMHO, most applications should use pool.ntp.org

  • ziml77 1 hour ago
    Nitpick: UTC stands for Coordinated Universal Time. The ordering of the letters was chosen to not match the English or the French names so neither language got preference.
  • ComputerGuru 13 minutes ago
    Not exactly the topic of discussion but also not not on topic: just wanted to sing praise for chrony which has performed better than the traditional os-native NTP clients in our testing on a myriad of real and virtualized hardware.
    • steve1977 11 minutes ago
      Chrony is the default already in some distros (RHEL and SLES that I know of), probably for this very reason.
  • Topgamer7 1 hour ago
    Out of curiosity, can anyone say the most impactful things they've needed incredibly accurate time for?
  • politelemon 4 hours ago
    I'm missing the nuance or perhaps the difference between the first scenario where sending inaccurate time was worse than sending no time, versus the present where they are sending inaccurate time. Sorry if it's obvious.
    • opello 3 hours ago
      The 5us inaccuracy is basically irrelevant to NTP users, from the second update to the Internet Time Service mailing list[1]:

      To put a deviation of a few microseconds in context, the NIST time scale usually performs about five thousand times better than this at the nanosecond scale by composing a special statistical average of many clocks. Such precision is important for scientific applications, telecommunications, critical infrastructure, and integrity monitoring of positioning systems. But this precision is not achievable with time transfer over the public Internet; uncertainties on the order of 1 millisecond (one thousandth of one second) are more typical due to asymmetry and fluctuations in packet delay.

      [1] https://groups.google.com/a/list.nist.gov/g/internet-time-se...

      • zahlman 2 hours ago
        > Such precision is important for scientific applications, telecommunications, critical infrastructure, and integrity monitoring of positioning systems. But this precision is not achievable with time transfer over the public Internet

        How do those other applications obtain the precise value they need without encountering the Internet issue?

        • throw0101d 2 hours ago
          > How do those other applications obtain the precise value they need without encountering the Internet issue?

          They do not use the Internet: they use local (GPS) clocks with internal high-precision clocks for carry-over in case GNSS signal is unavailable:

          * https://www.ntp.org/support/vendorlinks/

          * https://www.meinbergglobal.com/english/products/ntp-time-ser...

          * https://syncworks.com/shop/syncserver-s650-rubidium-090-1520...

          * https://telnetnetworks.ca/solutions/precision-time/

          • herpderperator 1 hour ago
            If those other applications use their own local GPS clocks, what is the significance of NIST (and the 5μs inaccuracy) in their scenario?
          • lysace 2 hours ago
            TIL/remembered GNSS satellites have onboard atomic clocks. Makes a lot of sense, but still pretty cool. Something like this, I guess?

            https://en.wikipedia.org/wiki/Rubidium_standard

            • CamperBob2 1 hour ago
              Yes, either Rb, Cs, or H standards depending on which GNSS system you're using.

              For the most critical applications, you can license a system like Fugro AtomiChron that provides enhanced GNSS timing down to the level of a few nanoseconds. There are a couple of products that do similar things, all based on providing better ephemerides than your receiver can obtain from the satellites themselves.

              You can get AtomiChron as an optional subscription with the SparkPNT GPSDO, for instance (https://www.sparkfun.com/sparkpnt-gnss-disciplined-oscillato...).

        • geerlingguy 2 hours ago
          A lot of organizations also colocate timing equipment near the actual clocks, and then have 'dark fiber' between their equipment and the main clock signals.

          Then they disperse and use the time as needed.

          According to jrronimo, they even had one place splice fiber direct between machines because couplers were causing problems! [1]

          [1] https://news.ycombinator.com/item?id=46336755

          • vasco 2 hours ago
            If I put my machine near the main clock signal, I have one clock signal to read from. The comment above was asking about how to average across many different clocks, presumably all in different places in the globe? Unless there's one physical location with all of the ones you're averaging, you're close to one and far from all the others so how is it done without the internet?
        • LeoPanthera 2 hours ago
          If you must use the internet, PTP gets closer.

          Alternate sources include the GPS signal, and the WWVB radio signal, which has a 60kHz carrier wave accurate to less than 1 part in 10^12.

          • aftbit 2 hours ago
            Can you do PTP over the internet? I have only seen it in internal environments. GPS is probably the best solution for external users to get time signals with sub-µs uncertainties.
    • BuildTheRobots 3 hours ago
      It's a good question, and I wondered the same. I don't know, but I'd postulate:

      As it stands at the minute, the clocks are a mere 5 microseconds out and will slowly get better over time. This isn't even in the error measurement range and so they know it's not going to have a major effect on anything.

      When the event started and they lost power and access to the site, they also lost their management access to the clocks as well. At this point they don't know how wrong the clocks are, or how more wrong they're going to get.

      If someone restores power to the campus, the clocks are going to be online (all the switches and routers connecting them to the internet suddenly boot up), before they've had a chance to get admin control back. If something happened when they were offline and the clocks drifted significantly, then when they came online half the world might decide to believe them and suddenly step change to follow them. This could cause absolute havoc.

      Potentially safer to scram something than have it come back online in an unknown state, especially if (lots of) other things are are going to react to it.

      In the last NIST post, someone linked to The Time Rift of 2100: How We lost the Future --- and Gained the Past. It's a short story that highlights some of the dangers of fractured time in a world that uses high precision timing to let things talk to each other: https://tech.slashdot.org/comments.pl?sid=7132077&cid=493082...

    • throw0101d 2 hours ago
      > […] where sending inaccurate time was worse than sending no time […]

      When you ask a question, it is sometimes better to not get an answer—and know you have not-gotten an answer—then to get the wrong answer. If you know that a 'bad' situation has arisen, you can start contingency measures to deal with it.

      If you have a fire alarm: would you rather have it fail in such a way that it gives no answer, or fail in a way where it says "things are okay" even if it doesn't know?

  • gnabgib 5 hours ago
  • mmmlinux 2 hours ago
    Are there any plans being made to prevent this happening in the future?
    • kibwen 2 hours ago
      Yes, the US government is banning all those democrat windmills that conspired to blow over the NTP server.
  • V__ 3 hours ago
    Has anyone here ever needed microsecond precision? Would love to hear about it.
    • Aromasin 2 hours ago
      I worked at Altera (FPGA supplier) as the Ethernet IP apps engineer for Europe for a few years. All the big telecoms (Nokia, Ericsson, Cisco, etc) use Precision Time Protocol (PTP) in some capacity and all required clocks to be ns levels for accuracy. Sometimes as low a 10ns at the boundary. Any imperfection in the local clock directly converts into timestamp error, and timestamp error is what limits PTP synchronization performance. Timestamps are the fundamental observable in PTP. Quantization and jitter create irreducible timestamp noise. That noise directly limits offset and delay estimation. Errors accumulate across network elements and internal clock error must be much smaller than the system requirement.

      I think most people would look at the error and think "what's the big deal" but at all the telecoms customers would be scrambling to find a clock that hasn't fallen out of sync.

    • sgillen 3 hours ago
      We don't use NTP, but for robotics, stereo camera synchronization we often want the two frames to be within ~10us of eachother. For sensor fusion we then also need a lidar on PTP time to be translated to the same clock domain as cameras, for which we also need <~10us.

      We actually disable NTP entirely (run it once per day or at boot) to avoid clocks jumping while recording data.

      • wpollock 2 hours ago
        > We actually disable NTP entirely (run it once per day or at boot) to avoid clocks jumping while recording data.

        This doesn't seem right to me. NTP with default settings should be monotonic. So no jumps. If you disable it Linux enters 11-minute mode, IIRC, and that may not be monotonic.

        • ComputerGuru 16 minutes ago
          Pedantically, a monotonic function need not have a constant first derivative. To take it further, in mathematics it is accepted for a monatomic function to have a countable number of discontinuities, but of course in the context of a digital clock that only increments in discrete steps, that’s of little bearing.

          But that’s all besides the point since most sane time sync clients (regardless of protocol) generally handle small deviations (i.e. normal cases) by speeding up or slowing down the system clock, not jumping it (forward or backward).

      • opello 1 hour ago
        In your stereo camera example, are these like USB webcams or something like MIPI CSI attached devices?
    • andrewxdiamond 3 hours ago
      We use nanosecond precision driven by GPS clocks. That timestamp in conjunction with star tracker systems gives us reliable positioning information for orbital entities.

      https://en.wikipedia.org/wiki/Star_tracker

    • grumbelbart 1 hour ago
      Lightning detection. You have a couple of ground stations with known positions that wait for certain electromagnetic puses, and which record the timestamps of such pulses. With enough stations you can triangulate the location of the source of each pulse. Also a great way to detect nuclear detonations.

      There is a german club that builds and distrubutes such stations (using GPS for location and timing), with a quite impressive global coverage by now:

      https://www.blitzortung.org

    • zamadatix 3 hours ago
      (Assuming "precision" really meant "accuracy") The network equipment I work on requires sub microsecond time sync on the network for 5G providers and financial trading customers. Ideally they'd just get it from GPS direct, but that can be difficult to do for a rack full of servers. Most of the other PTP use cases I work with seem to be fine with multiples of microseconds, e.g. Audio/Video over the network or factory floor things like PLCs tend to be find with a few us over the network.

      Perhaps a bit more boring than one might assume :).

    • peaseagee 2 hours ago
      At a previous role, we needed nanosecond precision for a simulcast radio communications system. This was to allow for wider transmission for public safety radio systems without having to configure trunking. We could even adjust the delay in nanoseconds to move the deadzones away from inhabited areas.

      We solved this by having GPS clocks at each tower as well as having the app servers NTP with each other. The latter burned me once due to some very dumb ARP stuff, but that's a story for another day.

    • IceWreck 3 hours ago
      We need nanosecond precision for trading - basically timestamping exchange/own/other events and to measure latency.
    • marcosdumay 3 hours ago
      You probably want to ask about accuracy. Any random microcontroller from the 90s needs microsecond precision.
    • thadt 47 minutes ago
      Nuclear measurements, where the speed of a gamma ray flying across a room vs a neutron is relevant. But that requires at least nanosecond time resolution, and you’re a long way from thinking about NTP.
    • pi-rat 2 hours ago
      High speed finance is msec and below. Fastest publically known tick to trade is just shy of 14 nanos.

      Timekeeping starts to become really hard, often requiring specialized hardware and protocols.

    • hnuser123456 3 hours ago
      The high frequency trading guys

      edit: also the linked slides in TFA

    • bobmcnamara 2 hours ago
      Yes, but always got it from GPS so presumably they'd be off about the same amount.

      Distributed sonar, allows placing receivers willy-nilly and aligning the samples later.

      Remote microphone switching - though for this you wouldn't notice 5us jitter, it's just that the system we designed happened to have granularity that good.

    • immibis 1 hour ago
      I believe LTE and 5G networks require it to coordinate timeslots between overlapping cells. Of course, they can use whatever reference they want, as long as all the cells are using the same one - it doesn't have to be UTC. Some (parts of) networks transmit it across the network, while others have independent GPS receivers at each cell site.

      Synchronization is also required for SDH networks. Don't know if those are still used.

      Someone else referenced low power ham radio modes like WSPR, which I also don't know much about, but I can imagine they have timeslots linked to UTC and require accuracy. Those modes have extremely low data rates and narrow bandwidths, requiring accurate synchronization. I don't know if they're designed to self-synchronize, or need an external reference.

      When multiple transmitters are transmitting the same radio signal (e.g. TV) they might need to be synchronized to a certain phase relationship. Again, don't know much about it

    • esseph 3 hours ago
      Telecom.

      Precision Time Protocol gets you sub-microsecond.

      https://en.wikipedia.org/wiki/Precision_Time_Protocol

    • jeffbee 1 hour ago
      A database like Google Spanner has higher latency in proportion to the uncertainty about the time. Driving the time uncertainty down into the microsecond range, or lower, keeps latency low.
    • idiotsecant 2 hours ago
      Lots of things do. Shoot, even plain old TDM needs timing precision on the order of picoseconds to nanoseconds.
    • loeg 3 hours ago
      I mean, we routinely benchmark things that take microseconds or less. I've seen a 300 picosecond microbenchmark (single cycle at 3GHz). No requirement that absolute time is correct, though.
  • geetee 3 hours ago
    Now I'm curious... How the hell do you synchronize clocks to such an extreme accuracy? Anybody have a good resource before I try to find one myself?
    • bestouff 2 hours ago
      Look up PTP White Rabbit.
  • ChrisArchitect 5 hours ago
    More discussion:

    NTP at NIST Boulder Has Lost Power

    https://news.ycombinator.com/item?id=46334299

  • qmr 5 hours ago
    Gah, just when you think you can trust time.nist.gov

    Suggestions from the community for more reliable alternatives?

    • evanriley 4 hours ago
      > Gah, just when you think you can trust time.nist.gov

      You still can...

      If you're that considered about 5 microseconds: Build your own Stratum 1 time server https://github.com/geerlingguy/time-pi

      or just use ntppool https://www.ntppool.org/en/

      • beala 2 hours ago
        It sounds like GPS, and thus a GPS-based stratum 1 server, uses these time servers, but they were successfully failed over:

        > Jeff finished off the email mentioning the US GPS system failed over successfully to the WWV-Ft. Collins campus. So again, for almost everyone, there was zero issue, and the redundancy designed into the system worked like it's supposed to.

        So failures in these systems are potentially correlated.

        The author mentions another solution. Apparently he runs his own atomic clock. I didn’t know this was a thing an individual could do.

        > But even with multiple time sources, some places need more. I have two Rubidium atomic clocks in my studio, including the one inside a fancy GPS Disciplined Oscillator (GPSDO). That's good for holdover. Even if someone were jamming my signal, or my GPS antenna broke, I could keep my time accurate to nanoseconds for a while, and milliseconds for months. That'd be good enough for me.

      • eddyg 3 hours ago
        Be aware that there are members of the NTP pool with less-than-honorable intentions and you don't get to pick-and-choose. Yes, they all should provide the time, but they also get your IP address.

        For example: unlike the IPv4 space, the IPv6 space is too big too scan, so a number of "researchers" (if you want to call them that) put v6-capable NTP servers in the NTP pool to gather information about active v6 blocks to scan/target.

        • ticoombs 3 hours ago
          Do you have any acticles or references about this? That would be great research (pun intended) to find out
        • edoceo 2 hours ago
          Is this one of those extraordinary claims that requires evidence? Or is it generally true that there are homey-pots in many of these services (NTP, mirrors, etc)
    • ianburrell 4 hours ago
      Most places that need accurate time get it from GPS. That is 10-100 ns.

      Also, you can use multiple NIST servers. They have ones in Fort Collins, CO and Gaithersburg, MD. Most places shouldn't use NIST directly but Stratum 1 name servers.

      Finally, NTP isn't accurate enough, 10-100 ms, for microsecond error to matter.

    • ssl-3 1 hour ago
      Yes.

      Use NTP with ≥4 diverse time sources, just as RFC 5905 suggests doing. And use GPS.

      (If you're reliant upon only one source of a thing, and that thing is important to you in some valuable way, then you're doing it wrong. In other words: Backups, backups, backups.)

    • ajkjk 4 hours ago
      their handling it responsibly seems like more evidence for trusting them, not less?
    • vel0city 2 hours ago
      Use the other servers as well: https://tf.nist.gov/tf-cgi/servers.cgi

      For instance, time-a-wwv.nist.gov.

      One should configure a number of different NTP sources instead of just a single host.

    • monster_truck 4 hours ago
      I'm more concerned about what you think they did to earn your trust in the first place