Every GitHub object has two IDs

(greptile.com)

86 points | by dakshgupta 9 hours ago

4 comments

  • agwa 18 minutes ago
    > GitHub's migration guide tells developers to treat the new IDs as opaque strings and treat them as references. However it was clear that there was some underlying structure to these IDs as we just saw with the bitmasking

    Great, so now GitHub can't change the structure of their IDs without breaking this person's code. The lesson is that if you're designing an API and want an ID to be opaque you have to literally encrypt it. I find it really demoralizing as an API designer that I have to treat my API's consumers as adversaries who will knowingly and intentionally ignore guidance in the documentation like this.

    • krisoft 5 minutes ago
      > Great, so now GitHub can't change the structure of their IDs without breaking this person's code.

      And that is all the fault of the person who treated a documented opaque value as if it has some specific structure.

      > The lesson is that if you're designing an API and want an ID to be opaque you have to literally encrypt it.

      The lesson is that you should stop caring about breaking people’s code who go against the documentation this way. When it breaks you shrug. Their code was always buggy and it just happened to be working for them until then. You are not their dad. You are not responsible for their misfortune.

      > I find it really demoralizing as an API designer that I have to treat my API's consumers as adversaries who will knowingly and intentionally ignore guidance in the documentation like this.

      You don’t have to.

    • maxbond 12 minutes ago
      You could also say, if I tell you something is an opaque identifier, and you introspect it, it's your problem if your code breaks. I told you not to do that.
    • haileys 10 minutes ago
      This is well understood - Hyrum's law.

      You don't need encryption, a global_id database column with a randomly generated ID will do.

      • maxbond 8 minutes ago
        You could but you would lose the performance benefits you were seeking by encoding information into the ID. But you could also use a randomized, proprietary base64 alphabet rather than properly encrypting the ID.
        • haileys 5 minutes ago
          Encoding a type name into an ID is never really something I've viewed as being about performance. Think of it more like an area code, it's an essential part of the identifier that tells you how to interpret the rest of it.
          • maxbond 1 minute ago
            You could definitely put a prefix and a UUID (or whatever), I failed to consider that.
    • nwallin 10 minutes ago
      Hyrum's law is a real sonuvabitch.
  • haileys 30 minutes ago
    > That repository ID (010:Repository2325298) had a clear structure: 010 is some type enum, followed by a colon, the word Repository, and then the database ID 2325298.

    It's a classic length prefix. Repository has 10 chars, Tree has 4.

  • ezyang 16 minutes ago
    I just want to point out that Opus 4.5 actually knows this trick and will write the code to decode the IDs if it is working with GitHub's API lol
  • chatmasta 44 minutes ago
    > Somewhere in GitHub's codebase, there's an if-statement checking when a repository was created to decide which ID format to return.

    I doubt it. That's the beauty of GraphQL — each object can store its ID however it wants, and the GraphQL layer encodes it in base64. Then when someone sends a request with a base64-encoded ID, there _might_ be an if-statement (or maybe it just does a lookup on the ID). If anything, the if-statement happens _after_ decoding the ID, not before encoding it.

    There was never any if-statement that checked the time — before the migration, IDs were created only in the old format. After the migration, they were created in the new format.