Java Hello World, LLVM Edition

(javaadvent.com)

200 points | by ingve 3 days ago

11 comments

pron 3 days ago
Tangential:
The --enable-native-access option mentioned in the article is part of a large effort we call "Integrity by Default"[1]. The idea is that a library module can violate invariants established by another module (e.g. access to private fields and methods, mutation of final fields etc.) requires approval by the application, so that a library will not be able to have a global effect on the application without its knowledge, and the correctness of each module could be verfied in isolation.
Now, --enable-native-access is also required to use JNI, but JNI can violate the integrity of Java invariants in a much more extensive way than FFM can. For example, JNI gives native code access to private fields of classes in arbitrary modules, while FFM does not. The only invariant FFM can break is freedom from undefined behaviour in the C sense. This is dangerous, but not nearly as dangerous as what JNI can do.
For the time being, we decided to enable both FFM and JNI with the same flag, but, given how more dangerous JNI is, in the future we may introduce a more fine-grained flag that would allow the use of FFM but not of JNI.
[1]: https://openjdk.org/jeps/8305968
[-]
- tadfisher 3 days ago
  Where does the "final means final" effort fit in? Can the JVM prevent modification of final fields via JNI, or is --enable-native-access also going to require (or imply) the flag which enables setAccessible() and friends?
  [-]
  - pron 3 days ago
    Ah, that's a great question, and the answer is in the JEP (https://openjdk.org/jeps/500#Mutating-final-fields-from-nati...).
    When running with -Xcheck:jni, you'll get a warning when trying to mutate a final field with JNI.
    Now, enabling this check by default without harming JNI performance proved to be too much of an effort. However, mutating final fields with JNI even today can already lead to undefined behaviour, including horrible miscompilation, where different Java methods can read different values of the field, for final fields that the JVM already trusts to be immutable, such as static finals, record components, or a few other cases (indeed, there are non-final fields that the JVM trusts to be assigned only once, and mutating those with JNI is also undefined behaviour). As the compiler starts trusting more final fields after this change, mutating almost all final fields will lead to undefined behaviour. Then again, using JNI can lead to undefined behaviour in many ways.
    So to make sure your JNI code isn't mutating finals, test with -Xcheck:jni (as of JDK 26).
    [-]
    - gorset 3 days ago
      This brings back memories debugging an azul zing bug where an effectively final optimization ended up doing the wrong thing with zstd-jni. It was painful enough that I couldn’t convince the team to enable the optimization again for years after it was fixed.
tuhgdetzhh 3 days ago
I'm always a bit shocked how casual people people wget and execute shell scripts as part of their install process.
This is the equivalent of giving an author of a website remote code execution (RCE) on your computer.
I get the idea that you can download the script first and carefully read it, but I think that 99% of people won't.
[-]
- stouset 3 days ago
  I’m always a bit shocked how seriously people take concerns over the install script for a binary executable they’re already intending to trust.
  [-]
  - shakna 3 days ago
    Between you and me, are a bunch of other hops. Blindly trusting dependencies is one part of why npm is burning down at the moment.
    Why trust un-signatured files hosted on a single source of truth? It isn't the 90s anymore.
    [-]
    - stouset 1 day ago
      $ curl ${flags} https://site.io/install.sh | sh $ curl ${flags} https://site.io/tool > ./tool $ chmod u+x ./tool $ ./tool
      Both of these are effectively the same damn thing but everyone loses their minds over the first one.
      Also, a lot of those install scripts do check signatures of the binaries they host. And if you’re concerned that someone could have owned the webserver it’s hosted on, then they can just as easily replace the public key used for verification in the written instructions on the website.
      [-]
      - shakna 1 day ago
        I'm not advocating for either of those.
        pacman -Sy {tool} pkg_add {tool} apt install {tool}
        Even the AUR does a lot more to make you secure, than a straight curl - even though throwing things up there is easy.
    - saagarjha 3 days ago
      What’s your alternative?
      [-]
      - shakna 3 days ago
        A mirrored package manager, where signature and executable are always grabbed from different sources.
        Like apt, dnf, and others.
        [-]
        saagarjha 3 days ago
        Pretty sure my apt sources have the signing and package pointing to the same place
        [-]
        shakna 3 days ago
        If you have more than a single source, then apt will already be checking this for you.
        The default is more than a single source.
        [-]
        saagarjha 2 days ago
        All of mine point to like somethingsomething.ubuntu.com
        [-]
        shakna 2 days ago
        If it points to mirror.ubuntu.com, it'll be mirroring at host end, instead of inside apt. But as apt does do resolution to a list, it'll be fetching from multiple places at once.
  - romaniitedomum 3 days ago
    > I’m always a bit shocked how seriously people take concerns over the install script for a binary executable they’re already intending to trust.
    The issue is provenance. Where is the script getting the binary from? Who built that binary? How do we know that binary wasn't tampered with? I'll lay odds the install script isn't doing any kind of GPG/PGP signature check. It's probably not even doing a checksum check.
    I'm prepared to trust an executable built by certain organisations and persons, provided I can trace a chain of trust from what I get back to them.
    [-]
- VMG 3 days ago
  The thing that gets installed, if it is an executable, usually also has permissions to do scary things. Why is the installation process so scrutinized?
  [-]
  - davnicwil 3 days ago
    I think there's a fundamental psychological reason for this - people want to feel like some ritual has been performed that makes at least some level of superficial sense, after which they don't have to worry.
    You see this in all the obvious examples of physical security.
    In the case of software it's the installation that's the ritual I guess. Complete trust must be conferred in the software itself by definition, so people just feel better knowing for near certain that the software installed is indeed 'the software itself'.
  - tuhgdetzhh 2 days ago
    It would raise the same kind of alert for me if someone used wget to download a binary executable instead of a shell script.
    The issue is not the specific form in which code is executed on your machine, but rather who is allowed by you to run code on your computer.
    I don't trust arbitrary websites from the Internet, especially when they are not cryptographically protected against malicious tampering.
    However, I do trust, for instance, the Debian maintainers, as I believe they have thoroughly vetted and tested the executables they distribute, with a cryptographic signature, to millions of users worldwide.
- balder1991 3 days ago
  Even assuming it’s not malicious, the script can mess up your environment configuration.
  [-]
  - exe34 3 days ago
    I'm so thankful for nixos for making it hard for me to give in to that temptation. you always think "oh just this once". but with nixos I either have to do it right or not bother.
    [-]
    - hombre_fatal 3 days ago
      NixOS gives you a place to configure things in a reproducible way, but it doesn’t require you do it.
      [-]
      - exe34 3 days ago
        $ ./Downloads/tmp/xpack-riscv-none-elf-gcc-15.2.0-1/bin/riscv-none-elf-cpp Could not start dynamically linked executable: ./Downloads/tmp/xpack-riscv-none-elf-gcc-15.2.0-1/bin/riscv-none-elf-cpp NixOS cannot run dynamically linked executables intended for generic linux environments out of the box. For more information, see: https://nix.dev/permalink/stub-ld
        You have to go out of your way to make something like that run in an fhs env. By that point, you've had enough time to think, even with ADHD.
      - tombert 3 days ago
        It sort of does actually, at least if you don't have nix-ld enabled. A lot of programs simply won't start if they're not static-linked, and so a lot of the time if you download a third-party script, or try to install it when the `curl somesite.blah | sh`, it actually will not work. Moreover, it also is likely that it won't be properly linked in your path unless you do it thr right way.
  - maccard 3 days ago
    So can a random deb, or npm package, or pip wheel? You’re either ok with executing unverified code or not - piping wget into bash doesn’t change that
    [-]
    - dubi_steinkek 2 days ago
      Maybe they can with postinstall scripts, but they usually don't.
      For the most part, installing packaged software simply extracts an archive to the filesystem, and you can uninstall using the standard method (apt remove, uv tool remove, ...).
      Scripts are way less standardized. In this case it's not an argument about security, but about convenience and not messing up your system.
- OptionOfT 3 days ago
  Equally I don't like how many instructions and scripts everywhere use shorthands.
  Sometimes you see curl -sSLfO. Please, use the long form. It makes life easier for everybody. It makes it easier to verify, and to look up. Finding --silent in curl's docs is easier than reading through every occurrence of -s.
```
   curl --silent --show-error --location --fail --remote name https://example.com/script.sh
```
  Obligatory xkcd: https://xkcd.com/1168/
  [-]
  - Terr_ 3 days ago
    For a small flight of fancy, imagine if each program had a --for-docs argument, which causes it to simply spit out the canonical long-form version equivalent to whatever else it has been called with.
    [-]
    - ndsipa_pomu 2 days ago
      Or, a separate program that can convert from short to long form:
      > for-docs "ls -lrth /mnt/data"
      ls -l --reverse -t --human-readable -- /mnt/data
      (I'd put in an option to put the options alphabetically too)
      [-]
      - Terr_ 2 days ago
        While I'd appreciate that facility too, it seems... even-more-fanciful, as one tool would need to somehow incorporate all the logic and quirks of all supported commands, including ones which could be very destructive if anything went wrong.
        Kind of like positing a master `dry-run` command as opposed to different commands implementing `--dry-run` arguments.
        [-]
        ndsipa_pomu 2 days ago
        I did muck around with using "sed" to process the "man" output to find a relevant long option in a one-liner, so it wouldn't be too difficult to implement.
        I did something like this:
        _command="sed" _option="n" man -- "${_command}" | sed --quiet --expression "s/^ -${_option}.*, //p"
        Then I realised that a bit of logic is needed (or more complicated regexp) to deal with some exceptions and moved onto something else.
  - yjftsjthsd-h 3 days ago
    > Finding --silent in curl's docs is easier than reading through every occurrence of -s.
    Dumb trick: Search prefixed with 2 spaces.
```
  man curl
  /  -s
```
    Yields exactly one hit on my machine. In the general case, you may have to try one and two spaces.
  - ndsipa_pomu 3 days ago
    Absolutely agree.
    The shorthands are for when typing it at a console and the long form versions should be used in scripts.
  - lionkor 2 days ago
    Aren't there tools for which the short flags are standardized (e.g. POSIX) but the long flags aren't?
  - scrame 3 days ago
    agreed. i get if you're great at cli usage or have your own scripts, but if you're publishing for general use, it should be long form. that includes even utility scripts for a small team.
    also, putting it out long-form you might catch some things you do out of habit, rather than what's necessary for the job.
    [-]
    - ndsipa_pomu 2 days ago
      Another possible advantage is that I invariably have to check the man page to find the appropriate long-form option and sometimes spot an option that I didn't know about.
- zenlot 3 days ago
  If you don't trust the software, don't install it.
  [-]
  - nurettin 3 days ago
    Trusting software would be foolish. Most software has access to file system and the net. Due to practical reasons, I have no energy or time to verify whether the next update of libsecure came with a trojan or stole my env, neither do you. I just acknowledge this fact, take a risk and install it.
jakozaur 3 days ago
LLVM IR is quite fun to play with from many programming languages. The Java example is rather educational, but there are several practical example,s such as in Go Lang:
https://github.com/llir/llvm
troymc 3 days ago
I made a poster showing how one might write a Hello World program in 39 different programming languages, and even different versions of some common languages like Java:
https://troymcconaghy.blog/2025/01/13/39-hello-world-program...
[-]
- pron 3 days ago
  Nice, but as of JDK 25 (the preview JEP 445 has become the permanent JEP 512), the canonical Hello World in Java is:
```
    void main() {
        IO.println("Hello World");
    }
```
  [-]
  - saagarjha 3 days ago
    Since it seems like you work on Java, would you mind taking a look at https://bugs.java.com/bugdatabase/view_bug?bug_id=JDK-836673..., where this syntax does not work for shebangs?
  - prmoustache 3 days ago
    Not a java developer but why the void? Shouldn't your main function and program return an integer?
    [-]
    - tadfisher 3 days ago
      I believe that is a C-ism, where the C runtime calls your main() and exits the process with the return value. The Java equivalent is System.exit(int status).
    - gavinray 3 days ago
      The return type of a Java main is the JVM platform return type
      Sending system signals is external to the JVM platform
  - troymc 3 days ago
    Thanks, I made a note to update that someday.
- throwaway150 3 days ago
  Cool poster! If you don't mind me asking, would you share what tools you use to create this poster? You've got syntax highlighting going on there too. What did you use for that?
  [-]
  - iTokio 3 days ago
    You just have to read his blog, it is short and he answered everything.
    > he used python and xelatex
    > https://github.com/ttmc/hello-world-ways
    [-]
    - troymc 3 days ago
      Yep, and for syntax highlighting, I used the minted package [1]. Internally, minted uses the Pygments library [2].
      [1] https://ctan.org/pkg/minted
      [2] https://pygments.org/
      [-]
      - throwaway150 3 days ago
        Thanks!
- realo 3 days ago
  This is super cool! Now someone should make a similar poster with Hello World sent to a serial port.
  Bonus points if it is a RS485 port.
  Some language that seem to look good might show their true ugly face...
- pmdr 3 days ago
  Objective C is by far the weirdest on that list.
  [-]
  - saagarjha 3 days ago
    Objective-C is basically Java so I wouldn’t call it that weird.
    [-]
    - gnabgib 3 days ago
      Objective-C is significantly (11 years) older than Java.
      1984: https://en.wikipedia.org/wiki/Objective-C
      1995: https://en.wikipedia.org/wiki/Java_(programming_language)
      [-]
      - saagarjha 3 days ago
        Correct, Java was designed with a strongly influence from Objective-C.
        [-]
        gnabgib 3 days ago
        One might even say Java is basically Objective-C
        [-]
        pjmlp 3 days ago
        Kind of, but with C++ syntax to make it more appealing,
        https://cs.gmu.edu/~sean/stuff/java-objc.html
        saagarjha 3 days ago
        No, Java never took anything good from the language.
        [-]
        pjmlp 3 days ago
        Sun folks disagree,
        https://cs.gmu.edu/~sean/stuff/java-objc.html
        https://en.wikipedia.org/wiki/Distributed_Objects_Everywhere
        Sure, they could have taken a bit more, like proper AOT instead of it being a feature only available in third party commercial JDKs, or some low level niceties like C#.
        [-]
        saagarjha 3 days ago
        I was talking about good parts of the language
        [-]
        pjmlp 2 days ago
        Like [] and @ all over the place, C lack of safety, and manual memory management?
        Because I don't see what else good Java has left out, besides AOT in the box and unsigned types.
        [-]
        saagarjha 2 days ago
        Uh, the entire runtime?
        [-]
        jeberle 2 days ago
        I would look to the UCSD p-System as a precedent to the JVM. Both are byte-code interpreted VMs. Gosling used the p-system earlier in his career, prior to joining Sun.
        https://en.wikipedia.org/wiki/James_Gosling#Career_and_contr...
        The Objective-C runtime is very small: just enough to do late-bound fn calls to a tree of class defs. All on top of C.
        pjmlp 2 days ago
        I beg to differ, given the engineering effort that went into JVM across various Java vendors, versus Apple and NeXT have done.
        Proven by the fact that Swift had to be invented, as there was nothing left to fix Objective-C in a proper way.
        [-]
        saagarjha 2 days ago
        Swift has that runtime, by the way.
        [-]
        pjmlp 2 days ago
        Nope, Swift interops with Objective-C runtime to ease code migration from legacy Objective-C code, and existing Apple frameworks predating Swift.
        A runtime that isn't part of the cross-platform Swift project, with missing functionality being rewriten into Swift.
        [-]
        saagarjha 23 hours ago
        Yes, and those platforms are worse off for it.
  - watersb 3 days ago
    Smalltalk, but in C
namegulf 3 days ago
Wondering the benefits and how is this different from using GraalVM to build native images?
For eg. we could use Spring + Graal VM and get the application into native binaries without worrying too much about the low level stuff.
What are we missing?
[-]
- gavinray 3 days ago
  This article specifically discusses calling external C ABI libraries via the FFM API.
  GraalVM is for compiling JVM bytecode to native, architecture-specific binaries.
  FFM is like "[DllImport]" in .NET, or "extern" definitions in other languages.
  The article shows how to auto-generate JVM bindings from C headers, and then allocate managed memory + interact with externally linked libs via the FFM API passing along said managed memory.
  [-]
  - fniephaus 3 days ago
    BTW: We (the GraalVM team) maintain a full-blown LLVM bitcode runtime that can be embedded in Spring or any other JVM application and compiled to native: https://github.com/oracle/graal/tree/master/sulong
    [-]
    - gavinray 3 days ago
      May as well throw the Native Image C API for FFM-like capabilities out there too
      https://www.graalvm.org/latest/reference-manual/native-image...
      One of the neatest things I've been able to do is compile a .dll library "plugin" for an application which loads plug-ins by invoking a special exported symbol name like "int plugin_main()" using GraalVM and @CEntryPoint
      The entrypoint function starts a Graal isolate via annotation params and no native code was needed
  - namegulf 3 days ago
    Don't we have JNI for that?
- scrame 3 days ago
  people still use make for things. how many stand-alone utilities require npm?
  i don't know graalvm, but I've used too much ant, buldr, gradle and maven. I'm not really convinced Graal VM would make anything better just because you are more familiar with it.
  The author even says to just use what you like because that part doesn't matter.
  [-]
  - namegulf 3 days ago
    ant, buldr, gradle and maven - are build tools
    we're talking about native code here
rendaw 3 days ago
Self plug, I put together this reference/example before+after (high and corresponding intermediate/low level) example gallery for for a couple languages: https://andrewbaxter.github.io/semicompiled/ https://github.com/andrewbaxter/semicompiled?tab=readme-ov-f...
I was using it while dabbling on compiler stuff, it was useful to have a set of concise compilation examples. I haven't touched it much lately, unfortunately, and I added the eBPF because the target was there but had no way to validate it (standalone eBPF validator where?) so I think it's probably somewhat wrong... or invalid at least, maybe that's a separate concern for people who would want this.
mands 3 days ago
Nice read up of the new FFM API.
Recently saw a new FFM-based zero-copy transport and RPC framework using io_uring at https://www.mvp.express/
An interesting time to be in the Java/JVM ecosystem, meanwhile, back to my Spring Boot app...tho least we're on Java 25
kachapopopow 3 days ago
LLVM is such an amazing piece of software, the amount of uses for it are unlimited especially when it comes to obfuscation. The IR is also really fun for compiling bytecode to native code since it's pretty trivial to translate it into IR (opposite of what is done in this article)
Octoth0rpe 3 days ago
I've been playing with a very basic compiler for a language that looks a bit like go -> llvm ir, but I'm finding myself constantly revising my AST implementation as I progressively add more things that it needs to represent. Is anyone aware of any kind of vaguely standardized AST implementation used by more than one project? I've been searching this morning for one and am coming up empty. My thinking is that if I can find some reasonably widely used implementation, then hopefully that implementation has thought out lots of the corner cases that I haven't gotten to yet.
[-]
- znkr 3 days ago
  LISP ;-)
  [-]
  - xnacly 3 days ago
    This, lisp is perfect for representing arbitrary data, nesting is just another sexpr, easy to produce, easy to parse and easy to debug / reason about
    [-]
    - pjmlp 3 days ago
      When I did my degree, the years prior to mine had some flexibility choosing the implementation language for compilers class.
      Lisp and Prolog were forbidden due to how easy the whole exercise would be.
  - Octoth0rpe 3 days ago
    I can appreciate this answer, but I don't think it's really what I'm asking.
    I think I'm more looking for some kind of standardized struct definition that translates easily to llvm IR and is flexible enough for a wide variety of languages to target.
    Something like this: https://gist.github.com/thomaswp/8c8ef19bd5203ce8b6cd4d6df5e... (Which doesn't meet my criteria because AFAICT isn't used by anything, but is reasonably close to what I want) or this: https://docs.rs/sap-ast/latest/src/ast/lib.rs.html#1-83 (which seems specific to SAP, I would like something more general)
- emptysea 3 days ago
  Ruff’s ast is used by Ruff, Ty, and Pyrefly
  [-]
  - Octoth0rpe 3 days ago
    Thank you! this looks pretty helpful
znpy 3 days ago
500 internal server error…
zkmon 3 days ago
What's wrong with using the standard JDK for Java code?
[-]
- throwaway150 3 days ago
  Nothing wrong with it. Why would you assume the author is in anyway hinting that there's something wrong with using the standard JDK for Java code?
  [-]
  - zkmon 3 days ago
    Ok. Let me ask differently. Why would I download and use LLVM for working with java code? Which usecases favor this?
    [-]
    - mands 3 days ago
      It's more an fun educational overview of the new FFM API.
      I can't think of many actual use-cases where you'd want to use the LLVM JIT over those built-in to HotSpot.
      Interfacing with existing LLVM-based systems, writing a very tight inner loop using LLVM where you absolutely need LLVM-like performance, or creating a compiler that targets LLVM using Java would be the main "real-world" use-cases.
    - drzaiusx11 3 days ago
      This is interop glue to cross language boundaries in the JVM without the problems that come with JNI. The natural goal/use-case being that you can call pre-existing code in other languages that target LLVM IR.
    - TazeTSchnitzel 3 days ago
      That's not what the article is about.
    - connicpu 3 days ago
      The article is presenting something different entirely. This is the precursor to what it would take to create a compiler written in java that produces native code.
    - almostgotcaught 3 days ago
      "why would I use a frying pan when I can use a flashlight"
      The two things have nothing to do with each other.