What an amazing article. And what amazing analysis of this 30 years old blob. This was super enjoyable to read.
My only tiny gripe is that with the first quirk the author who insists that his implementation is bug for bug compatible lists the other implementations behavior and explains that they are very different from the original.
And then they proceed with additional quirks where the supposedly bug-for-bug compatible implementation also is different in the same way as the first example in that it produces an error message rather than the quirky output.
Don’t get me wrong: errors rather than quirks is much better behavior, but then don’t claim to be bug-for-bug compatible, nor roast other implementations for doing the same thing.
Apologies if I got the tone wrong there, I definitely wasn't trying to roast the other projects.
In terms of what I prioritized bug-for-bug compatibility on, I tailored it to getting https://github.com/squeek502/win32-samples-rc-tests passing 100%, and then also tried to take into account how likely a bug/quirk was to be used in a real .rc file (ultimately this is just a judgement call, though). The results of that test suite (provided in the readme) is also a better indication of how rc.exe-compatible the various resource compilers are in practice (i.e. on 'real' .rc files).
> Somehow, the filename { causes rc.exe to think the filename token is actually the preceding token, so it's trying to interpret ICON as both the resource type and the file path of the resource. Who knows what's going on there.
> [...]
> Strangely, rc.exe will treat FOO [in place of the resource type, followed by EOF] as both the type of the resource and as a filename (similar to what we saw earlier in "BEGIN or { as filename").
Having written a stupid lexer recently, I’m almost certain I know what’s going on there. The lexer has a lexeme type global that it always sets, and a separate lexeme text global that it only sets for text-like tokens (numbers, identifiers, strings, and bare filenames, which it does not interpret other than to tell where they end, that being the easiest way to deal with ANSI C’s pp-numbers) but not for punctuation or EOF. Now have the code that looks for the resource filename blindly reach into the text global without first checking the type global (not even to check if it’s EOF), and you get exactly the behaviour above.
(Alternatively, the type could instead be returned by the next-lexeme function—that’s what my stupid lexer currently does, anyway, though I’m considering changing it. The result is the same.)
> For whatever reason, rc.exe will just take the last number literal in the expression and try to read from a file with that name [...].
I think I’m skilled enough to fuck that up too: the code to read the filename calls the expression parser (which is either not supposed to be called at EOF, or returns EOF that you’re supposed to check for in case that happens) and then blindly reaches into the lexeme text variable.
Very brave author. I have contributed a few patches to WINDRES to fix some bugs and it's a strange tool / concept.
I'm going to guess that Microsoft won't wish to fix any bugs in RC.EXE, since that would break some existing resource scripts, and at this point backwards compatibility is much more important than dealing with quirks.
I think everything labeled 'miscompilation' could be fixed without breaking backwards compatibility, since triggering them always leads to an unusable/broken .res file. No clue how likely it is they'll be fixed, though.
You're assuming that anything will notice if the .res file is broken, though. It might add a .res because "that's what Windows programs are supposed to do", or "only old versions of the program actually used that", or "everybody knows it crashes if you use that menu option, so don't use it."
But the build still depends on the compilation of resources succeeding.
For the miscompilations, the fix wouldn't add a new compile error. Instead, rc.exe would just start doing the right thing in certain scenarios, so previously broken things would start working. For example, padding bytes that were previously missing would get properly added.
It's always theoretically possible that someone, somewhere is somehow relying on such a miscompilation, but for many of the ones detailed in the article, it seems extraordinarily unlikely.
The concept is actually great, having declarative GUI components compiled into efficient binary representations, and allowing named values to be shared in a single-point-of-definition style with the code that will be handling those components at runtime.
I was thinking `NOT (1|2)` and `NOT () 2` could make sense if the parser just has a `not_in_effect` flag that gets set to true when a `NOT` is encountered and then applies to the next integer as soon as one is parsed. So `NOT (1|2)` sets the flag, then starts parsing `(1|2)`. Once it's parsed the `1`, it notices a NOT is in effect so it applies it to the 1 as if it had just parsed `NOT 1` (which leaves 0 unchanged), then parses `| 2`, so the result is 2.
`NOT () 2` would be the same logic. `)` signifies the end of an expression and thus evaluates to the current integral result, which is 0 (for the same reason that unary - is zero), and a NOT is in effect so it's treated as `NOT 0` which is a no-op ("unset no bits"). Then the next `2` makes the result `2`. This assumes that `x y` is parsed the same as `x | y` (maybe only if a `NOT` has been parsed at any point first) or as `y` (the same stack-like "the last number that was parsed becomes the result" behavior described in other items).
This doesn't explain the `7 NOT NOT 4 NOT 2 NOT NOT 1 = 2` case though. If the parser just *sets* the `not_in_effect` flag when it encounters a `NOT` (instead of *toggling* it), then this would be `7 | NOT 4 | NOT 2 | NOT 1` which would be 0. If the parser does toggle the flag, this would be `7 | 4 | NOT 2 | 1` which would be 1 or 5. If the parser treats a `NOT` as ending the previous expression (if any), this would be `7 | NOT 0 | NOT 4 | NOT 2 | NOT 0 | NOT 1` which would be 0.
> My resource compiler implementation, resinator, has now reached relative maturity and has been merged into the Zig compiler (but is also maintained as a standalone project),
I was going to say scope creep, but then I remembered I’ve replaced the cross toolchain with `zig cc` in a few small cgo projects. Does zig intend to become the busybox of compilers?
Resinator's error messages look amazing! I also feel like I’ve gained a lot of cursed but useless (to me) knowledge, so thanks for that. :-)
I don't have a horse in this race, but regarding FONT resources, I would like to humbly suggest not supporting them at all. Radical, but from what you wrote, they do seem pretty weird and ripe for accidental misuse. Plus, they are obsolete and it seems like Resinator already intentionally diverges from rc.exe in a few cases anyway.
I'm actually pretty okay with where I've landed with FONT resources. The legwork has already been done in figuring things out, and with the strategy I've chosen, resinator doesn't even need to parse .fnt files at all, so the implementation is pretty simple (I wrote a .fnt parser, but it's now unused[1]).
I recently wrote an ANTLR4 parser for RC files, as part of software archeology on a legacy codebase I support. Considering how many programs over the last 30 (35?) years exist that use the Windows resource compiler, it's surprising how little in-depth information and how few open source alternative tools exist for it. So I'm really glad to see both the information and the project in this post.
That's a crazy amount of work and a crazy amount of quirks indeed. Very much illustrates a mindset where the user is at fault if they provide bad input - and development effort for everything was multiplied compared to today. In 1985, of course, nobody cared about things like security from untrusted inputs, and reproducible builds.
My favourite bug from this list is that the compiler expands tabs to spaces in string literals and puts them at tab stops based on the string literal's horizontal position in the source file.
I think that being able to directly define resource type 6 is not a bug. You got exactly what you asked for - an invalid resource. Crashing when loading it isn't a bug, either.
I suppose that style flag arguments are parsed as |-separated lists of numeric or NOT expressions, rather than single expressions where | serves as bitwise-or.
> If the truncated value is >= 0x80, add 0xFF00 and write the result as a little-endian u32. If the truncated value is < 0x80 but not zero, write the value as a little-endian u32.
This is sign-extension: s8 -> s16 -> u16 -> u32. The examples below this also seem to have reversed the order of the input byte and the FF.
Visual C++ 6, at least, includes a toolbar resource editor. IIRC it shows the toolbar metadata and the bitmap together in one editor, and you edit each button's image individually even though they are concatenated into one bitmap in the resource file.
"GROUPBOX can only be used in DIALOGEX" might refer to some limitation other than the resource compiler. For example, perhaps Windows versions that don't support DIALOGEX also don't support GROUPBOX.
A lot of them could be caused by memory safety errors. For example the fact that "1 ICON {" treats "ICON" as the filename is probably because the tokenizer doesn't set the Microsoft equivalent of yytext for tokens where it's not supposed to be relevant. Maybe it would even crash (null pointer) if { could be the first token (which it can't).
I liked this article. I would suggest having a new category of ”validation” for some of these. It’s not particularly fair to call something a bug, for example, when it’s just that rc.exe doesn’t play nicely with things it never expected to receive, like non-numeric characters etc.
My only tiny gripe is that with the first quirk the author who insists that his implementation is bug for bug compatible lists the other implementations behavior and explains that they are very different from the original.
And then they proceed with additional quirks where the supposedly bug-for-bug compatible implementation also is different in the same way as the first example in that it produces an error message rather than the quirky output.
Don’t get me wrong: errors rather than quirks is much better behavior, but then don’t claim to be bug-for-bug compatible, nor roast other implementations for doing the same thing.
In terms of what I prioritized bug-for-bug compatibility on, I tailored it to getting https://github.com/squeek502/win32-samples-rc-tests passing 100%, and then also tried to take into account how likely a bug/quirk was to be used in a real .rc file (ultimately this is just a judgement call, though). The results of that test suite (provided in the readme) is also a better indication of how rc.exe-compatible the various resource compilers are in practice (i.e. on 'real' .rc files).
> [...]
> Strangely, rc.exe will treat FOO [in place of the resource type, followed by EOF] as both the type of the resource and as a filename (similar to what we saw earlier in "BEGIN or { as filename").
Having written a stupid lexer recently, I’m almost certain I know what’s going on there. The lexer has a lexeme type global that it always sets, and a separate lexeme text global that it only sets for text-like tokens (numbers, identifiers, strings, and bare filenames, which it does not interpret other than to tell where they end, that being the easiest way to deal with ANSI C’s pp-numbers) but not for punctuation or EOF. Now have the code that looks for the resource filename blindly reach into the text global without first checking the type global (not even to check if it’s EOF), and you get exactly the behaviour above.
(Alternatively, the type could instead be returned by the next-lexeme function—that’s what my stupid lexer currently does, anyway, though I’m considering changing it. The result is the same.)
> For whatever reason, rc.exe will just take the last number literal in the expression and try to read from a file with that name [...].
I think I’m skilled enough to fuck that up too: the code to read the filename calls the expression parser (which is either not supposed to be called at EOF, or returns EOF that you’re supposed to check for in case that happens) and then blindly reaches into the lexeme text variable.
I'm going to guess that Microsoft won't wish to fix any bugs in RC.EXE, since that would break some existing resource scripts, and at this point backwards compatibility is much more important than dealing with quirks.
Edit: Reminds me a bit of my adventures with the Registry, another ill-conceived part of Windows: https://rwmj.wordpress.com/2010/02/18/why-the-windows-regist...
But the build still depends on the compilation of resources succeeding.
It's always theoretically possible that someone, somewhere is somehow relying on such a miscompilation, but for many of the ones detailed in the article, it seems extraordinarily unlikely.
`NOT () 2` would be the same logic. `)` signifies the end of an expression and thus evaluates to the current integral result, which is 0 (for the same reason that unary - is zero), and a NOT is in effect so it's treated as `NOT 0` which is a no-op ("unset no bits"). Then the next `2` makes the result `2`. This assumes that `x y` is parsed the same as `x | y` (maybe only if a `NOT` has been parsed at any point first) or as `y` (the same stack-like "the last number that was parsed becomes the result" behavior described in other items).
This doesn't explain the `7 NOT NOT 4 NOT 2 NOT NOT 1 = 2` case though. If the parser just *sets* the `not_in_effect` flag when it encounters a `NOT` (instead of *toggling* it), then this would be `7 | NOT 4 | NOT 2 | NOT 1` which would be 0. If the parser does toggle the flag, this would be `7 | 4 | NOT 2 | 1` which would be 1 or 5. If the parser treats a `NOT` as ending the previous expression (if any), this would be `7 | NOT 0 | NOT 4 | NOT 2 | NOT 0 | NOT 1` which would be 0.
I was going to say scope creep, but then I remembered I’ve replaced the cross toolchain with `zig cc` in a few small cgo projects. Does zig intend to become the busybox of compilers?
I don't have a horse in this race, but regarding FONT resources, I would like to humbly suggest not supporting them at all. Radical, but from what you wrote, they do seem pretty weird and ripe for accidental misuse. Plus, they are obsolete and it seems like Resinator already intentionally diverges from rc.exe in a few cases anyway.
I'm actually pretty okay with where I've landed with FONT resources. The legwork has already been done in figuring things out, and with the strategy I've chosen, resinator doesn't even need to parse .fnt files at all, so the implementation is pretty simple (I wrote a .fnt parser, but it's now unused[1]).
[1] https://github.com/squeek502/resinator/blob/master/src/fnt.z...
My favourite bug from this list is that the compiler expands tabs to spaces in string literals and puts them at tab stops based on the string literal's horizontal position in the source file.
I think that being able to directly define resource type 6 is not a bug. You got exactly what you asked for - an invalid resource. Crashing when loading it isn't a bug, either.
I suppose that style flag arguments are parsed as |-separated lists of numeric or NOT expressions, rather than single expressions where | serves as bitwise-or.
> If the truncated value is >= 0x80, add 0xFF00 and write the result as a little-endian u32. If the truncated value is < 0x80 but not zero, write the value as a little-endian u32.
This is sign-extension: s8 -> s16 -> u16 -> u32. The examples below this also seem to have reversed the order of the input byte and the FF.
Visual C++ 6, at least, includes a toolbar resource editor. IIRC it shows the toolbar metadata and the bitmap together in one editor, and you edit each button's image individually even though they are concatenated into one bitmap in the resource file.
"GROUPBOX can only be used in DIALOGEX" might refer to some limitation other than the resource compiler. For example, perhaps Windows versions that don't support DIALOGEX also don't support GROUPBOX.
A lot of them could be caused by memory safety errors. For example the fact that "1 ICON {" treats "ICON" as the filename is probably because the tokenizer doesn't set the Microsoft equivalent of yytext for tokens where it's not supposed to be relevant. Maybe it would even crash (null pointer) if { could be the first token (which it can't).
> |-separated lists of numeric or NOT
Note that | is not the only operator that can be used in style parameters, & + and - are all allowed too.
> perhaps Windows versions that don't support DIALOGEX also don't support GROUPBOX
Seems possible for sure. From [1]:
> The 16-bit extended dialog template is purely historical. The only operating systems to support it were the Windows 95/98/Me series.
[1] https://devblogs.microsoft.com/oldnewthing/20040622-00/?p=38...
> The examples below this also seem to have reversed the order of the input byte and the FF.
Good catch, fixed