ELF Crimes: Program Interpreter Fun
December 21, 2025
For reasons I don’t even remember, I was reading about the details of the ELF executable format, and stumbled across the program interpreter
functionality used for dynamic linking. Immediately I was struck by one of the more cursed and yet simultaneously pedestrian ideas I’ve had, but we’ll get back to that.
For those unacquainted (because who is?), the ELF standard specifies a PT_INTERP type of program segment, which is what makes dynamic linking work. The PT_INTERP segment contains a single string that specifies a hardcoded path to the dynamic linker/loader (the interpreter
), typically /lib64/ld-linux-x86-64.so or such. When present, it tells the OS loader that the program can’t be executed as-is. It first needs to run the dynamic linker found at that path, which will go load all the dynamic libraries into memory and then update the addresses referenced by the executable to the libraries’ locations in memory. Then it can run the original executable like would’ve normally been done by the OS loader.
But wait, we’re getting ahead of ourselves. The above is a description of what the dynamic linker does holistically. The ELF interpreter functionality in particular is much much simpler: after the to-be-interpreted executable is loaded into memory,1 the OS then loads the interpreter executable into the same process address space, and executes the interpreter’s code. That’s literally it; the OS doesn’t invoke the interpreter specially, it doesn’t expect the interpreter to have any specific behavior or functionality, it doesn’t even expect the to-be-interpreted executable to ever be run. It just loads both executables and expects the interpreter to do everything else that needs to be done.2
This opens up some… interesting possibilities. For instance, imagine using ELFs as a data storage format, where you can directly run them to open them in the program, no need for any of that gosh dang xdg-open or MIME types based off of file extension or any of that nonsense! Or now your programming language’s bytecode is on equal footing with natively compiled languages! Clearly it’s such a good idea, I can’t see why seemingly no one has ever done this before (and written about it)!
Now, the ELF designers did seem to envision some other uses since they used the generic term interpreter
rather than something specific to linking; but I assume they were imagining something, you know, sensible XD.3 Even bytecode is a step too far I think, since it’d still be solely cross-platform data needlessly put in a native executable shell.
But of course I would be a Fool to let anything like being sensible
stop my grand plans!
Trying It
After conceiving of this idea, for some reason my first thought was not to use a linker script, or even to write a program to generate an ELF from scratch. Instead, I decided to write an ELF by hand in a hex editor! Genius!
Naturally that was abandoned after a half-hour. It went great for a bit, I wrote a simple executable that would just exit immediately, but then I had to add a PT_INTERP program header, and had to go manually update all the offsets in the file to updated positions, and I realized it was untenable immediately after that. Although I think it ultimately was a useful exercise, great for learning the intricacies of the underlying format, which is much simpler than you might expect (given the mythology that surrounds compilers and linking and stuff in programmer culture).
I switched to the much more straightforward option of using a linker script and objcopy. And even though I’m generally pretty familiar with linker scripts, when switching I had to first learn about the existence of the PHDRS command to manually specify the program segments that get loaded into memory4, so that’s another actually useful thing I’ve learned from this project.
This implementation was pretty simple since it doesn’t need any of the actual dynamic linking stuff like symbol tables or compiler metadata; it only needs a section with all the data and a section with the interpreter path. And the interpreter has no particular requirements either, other than being statically linked and not overlapping with the interpreted executable’s memory.
But after getting the linked script setup working, even though it seemed like it should’ve been enough, it absolutely refused to work. It would only give me an opaque exec format error
error, no matter how many different shotgun debugging things I tried, I ended up spending something like six hours over the course of two days on it with no real progress. I went on the Fediverse and asked and some very smart people gave me all the esoteric edge cases they could think of, still to no avail. It took digging into the Kernel’s ELF loader code (from someone on Fedi’s advice) and looking at all possible instances of ENOEXEC before I finally found the issue: the interpreter path in the executable needs to be NUL-terminated. After making that trivial one-byte change, even the first simplest version started working perfectly…
To be fair, I don’t think I’m totally to blame for that one. The interpreter path already has an explicitly-specified length in the ELF program headers, so why does it also need to be NUL-terminated? Additionally, readelf helpfully displays control codes for non-printing characters in the interpreter path, which I knew because I’d accidentally left a newline at the end of the path when I first started. But then it helpfully elides the NUL terminator at the end, while also not warning if it’s missing a NUL. This meant that when comparing binaries with readelf, a working binary and my non-working binary would appear identical. I filed a bug about this and offered to write a patch myself, but as of publishing this post it unfortunately hasn’t been responded to by the developers.
Final Result
It’s a proof of concept that doesn’t do anything interesting in and of itself. The data in the interpreted executable is just hello, world!
, and the interpreter reads that string and prints it to the console. Sort of a black triangle, all the interesting stuff is how it gets there behind-the-scenes. Although to make it more interesting, all one needs to do is specify an alternate data file as the first parameter to generate.sh and change the code in the body of the interpret function, and it can do whatever you want, since it’s basically just passing a byte array from the data file to the interpreter code.
The bulk of the interesting stuff is in interpreted-data.ld, the end of generate.sh, and the interpreter code. The first and second link the interpreted data together into the interpreted executable, and the interpreter code pulls the data back out and does whatever with it.
Ultimately, I’d say ends up being surprisingly trivial to do (when one isn’t making stupid mistakes); it’s just well off the beaten path. I think I’m pretty happy with the result, it was a fun excuse to learn a lot about loading and other stuff that’s not usually very visible. I find very unintended hacks like this generally amusing too so it is inherently Neat.
Disclaimer: Do Not Do This!
In case it isn’t clear, you absolutely should not do this for real use XD. Just use a shebang, the mild portability issues with those pale in comparison to the issues with using an ELF file. It doesn’t matter if it’s not a textual format: I know for a fact that I’ve seen otherwise fully-binary formats have their magic number
be #!/usr/bin/env so their binary files could be directly executed on at least a decent majority of *nix systems.
Or if a shebang really isn’t enough for you, there’s always binfmt_misc, as used by Java, Wine, QEMU, et al. It lets you register custom executable handlers as long as there’s a clear magic number in the file or a consistent file extension.
I won’t even attempt to enumerate all the issues with using ELF like this, but here’s a few off the top of my head:
- The executable is hardcoded for one architecture5, even though it’s just data and could work anywhere if you used any normal format.
- This is doing something that’s normally expected to only be done by some of the lowest-level userspace code on the system. The portability isn’t as bad as one may think, but still none too portable over any format actually designed for it. Even fundamental things are nonportable, e.g. the footnote linked above about how the interpreted executable can either be loaded into memory or the interpreter can be given a file descriptor of the (unloaded) interpreted executable.
- It requires careful placement of all loadable segments in the interpreter and data file, otherwise the interpreter will collide and happily clobber the already-loaded data.
- It just radiates jank. This is not something even kernel/compiler hackers would normally do, so tooling is not really oriented towards doing stuff like this. Also, have fun finding documentation when you need it, since no sensible person would want to do anything like this!
-
n.b. the ELF spec says the to-be-interpreted executable can be loaded into memory by the ELF loader. However, it also allows the OS loader to only load the interpreter into memory, and then pass it a file descriptor to the to-be-interpreted executable in the auxiliary vector; expecting the interpreter to sort out the initial loading on top of everything else. AFAICT from reading binfmt_elf.c, Linux will always do the former, even though some scattered old Linux docs and elf(5) claim it might do either. Regardless, other OSes using ELF could easily do either, I haven’t looked. ↩︎
-
As a fun side-effect this means that the dynamic linker, despite typically having a
.soextension (and also being marked internally as a shared library), can actually be run like a normal executable. Try running/lib64/ld-linux-x86-64.so.2 --helpsometime! ↩︎ -
I’m guessing they were probably thinking about things like
fat
(multi-architecture) binaries, or something else involving native code where the non-portability of ELF still makes sense. Or maybe they were only thinking of dynamic linking, but like a lot of format designers/standardizers they were allergic to using the standard jargon everyone in the space already knows. ↩︎ -
ELF distinguishes between
sections
andsegments
. Sections are semi-abstract named blocks of data stored in the object/executable file for use in linking (static or dynamic) and relocation, even if they don’t get loaded into memory at runtime. They’re the things programmers usually interact with if they have to think about ELF. Segments are sets of data that actually get loaded into memory for use by the program, as well as listing a few other things needed by the program loader: the program interpreter path, dynamic linking symbol tables, etc.Note that sections and segments are essentially different
views
into the same set of data within the file, so data can be owned by both a segment and a section. e.g. for those special loader metadata segments, usually there are sections and segments that directly correspond; like a.dynamicsection that holds the data also referenced by thePT_DYNAMICsegment. ↩︎ -
You can make ELFs with no specific instruction set (in fact my demo repository does that for the intermediate data object files before linking), which in theory would allow this technique to only be dependent on endianness and word width, not ISA or ABI. However, since this depends on the OS’ loader accepting and loading the executable, the data file is forced to be hardcoded to a specific architecture supported by the machine.
You could still parse the data out by directly invoking the interpreter with the data ELF as an argument since that can be done architecture/ABI/endian independently, but why would you use ELF instead of a dedicated data-only format then??? ↩︎
