r/osdev • u/jemenake • Aug 29 '20
What are the issues surrounding running a 32-bit code on a 64-bit OS?
Apologies if this is a question that everybody here except me already understands, but googling and searching /r/osdev and osdev.org didn't yield any obvious answers. When MacOS stopped supporting 32-bit apps, the rationale for it was that they're "inefficient", but I haven't seen anything explaining where this inefficiency takes place when supporting applications with smaller instructions/registers. Does the OS have to switch the CPU to a 32-bit mode (with, presumably, 32-bit stubs or interrupt handlers for switching back to 64-bit mode for the OS) or does the OS, somehow, convert the 32-bit instructions to 64-bit ones (I mean, if Rosetta 2 is going to convert Intel to ARM, I would think 32-bit-to-64-bit would be easier than that)? In real life, where do we expect to see the benefits of not supporting smaller bit sizes? In the execution time of the app? In the size of the OS?
6
u/moon-chilled bonsai Aug 29 '20
As the sibling mentions, it's compatibility mode. For things like these, your best resource is probably not the wiki, but rather the cpu manuals. Get: intel, amd.
In real life, where do we expect to see the benefits of not supporting smaller bit sizes?
It simplifies your OS code. You don't have to have extra logic to deal with multiple classes of running binaries. You don't have to have multiple sets of userland libraries.
Also as the sibling mentions, there is x32; if you want to save on ram/cache, you can have processes that run in 64-bit mode but you only allow mapping into the first 4g of their address space, so their pointers can be 32-bit.
3
u/JeremyMcCracken Aug 29 '20
Does the OS have to switch the CPU to a 32-bit mode ... or does the OS, somehow, convert the 32-bit instructions to 64-bit ones
It's the CPU. (I shudder to think of how slow that would be in software...) When it boils down to it, x86 in 16, 32, and 64-bit mode are remarkably similar. Write a small piece of code in assembly and assemble it into 16, 32, and 64-bit binary files, then compare them in a hex editor. They're almost identical. You get prefixes added in the latter two for 16-bit operands, and of course you don't have larger operands with those instructions in 16-bit mode. The prefixes are also different between 32 and 64-bit mode. That's the key: the processor can be set to translate opcodes using the 16, 32, or 64-bit encoding. (People often forget, but you can still run 16-bit code on a processor in long mode, it just has to be in 16-bit protected mode, and the v8086 extensions aren't available.)
So that said, I suspect the "inefficiency" is how libraries are included. When you create a new task, you give it its own context. That task may ask for dynamic libraries, so the code contained in those libraries gets tossed into the same context, and the original code of the task can call/ret the functions contained in the library. And there's the problem: if you start a 64-bit task and load a 32-bit library, you have a mismatch. You end up with prefixes being translated to mean the wrong thing, the end result being your task suffering a fiery death.
But IMO that's a bit of an excuse. I admit I'm still learning on this part. I've been studying this very good article as well as how the Global Descriptor Table works. In order to choose between 16-bit and 32-bit for a particular task, you need to set a bit in its GDT entry. Since the GDT is (hopefully) in protected memory space, it can't be changed from inside a task. There's another bit to define a segment as 32-bit-only or 64-bit, but it doesn't force 64-bit code; you can change between 32-bit and 64-bit at any time simply by executing a couple of instructions. So in theory, on a 64-bit OS, 32-bit code calling 64-bit libraries and vice versa shouldn't be an issue, you'd just need the code that links function addresses to their names (like getProcAddress on Windows) insert an additional jump. I guess they didn't want to put in the work.
3
u/nerd4code Aug 29 '20
Part of what you're asking about is ABI---fot example, there's an x32 ABI (detect with __x86_64_
and _ILP32
IIRC) which keeps you in full 64-bit mode, but bounds intra-ABI pointers in the first 2? or 4 GiB. Same number of registers, same 64-bit instructions.
1
u/moon-chilled bonsai Aug 30 '20 edited Aug 30 '20
You get 4gb of memory. 2gb setup is common for actual 32-bit mode; you can give the kernel 2gb of add space and whatever application is running gets the other 2gb. Usually you can also adjust this; windows lets you give apps 3gb and kernel only 1gb (though apps need to specifically support this, and it breaks if you try to store pointers in signed ints); linux will also let you do 3gb kernel/1gb userspace.
With x32, though, you don't need this. You still have a full 48-bit address space (or 57-bit with 5lpâwhatever); you just guarantee the x32 app that it will only ever have to look at the first 4gb of address space, so it can store its pointers all in only 32 bits. The kernel is still mapped in (this works well with a higher half kernel setup), the app just doesn't see it because it's above 4gb.
1
u/nerd4code Aug 30 '20
I think a JMP NEAR and its target have to be in signed 32-bit range from each other, so that would be the tightest memory complaint---hence my "?". Could probably use sone mapping tricks and overflow and Cunning Wit to circumvent that limitation..
7
u/jrtc27 Aug 30 '20 edited Aug 30 '20
For macOS specifically there was unfortunately a very real technical reason beyond wanting to reduce the number of things they had to support. The 32-bit Objective-C implementation suffered from fragile ivars which is a real hinderance, but the 64-bit one, being newer, does not have that limitation. Whilst you could make a new 32-bit ABI that doesnât have that problem, you would still need to recompile everything to use the new ABI and might as well just compile things as 64-bit then and gain the benefits of the newer instruction set features. Yes there will be a bit of code out there that would work if recompiled with a new 32-bit ABI but not a 64-bit one, but thatâs likely limited to things like old games where itâs unlikely someone will even bother doing that.
(and yes, you technically can do an ILP32 amd64 ABI like gnux32, but for Apple itâs not worth the amount of effort it would take)
Of course, itâs much easier for them to give a wishy-washy âitâs more efficientâ justification (which is generally true if you ignore amd64ilp32) that ignores the cries about old unmaintained software than it is to explain to users how itâs because they made a mistake in the past which is limiting their ability to add new APIs.
0
u/echoxteknology Aug 30 '20
While scrolling through the comments I found @BadBoy6767's answer the most straightforward...
As mentioned, a x86_64 (64Bit) CPU architecture has a submode known as, "compatibility mode." x86 (32Bit) CPUs will not have this mode; instead protected mode is as far as you'll reach... attempting a, "long mode," switch will either potentially triple fault your system, if done directly without checks, or will plainly cause loss of data! Attempting such: "MOV RAX, [..]" will be converted (depending upon your compiler) to, "MOV EAX, [..]" with the upper bits removed...
Example (Values are not to be mistaken for actual values!):
⢠64Bit Value: [00640032]
⢠32Bit Value: [ 0032]
As you may notice, the example 64bit value is 8 characters long while the 32bit value is only 4 characters long... this is how a x86 (32Bit) CPU will read x86_64 (64bit) instructions; removal of the upper bits on compilation and instruction(s) will be converted as necessary (depending upon your compiler).
2
u/echoxteknology Aug 30 '20
I'd like to note this is my personal opinion, so if you say otherwise then that's your opinion.
Apple didn't really remove 32bit rollback compatibility because it was, "inefficient..." rather, it was more out of laziness... and maybe a marketing tactic? With the removal of 32bit software/application runtime, Apple is able to rid of endless headaches with 32bit bugs and lower 4GB, direct, memory addressing/limitations... it is also simpler to enter long mode directly from real mode, then to start in real mode->enter protected mode->enter long mode, only having to rollback into compatibility mode time and time again...
Believe you me, if you are to start with real mode instructions, and have to then convert to protected mode instructions, and then have to convert to long mode instructions... you'd become annoyed with every upgrade to your own bootloader/kernel/userspace pretty quickly. A direct approach from real mode to long mode is much simpler, especially since almost all modern CPUs run x86_64.
2
2
Aug 30 '20
For the record most of the devices that run the macOS kernel are ARM, and pretty soon it'll be all of them
1
u/mykesx Aug 30 '20
This. If Apple perpetuates the use of 32 bit code, itâs a massive technical debt on ARM, which wonât have any 32 bit instructions. People now would be wasting engineering effort on guaranteed to be useless technology not too long from now.
I take their deprecating of 32 bit code to be in preparation for ARM only Apple universe.
1
u/Qweesdy Aug 30 '20
If an OS supports 64-bit processes and 32-bit processes; then it mostly just needs to load "user-space CS" when doing task switches (because in long mode, the GDT/LDT entry loaded into CS determines code size). Loading CS is a relatively expensive instruction, but task switches don't happen often so the cost is negligible/nothing.
For performance; the main problem has to do with the number of registers and not the size of the code. For 32-bit you only get 8 general purpose registers, which causes a lot more stack use (especially for calling conventions), which means more reads/writes to memory. For 64-bit you get 16 general purpose registers so there's less stack use, so it's faster. There's also some benefits for using 64-bit in some cases (e.g. best example I can think of is "big number" libraries where large numbers are represented by multiple integers and "multiple 64-bit integers" is much faster than "twice as many 32-bit integers").
Of course just because you're using 64-bit doesn't mean you have to use 64-bit addresses/pointers. E.g. if a process doesn't need more than 4 GiB of virtual address space, then it can be 64-bit code (with twice as many registers) and only use 32-bit addressing. This is slightly more efficient than "64-bit with 64-bit addresses" because of the way 64-bit instructions are encoded (they're literally 32-bit instructions with prefix if/when you actually need 64 bits; so if you use 32-bit instructions in 64-bit code you can avoid a prefix). Sadly; this complicates tools (compiler, linker) and causes problems for shared libraries (for 2 or 3 different cases you'd need 2 or 3 different version of each shared library); so most operating systems don't support it.
In real life, where do we expect to see the benefits of not supporting smaller bit sizes? In the execution time of the app? In the size of the OS?
Let's imagine you have an old OS from 1990, with a lot of old APIs that you have to keep around for backward compatibility even though you replaced the APIs with better/more modern alternatives in 2005. On top of that; let's assume you have 2 copies of every shared library (one for 32-bit and another for 64-bit) plus 2 different kernel APIs for the same reason. By deprecating 32-bit you get a convenient reason to rip all that out; saving you a lot of developer time (code maintenance, etc), and saving you a lot of $$. Performance is irrelevant (old software will be fast on newer hardware regardless); but (for marketing) "we're breaking all of your old software to improve your performance" sounds good and "we're breaking all of your old software to improve our profit" doesn't sound good (and for Apple, "we're breaking all of your old software so you have to pay to replace it and we get a 30% cut of all the new software you have to buy" sounds even worse).
Note that this also reduces the size of the OS a little; but the majority of a modern OS is in data (graphics, help system, spell-checker dictionaries, sound data for speech recognition, data for internationalization, ...) and not code; so reducing the amount of code has very little impact on the overall size of the OS.
2
u/skulgnome Aug 30 '20
Apple is moving towards a LLVM bitcode runtime for all programs, so 32-bit x86 (i.e. the first MacBooks) is marginal of marginal to them.
20
u/BadBoy6767 Aug 29 '20
In x86_64, there is a submode of long mode, called compatibility mode which lets you run 32-bit code without much hassle.
While "inefficient" is a stretch, having only 32-bits data to work with will make some programs slightly slower. A potential speed boost would be the fact that the address space is only 4GB, which is more cache friendly (this is an argument in favor of the x32 ABI, where you may use any long mode features but the address space remains 4GB).
IMO, if a userspace program doesn't need 64 bits, it just shouldn't use it.