Protected Mode On A Z80! (Almost)

The microprocessor feature which probably most enables the computing experience we take for granted today is protected mode. A chip with the required hardware can run individual software processes in their own environments, enabling multitasking and isolation between processes. Older CPUs lacked this feature, meaning that all the resources were available to all software. [Andy Hu] has done the seemingly impossible with a Zilog Z80, enabling a protected mode on the chip for the first time in over four decades. Has he found an elusive undocumented piece of silicon missed by every other researcher? Not quite, but it is a clever hack.

The Z80 has two address spaces, one for memory and the other for I/O. He’s taken the I/O request line and fed it through a flip-flop and some logic to call a hardware interrupt the first time an I/O call is made or when a RST instruction is executed. Coupled with a small piece of memory for register contents, and he’s made a Z80 with a fully-functional protected mode, for the cost of a few logic chips. It’s explained in the video below the break, and we hope you agree that it’s rather elegant given the resources in hand. It’s too late for the commercial 8-bit machines of the past, but it would be interesting to see what today’s retrocomputer designers make of it.

27 thoughts on “Protected Mode On A Z80! (Almost)

  1. Memory protection is less important than the virtual memory mapping aspect of protected mode execution. Though I guess even that is just a few chips away, you could e.g. stick in an SRAM chip that maps the higher bits of the address bus and that can be filled in by the system code.

    1. I wonder how exactly you’d do that. If you have an OS Scheduler, would that then set a few latched bits to change the mapping “page” of the SRAM address remap chip to the appropriate process in question?
      That should be doable… Actually, that would be really cool! This might even allow for some really interesting other hacks, like mmap-ing by setting the remap-SRAM chip’s address for a certain address range to set a hardware pin that triggers an interrupt, which then takes over and feeds in mapped data.

      Fun ideas :>

        1. A friend of mine at college used the io lines on a Sinclair Spectrum to enable bank switching, out 31,x selected the bank with 0 being the internal top 32Kb of RAM, so could enable 255 banks of ram giving 8MB of ram on a Spectrum. His ROM and the logic mod to the motherboard made sure that bank 1 was protected and could only be accessed by the ROM

      1. C64 had this designed in. Parts of the 64KB RAM space were overlaid by OS and character ROM. A single address “register” was used to configure the current “page” selection. Further RAM extensions used the same technology, as did the first ’86 PCs to exceed the initial 640KB RAM (which, as we know, should be more than enough for anyone). Different technologies were identified either by “extended” or “expanded” memory. Those were the days…

        1. Actually.. The IBM PC Compatibles had a 640KB limit due to the separation in conventional DOS memory (~0-640KB) and the Adapter Segment (640-1MB, UMA with free RAM spots known as UMBs).

          However.. This was merely an artificial limitation.
          MS-DOS Compatibles didn’t have it often.
          Some had much more than 640KB of contiguous DOS memory.

          The 640KB limit existed, because the video memory for EGA/VGA followed directly after 640KB.

          With a Hercules card, 704KB of DOS memory is available. With a CGA, it’s 736KB. With both installed (dual monitors setup), it’s 704KB.

          There’s even a period correct modification for the IBM PC 5150 from the 1980s, which allowed expanding RAM on motherboard beyond 640KB!

          https://www.vogons.org/viewtopic.php?f=9&t=69895

          https://ww.youtube.com/watch?v=8nMB8XvwUJo

          With no video and an a serial terminal (who remembers DOS’ CTTY command?), up to ~900 KB are possible, depending of the size of the BIOS/Firmware and Option-ROMs that are located in the Adapters Segment/Upper Memory Area.

          Anyway, you’re right about Bank-Switching on x86! 😎
          “Expanded Memory” (aka EMS) was introduced to work with large data on 808x IBM PCs.
          More precisely, it was made because Lotus 1-2-3, an important application of the 1980s, ran out of memory.

          Originally, an EMS board was installed, which carried up to 2MB of memory. Later, EEMS and LIM4 specifications supported a lot of clever things with such boards!

          Desqview used it for real preemptive multi-tasking on IBM PCs/IBM ATs.
          Thus was before the 80386 and V86 were used by memory managers (EMM386, QEMM).

          Intelligent chipsets, like NEAT chipset on 80286 motherboards, had a dedicated, external MMU which did memory management for the 80286 (which had a fine MMU itself, but it didn’t do paging yet).

          Some Turbo XT chipsets had EMS in the chipset, too.
          Famous were SoCs based on the NEC CPU family (V20/V30). Their later implementations did support Bank-Switching/EMS.

          Last, but not least, so-called LIMulators were around. They simulated EMS by copying memory around using XMS (EMM286) or by using a swap file on HDD.

          The later worked on XTs, too. But an MFM/RLL fixed-disk wasn’t exactly the most performing devices, hi. 🙂

          That was possible, because EMS is a software specification. DOS programs using EMS talk to the EMS Memory Manager. It’s then up to this piece of software to provide the memory. It doesn’t matter to tge program, were it really is.

          Important is merely the 64KB EMS frame in the 640-1MB region (usually, can be anywhere below 1MB) which consists of 4 pages at 16KB. Or more. There’s also a Large Page Frame EMS with a 256KB window. It’s preferred by DESQView and Windows 3.0 (Real-Mode kernal).

    2. Personally I think virtual memory mapping is fairly unnecessary in general.

      It really doesn’t take much effort to make an application able to run at any arbitrary point in memory. Relative calls for reading in data and jumping about in code isn’t something new and largely solves the issues of hardcoded absolute addresses.

      In essence, virtual memory mapping adds hardware complexity for the sake of giving the programmer a comfortable “start” to their program so that developers can hardcode memory jumps with absolute addresses.

      If these hardcoded jumps used relative addresses, then there is no real reason to have virtual memory mapping.

      And more complex interactions with other system components/services/APIs/etc are generally rather arbitrarily placed in memory regardless. So these wouldn’t be hardcoded, so the application will need to handle such dynamically regardless if virtual memory mapping is a feature of the CPU or not.

      (I will though say that memory mapped IO is however more useful, but this has relatively little to do with virtual memory mapping for a process’/thread’s data/code.)

        1. The 6502 is however a very old architecture built at a time when even a few KB of RAM were considered a lot of memory.

          I were more considering things from the modern perspective.

          By the time we get up to having a few MB or more worth of RAM, and an architecture designed for more modern tasks, then virtual memory mapping is honestly rather unnecessary. Since there will be rather decent relative addressing instructions available.

          And even if the instruction set lacks an adequate long distance relative jump, we most likely have the option to use a register as a pointer. If we also lack this, then the architecture isn’t particularly good.

          (I should likely clarify that I am not talking about the 6502, or Z80, or any specific architecture. Just that architectures in general that focus on more modern applications don’t really need virtual memory mapping. (other than for the fact that a lot of programs currently aren’t using relative jumps, since virtual memory mapping is a thing. And before virtual memory mapping computers tended to run just one thing at a time so absolute addressing weren’t a problem since no other applications were expected to run in parallel. Also a bit of an oversimplified explanation as well.))

          1. There’s no need for relative or indirect addressing to avoid virtual memory, relocation at load time can do the trick. This method is even older than virtual memory and demand paging. Most modern operating systems support and use it.

          2. There’s more to an MMU than isolation and relocation. In a long running OS, you’ll incur memory fragmentation just like on a disk as processes come and go. Relative addressing alone does not resolve this issue.

            A good MMU allows the OS to create a contiguous process address space from scattered memory segments rather than wasting cycles and bandwidth physically copying process code and data for defragmentation purposes. This makes the fragmentation largely irrelevant to performance outside of very specific cases, which can be a huge win.

          3. hartl:
            Relocating data at load isn’t really viable, especially in a multicore environment.

            Steve:
            Yes, fragmentation of main memory is indeed a thing.

            One can indeed make a memory management unit that stiches together neat and tidy continuous address spaces for us to allocate. This partly makes fragmentation worse since it abstracts away fragmentation from the application. The fragmentation is still there, we just don’t see it. This largely removes the ability for the developer to defragment their allocations over time as they naturally execute the data.

            And accounting for these fragments and knowing what part of the virtual address space belongs to what part of the physical address space has associated hardware overhead. This will generally reduce power efficiency and/or overall performance. Since this effects every single memory call.

            However, fragmentation of RAM compared to storage is two completely different worlds. The ratio of bytes allocated compared to bytes of activity is generally polar opposites to each other when we compare RAM to storage.
            RAM has a lot higher read/write activity for its allocations than storage ever approaches under realistic scenarios. And this activity makes it easier to passively defragment as we go.

            However. Our MMU can defragment as well, there is multiple ways of implementing this with various pros and cons. But most of the time the MMU won’t have an all that clear picture of the larger picture.

            Or we can let the application defragment itself.
            The application after all knows when it will create and destroy data. It knows roughly what it will execute next. It can likewise prioritize placing assets where it most logically fits within the allocations it has to work with. And with proper APIs it could even state its intentions to the Kernel, that in turn can view the larger picture and try to make everyone happier. (Not saying that every programmer will be perfect at this job, but most compilers will likely hide this away pretty quickly just like they do with tons of other housekeeping stuff.)

            But fragmentation only really starts becoming problematic when our RAM starts getting more ful.

            However, a more useful feature of virtual memory mapping is paging RAM out to disk so that we can go beyond the installed amount of system memory. However, this could still be done without virtual memory mapping. But it would become intentional instead of the current swaps that our applications “don’t” notice.

      1. Run from any arbitrary point in memory? The old Mac OS / System did that. It treated RAM like a hard drive, which came with some of the same problems as a hard drive, like fragmentation.

        To load a program on ye olde Mac, there had to be a single contiguous block available for the program’s requested RAM space. The requested space was user configurable per program, but if not enough was requested and you tried opening too large of a file, you’d get an error for not enough memory. But if the requested space was too large, it was wasting RAM that could be used for other programs.

        During a session of running and quitting software the RAM space could become fragmented. The way to deal with it was to close everything to free up all the RAM then launch the programs you wanted to use in sequence, or reboot.

        Jump Development made RAM Charger. It dynamically remapped all free RAM into a single block, and it replaced the stock method of software requesting fixed amounts of RAM. It could be disabled per-program, and there was a default disabled list for software known to not be happy with dynamically assigned workspace. For example, launch Stuffit Expander and it’d take up just enough RAM for the program. Drop an archive file on it and RAM Charger would expand its RAM space as needed for it to extract the archive, then when Expander finished its space would be reduced back to minimum.

        Apple tried really hard to put an end to that. Almost every System and Mac OS update from the initial release of RAM Charger changed something to make it not work, so Jump Development had to figure out what was done then release an update that worked with the new Apple update and all previous OS versions it supported.

        Apple’s final attempt to stop RAM Charger came with the Mac OS 9.1 update. It didn’t block the core function, just made More About This Mac and some other non-essential stuff not work. So Jump Development didn’t bother to fix those features and functions.

        I don’t know why Apple wouldn’t license a basic version of RAM Charger like they did Control Strip, Super Clock, Extensions Manager and various other 3rd party pieces which became core components of the OS.

        Apple didn’t license the superior Virtual Memory function of Connectix RAM Doubler, but they also didn’t repeatedly attempt to break it.

        I always ran RAM Charger plus RAM Doubler with compression OFF, only used the Virtual Memory.

        Never had an out of memory problem and the Macs I used them on ran faster. ‘Course what was extra quick for an old Mac was a IIci with 601 CPU upgrade, 128 meg RAM, RAM Charger, and virtual memory off, didn’t use the Connectix virtual memory on it because with Mac OS 8.1 128 meg was more than enough. It really helped that the control panel for the 601 card had a function to turn off the RAM test during boot.

        1. That there seems like a horror story of a poorly implemented system.

          Ideally an application should be able to have more than 1 allocation in RAM. (For many reasons, not just to handle its own data, but also to handle buffers for inter process communication needs.)

          Personally think that it is better to make fragmentation obvious to the program, so that developers can find suitable ways to handle it in a way that also benefits their application. Instead of abstracting it away with a generalized solution.

          1. There are a lot of reasons why this may not be a great idea.
            – There’s already an attitude of “our computers are fast enough, let’s prioritize developer time”. If you leave memory management to Silicon Valley devs, performance is going to tank accross the board. That’s not to dunk on devs, it’s just how things are.
            – Abstracting this stuff away leaves room for better hardware to improve performance for existing software. The less abstract your ABI is, the more you’re constrained when developing new systems (we’ve already seen the effects of this with Apple’s jump to ARM)

            We’ve already seen all of these come into effect with branch prediction, where a few system designers attempted to make it a manual process and then the industry collectively realized that in-hardware branch prediction works better in almost every case, even though the CPU is effectively needing to guess the developer’s initial intent.

          2. Not to mention that having programs self-police fragmentation sounds like an absolute nightmare. Eg-
            Let’s say a chunk of memory needs to be moved to make some room. Your program needs to somehow become aware of this, move the memory, and then go through and remap every currently active pointer – sounds a lot like garbage collection, but there’s nothing you can do at compile time to avoid it since it depends on what other software is doing with its RAM.
            Speaking of which, your program now also needs to be aware of what other software is doing with its RAM, which sounds like a security nightmare. And then you have like fifty different defragmentation algorithms on your machine running simultaneously and constantly triggering each other.

            Much better to have this centrally supervised by a program that has a global view of the system (the OS), with every process having free reign over it’s own private infinite memory space.

          3. Anton yes, you do point out valid concerns that are applicable for Storage.

            If we have a lot of mostly static fragmented data that we have no inherent desire to rearrange, then yes it is a chore to move it.

            If we have a lot of mostly actively worked on data that gets progressively cycled through and made obsolete as we process forth new data from it. Then it isn’t that hard to get rid of smaller allocations and consolidate as we go.

            But likewise, fragmentation isn’t just a thing for allocations. But also for the very data itself within said allocations. Software already does need to consider migrating data for its own comfort, else it will quickly end up with fairly poor memory utilization. (I am however not saying that all developers are remotely good at doing this. And a fair amount is handled by garbage collection and other compile time optimizations.)

            In regards to applications needing insight into the overall landscape.
            No, they don’t. They ask the Kernel for an allocation of X bytes, the kernel finds and provides such to the application. The application can then naturally migrate its data into this new allocation as it processes its data, before removing one or more of its smaller allocations that used to host the now obsolete data.

            Some data will occasionally have to be migrated regardless, since it doesn’t easily get obsolete. But ideally speaking most such data should already be loaded into a more stable allocation regardless. (A stable allocation is one that the process has little long term intentions of removing. Something it can likewise flag to the kernel.)

            Things can get a bit less smooth when the total amount of allocated address space approaches the total amount available, and here “out of memory” errors could appear if we don’t have any swap space functionality. (something that likewise is debatable.)

            In the end.
            I do not see it as a stretch to let applications handle fragmented allocations instead of abstracting it away in hardware.

            Beyond bringing the ease of swap spaces onto the table, virtual memory mapping isn’t really that great.

  2. Protected mode was available on the Z80 40 years ago. Look up the Morrow Designs MPZ80 CPU board. This was a Z80 CPU board for the S-100 bus. It was designed to run a UNIX-like operating system called MICRONIX. There is a technical manual that shows exactly how it worked back then. Full protected mode, process isolation, multi-process task switching, io trapping, virtual paged memory – in 1982 with a Z80.

    http://www.bitsavers.org/pdf/morrow/boards/MPZ80_CPU_Technical_Manual_Rev_1_Apr82.pdf

    https://www.retrotechnology.com/herbs_stuff/mnix_micronix.html

  3. Mmm… he says that it’s not possible to use interrupts, but I think that it isn’t completely true: ANDing the ^Q output of the flip-flop with M1, and ORing the output with IORQ from the Z80 produces a “synthesized IORQ” that can be used outside as the original IORQ, because it goes down only in “supervisor mode”, or when asserting an interrupt, thus making it fully transparent.

    It would require also ORing the current NMI signal with M1 too, to avoid triggering the NMI during INT assertion… but I think that it can work.

  4. I used the msx memory mapper to run multiple threads. With a 1 or 4MB mapper it was quite doable. Driven by the gfx line or frame interrupt it was quite fun. Especially with added Gfx9000 running power basic on a separate monitor. But also switching between contexts. The Turbo R RISC version of Z80 (R800) was even more convincing

  5. Even today there are many microcontrollers without MMU that run a multitasking OS just fine.
    FreeRTOS for example.
    You only need memory protection if you want a real multiuser system with different access levels.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.