Ask Hackaday: Understanding The X86 Memory Addressing System

x86 scb

A quick look at the pinouts of an Intel 8086 & 8088 processor reveals a 20 bit address bus. There was high demand for the ability to address 1 meg (2^20) of address space, and Intel delivered. However, a curious individual would wonder how they can achieve such a feat with only 16 bit registers. Intel solved this riddle by combining two registers so they could make it compatible with code written for the 8008, 8080 & 8085. The process they use can be a bit confusing when trying to figure out where to locate your code in the ROM. In this article, we are going to go over the basics of how the Physical Address is calculated and how to locate your code correctly in ROM.

x86 memory map

In a monumental effort to confuse young budding computer scientists in the late 70’s, Intel broke its 1 meg of address space into four 64k chunks, with each chunk represented by a 16 bit Segment Register. The value in a Segment Register, called the Segment Address can be thought of as the base address (0000h) of one of the 64k chunks. The address within the 64k chunk is found by an Offset Address. The combination of the Segment Address with the Offset Address is called the Logical Address, and can be transformed to form the elusive Physical Address. In a normal instruction fetch, the Segment Address is stored in the Code Segment (CS) register and the Offset Address is taken from the Instruction Pointer. So the Logical Address will be CS:IP or for example FFF0h:C000h.

 

registers of x86

 

The formation of the Physical Address is done by multiplying the Segment Address by 16, and then adding it to the Offset Address. By multiplying the Segment Address by 16, you turn it into a 20 bit value by appending four zero’s to the right side. This calculation is done by a dedicated adder within the processor. But you need to know how the addresses in your program are turned into a Physical Address if you want to know where to locate the code in ROM.  This will become more clear below.

 

The Reset Vector

Now that we are thoroughly confused about the extremely logical and straight-forward internal workings of the x86 address calculations, we can move on to why this information is useful. When the 8086 processor recovers from a hardware reset, the very first address it puts out is FFFF0. This means PINS A0 – A3 are LOW, and A4 – A19 are HIGH.

binary to hex conversion

FFFF0 is the Physical Address. So the Logical Address would be FFFF:0000. With FFFF coming from the Code Segment (CS) Register and 0000 coming from the Instruction Pointer (IP) Register. These are the states of the registers upon reset.

Now, you might have noticed that FFFF0 is really really really really close to the bottom of our memory map. Indeed, it is only 16 bytes away. So the first instruction has to be a far jump to somewhere higher up in memory, and load the Code Segment and Instruction Pointer to the place where your program actually starts. What a brilliant design!

Why Knowing This is Important

Want to roll your own x86 computer from scratch? Consider this schematic (pdf warning) from [Scott’s] 8088 SCB project. Take a look at the processor – he’s only using 16 of the address lines. For the ROM, he’s using a 2764 8k x 8 EPROM, which has 13 address pins. So the question is: where in the ^*#$ do you locate the code in the ROM??? Wait…is it…0000h? Ohhh no, that would be WAY to easy.

First, we have to figure out the reset vector address that will be placed on the EPROM’s address pins. The 8088 will put FFFF0 on its address bus. But from the hardware’s perspective, this address is actually 7FF0.

binary to hex conversion

But wait, there’s more! The 2764 EPROM only has 13 address pins, A0 – A12. This means that when the processor puts FFFF0 onto the address bus, the address seen by the EPROM will actually be 1FF0.

binary to hex conversion

If you still haven’t had enough, now is where you figure out how to get your code (with the reset instruction) into the correct place on the ROM. In this case, the FAR jmp must be located at 1FF0. This is generally done with what is known as a Locator – a program the strips the .EXE generated from the linker into something that can be loaded into ROM. There are not many of these programs around, and if you’re lucky enough to get your hands on one, please let us know in the comments. I have yet to locate Intel’s TLOC.exe, and Paradigm has ignored my requests for theirs.

Below is a hex dump showing the correct placement of the reset vector for [Scott’s] 8088 SBC. The EA hex instruction is a far jump. Far means outside of the 64k segment.

hex dump

 

Anyone motivated to make their own x86 SBC now? [Wichit] made this 80C188 SBC, and provides a good starting point. I’ll stick with Arduino.

Note: The screenshots for the binary/hex converter came from here.

65 thoughts on “Ask Hackaday: Understanding The X86 Memory Addressing System

      1. Right now I’m digging throught this freakin’ segment shit as I want to use one of my older PC’s for a hardware project. Just debugging a switch to protected mode to be able to use VESA linear buffer instead of the A0000 video buffer windows was a two week nightmare. I used to program 68000 in assembly and I expected the x86 to be tough but honestly I didn’t expect it to be this bad.

      2. WTF? The backwards compatibility is one of the biggest strengths of x86.
        If you really hate something for a few instructions executed before switching to 64 bit mode you should perhaps get some help…

          1. According to that page, it’s just a handful of instructions, so what’s the problem exactly ? Of course, you also need to set up IDT and GDT, but you’d have to do similar things on other CPUs as well.

          2. 1) It’s a small price for compatibility. It only hurts 100 people in the world who write their own bootloaders, and 90 of them enjoy this kind of stuff anyway. 2) setting up page tables and other environment is not brain damaged. The settings depend on memory configuration that the CPU doesn’t know about. Requiring the programmer to set it up at the start of execution is simply the most sane thing to do. And again, it’s something that 99.9999% of the people never have to deal with.

      3. That is why they are still around, when some better designs have never gained traction. Backwards compatibility is a bigger deal to the companies buying hardware and paying for software, historically FAR more important to them than the lower level complexity. What company would want to chuck all of their computer hardware and buy all new software, just to get some new feature in a completely different architecture? The million dollar question is when to drop something, and how far back to keep it compatible.

        1. Most of the “better” designs weren’t actually better in the places where it mattered. For instance, the complex x86 instruction encoding is a good thing, since it offers better code density than RISC. And good code density allows better use of i-cache, which increases performance.

  1. Wow, weird to think this is almost retro now. Nice article. One possible mistake:

    “Intel broke its 1 meg of address space into four 64k chunks”

    Should be sixteen chunks, yes? (Four would correct for the number of bits used in SR.)

    1. More accurately: they broke the address space up into 4096 chunks of 64K, overlapping by 16 bytes each :-)

      1000h:0000h is the same as
      0FFFh:00010h is the same as
      0FFEh:00020h is the same as

      0003h:0FFD0h is the same as
      0002h:0FFE0h is the same as
      0001h:0FFF0h

      Yes, “4” is definitely wrong. :-)

        1. Jac’s right, and if you careful read what he’s saying you’ll realise he is illustrating the point that the segment base address and the index overlap each other by 12 bits. The upper 4 bits of the segment base address, plus 12 shared bits in the middle, plus 4 bits at the lower end of the index register = 20 bits in total (plus nearly one extra bit because of arithmetic carry).

          1. He’s not quite right. It’s not that they overlap by 16 bytes. They begin 16-bytes apart. They overlap by 64k-16 bytes. Thus 0000:0010 is an alias for 0001:0000. There are 64k segments, each beginning 16 bytes apart.

          2. No, the segment and index addresses overlap each other by 12 bits. I didn’t say the segment addresses overlap each other by 16 bytes. 64k – 16 bytes is another way of saying 12 bits of address.

          3. Mostly I was responding to “Jac’s right”, and then going back to Jac’s comment about them overlapping by 16 bytes. I also probably wouldn’t say that they overlap by 12 bits, because they are additive and can impact more than the twelve bits, which makes the bitwise relationship of the bits in the middle a little less clear, but I take your meaning.

  2. One of the things I hated about this scheme was that different segment:offset pairs could refer to the same memory location. What a mess!

    Intel’s 8086 manual touted segmented memory as a good thing: you could modularize your code and data by putting it in separate segments.

    Yah. I went with the 68000 microprocessor instead.

      1. On the other hand, intel’s method provided a nice upgrade path to the 386. With registers extended to 32 bits, this design works very well. Looking back, it’s actually quite amazing how intel managed to keep incrementally improving their design over several decades in rather smooth fashion.

        1. I don’t see how this is a “nice upgrade path” compared to a flat address space. In fact, the 80286’s protected mode, where the segment register was shifted 16 bits instead of 4, broke some old code.

          1. The 80386 supports a flat address space so why are you complaining?
            Personally I was sad when AMD decided to remove segmentation in 64 bit mode – it made some kinds of microkernels much less efficient. And what did we gain?
            . performance? Nope – x86 processors still need to support segmentation even under 64 bit mode (when running 32 bit programs) and actually using segments already was penalized compared to flat addressing.
            . more efficient hardware? Nope. See above.
            . extra opcode space by reusing segment prefixes (bytes that changes instruction interpretation located before the instruction proper)? No. AMD defined segment prefixes to be NOPs.

            To add insult to injury AMD later included some simplified segmentation also in 64 bit mode. Intel doesn’t support it and the simplifications made it unusable except for the intended use: supporting virtualization. It still needs most of the complications of real segmentation.

            So why would I like segmentation? It provides security with a byte granularity compared to paging (4kiB granularity), it supports doing fine grain security with some software support which means failures can be contained better.

        2. Intel’s upgrade path wasn’t upward compatible for either the ‘286 or ‘386 generation.

          The ‘286’s memory model broke pretty much every practical 8086 program by replacing its predecessor’s address calculation of: Segment*16+Offset with LocalDescriptorTable[Segment].BaseAddress+Offset. This means that there’s no longer any relationship between an address such as: 0x1234:0x5678 and 0x1235:0x5668. The only programs that will therefore work are small model programs that didn’t use descriptors at all. Which is to say – it’s not upward compatible – a 286 operating system must map linear virtual-mode addresses to linear physical addresses to maintain 8086 compatibility.

          The ‘386 essentially layers a proper paged virtual memory model on top of the broken ‘286 memory model. However, again, 8086 programs can’t simply be run in 32-bit mode because (a) The segment registers still map to descriptor tables (b) Segment registers are 32-bits and (c) The switch to 32-bit mode changes the interpretation of all the instructions that operate on 16-bit data to operate on 32-bit data. Intel ‘fixed’ this by creating a virtual 8086 mode for the processor, a hardware emulator for 8086 real-mode software. Which is to say, it’s not upward compatible.

          The really disappointing thing is that it’s not hard to imagine how they could have made the 286 genuinely compatible by layering a paged memory model on top of 8086 segments. For example, addresses could then have been calculated as:

          VirtualAddress=Segment*16+Offset, then

          PhysicalAddress=PageTable[VirtualAddress>>12].Translation*4096+(VirtualAddress&0xfff).

          That way, the segment/offset relationship is preserved as is the 286 physical address space. 286-specific programs could then be defined as Segment*256+Offset, which would give a 16Mb virtual address space, plenty of space until the arrival of the ‘386 and even up to the mid-1990s.

          A page-table scheme such as the one above, would have executed protected 8086 and 286 code faster than descriptor-based virtual memory, even if this 286 only cached a single page table entry per segment (the real 286 cached a descriptor per segment). That’s because (a) >64K programs must often reload or modify segments, but a page table implementation would have resulted in fewer page faults and (b) ‘286 page table entries could have fitted into 32-bits [12-bit translation+12-bit Length+8-bits for attributes], so page faults would have been quicker.

    1. Depends what you were trying to do, if you were a low-level assembly coder then segmented memory was brilliant. Different segment:offset pairs referring to the same memory location allowed for some highly-optimized code involving overlapping segment pages. And splitting the addressing across two registers allowed for some extremely fast texture mapping code where you’d use a segment register (e.g. ES) to reference the texture page while the offset BX (split further into BH:BL) referenced U and V. There were lots of other bit fiddling tricks you could use it for as well.

      1. “Different segment:offset pairs referring to the same memory location allowed for some highly-optimized code”: I suspect that was more a happy accident than anything intentional on Intel’s part.

        1. No, that was very much by design. And intuitively you would expect there to be a good reason, given that calculating a physical address with a 4-bit shift and a 12-bit add take more silicon and code than a single 16-bit shift. The purpose was to future-proof 16-bit applications and provide a simple way of mapping virtual memory to physical memory (with 16-byte granularity) in preparation for the switch to protected 32-bit architectures and proper page mapping. By allowing the segments to overlap you could pack all of a program segments into a single block thus greatly reducing memory fragmentation and the chance of apps trashing each other’s memory space. The optimizations I mention were hacks that game developers like myself used at the time and they broke the minute you tried running your code in an extender. We weren’t really supposed to use them, but everyone did anyway. :)

  3. The image at the beginning of the article shows he’s using an Atmel AT29C512, which is a 64KBytes flash chip. It also appears that he’s using an Hitachi HM628512 SRAM chip, which has 512KBytes.

    The article seems somewhat confusing. Firstly, the conventional way to decode memory is to allow memory to be mirrored at multiple addresses. So, if you have a ROM on an 8086, it must be possible to decode it at $FFFF:0000 (physical $FFFF0), because otherwise the processor won’t see the ROM when it resets. Therefore, it must be valid to address the ROM at $F0000 (if 64Kb is really decoded), or $FE000 (if 8Kb is really being decoded). And that means that a far address of $FFFF:0000 is just as good.

    Decoding just 13 bits from 20 bits just means that the top 7 bits are masked out and the ROM will be decoded at every 8Kb region. There’s nothing odd about that.

    Secondly, the article’s description of 8086 addressing is a bit confusing, because it says that the 1Mb address space is divided into 4x 64Kb segments. And that can’t be true, since 4x64Kb = 256Kb, not 1Mb. So, here’s a correct, and easier description:

    *****************************

    The 8086 has a 1Mb physical address space: $00000 to $FFFFF . However, all the main address registers: IP (the instruction pointer, PC on other processors); BP, BX, DI, SI and SP are all 16-bits, which only allows 64Kb to be easily addressed.

    So, what Intel did was to create 4 extra 16-bit ‘segment’ registers: CS, DS, ES and SS. Then whenever the processor needs to address memory, it picks a segment register (e.g. DS) and an offset (e.g. BX) and then calculates the address as:

    Segment_Register*16+Offset. In this case it’d be: DS*16+BX. This gives an addressing range up to $FFFF*16+$FFFF=1Mb+64Kb (the top 64Kb just wraps around to the bottom of memory on an 8086). Because some of the address bits overlap in the calculation, it means that multiple segment:offset values also map to the same address. For example: $0010:1234 maps to the same address as $0011:$1224, because $10*16+$1234 = $11*16+$1224.

    The 8086 and 8088 pairs each address register with a default segment register. IP is paired with CS by default, BX and SI are paired with DS by default, DI is paired with ES by default, BP and SP are paired with SS by default. However, by using ‘segment prefix’ instructions just before a memory accessing instruction, you can override the default. For example, SEG CS:MOV AX,[BX] would copy the 16-bit value at CS:BX to AX instead of using the location DS:BX. In any case, the four segment registers mean that the 8086 has easy access to 4*64Kb = 256Kb.

    To gain access to the whole 1Mb you need to muck about with the segment registers’ contents. For example, MOV DS,1000 would now mean that the data segment would reside at segment 1000. That’s clumsy, but the 8086 also has some specialised jump far and call far instructions that load a new CS and jump offset in one go.

    *****************************

    1. Correction: MOV DS,1000 isn’t strictly possible. You can do MOV DS,byte ptr[1000] to load DS from a memory location. Or you can do e.g. MOV AX,1000 then MOV DS,AX or do POP DS to pop a 16-bit value into DS. However, loading DS with a constant isn’t in the instruction set.

    2. Thanks- The original article needed some clarification. Right off the bat: ANY time you are talking about “Meg” and “K” or any amount of memory whatsoever, and are not specific about Bytes “B” versus bits “b” you are already causing trouble!

    1. “808x fanbois: “Why would you EVER want a processor that throws a fit if a word is not stored at an even address???””

      If memory serves, just putting a CNOP 0,2 instruction before any data chunk would align everything at an even address ensuring safety.

      1. There are situations where you can’t align your data properly. For instance, a network packet with different fields on different alignments. In those cases it’s very useful if the hardware supports unaligned access, even at a performance penalty.

  4. Hi,

    for anyone who wants to dig into the old rusty x86 and isa architecture

    There is a book by “Ed Nisley” called “The Embedded PCs ISA Bus” where he actually describes how to “µController” an 8088/8086 because these early CPUs are simply to hack because you just connect a bunch of TTL 74LSXYZ ICs to it and you have fun. Btw. the book ist LEGALLY free for download at his blog-site:

    http://softsolder.com/2011/10/03/the-embedded-pcs-isa-bus-firmware-gadgets-and-practical-tricks-unleashed/

    Then there is one from archive.org
    https://archive.org/details/ISA_System_Architecture

    What’s really cool is “ISA System architecture” by Tom Shanley/Don Anderson

    because, the x86 memory adressing system plays on the ISA bus (8086 / 80186 / 80286)

    1. The tragic flaw of the ISA bus is that the IOCHRY signal is active high

      The ISA bus (or slight variations thereof) is still in widespread use in microcontrollers, typically for Ethernet and LCD interfaces. You can even program a FT2232H USB peripheral chip to talk to ISA bus peripherals.

      The ISA bus also supported DMA which was almost never ever used except for an obscure command in Microsoft BASIC.

    2. Thanks! I’m trying to build an ISA extension card as ISA seems to be fairly simple to interface. I enjoy playing with these old but simple technologies a lot and I’d like to re-use some of the old PCs for simple control/measurement. I can’t imagine myself building even a PCI card not to mention PCI-Express. The difference in the complexity seems to be enormous.

      1. PCI is not too out of reach with the help of small CPLD. There are POST diagnostic cards from China that have only 3 GAL chips – 1 for the PCI and 2 for the 7 segment display decoder/driver.

        See here for a PCI project with full write up using discrete parts: http://elm-chan.org/works/pci/report_e.html

        The component count can be reduced by using a modern XC9500XL series CPLD.

        PCIe on the other hand might get a bit more difficult as having to deal with high speed serial signals. I haven’t seen any FPGA/CPLD part that have SERDES without going to a BGA package.

      2. BTW you can also hack a $5 PCI Ethernet card that has a 32-pin DIP socket for boot ROM (as long as it supports writing to FLASH chip). From there you have a large block of memory space with the usual address, data line and async control signals.

    3. 74lsXXX logic? We used a GAL20v8 instead (basically a CPLD). It saved a ton of wiring and made modifying the system much faster/easier. Once I got a taste of using the CPLD, I never went back to using discreet logic.

  5. UGH! That RTC, the DS12C877, what a hog that guy is. I still have to deal with some equipment that has a similar chip from ST in it, but it’s static ram along with a built in RTC. The reason it’s so fat is that the hump contains the crystal for the clock and a backup battery for retaining memory and the time when the power fails. High failure part and will soon go the way of the dinosaur, which presents a whole new challenge… There was some old gaming console around that had the same chip and some guy figured out that you can dig into the package and find the battery terminals and hack in a new replaceable coin cell, but it’s not really worth it.

  6. Backward compatibility rocks!

    I just finished a small arcade game project using a “junk” 2007 dual core Dell AMD motherboard with 1GB ram to fit into an arcade case, The hardware is incredibly overkill just for a 1983 DOS game but old PC’s are effectively landfill, total cost was just my time and a crap load of messing about to get it all to fit in the cabinet and pulling the keyboard apart to fit to the arcade buttons.

    Check list,
    Hardware: Free
    Operating System: FreeDOS: Free
    game with source code modified slightly for arcade use: Free (digger.org)

    Even if you could get a 8086 that is hobbiest friendly or squeeze it onto an arduino board why would you even bother?

  7. I remember reading about this in BYTE when the 8088 first came out. I immediately realized that Intel was doomed, the 68k would take over the world, so I would never have to worry about this abomination.

    Yes I’m old, but at least I’m not foresighted.

  8. I did a lot of this for a class in college. I may even have the compile tools stored somewhere as I spent many 2am nights coding for my class. I’ll have to take a look this weekend.

  9. Why didn’t intel just make a 32 bit address?
    It could even use less transistors considering the lack of adders and stuff, as it would almost only be two registers.
    Tho it might be because of wire-bonding and other off chip stuff.

    1. I think the problem was the number of pins on the package. The 8088 I was using already had 40 pins in a through-hole DIP package. Back then, you didn’t find more than 40 pins on an IC, generally speaking, and surface mount technology wasn’t so common. Adding another 12 pins would have been very difficult.

Leave a Reply to TRONCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.