VGA Without A Microcontroller

One of the most challenging projects you could ever do with an 8-bit microcontroller is generating VGA signals. Sending pixels to a screen requires a lot of bandwidth, and despite thousands of hackers working for decades, generating VGA on an 8-bit microcontroller is rarely as good as a low-end video card from twenty years ago.

Instead of futzing around with microcontrollers, [Marcel] had a better idea: why not skip the microcontroller entirely? He’s generating VGA frames from standard logic chips and big ‘ol EEPROMs. It works, and it looks good, too.

VGA signals are just lines and frames, with RGB pixel values stuffed in between horizontal sync pulses, and frames stuffed between vertical sync pulses. If you already know what you want to display, all you have to do is pump the right bits out through a VGA connector fast enough. [Marcel] is doing this by saving images on two parallel EEPROMs, sending the output through a buffer, through a simple resistor DAC, and out through a VGA connector. The timing is handled by a few 74-series four-bit counters, and the clock is a standard 25.175 MHz crystal.

There’s not much to this build, and the entire circuit was assembled on a breadboard. Still, with the clever application of Python to generate the contents of the ROM, [Marcel] was able to build something that displays eight separate images without using a microcontroller.


47 thoughts on “VGA Without A Microcontroller

  1. I have been looking at this problem.

    Old school video circuits were very cleverly made so as to leverage the throughput of the CPU.

    Most of them were tile based so that the CPU only had to change the tile (character) number and *NOT* the 64 bytes that represent the colored pixels.

    I made one of these in the 80’s using 74xx, 40xx and SRAM. To speed access you used 3 or 4 SRAM chips (and 3 or 4 shift registers) , one for each color channel (4 bits – 16 colors, 3 bits – 8 colors) in RGB or RGBI format. You could also use a high speed color palette chip.

    So your character SRAM was about 2kB and it’s data bus went to the the address bus of 4 of 2kB pixels SRAMs.

    Why this is hard today is a number of factors. First I don’t know of any pallet SRAM chips in production today. There isn’t much less than 32kB in SRAM chips. The much higher refresh rate of a modern LCD means that FLASH/EPROM/EEPROM is too slow to use as character ROM unless you drop the resolution a *LOT*.

    And if that isn’t hard enough you have the issue with 3.3V vs 5V. Modern-ish 5 Volt tolerant chips are capable of working with old TTL chips (they have LVTTL) but not old CMOS chips so with a CPU like the Z80 it gets hard because they are mostly CMOS now. Can someone tell me what the (Homer Simpsons brain) CPU is – ie CMOS or TTL levels – the original Apple CPU – was it 6802 or 6502?

    I think the better solution today would be to use parts like shift registers and some basic decoding in TTL for the high speed stuff and have a micro-controller manage the rest (lower speed) or just go to CPLD or FPGA. Most of the old circuits can fit in a largish CPLD of 100 or 200 macro for a moderately complex system and a really simple system would be under 100 macros.

    One excellent product out there is the GameDuino but it is SPI. It would be so nice to re-work that to be parallel. It’s open source *but* it’s in Verilog and I code in VHDL.

    I am having a go at this and I have decided that the CPU will be level shifted and everything else will be 3.3 Volt or 3.6 Volt (for improved noise margin).

    Great project and extra points for putting the sync in ROM.

    1. The oldies were also limited by RAM and I suppose they tiled for that reason as well.

      I figured my project could improve a bit by going to a 8 MHz clock. This is less than 1% off from the 8.056 MHz needed for 256 clocks per scan line. This simplifies the counter logic because only 1 signal is needed instead of 3: 1 to reset Y at the appropriate time. X overflows by itself and can then increment Y. That then paves the way to 6 bit colors. Then 200×120 pixels @ 64 colors should be doable with 1 chip less and a 8 MHz oscillator…

      1. The old video interfaces cut a lot of corners like that. Old analogue TV’s had a Vertical Hold (sync) adjustment and a Horizontal Hold (sync) adjustment that was customer accessible. Even modern LCD give you some lee way. They don’t care about the dot clock as long as the full sync cycle is close to a standard.

        At most there were two bits (MSB) of the horizontal counter chain that caused a reset to the horizontal counter chain.

        In the earliest computers (with video) there were always a binary multiple of horizontal tiles or pixels – 16, 32, 62, 128

        Then later there were (two MSB) 24, 48, 96, 192

        The easiest ratio is to have 96 counts (tiles) per horizontal time (total including sync and porch) and 64 active tiles or some similar ratio like 48 and 32.

    2. On the Gameduino:
      The first one was cool, but just a bit more limited than I would have liked. It only let you use 256 characters for backgrounds. Game systems like Super Nintendo and Sega Genesis let you have at least 1024, so this is a severe limitation. Similarly, their sprites can only be 16×16, which is also half of the Genesis’ 32×32 and 1/4 the SNES’ 64×64.
      The Gameduino 2 is more like an OpenGL polygon processor with 2000 sprites. Unfortunately, they seem to have gotten rid of all of the character mapped background stuff, which means that you would have to burn half your sprites just to implement a 1 layer character mapped display. Also, you’d have to use a lot of slow Arduino cycles to calculate the positions of every character in the background layer.
      Now maybe you are saying “Why do I care about character mapped displays?” Well, because Arduino is extremely limited in speed and memory, and character mapped displays are one of the most efficient ways to allow a complex screen to be updated quickly. There is a reason why all of the 80’s to 90’s 8 bit and 16 bit videogame hardware kinda worked the same way: it was the best way to do it!
      Right now, I’m trying to find a way to update screen quickly on Teensy 3 series devices. Unfortunately, sending every single pixel is too slow to do at a decent 60Hz rate. I have managed to get up to 44FPS on a Teensy 3.6 with processor and I/O overclocked.
      I’d really like to have a processor to do SNES / Genesis style graphics on LCD screens from microcontrollers, but the Gameduino doesn’t quite seem to be it. It’s on my list of projects to do, but I’m still bogged down in my current efforts

      1. The GameDuino 1 does about as much as you can with the internal BRAM. It would be much more capable with an external SRAM but then you also increasing the addressing space for the CPU/uC.

        Newer FPGA’s have more internal BRAM but there expensive. It’s cheaper to use a smaller FPGA and external SDRAM or SRAM.

        I have a GameDuino sitting right here and the only thing I don’t like about it is that it’s all coded in Verilog rather than VHDL.

        1. Even with 32k of internal memory, some different design decisions would have given more flexibility.
          Each character image has its own color palette, where it probably would have been better to have a global set of palettes and then have the palette selected in the screen array. This would have required characters to be 2 bytes instead of 1, but then you could add H/V flip bits and more character bits as well, allowing you to get a lot more use out of your small character set.
          Even if the scrolling screen area is reduced in size, it is pretty common to rotate new lines in off screen as necessary. Granted, the Arduino doesn’t have a lot of RAM to store extra character map data, but if you are using it with a larger microcontroller like a Arduino Due or a Teensy, it is completely reasonable to do so,
          Ultimately, I feel it is a great idea that could have been awesome if some slightly different design decisions were made. It really inspires me to make my own, but make it better.

  2. What’s the framerate? One image at a time, basically? How long does it take to transition between images? It seems to be more akin to a digital photo frame than a “proper” VGA signal. Still a neat build, just not clear that it is completely accurate to call it a VGA signal quite yet?

        1. Indeed, as described. I could change the frame that was being played back by modifying the 3 high (orange) address lines into the y ROM or by reprogramming the pixel ROM through an Arduino. I didn’t bother to put a rotary encoder on the board. A pair of plier worked for me.

          1. What would be really neat is having two sets of RAM chips for the image data and being able to toggle between them with a microcontroller. That way the micro could generate the data and fill out one bank of RAM, and flip it over to being drawn on the screen and replace the contents of the other bank. Doing that, this would make an awsome VGA adapter for a lot of common micros.

  3. I’m still waiting for the cheap implementation of graphics through an NTSC format that can be driven from a microcontroller (the uC does not have to drive the display direct, but can send instructions via I2C or whatever to a circuit that does the final rendering). The character overlays that are out there are too clunky for me.

    1. A couple of comments below you will see a reference to VS23S10D-L by [fireclocker]

      They would be perfect for this. I have asked for some samples (because of the high shipping costs to my location) and I will see if I can get 4 of them to spit out HSVGA 400×300

      They are SPI and can output NTSC or PAL. They also have a parallel interface that would be great for retro computers but the SPI is still need to configure it. They have 128kB internal SRAM. It’s an ideal companion for an 8 bit micro-controller as well.

      Here are some links

      Product description –

      Forum post about using it ATmega1284, using it with the Arduino Uno (328) would be much the same –

      PAL example –
      NTCS is *just* different timing

  4. Don Lancaster’s excellent “TVT Typewriter Cookbook” has been available as a free download for a while now. LOTS of info on how video works, and the sort of small to mid-scale TTL and CMOS circuits you need to create a video signal. Mostly (all?) NTSC-oriented, but VGA is just … thee parallel signals with more timing flexibility, right?

    1. At 800×600, 72Hz you need a 50MHz dot (pixel) clock which allows you 20ns access time for a single SRAM access (BMP format), or 10ns access time for two accesses (tiled format), or 7ns access for tiled with a palette.

      You can use slower standards like was used here with a 25.175MHz dot clock but then you have to find a 27.175MHz crystal that was originally used for CGA.

      FLASH is generally around 55ns – 70ns for the the faster chips unless you move to FPGA config flash which is much faster but is also expensive and complex to use for an unintended purpose.

      These low access times are getting to the realms of CPLD or FPGA. With CPLD you loose half your pin (GPIO) count interfacing to the SRAM chip and with FPGA you can use internal BRAM but then your up to $50 just for the FPGA and config FLASH. And that is without even mention the complexities of mixing 1.8V/2.5V FPGA or 3.3V CPLD with a 5 volt CPU for a retro system.

      I only wish it were so simple as “some memory and the right clock”.

      One solution is to aim for a lower resolution and dot clock like 800×600 with a 25MHz dot clock to give you 400×300 and double the access time. But after that you still have the issues of mixing 5V CPU’s with 3.3V CPLD/FPGA.

      1. You know that the original VGA ran off a 3MHz master clock? (Pixel clock divided by 4, 8, or 9, depending)

        I mean, don’t get me wrong, displaying truecolor data does dramatically increase the needed bandwidth, but there’s nothing here that isn’t solved by just making your bus wider and slower.

        1. VGA has never had a dot clock slower than 21.175MHz which was also the dot clock for CGI
          Here is a collection of specs –

          So I am not sure what your referring to.

          Some of the old systems had a dot clock from the very common color burst crystal of the time which was 3.58 MHz for NTSC or 4.43MHz for PAL. These were monochrome though if they were connected to a TV. There were some propriety monitors that had separate RGB inputs for color though.

          The even earlier system from the era that TV was *only* black and white seemed to use lower frequencies still but they were less ‘standard’.

          I understand that you can make your bus width wider but to do this you need more SRAM chips and where you end up is with a BMP format that has 472kB or SRAM when you only use 32kB or in tiled format 128kB or SRAM when you use about 8kB.

          So to summarize, old mono NTSC system had clocks as low as 1.7MHz, old color systems had dot clock as slow as 3.58 MHz and VGA has a dot clock of greater than 21.175MHz.

          I tend to use SVGA 800*600, 72Hz clocked at 25MHz (400×300) instead of 50MHz because 21.175MHz crystals are hard to find and 800×600 is too many pixels to push for retro CPU MIPS (typically 1MIP) and the higher resolution is too hard to fit into a 16 bit address space.

          1. The VGA clocked its RAM at 3MHz. Like I said, 25.175MHz ÷ 8, or 28.32MHz ÷ 9, or 12.6MHz ÷ 4.

            The only high-speed portion is the shift registers and RAMDAC, everything else runs at a quite sedate rate.

            This is really similar to how the CGA ran at “14.318” MHz, except that everything other than the shift register output stage actually ran at 1.8MHz (or 900kHz)

          2. @[rnjacobs]

            You have described monochrome VGA which never existed. The Minimum VGA had was 16 color and the Maximum was true color. Unless you are claiming that there were 4 or more parallel RAM chips. I did see some VGA board that had 4 (soldered) chips but most had 2 (soldered) chips with sockets for more chips. I remember them well as they were most commonly 10nS SOJ chips with the power pins in the middle. I use the same ones today. CY4019D and CY4049D (from memory). The output stage of these VGA cards was always BMP format even with the early video processors.

            Perhaps there was a VGA card that was 16 color and used 4 8-bit SRAM chips in parallel but I never saw that card. I did see this scheme used very commonly with much earlier computers that were mostly monochrome.

            To reduce your access requirement by a factor of 8 with 8-bit shift registers requires you to have as many SRAM chips as there is bits per pixel.

            Even 16 color used one SRAM at a time which gave you 2 pixels per byte access.

            The story was different for 16 bits per pixel or 24 bits per pixel of course.

            All of the above is only half the explanation for a retro computer because a retro computer simply doesn’t have the MIPS to push VGA-like resolutions. They ranged from ‘a quarter of one’ MIP to the very fast for the day 1 MIP.

            Here is where your technique falls down with modern chips.

            Retro computers used tiling through 2kB or so chips.

            You have a 2kB character (or tile) number chip that may represent 64 by 32 characters or tiles.

            It’s data (out) bus was used as an index into the address bus of 3 other 2kB SRAM chips which contained the pixel data for one color (red, green or blue) in each SRAM chip and their data (out) bus drove the 8-bit shift registers.

            To repeat this today with a minimum 32kB SRAM chips you get –

            32x32px tiles instead of 8×8 tiles – that’s 16 times the CPU cycles to update

            256×128 characters (or tiles) instead of 64×32 once again 16 times more CPU cycles

            Now if that isn’t bad enough you now have a total resolution of –
            8192px x 4096px instead of 512px by 256px

            So what clock rate and access times do you expect you would need for 8192 horizontal pixels?

            It’s just unworkable to do the same with modern sized SRAM chips.

            So today, to make a video output that is simple enough to be driven by a retro computer you need multiple accesses to SRAM for the the given character rate and accordingly you need very very fast SRAM because if that isn’t hard enough you now need dot clocks of more than 25MHz instead of the old dot clock some of which were as slow a 3MHz.

            The TLDR here is that old video displays were optimized for old retro CPU’s so well that you just can’t beat it because the limiting factor became the CPU MIPS or simply – just how may pixels can this CPU push?

          3. Perhaps there was a VGA card that was 16 color and used 4 8-bit SRAM chips in parallel but I never saw that card. I did see this scheme used very commonly with much earlier computers that were mostly monochrome.

            That is, in fact, exactly what the original VGA did. Each of the RAMs actually had its own address bus (at least partly), too, to enable forwarding the character index from one RAM to the next one for character cell lookup in text mode

        2. Im not buying this, early EGA/VGA graphic cards usually had 3-4 crystals, one for each mode timings
          ati vga 36 50 56.644
          ati wonder16 36 45 50 56.644
          ati mach8 33 45
          trident 16.257 25.175 28.322 44.9
          Chips 25.175 32.514 56.644
          Video7/Cirrus Logic 25.172 28.332 32.514
          Paradise 25.175 28.322 36
          WDC 25.175 28.322 36 44.9
          ET4000 six crystals :o

          later ISA (and all PCI?) standardized on 14.318 (either crystal or straight from ISA bus) + internal pll

      2. Normally, you’d access more than one pixel at a time, and load them into shift registers (faster than RAM) for clocking out to the display. That’s what VRAM was all about – including the shift register in the RAM chip (essentially.) Yeah, this winds up being a lot of paralleled RAM chips if you want 32bits per pixel, but you’re still in similar physical complexity to the old systems; You’re just using a bunch of x16 or x32 RAM chips (or x64 DIMMS?) instead of the x1 and x4 RAM chips they had “in the old days.”

        1. It still comes down to TotalAccess = Chips * DataWidth / BitsPerPixel

          So obviously data-width helps but it also creates a problem if you want to CPU to update 8-bits at a time.

          And the problem with more chips is that you end up with far to much RAM because modern RAM is so so much bigger than the old 8kb DRAM or 2kB SRAM chips.

    1. Ah, the PS2 (where the keyboard/mouse connector came from) and it’s micro-channel architecture which became PCI at least a decade later. They were the mecano set computers of the time.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.