Nyan Keys: Because Your Keyboard Is Painfully Slow

December 10, 2023

You probably don’t notice keyboard latency when typing or doing mundane tasks, but if you start gaming, that’s also when you might start complaining. Every millisecond counts in that arena. Think your keyboard is fast? Think again. Because unfortunately, no matter what you’ve got in there, that key matrix is slowing you down. What you need is an FPGA-based keyboard with an overkill MCU. You need Nyan Keys.

[Portland.HODL] set out to make the lowest-latency mechanical keyboard possible that would accept any Cherry-compatible switches, and boy howdy, is this thing fast.

Coupled with the STM32F723VET6 MCU is USB 2.0 HS, which has an 8000Hz polling rate. At worst, key latency measures 30μS, which blows the 1mS average out of the water.

Because it uses a Lattice Semi iCE40HX 4k FPGA, each key switch can connect to its own I/O pin, which also eliminates the need for diodes.

It also means that each key switch can have its own “core” — an 8-bit timer that is always counting up to 255. The key can only change its state when the timer reads 255. This acts as a rather clever debounce mechanism.

If all that’s not enough, [Portland.HODL] built an operating system called NyanOS written in C to avoid any performance-reducing overhead. Oh, and it has an opt-in Bitcoin miner.

We’ve seen a lot of keyboards, the fast ones are fast because of the input side — they are chording keyboards that take combinations to type, rather than using one key (or so) per character. The Characorder is so fast that it was banned from competition.

32 thoughts on “Nyan Keys: Because Your Keyboard Is Painfully Slow”

Zoe Nagy says:

December 10, 2023 at 4:42 am

what about usb latency of 20ms+

Report comment

Reply
1. rumpel says:
  
  December 10, 2023 at 5:06 am
  
  Can’t we squeeze keyboard events into an audio stream?
  
  Anyway, beautifully over-engineered, I love it!
  
  Report comment
  
  Reply
2. Daid says:
  
  December 10, 2023 at 5:40 am
  
  Depends on the USB polling interval (which is set by the device) and underlaying USB version used.
  
  From the linked post:
  > We would take out 30us latency then add that to the 125us latency of the USB Interrupt transfer to get ~155us.
  So it has 155us total worst case latency.
  
  Report comment
  
  Reply
  1. Portland.HODL says:
    
    December 10, 2023 at 10:19 am
    
    For this project I used a bInterval of 1 using a USB 2.0 HS interface. This makes the OS poll at a rate of 8000hz or 125us. I am still thinking of the best way to test from electrical switch contact to kernel event to get the end to end latency.
    
    Report comment
    
    Reply
    1. Bry says:
      
      December 10, 2023 at 3:55 pm
      
      One way to measure input device latency is: wire up an LED that changes state when a button is pressed, use a high-speed camera to film both the LED and something else controllable by the computer at the same time which will change when the button is pressed, and determine the time between these events. Commonly the “something else” is a screen on which something will visibly change as a response to processing the input, although this will include display latency. Though this can get quite tedious, because often several samples are needed in case of variable latency.
      
      Another way used by the MiSTer Controller Latency tester is: wire up one button from the device to a separate microcontroller which is able to ground the contacts to simulate a button press, have a separate device (in the case of the MiSTer, an FPGA board with a bunch of surrounding electronics) which sets the state of a GPIO pin based on when it detects input from a button, and have the microcontroller measure the time between it simulating a button press and receiving indication that input has been detected. In this way it is able to send thousands of simulated button presses and determine min/max/avg/stddev latency. I believe the MiSTer maxes out at 1000Hz polling for USB devices so this may not be sufficient for your needs, but you may wish to consider a similar approach.
      
      Report comment
      
      Reply
      1. Portland.HODL says:
        
        December 10, 2023 at 7:39 pm
        
        I did a very simple method of just stopping the capture of the Saleae logic analyzer while at the same time measuring the the logic level of the ‘R’ (stop) key. The difference between when the logic level goes low and the logic capture stops should show the total end to end latency
        
        Currently getting results around this 500-600uS mark worst case. This also includes the delay in the processing by the application itself and I doubt saleae made the logic application to be extremely responsive.
        
        https://imgur.com/a/NRWMl8y
        
        With that said I will give the LED method a shot.
        
        Report comment
  2. CRJEEA says:
    
    December 10, 2023 at 2:39 pm
    
    If you make the key caps conductive, it can add characters to the keyboard buffer before you type them. :D
    
    Report comment
    
    Reply
    1. Jacob says:
      
      December 11, 2023 at 11:00 am
      
      I like that idea a lot
      
      Report comment
      
      Reply
Paul says:

December 10, 2023 at 5:39 am

Nicely done.
But the keyboard side is the easy part.
Let’s talk about why the OS is so pitiful in handling keystrokes and rendering them to the screen.
My Apple ][ was faster than any GUI application on a current computer. Worse, the latency is *variable* from keystroke to keystroke. It’s intensely frustrating, second-guessing for a split second whether the keystroke made it or not.

Report comment

Reply
1. Portland.HODL says:
  
  December 10, 2023 at 1:51 pm
  
  Here is a very simple test where I measured the logic level on the pin for the ‘R’ key in the Saleae Logic software. The R key corresponds to the stop command for capture. So the time between the pin going low and the software stopping would equal the end to end latency even though the OS.
  
  Currently averaging ~500-600us which is still much much better than the results from other boards out there right now which average 7-9ms
  
  Here is one of the sample captures
  https://imgur.com/a/NRWMl8y
  
  Report comment
  
  Reply
  1. Paul says:
    
    December 11, 2023 at 5:04 am
    
    Interesting. Thanks for the info.
    That implies the OS is doing a good job passing along the keystroke promptly. So it’s the other end of the process that’s the culprit: rendering the character to the screen.
    
    Report comment
    
    Reply
  2. CircularSaw says:
    
    December 26, 2023 at 7:18 pm
    
    Have you considered making a gamepad like this? Retro gamers would football spike a newborn for this kind of performance.
    
    Report comment
    
    Reply
Just trying to help says:

December 10, 2023 at 6:50 am

He’s not really measuring the time from keypress (with debounce) to action. That would show an occasional response time improvement from something like ~10mS to ~9.969mS.

Wouldn’t it be faster to eliminate the millisecond debounce like the optical switches attempt to do?

They might not feel good, but this guy put so much effort into this, yet these cheap keyboards might be orders of magnitude faster.

Report comment

Reply
1. Portland.HODL says:
  
  December 10, 2023 at 12:26 pm
  
  That is keypress to action the reason why is because as soon as the signal on the line changes the state of the key is sent. Then the debounce counter starts to count up an locks the state of the key for 2-5ms for most switches after that period of time the keys state can change again. This works perfect because there is no person that can press and release a key in under a few ms.
  
  Read more here about it.
  https://hackaday.io/project/193920-nyan-keys-fpga-based-mechanical-keyboard/log/225840-nyan-keys-tries-the-new-cherry-mx2a-and-cherry-loses
  
  Report comment
  
  Reply
Rog77 says:

December 10, 2023 at 9:28 am

I wonder, if one were going to get silly could they used a PCIE data acquisition FPGA card with DMA as the interface, rather than USB?

Report comment

Reply
1. Portland.HODL says:
  
  December 10, 2023 at 10:13 am
  
  Absolutely! But at the end of the day for this project I wanted to create something that would work under the standard USB HID drivers that are included with the most common operating systems Linux, Mac OS, WIndows.
  
  Report comment
  
  Reply
  1. Rog77 says:
    
    December 10, 2023 at 1:33 pm
    
    It seems like a fine project, kudos :-)
    
    I was just joining in wondering on maximum speed, it probably wouldn’t have been very practical or salable.
    
    Report comment
    
    Reply
    1. Portland.HODL says:
      
      December 10, 2023 at 2:09 pm
      
      Thank you so much, the latency of PCIE would be much lower. The interesting part is that an FPGA could do it. The Xilinx KC705 evaluation board has a PCI-E edge connector if anyone wanted to experiment.
      
      At the end of the day the user is creating a logic analyzer disguised as a USB HID device.
      
      Report comment
      
      Reply
Rick C says:

December 10, 2023 at 9:41 am

“Because it uses a Lattice Semi iCE40HX 4k FPGA, each key switch can connect to its own I/O pin, which also eliminates the need for diodes.”

It’s too bad nobody makes MCUs with 100+ GPIO pins.

Report comment

Reply
1. Portland.HODL says:
  
  December 10, 2023 at 10:07 am
  
  Project author here.
  
  There are some TQFP 144 STM32s (STM32F723ZCT6) with 100+ available IOs that could handle per pin IO. The reason to use an FPGA was the ability to offload the MCU of any work other than having to request key state and then push them over USB 2.0HS. Currently at 12.5MHZ sclk on the spi bus I am getting around 83k full state reads a second.
  
  Report comment
  
  Reply
2. Jarek L says:
  
  December 10, 2023 at 12:50 pm
  
  Yea, the datasheet for the chip they use claims:
  STM32F723xx devices-
  – Up to 140 I/O ports with interrupt capability
  
  putting the fpga in there gives more flexibility tho, maybe in testing using the fpga to handle the keystrokes and the uC for USB alone offers speed improvements
  
  Report comment
  
  Reply
  1. Portland.HODL says:
    
    December 10, 2023 at 7:23 pm
    
    That was the logic, at the end of the day the novelty of designing this was that there would be an incredibly high level of flexibility because of the FPGA as to the charaistics and handling of keystrokes.
    
    The other answer would be that yeah you can have the uC do both and it would likely be more than fast enough, the majority of the latency improvements come not from the fpga but the use of USB 2.0HS which makes the host interrupt 8000 times a second compared to the 1000 times a second for standard USB 2.0 FS.
    
    With that said when I built this it wan’t to mass produce or anything like that. I just wanted to coolest toy in the toybox and using an FPGA is one way to do that.
    
    Lastly yeah you are right because the FPGA is handling all of the timers and keys in parallel. The MCU then doesn’t have a whole lot ot do other than read out the contents and push them onto the USB bus as such it works quite well for that.
    
    Report comment
    
    Reply
3. ☺ says:
  
  December 10, 2023 at 1:08 pm
  
  Could more than one chip be used? A bunch of RISC-V chips (like the CH32 series) on a SPI bus could be fast and still be cheap.
  
  Report comment
  
  Reply
  1. Portland.HODL says:
    
    December 10, 2023 at 7:27 pm
    
    Yeah that approach would also work and was considered. My thought was that it would be better long term to just slap an FPGA in there and have the ability to have a ton of IO and also be able to migrate between dedicated logic and softcore uC when needed.
    
    I also wanted the community to hack these boards and leaving an FPGA in there that supports the icestorm toolkit was a great way to leave the future of Nyan Keys very open ended.
    
    Report comment
    
    Reply
Miles says:

December 10, 2023 at 9:19 pm

Now the sad part. Most video games I play (racing games) have molasses for reaction. We will need to reprogram them to overdraw and AI generate like VR tech so that as we hit the input the display can move before the CPU can even react to the key press. Imagine your video card with an input interface.

Although it would also be trick to have a wordprocessor built into the FPGA directly driving a display XD

Report comment

Reply
herr_brain says:

December 11, 2023 at 12:10 am

So… I hate to be THAT guy, but what about the network latency that will surely dwarf any input-related latency?

Report comment

Reply
1. Portland.HODL says:
  
  December 11, 2023 at 2:04 am
  
  It all ends up as submissions into netcode anyways, but yeah at the end of the day it might not even matter, still better to have a shot than not.
  
  There are also local games such as OSU that would work quite well with this keyboard.
  
  Report comment
  
  Reply
Sheldon says:

December 11, 2023 at 12:27 am

`The key can only change its state when the timer reads 255. This acts as a rather clever debounce mechanism.`
That’s not a debounce logic – that’s just occasional sampling causing a slow-down in updates (one may get really unlucky and sample at the point of the bounce rather than the desired logic change resulting in additional latency as it won’t then update until the next time the counter overflows).

Report comment

Reply
1. Portland.HODL says:
  
  December 11, 2023 at 2:02 am
  
  Please read the project log, whilst the counter is running the state can not change. When the counter is at the top the state can change as soon as the logic level changes. If that isn’t debouncing a switch properly I apologize but at the end of the day it’s working flawlessly,
  
  A note the count to 0xFF is set to equal the stated debounce time of the switch as per the data sheet.
  
  Report comment
  
  Reply
Conor Stewart says:

December 11, 2023 at 5:19 am

This is pretty pointless in terms of real world benefit or uses. There is nothing wrong with current keyboards and they are already much faster than they need to be for even professional gamers, so a lot of the claims made in this article are just false.

There is no reason that a key matrix is the limiting factor for polling speed, key matrices can be polled very fast.

If there was any actual benefit to having a keyboard this fast then large companies catering to the gaming market would have already made them. The actual bottlenecks when gaming is humans, our reaction speed is much lower than keyboard or mouse polling rates or frame rates on monitors.

So whilst this may have been a fun project to make it really doesn’t have any practical benefit over most other keyboards on the market.

Report comment

Reply
1. spaceminions says:
  
  December 11, 2023 at 12:04 pm
  
  The limiting factor is usually between my computer and the world – A person may not react in milliseconds, but can anticipate things and then observe the timing in retrospect. Usually when I am frustrated by the timing of something in a game, it’s because I saw things occuring one way but an instant later the screen changes to a different result. A tiny enough change is still insignificant, but not pointless.
  
  I suppose the professionals cancel out such issues by training themselves to do the “wrong” thing in order to get the right result – e.g. in a shooter game, they might fire at a stone wall or empty air knowing that by the time the network catches up, something completely different will happen than what the represented projectile should do. But even then, I have to imagine they would still prefer to have slightly less delay than the average among their competition, if all else were equal, even if it’s a tiny number.
  
  Report comment
  
  Reply
TheKing says:

December 14, 2023 at 10:58 am

How can I buy one of these from you?

Report comment

Reply