The Pi 2 Means Faster GPIO

March 27, 2015

The Raspberry Pi is a great machine to learn the ins and outs of blinking pins, but for doing anything that requires blinking pins fast, you’re better off going with a BeagleBone. This has been the conventional wisdom for years now, and now that the updated Raspberry Pi 2 is out, there’s the expectation that you’ll be able to blink a pin faster. The data are here, and yes, you can.

The method of testing was connecting a PicoScope 5444B to a pin on the GPIO pin and toggling between zero and one as fast as possible. The original test wasn’t very encouraging; Python maxed out at around 70 kHz, Ruby was terrible, and only C with the native library was useful for interesting stuff – 22MHz.

Using the same experimental setup, the Raspberry Pi 2 is about 2 to three times faster. The fastest is still the C native library, topping out at just under 42 MHz. Other languages and libraries are much slower, but the RPi.GPIO Python library stukk sees a 2.5x increase.

67 thoughts on “The Pi 2 Means Faster GPIO”

Randy P says:

March 27, 2015 at 10:07 am

Is it just me or is it disingenuous to say this is testing I/O speed? Isn’t it just testing… O? How fast can it sample?

Report comment

Reply
1. Marc says:
  
  March 27, 2015 at 11:11 am
  
  From what I’ve read I think that the i/o read and write commands take the same number of clock cycles on the CPU. But to use the value you read by comparing or storing will affect the amount of time your loop takes immensely. So the Pi 2 will probably sample much faster, and so have more capability, but ymmv depending on how crafty you are when you are storing or using your information.
  
  Report comment
  
  Reply
TW says:

March 27, 2015 at 10:24 am

Spotting the PicoScope there, I knew they’d have bunged one your way – checked your site: ‘sponsored by’!
I wish they’d kept their promise for me – nice bit of kit ;-)

Was the test done with OOTB settings? Does the speed scale with overclocking, or are the GPIO scaled down wehen overclocked.

Report comment

Reply
1. TW says:
  
  March 27, 2015 at 10:24 am
  
  “when overclocked” – (makes more sense)
  
  Report comment
  
  Reply
2. Joonas Pihlajamaa says:
  
  March 28, 2015 at 3:06 am
  
  Haha, yeah Picotech has been a long-time partner of my site, awesome scopes!
  
  Yes the Pi is basically “out of the box” Raspbian setup. I would assume the speed scales with overclocking, hard to imagine the ARM chip would have independent GPIO clock.
  
  Report comment
  
  Reply
Mark says:

March 27, 2015 at 10:29 am

Why would you even attempt a performance benchmark with Python. Python is not fast. It will never be fast. It wasn’t intended to be fast.

Report comment

Reply
1. lja says:
  
  March 27, 2015 at 10:31 am
  
  There seems to be some trending obsession of “do everything in Python even if it is not a task it is suited well to” it’s like the arduino of the software world
  
  Report comment
  
  Reply
  1. someone says:
    
    March 27, 2015 at 12:10 pm
    
    The Raspberry Pi is DESIGNED to teach programming with Python, so there is a good reason to be using it on the Pi – all sorts of library’s are already there. Python is used to teach intro programming at places like MIT, and Standford- so no “obsession” – people starting to work with more demanding hardware projects just need to learn the correct tools.
    
    Report comment
    
    Reply
  2. Thomas Barth says:
    
    March 27, 2015 at 1:28 pm
    
    Nope, LabView is the Arduino of the Software world…:p
    
    Report comment
    
    Reply
    1. pablo says:
      
      March 27, 2015 at 4:53 pm
      
      I suppose Microsoft Excel is the parallel port bodge-wired to a breadboard of the software world
      
      Report comment
      
      Reply
      1. Ren says:
        
        March 27, 2015 at 10:49 pm
        
        I am not a MicroSoft minion in any shape or metric, but I’m surprised at what people can do with Excel, like starting their car, or running complex electromechanical machinery.
        
        Report comment
      2. yetihehe says:
        
        March 28, 2015 at 1:03 am
        
        After I’ve seen mechanical simulation done in access, nothing will amaze me.
        
        Report comment
2. Erik says:
  
  March 27, 2015 at 11:52 am
  
  It is benchmarked for exactly the same reason the faster languages are — to see how fast it is so that people can make informed decisions about whether it fits their needs.
  
  Report comment
  
  Reply
3. proft says:
  
  March 27, 2015 at 11:55 am
  
  It’s a useful comparison and it’s worth noting that it’s an order of magnitude faster than shell scripts. Discounting the language for speed doesn’t really make sense. Create or use time-critical components written in C or Fortran and call them with CPython’s ctypes. That’s how it’s intended to work and it’s trivial to implement.
  
  Report comment
  
  Reply
  1. sf says:
    
    March 27, 2015 at 5:19 pm
    
    CFFI instead of ctypes, other than that: +1
    
    Report comment
    
    Reply
4. bschollnick says:
  
  March 27, 2015 at 6:02 pm
  
  Actually python can be extremely fast. I’ve written near real time code with it, that was used to monitor RFID.
  
  Will it be faster than C? Probably not, but it’ll be easier to read, and faster to write.
  
  Report comment
  
  Reply
  1. anthony says:
    
    March 27, 2015 at 7:56 pm
    
    I tried to do that with the PI (1) and it worked for a while but it was no where near perfect … it would miss reads
    
    Report comment
    
    Reply
    1. bschollnick says:
      
      March 29, 2015 at 5:19 am
      
      I’ve had no issues with the R PI b+. My only complaint was that the RPI B+ was a little slow for the database work that I was trying to do. That’s more of an issue with the framework, and not the 700-1Ghz processor in the RPI B+.
      
      Report comment
      
      Reply
5. Greenaum says:
  
  March 28, 2015 at 7:06 am
  
  Have to admit, yeah. For a proper ultimate speed test, really it should be done in assembler, with no OS loaded. Just a simple few bytes. It can do that, right? Anything above that is software slowing it down.
  
  Report comment
  
  Reply
Anne Nonymous says:

March 27, 2015 at 10:32 am

I wonder how fast a java program can toggle the IO? Maybe Java can achieve a toggle rate of well, 1/10Hz?

Nothing prevents someone from writing a bare-metal program on the SD card and booting that, as one wouldn’t have to deal with the IO overhead of a full operating system.

Report comment

Reply
1. lwatcdr (@lwatcdr) says:
  
  March 27, 2015 at 1:11 pm
  
  It is hysterical when people think a well written java program would be slower than a Python script….
  
  Report comment
  
  Reply
  1. FrankenPC says:
    
    March 27, 2015 at 3:47 pm
    
    Even JavaScript running in a decent browser can run at incredible speed with something like asm.js running.
    
    Report comment
    
    Reply
    1. @XZ3R0 says:
      
      March 27, 2015 at 6:36 pm
      
      Whats javascript have to do with java?
      
      Report comment
      
      Reply
      1. Garbz says:
        
        March 28, 2015 at 7:36 am
        
        In speed references its generally even slower.
        
        Report comment
Anne Nonymous says:

March 27, 2015 at 10:36 am

For maximum speed, the IO should be toggled in java, running on a java interpreter written in java, running on a java interpreter written in java. Adding more recursion should make it even faster! Everyone KNOWS java is FASTER than C!

Report comment

Reply
1. Thomas Barth says:
  
  March 28, 2015 at 4:34 am
  
  Could you please explain how Java can be faster than C for an operation like setting a bit?
  
  Report comment
  
  Reply
  1. Whatnot says:
    
    March 28, 2015 at 7:10 am
    
    More on that here https://en.wikipedia.org/wiki/Sarcasm
    
    Report comment
    
    Reply
someone says:

March 27, 2015 at 10:39 am

The language differences are what you would expect. Other than C, nothing else mentioned is suitable for “real- time” control. Of course, how fast you require for “real-time” control depends on the application. The other, equally important issue, is how deterministic the system is. You need a kernel/OS made for real time applications if you are going to run a machine tool or something else with critical timing/motion control etc. There is work being done running LinuxCNC on a Pi, with a suitable kernel- it is not quiet there yet….

Report comment

Reply
1. Sylph-DS says:
  
  March 27, 2015 at 10:57 am
  
  I agree.
  
  I personally don’t actually think LinuxCNC will ever be quite there, though. The Pis run a SoC designed for tablets and phones, aka not real-time, multi-purpose, and focused on working together with its GPU to make some nice multimedia magic happen. In my opinion, trying to make something like that run real-time is a fool’s errand.
  
  Report comment
  
  Reply
  1. GerritV says:
    
    March 27, 2015 at 1:06 pm
    
    It’s a chip that executes instructions. Tuning Raspian like they did for the usual LinuxCNC distro should result in usable speeds. Real-time and multipurpose are somewhat countering each other. Realtime simply means that timing specific tasks are predictable. That is up to the OS, and generic Linux is not suited to that, just like Windows and OSx aren’t (out of the box). There are implementations of Mach3 running under Embeded Windows XP without issue on weaker cpu;s.
    
    Report comment
    
    Reply
    1. Sylph-DS says:
      
      March 27, 2015 at 2:29 pm
      
      The problem isn’t software so much as hardware, the Broadcom chip isn’t just a CPU, it’s a SoC that contains a CPU, GPU (and more). The CPU and GPU share a number of resources and are hard-wired together in a number of ways (For example, some of the timers in the ARM are linked to the GPU clock, so if the GPU decides to change clock (which it does depending on load, in order to save energy) your timers are suddenly running at an entirely different speed). Now, I’m sure that if you know exactly how the SoC is put together, you could figure out how to make everything run stable or at least know when it won’t, but Broadcom is unwilling to share all the necessary details so that is not going to be easy.
      
      Report comment
      
      Reply
      1. GerritV says:
        
        March 27, 2015 at 4:58 pm
        
        I want to say, you are joking right? But that would be a sarcastic. I have been around mainframes since 1965. What you describe is a computer system. It is about how you configure an OS to operate the combination if peripherals. Whether it is on a single chip or spread across 3 floors doesn’t make any difference. LinuxCNC runs under a limited version of Linux today. Mach3 runs on XP embedded on low end hw. Pi 2 is more than enough hardware to do the same.Someone needs to expend the effort to do the configuration work to remove . An AtMega328P is more than adequate to run > 30k steps per second running Grbl, the Pi 2 is up to the task with considerably more horsepower.
        
        Report comment
  2. someone says:
    
    March 27, 2015 at 1:33 pm
    
    Yes, the Pi is not really suited for CNC- someone somewhere is going to get excited about a $35 computer, and take up the challenge of getting it to work with their favorite application. If it works or not, they still have learned plenty trying.
    
    Report comment
    
    Reply
    1. Sylph-DS says:
      
      March 27, 2015 at 2:31 pm
      
      You’re right of course. I guess my wording was a bit harsh.
      
      Report comment
      
      Reply
2. arachnidster says:
  
  March 27, 2015 at 1:39 pm
  
  Running in a multitasking general purpose OS is not “suitable for real-time control”.
  
  Report comment
  
  Reply
  1. someone says:
    
    March 27, 2015 at 1:50 pm
    
    True, however there are tricks used with industrial control systems that let you use both. One example I have used is Beckhoff Automation’s “Twincat” – it wedges itself in alongside MS windows (as poor an OS for “real-time” as it gets) and hands out time slices to the automation tasks and MS windows. It works.
    
    Report comment
    
    Reply
  2. Whatnot says:
    
    March 28, 2015 at 7:16 am
    
    Computers have interrupts,
    And you can write code in assembler making it ten times as fast as any C type language.
    So yes it is possible even with a multitasking OS to do such things. In fact it’s done all over the place 24/7. Even using windows.
    
    Report comment
    
    Reply
    1. arachnidster says:
      
      March 28, 2015 at 7:51 am
      
      You’re completely missing the point. Multitasking means that your task can be preempted at any time – even if just by the kernel rather than another process – which means your ‘realtime’ task may go on hiatus for 20 milliseconds without notice.
      
      And no, assembler is not “ten times as fast as any C type language”.
      
      Report comment
      
      Reply
gudenau (@gudenaua) says:

March 27, 2015 at 11:55 am

How does bare-metal asm do?

Report comment

Reply
1. Lasse Bierstrom says:
  
  March 27, 2015 at 12:42 pm
  
  Maximum speed is ~125Mhz @ 1.2V, so asm should reach max!
  asm is the lowest level to control the cpu, if it isn’t possible by asm, it’s impossible (assuming asm access is as fast a a dma access)
  
  Report comment
  
  Reply
  1. someone says:
    
    March 27, 2015 at 2:13 pm
    
    just to be particular : machine code is the lowest level, an assembler is one small step above that!
    
    Report comment
    
    Reply
    1. matseng says:
      
      March 27, 2015 at 9:08 pm
      
      Meh… There’s (normally) a 1-to-1 mapping between each instruction in assembly language and machine code, so they are on the same level. On the other hand you have programmable ‘micro operations’ on some cpus, and that is something that’s lower than both assembly and machine code.
      
      Report comment
      
      Reply
2. Thomas Barth says:
  
  March 27, 2015 at 1:30 pm
  
  Shouldn’t be any difference to the C code. Maybe you need to unroll the loop to save a cycle.
  
  Report comment
  
  Reply
  1. Daniel says:
    
    March 27, 2015 at 5:31 pm
    
    On SoCs in that performance region the GPIO block and the bus connecting it to the CPU core are rarely clocked as fast as the CPU core. A loop executed from the 1st level cache therefore usually doesn’t add any overhead.
    
    Report comment
    
    Reply
    1. Thomas Barth says:
      
      March 28, 2015 at 4:33 am
      
      good point.
      Yes, the Peripheral bus is most likely a lot slower than the CPU, hence this 1 cycle for the loop shouldn’t make any difference.
      
      Report comment
      
      Reply
GerritV says:

March 27, 2015 at 1:01 pm

Maybe someone should spend an hour or two profiling Python and some of these other lethargic interpreters to fix the bottle necks? Speeding up the clock is not the solution.

Report comment

Reply
1. arachnidster says:
  
  March 27, 2015 at 1:40 pm
  
  Gosh, I’m sure nobody’s ever tried doing any performance improvements on Python before! Good idea!
  
  Report comment
  
  Reply
  1. Kratz says:
    
    March 27, 2015 at 6:01 pm
    
    At a 72khz benchmark, apparently nobody has. Believe it or not, high level languages cam achieve high performance if properly optimized in the compile/jit/runtime. It’s just hard to do.
    
    Report comment
    
    Reply
    1. Greenaum says:
      
      March 28, 2015 at 8:07 am
      
      Yeah but there’s a limit. And as ever, speed vs versatility is what you’d be adjusting, once you’ve optimised out the slack.
      
      Report comment
      
      Reply
arachnidster says:

March 27, 2015 at 1:41 pm

I still maintain that if you’re relying on anything timing critical from inside a multitasking OS like linux, you’re doing it wrong. Use an external MCU, or a coprocessor like the BeagleBone’s CPU sports.

Report comment

Reply
1. denis says:
  
  March 27, 2015 at 1:45 pm
  
  Or an arduino. Everyone loves a pi-duino combo :)
  
  Report comment
  
  Reply
2. someone says:
  
  March 27, 2015 at 2:09 pm
  
  That is the most straight forward solution. Use Python or whatever you like for the higher level tasks, pass off time critical tasks to a microcontroller or FPGA.
  
  Report comment
  
  Reply
3. AltMarcxs says:
  
  March 27, 2015 at 3:41 pm
  
  And why not use 3 cores for Linux and 1 core as MCU?
  
  Report comment
  
  Reply
  1. marty lawson says:
    
    March 27, 2015 at 5:39 pm
    
    I am curious why we don’t hear about this being done more. Especially if the core dedicated to “MCU” tasks has top priority for IO and memory access, this should be able to do “real time” as well as any dedicated MCU?
    
    Report comment
    
    Reply
    1. Anne Nonymous says:
      
      March 27, 2015 at 6:20 pm
      
      Becuase the memory, IO, and other resources are shared. A MCU has exclusive deterministic access to its own RAM. Since it’s SRAM, a read ALWAYS takes N cycles, there are no cache misses to account for, no refresh cycles, it’s all 100% deterministic. I can run the same program on a MCU 1000 times, and each iteration will take eactly N clock cycles, under all conditions, barring interrupts, etc.
      
      When you add DRAM into the mix, and cache misses, and OS resource polling, you introduce non-determinism.
      
      That’s why nothing “real time” actually runs on a computer, dedicated ASICs or MCUs handle it, they don’t have all the overhead.
      
      Report comment
      
      Reply
      1. matseng says:
        
        March 27, 2015 at 9:14 pm
        
        Well… Theoretically speaking even a dedicated ASIC or MCU is non-deterministic unless the external events are in perfect sync with the clock of the MCU. So you’ll have a reposne time with up to one clock cycle in jitter.
        
        1 clock cycle, 100 nS, 1 uS, 10 uS, 100 uS, 1mS? How much jitter do you tolerate before it’s not a real time system?
        
        Report comment
  2. UltraSheen says:
    
    March 27, 2015 at 6:27 pm
    
    This is the right answer. It is trivial to assign a pthread to a CPU core.
    
    Alternatively, you can use a RT kernel. It’s amazing what you can accomplish in a “lowly” multitasking OS.
    
    Report comment
    
    Reply
4. Whatnot says:
  
  March 28, 2015 at 7:21 am
  
  There are a great many very important (as in even life threatening or million dollar risk) time-critical systems running on linux I’m sure.
  Although maybe for the amateur or other low resource outfit a MCU based solution is indeed the better choice, but it’s silly to dismiss it as not possible. As I said earlier in this thread: interrupts, they do exist..
  
  Report comment
  
  Reply
pablo says:

March 27, 2015 at 4:56 pm

Isn’t this a task better suited for a native hardware PWM pin from the CPU (if you want to toggle at a frequency / duty cycle) or with DMA output to a pin (if you want a particular sequence)? Software bit-banging is always going to be slower.

Report comment

Reply
1. Greenaum says:
  
  March 28, 2015 at 8:12 am
  
  In this case, no, because this was a test so see how fast the pins can be switched. It’s not a test to see how fast you can get a PWM signal, it’s a test for ad-hoc access to the IO pins. How fast you can switch a pin on or off. Say you wanted to bit-bang HDMI or something (no idea if that would work). In this case a constant stream of pulses was used to make it easy to pick up on the scope, but in practice it might not be a constant stream.
  
  Report comment
  
  Reply
Dan McG says:

March 27, 2015 at 7:54 pm

Has anyone seriously considered what the true potential of the RPi is, if it weren’t hobbled in its ability to be programmed in Assembly language?
It doesn’t take a rocket scientist to determine that there is information regarding the RPi which is not being considered, to wit:

As I read Broadcom’s datasheet(s), I/O instructions are 1-cycle instructions.
At a clock frequency of 900 MHz, this works out to 1.11… ns.
To toggle a bit on, and then off, requires 2.22…ns.
This gives (big surprise) a maximum toggle frequency of 450 MHz.

Quite a bit faster than ONLY 42 MHz available using C, huh?

Why is no one REALLY programming the Raspberry Pi in native assembly language?

Report comment

Reply
1. Greenaum says:
  
  March 28, 2015 at 8:14 am
  
  Personally, it’s cos I’ve no idea how to write ARM assembly, or how to get it working on my friend’s Pi. Why aren’t you doing it?
  
  Report comment
  
  Reply
  1. Dan McG says:
    
    March 28, 2015 at 11:50 am
    
    Perhaps if you worked as hard at learnimg assembly language as you do at jumping to conclusions, you’d be an assembly-language genius.
    You should have stopped your response at one sentence, COS where’d you get the information that I’m not writing assembly language code?
    
    Report comment
    
    Reply
    1. Greenaum says:
      
      April 2, 2015 at 3:49 pm
      
      From your preceding post. Or don’t you count as anybody? Don’t be so hard on yourself, guy.
      
      Report comment
      
      Reply
jaydcarlson says:

March 27, 2015 at 11:40 pm

If you’re trying to wiggle GPIO pins at deterministic speeds inside a non-deterministic, non-real-time operating system like Linux, you’re doing it wrong. These sorts of benchmarks/discussions are a good way to send newer programmers down a path to misery (“I’m trying to do software PWM on a raspberry pi using PHP, but it’s not working…”).

Put a 70-cent MCU on the i2C bus to offload the real-time tasks, and get on with your life.

Report comment

Reply
1. Greenaum says:
  
  March 28, 2015 at 8:16 am
  
  It’s not “trying” anything. It’s just a curio. To show how fast a pin can be toggled. It’s not an application, it’s meant to inform other people of the possibilities.
  
  Report comment
  
  Reply
RustyHydrogen says:

March 28, 2015 at 7:05 am

Realtime,and even hard realtime: XMOS.

Report comment

Reply

Hackaday

The Pi 2 Means Faster GPIO

67 thoughts on “The Pi 2 Means Faster GPIO”

Leave a Reply to @XZ3R0Cancel reply

Search

Never miss a hack

If you missed it

Ore Formation: A Surface Level Look

In Which I Vibe-Code A Personal Library System

Give Us One Manual For Normies, Another For Hackers

3D Printing And The Dream Of Affordable Prosthetics

Chinese Regulators May Kill Retractable Car Door Handles That Never Should Have Existed

Our Columns

Hackaday Podcast Episode 348: 50 Grams Of PLA Hold A Ton, Phreaknic Badge Is Off The Shelf, And Hackers Need Repair Manuals

This Week In Security: React, JSON Formatting, And The Return Of Shai Hulud

Keebin’ With Kristina: The One With The Pretty Protoypes

FLOSS Weekly Episode 857: SOCification

3D Printering: That New Color Printer

67 thoughts on “The Pi 2 Means Faster GPIO”

Leave a Reply to @XZ3R0Cancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns