New Teensy 4.0 Blows Away Benchmarks, Implements Self-Recovery, Returns To Smaller Form

Paul Stoffregen did it again: the Teensy 4.0 has been released. The latest in the Teensy microcontroller development board line, the 4.0 returns to the smaller form-factor last seen with the 3.2, as opposed to the larger 3.5 and 3.6 boards.

Don’t let the smaller size fool you; the 4.0 is based on an ARM Cortex M7 running at 600 MHz (!), the fastest microcontroller you can get in 2019, and testing on real-world examples shows it executing code more than five times faster than the Teensy 3.6, and fifteen times faster than the Teensy 3.2. Of course, the new board is also packed with periperals, including two 480 Mbps USB ports, 3 digital audio interfaces, 3 CAN busses, and multiple SPI/I2C/serial interfaces backed with integrated FIFOs. Programming? Easy: there’s an add-on to the Arduino IDE called Teensyduino that “just works”. And it rings up at an MSRP of just $19.95; a welcomed price point, but not unexpected for a microcontroller breakout board.

The board launches today, but I had a chance to test drive a couple of them in one of the East Coast Hackaday labs over the past few days. So, let’s have a closer look.

First Impressions

The board looks superficially similar to the older 3.2, at least from the top. There’s the usual dual row of pin headers you can plug into a breadboard, a micro-USB connector, and reset button. A new red LED near the USB connector gives you some status information, while the traditional “Arduino LED” is orange. Flip the board over, and you start to see some of the extra power this board wields. Besides ten more GPIO pins, there are pads for an SD card interface using 4-bit SDIO, and D+ and D- lines for the second 480 Mbps USB interface. The unmarked round pads are test points used in manufacturing and are no-connects from the end-user’s perspective.

Teensy 3.2 Everything Killer?

When doing hardware reviews it’s crucial to choose the right comparison hardware. I think the best comparison in this case is between the two boards that share the same form factor; the Teensy 4.0 and the 3.2. I’ve chosen not to make the comparison with the Teensy 3.5 and 3.6, which are priced a little higher, in a larger form factor, and have SD card slots soldered on.

Incredibly, the Teensy 4.0 is priced at $19.95, as opposed to the $19.80 Teensy 3.2. What does that extra fifteen cents buy? First, there’s performance. The 4.0’s 600 MHz clock vs the 72 MHz on the 3.2 doesn’t tell the whole story. The Cortex M7 on the 4.0 is a dual-issue superscalar processor capable of executing up to two 32-bit instructions per clock cycle; initial tests showed this happening between 40-50% of the time on Arduino-compiled code. Additionally, the Cortex-M7 is the first ARM microcontroller with branch prediction. While on the Cortex M4, a branch always takes 3 clock cycles, after a few passes through a loop, for instance, the Cortex M7 can begin executing correctly-predicted branches in a single clock. This is technology originally pioneered in supercomputers that you can use in your next Halloween costume.

Then, there’s floating-point. Veteran embedded programmers may have a bias against floating-point code, and with good reason. Without native floating-point instructions, these operations must be emulated, and run very slowly. The same thing happens with double-precision operations on a processor which only supports single-precision instructions. While Cortex-M4 processors support single-precision floating-point, the Cortex-M7’s include native double-precision instructions, so if you need the extra precision afforded by doubles, you’re not going to take a huge performance hit: basically, doubles seem to execute in only twice as many cycles as floats.

The Cortex-M7 on this board also supports tightly-coupled memory (TCM), which provides fast access like a cache, but without the non-determinism that can complicate hard real-time applications — one of the problems with other high-power microcontrollers. The 64-bit ITCM bus can fetch 64-bits, while two dedicated 32-bit buses (DTCM) can fetch up to two instructions from the TCM each cycle – these buses are separate from the main AXI bus used to communicate with other memory and peripherals. The Teensyduino environment automatically allocates code and statically allocated memory into the DTCM area, which can be up to 512K in size, although you can override the default behavior with some command-line switches. Memory that isn’t accessed by the tightly-coupled buses is optimized for access by the peripherals using DMA.

Spec Sheet

Despite its size, there’s a lot to this board and the chip it carries, so here’s condensed spec list:

  • ARM Cortex-M7 at 600 MHz
  • 1024K RAM (512K is tightly coupled)
  • 2048K Flash (64K reserved for recovery & EEPROM emulation)
  • 2 USB ports, both 480 MBit/sec
  • 3 CAN Bus (1 with CAN FD)
  • 2 I2S Digital Audio
  • 1 S/PDIF Digital Audio
  • 1 SDIO (4 bit) native SD
  • 3 SPI, all with 16 word FIFO
  • 3 I2C, all with 4 byte FIFO
  • 7 Serial, all with 4 byte FIFO
  • 32 general purpose DMA channels
  • 31 PWM pins
  • 40 digital pins, all interrupt capable
  • 14 analog pins, 2 ADCs on chip
  • Cryptographic Acceleration
  • Random Number Generator
  • RTC for date/time
  • Programmable FlexIO
  • Pixel Processing Pipeline
  • Peripheral cross triggering
  • Power On/Off management

The board consumes around 100 mA with a 600 MHz clock. Although I didn’t try it myself with the evaluation boards I have here, Paul notes that it can be overclocked for a performance boost. It also supports dynamic clock scaling: the instruction clock speed is decoupled from the peripherals, so that baud rates, audio sample rates, and timing functions continue to function properly if you change the CPU speed.

For the ultimate in power savings, you can shut the board off by adding a pushbutton to the On/Off pin. Pressing the button for more than five seconds disables the 3.3 V supply; a subsequent brief press will turn it back on. This doesn’t affect the real-time-clock (RTC) functionality, however: connecting a coin cell to the VBAT terminal will keep the time and date counter going.

Hands-On Benchmarks

Board CoreMark
Teensy 4.0 2313.57
Teensy 3.6 440.72
Sparkfun ESP32 Thing 351.33
Teensy 3.5 265.50
Metro M4 Grand Central
(overclocked CoreMark: 536.35)
214.85
Teensy 3.2
(overclocked CoreMark: 218.26)
126.76
Arduino Due 94.95
Arduino Zero 56.86
Arduino Mega 7.03

To see how fast this thing really is, Paul ported the CoreMark embedded-processor benchmark to the Arduino environment. (Note that CoreMark seems to be a registered trademark of the Embedded Microprocessor Benchmark Consortium (EEMBC)). This synthetic benchmark tests performance managing linked lists, doing matrix multiplies, and executing state machine code. He reports the following scores for a number of boards (larger numbers are better).

I was able to verify the Teensy 4.0 and 3.2 numbers; my 3.6 must have sprouted legs and walked off somewhere, and I didn’t have any of the other boards handy for testing. Using my numbers (nearly identical to those above), the 4.0 is around ten times as fast as the 3.2.

Since the CoreMark code is a “synthetic” benchmark, Paul wanted to test the new board in a more realistic scenario. In another GitHub repo, he has some code to do an RSA signature with a 2048-bit key. This is a processor-intensive operation, believe me — I had to implement it once in Lua (don’t ask!). Here are the scores for the same boards (lower numbers are better).

Lower is better
Board Seconds
Teensy 4.0 0.085
Teensy 3.6 0.474
Sparkfun ESP32 Thing 0.518
Metro M4 Grand Central 0.840
Teensy 3.5 0.909
Teensy 3.2 1.325
Arduino Due 1.901
Arduino Zero 9.638

Again, I was able to verify the numbers for the Teensy 3.2 and 4.0 boards. In this case, the 4.0 is around fifteen times as fast as the 3.2.

If you have any of these, or other Arduino-compatible boards lying around, clone one or both of these repos, open the respective *.ino file from either one, and test them out. Feel free to report results in the comments below.

15 Seconds to Sanity

One of the new features of the Teensy 4.0 is the automatic recovery process, which restores the board to a known good state without the need for a PC connection. If you press and hold the reset button for 15 seconds, the red LED will flash to indicate you’ve entered restore mode. Once you release the button, the red LED will illuminate while the flash memory is erased and re-written with the traditional Arduino “blink” program. Once the re-write is complete, the blink program is run and the orange LED begins blinking, just like on every Arduino-compatible for the past decade and a half. It’s DFU mode without the need for host computer or known-working binary. These used to be key components for hardware-based restore and now they’re part of the board itself.

Why would you want to do this? In a nutshell, because USB itself is a train-wreck. On top of an insanely sprawling and complex protocol, there are charge-only cables sans data pins lurking in your junk box, operating system bugs waiting to trip you up (looking at you, Windows 7), and a whole host of other issues that cause serious head-scratching when things stop working. This can be especially confusing with native-USB boards like the Teensy 4.0; while the built-in USB functionality is amazingly powerful, and can be used in a wide variety of ways, when something stops working, you’re not always sure how to get back on track. Now, you are – just press the button.

What Can You Do with a 600 MHz Microcontroller?

Paul envisions this Teensy 4.0 being used for polyphonic audio synthesis, running moderately complex machine learning algorithms, and real-time audio analysis. In many cases, the first level of processing on data-intensive input devices can now be moved from a host computer to the external microcontroller, narrowing the bandwidth required to the host system. And for projects driving a display, the built-in pixel processing pipeline can also accelerate graphics operations, offloading this work from the CPU.

There will be some fraction of hackers that will still wonder why we need a 600 MHz microcontroller; another fraction will have already needed it yesterday. In between, most users will take some time to figure out what doors this opens up. The reality is that our tools constrain not only our current designs, but also, to some extent, our imagination. A 15x performance improvement over the current tiny development board you may be using could enable some new and exciting applications, and you, dear reader, are the one who makes them happen. So, drive home a different way from work tonight, sleep on the sofa instead of the bed, or use whatever other tricks you have to shock your brain into creativity and figure out what you could really do with this thing. It’s a lot more than you can do with a 555. For that matter, it’s a lot more than most computers could do in the 90s.

177 thoughts on “New Teensy 4.0 Blows Away Benchmarks, Implements Self-Recovery, Returns To Smaller Form

      1. I don’t know about any benchmarks, but I’d honestly grab the Teensy over the Spresense any day for the fact that Paul has built, tested and supports the libraries for the Teensy, while the Spresense seems to be a project that Sony kind of just threw together and onto the market without much thought.

        Yeah, 6 cores and a number of features sound impressive, but that’s an expensive hobbyist board which is difficult to use in practice. In order to create a commercial product you’d have to figure out how to develop the hardware for it all on your own, while Paul from PJRC can provide you with a lot of the stuff already done, in addition to the amazing forum support (and often personal responses to questions!)

          1. Also it has branch prediction so it’s susceptible to at least one variant of Meltdown / Spectre; hopefully the silicon is new enough that it’s been fixed.

        1. It looks like that microcontroller has two internets (although I don’t know if they’re broken out on the Teensy). It’s fast enough that you can probably bit bang multiple 10Mbps Ethernets using the multichannel DMA too.

    1. Well for one thing, I’ll bet you can finally drive a reasonably-sized display at 60Hz without spending a lot of time optimizing. Ditto for reading off of cameras.

      And 100mA at 600MHz…so like, a third of a watt for some stuff that you might have previously used a Pi Zero for? I can think of an idea or two, like replacing the guts of old broken handhelds.

      1. 100mA at 600mhz is meaningless for most usage scenarios, especially in the scenarios you’ve suggested since those are real upper end figures were absolutely everything is on and running at full speed and the CPU core is working its ass off. It has a graphics (pixel) and 2d video processing engine just for camera reading and display output, it will probably chew though those tasks without breaking a sweat and bonus is the dynamic clock means 600Mhz is the upper end of where it can peak, most of the time it will probably not run that fast unless given a really beefy workload which won’t be graphics processing since that should be handled by the sorta GPU side of the chip.

        That particular MCU, the NXP iMXRT1062 has rocked the boat already. There are demo boards with super stupid slick video displays out there. If you Google an12245 pdf you’ll find NXP’s power consumption report for a demo board of that uses the same MCU. When everything is on and all peripherals active that baby is running at 600Mhz (report doesn’t say what its doing exactly during these measurements) the measurements at the DC to DC power supply says a current of 87.68mA

      2. This is exactly what I will use it for but firstly, I am using those RGB HUB08 displays.
        I don’t have a 4.0 yet but with a look into the files, the I/O is kinda limited.
        Given I will only be using the pins with holes (not the pads on the back), the widest port is port6 with 6 bits and 4 bits contiguous.
        That sucks because Teensy 3.6 has 2 ports with 8+ bits contiguous (good for fast 16-bit output)
        Luckily my displays use a 6-bit data bus.
        There is accommodation for a MicroSD card slot, it’s native-SDIO. The extra RAM and included library will be great to minimize any streaming lag.
        Maybe the extra CPU can be put to use with decompression.
        The price is great, though, maybe offset through additional hardware.
        I think a 4.1 will be released soon, let’s see.

    1. The customer support PJRC and the user community at the PJRC forum offer is fantastic. Makes the value much more than the specs alone would indicate. I’ve been a happy customer since the Teensy 3.0.

  1. The small footprint yells battery driven gadget, 100 mA not so much. So underclocking would be as interesting as overclocking. Don’t get me wrong, saving space in a design is always desirable and if current is key, just take an older one, but I still like to know.

    1. Modern MCUs (just like modern CPUs) aren’t designed with a fixed CPU clocking scheme. What you call “underclocking” is actually just “using the device as intended and well-documented in the datasheet and application guides”, i.e. using the flexible clocking architecture to run at reduced speeds when feasible.

      Generally, it’s often wiser to run fast for as shortly as possible, and then completely go to sleep, and only wake up 1ms later (that’s sixhundred thousand clock cycles at 600 MHz) on a timer interrupt to do the next chunk of work

      1. “it’s often wiser to run fast for as shortly as possible, and then completely go to sleep,”
        – Not true. The Frequency/Current curve is nowhere linear. It is more like exponential.

        1. But if you only need a few computations to react to rare IO changes, this can be fantastically more efficient. You could also run at lower clocks when not sleeping but the sleep trick, in certain cases, can get you 95% of the power savings you might hope to get.

        2. When the cpu wakes it’s not just the cores that go online but also everything else like IO and other miscellaneous functions. So it could indeed be better to run fast a short while than slow a long while.

        3. Well AFAIK in CMOS techs current versus frequency should be linear if the supply is kept constant. It will not be linear in newest processors that also scale supply voltage to save power on lower frequencies, or boost it to enable higher frequencies.

        4. Very late reply, but what you say about nonlinearity only true for dynamic power, and even then only if you adjust voltage to the minimum for the clock speed (low-end uControllers typically don’t). Static power (i.e. leakage) is a significant percentage of total dissipation, and is proportional the area of energized silicon. Going to sleep can allow you to power-gate (deenergize) significant portions of the chip.

          The poster you replied to was basically right: In modern designs fabricated on modern processes, “hurry up and sleep” is often a valid strategy.

          There are a lot of variables here. though. As an example, leakage is much worse for “fast” gates than “slow” ones, so if you have additional core that are optimized for low speeds to begin with (i.e. that use mostly slow gates) then the tradeoffs change again.

      2. The answer depends on what you are using to power a microcontroller and the amount of computation needed. A battery has internal resistance and as such its efficiency and discharge capacity drops off rapidly when an excessively high current load is used.

  2. Looks like a awesome board. But, Arduino? is the flagship way they are promoting it?
    Thats like handing a race car to a 9 year old. Arduino, is the worst IDE out there, why its the “go to” for most people many of us will never understand. Get with the program man, get a real IDE and RTOS environment going already.

    1. > why its the “go to” for most people many of us will never understand.

      Simplicity. It’s not for people that knows what a programmer is, or why a RTOS is useful. Is to people without computer skills, without electronics background that will watch some Youtube videos and start hooking leds and switches to a board they got from China.

      It’s not for you, and not for me. But for my kids, your kids. The 9 year olds driving a 0.4HP electric car daddy made on the basement. Not for Nascar drivers…

      1. Exactly. Even for those who do know what a programmer is, what real debugging is, assembly code, etc., etc., it’s simple, and depending on your project(s), easy can outweigh powerful. With arduino, you don’t need tons of configuration / setup to make the environment work for any given chip/board, even moreso config specifics I’m not going to remember after not using the dev env. for 9 months.
        – If you’re a professional, sure, you’re probably using something else. But it really isn’t ‘that bad’ for even relatively heavy recreational use, especially given the board support, libraries, and console ease of use.

          1. Ok, I know this not a Teensy story, but here goes. Once played with writing an explosion with pixels flying on Apple II in Lo Res mode (don’t ask, I was 14). Call -151 and hacked up the thing in straight hex opcodes. And lo and behold, it worked! Problem is, it did not do bounds checking, the asm was in memory right next to the LoRes graphics RAM and some pixels flew out. My program was killed by shrapnel from the explosion it caused. Never had saved it either.

    2. I appreciate that it handles a lot of the setup issues out of flashing microcontrollers. Load up the plugin for the given board, and then press one button to compile and upload everything. But I rarely do development in it directly.

    3. It’s the “go-to” for those who don’t know better, to be frank.

      Ie, any newly starting out hobbyist/student.

      They can cut their teeth there, and then when they know enough to look under the hood, they will be in a better position to understand things like RTOS and bare-metal, than if they were just dropped in the deep end.

      Also, I mean, cost is a factor. If you want a hardware-agnostic (ie, non-vendor locked IDE), you’re paying a pretty penny, or rolling with Eclipse.

      If you’ve not played with arduino to the point of growing bored of it, you’re not setting up a RTOS env in eclipse any time soon…

      1. I started with PICs in ASM then C, these days I only use Arduino for my projects because I can move SO much faster. I can’t say I miss spending hours/days/weeks just doing basic shit because you have to spend so much time going through the several hundred page manuals to figure out the quirks of the particular chip you’re using. Which bits do I have to flip on this thing to enable X, will that mess up Y, etc etc? Plus the errata once you’ve spent a full day trying to figure out why the hell X isn’t working as it should because you forgot to do that first. Ugh.

        1. +1 million. I use Eclipse for hours a day for non-micro programming, but still haven’t set it up for micro’s ever., even though I’ve been working with various ones in at least some capacity for 20+ years. Way to much involved for the benefit provided in relatively small projects, and it’s so easy in the arduino interface to try out new micros for a one-off project that I’m happy to put up with the editor quirks and shortcomings.
          – A Ferrari is nice, but if you only need to go a block to the grocery store once a month, it’s not (realistically) worth the upkeep (sorry, bad example, I’d be happy to have a Ferrari to take to the grocery store :-) ). I wouldn’t say Arduino is only used by those who don’t know better, it’s also used by those who do know better, but are ok with the tradeoffs for prebuilt chip configs, library/board manager, etc. Heck, I was interested in trying OTA programming the other day, and it was literally a 10 minute undertaking to have up and running with an existing project in Arduino IDE. You’ve got examples and libraries at your fingertips with no fuss.
          – Maybe someday I’ll have a reason to bite the bullet to set a chip/workflow up in eclipse, but I haven’t yet. Even getting into ISR’s to handle multi-channel ‘wierd’ communication protocols like Wiegand wasn’t that bad.

    4. For large and complex projects, I prefer to work directly with hardware, and usually RTOS. But you know, sometime you get this idea for something simple… and A teensy and a handful of Arduino really cuts the time from idea to delivery to shreds. I’ve thrown projects out for a friends art installations in very few hours where a “proper”* implementation would have taken days.
      Yes the IDE really really sucks. And for the sake of newbies I really wish they would improve it considerably. I’m not talking Eclipse or anything, but more modern code editor idioms would be handy. I suppose It gets the job done. kinda.
      * if it works well enough for the job it’s “proper”.

      1. Seconding time to implementation. With Arduino I can throw something in a few dozen lines of code and have a proof of concept up immediately. Especially when working with a new microcontroller I can often have something pretending to be working through Arduino before I can get code running on the new guy.

    5. The real IDE for Arduino is called VisualMicro. It’s basically a plugin for Microsoft Visual Studio (or Atmel Studio, which is the same but free and includes Visual Assist by Wholetomato) to use Arduino sketches and libraries.

      Without compatibility with Arduino libraries, the Teensy ecosystem would be nowhere. Yes, everyone agrees with you that Arduino’s setup() and loop() and stupid preprocessing is silly for advanced programming. But the giant number of hardware support libraries make it very tempting, compared to reinventing the wheel. If you want to write your (e.g.) I2S audio support library from scratch, be my guest. But I have a funny feeling it’s going to take less time to just deal with Arduino and have that audio library up and running in minutes.

      Meanwhile, I wouldn’t be surprised if popular RTOSes have already been ported to the microcontroller on this board. If not, someone might be working on it. Maybe you could do that.

      Instead of the kid in the race car, you could say thanks to Arduino compatibility, this board is like a sumo wrestler with a food truck.

      1. I second your recommendation for Visual Micro, I remember a few years ago when I used it with an esp8266 for a larger project, it was a little bit of a pain to get all the libraries and header files and things synced up with all the visual studio nice features (likely it has gotten even better though, the esp8266 has always felt a little janky in Arduino but is always slowly improving), but it was very good after a few hours of tinkering with headers and things like that from what I remember. (I was also fighting with a lot of different obscure libraries that were not maintained much and not really designed with the 8266 or Visual Studio in mind so that was part of the headache). It was a project that had quite a few different files, and the cool Visual Studio features made things a lot easier to deal with for the larger project.
        Visual Micro sure has my endorsement, it helped me a lot because of its wide array of real-time debugging helper features that are impressive to see working with Arduino code. It is a very cool plug-in that helped save my sanity during any large complicated Arduino projects.

    6. Arduino Schmino. I agree. It’s one horrible IDE.

      Why is there still no JTAG/SWD?

      Teensy at least had the pads on the bottom of the board, and although you had to remove components to get it to work, at least they were there. I want to use my J-LINK to program, with a real IDE.

      1. For development you can buy a board with the processor (also other variants of the i.MX RT) and all pins available, the USBs and Ethernet connected, Jtag or with on-board debugger. Costs $99 and an additional TFT display to connect costs $29: https://www.nxp.com/design/development-boards/i.mx-evaluation-and-development-boards/mimxrt1060-evk-i.mx-rt1060-evaluation-kit:MIMXRT1060-EVK Once the firmware development is completed (done with NXP’s free MCUXpresso tool chain or a commercial one) the hex can subsequently be loaded to the Teensy 4 (which has no Jtag debugging support) – where it no longer need to be debugged.
        The Teensy 4 is mainly for running developed code on when you want to build a few devices – a proper development board is more useful for actually developing.
        The price of the Teensy 4.0 looks very fair but for higher quantities a custom board is needed due to the fact that you are paying about $7 for the loader chip that is only used to load the code – pads for production programming cost nothing and so is more economical.

        1. I ask because it was partially implemented on the Teensy 3.6 and, well figuring this was a Teensy 4.0, you would think it would have been fixed. I’ll have to look into the NXP development board. Thanks for the info!

          1. I’m keen to get a dev kit in the near future.
            I understand the board supports wxga LCD interface for which I have a few lying around in old laptops. Shouldn’t be too onerous to take the Intel guts out of a suitable laptop or even tablet substituting the dev kit, even patching in a ps-2/serial keyboard vs using up parallel ports for eg effort efficiency. Heck could even jam the dev dev kit into an earlier 386 notebook in the place of the thicker floppy ;-)

      1. I also recommend PlatformIO.

        It’s a unique tool unlike any other.
        It’s also the most flexible thing I’ve ever encountered in the microcontroller world.
        It can work with “arduino”, “mbed” and several other platforms.

        It also comes pre-configured with a few decent IDE’s (as opposed to the java arduino crap which is not even a decent IDE).

        PlatformIO also has an amazing number of searchable & installable software libraries for preripherals.
        And yet, the core is very simple. A handfull of Python scripts and a configuration file which is a replacement for makefiles. Because of this CLI interface it can be easily integrated in any IDE you prefer.

    7. You have to use the proper tool for the job. I use both Arduno and Crossworks and I would never switch to only arduino or only crossworks. Arduino is for the quick throw-a-library-and-two-switches-and-an-128-64-OLED-together for a fun weekend thing. Crossworks is for my 9 to 5 where i’m using custom written os, state machines, drivers, fontends, crazy code optimisations and have to ensure solid reliability. And yes, arduino ide is pretty bad. Slow compile times, no debugging options, huge compiled code,… but it’s quick and easy enough for one man operations. It lets you put your ideas from your head to real life in no time.

    8. I actually think the Arduino IDE is quite usable for many applications, and it is quite simple to install and use. The only thing I really hate is the slow compilation time. I tried platformio but gave up because it seemed overly complicated for what I wanted to do (but maybe very flexible for others who are more knowledgeable).

      Do you have any specific examples of situations where using another IDE or language/compiler would give significantly better performance, or make development easier?

  3. The issue with floating point in real-time systems isn’t usually the speed; it’s the fact that on many platforms, including ARM, unlike more general-purpose registers, FPU registers aren’t saved upon entering an interrupt service routine.

    And with the ability to preempt an ISR with a higher-priority one, this can get very confusing and dangerous very fast.

    So, you usually **mustn’t** use the FPU when in an ISR, which is what many people (not me) mean when they say “real-time processing”.

    1. If you don’t enable FPU register saving (which is absolutely a feature on ARM M4), and don’t do it manually, and then try to do floating point math, you deserve what you get…

      1. You’re right with that! But don’t underestimate the fact that saving the FPU registers doesn’t come for free (instantaneous), and hence there’s good reasons to avoid it.

  4. Have almost 100 fielded Teensy-based systems (3.0, 3.2, and a few 3.5), mostly in agricultural control and monitoring stuff.

    Personally, plan to use the 4.0 to immediately start a design that updates my 3.2-based hearing aid, and become drunk with the power of massive ‘instantaneous’ audio transforms. But professionally, intend to start my use of the 4.0 with small, measured steps. Reliability, EMC, and power profiles need to be measured.

    1. Having worked in engineering for a hearing aid company, there are vastly better chips for hearing aids. I suspect the power consumption on this bad boy will be a tall mountain to climb.

        1. Thanks Brian,
          Ah ha – I see what’s happened, Arm has licensed the core to NXP for which the developer arm website has their general core details which was my link in above comment. NXP has then implemented their specifics which has general summary for the Teensy 4 in your link for the 1060 chip implementation.

          I write general as it’s only 111 pages, the full reference seems to be the slightly more comprehensive at 3355 pages with no login required at the NXP website though it’s the slightly less endowed1050 chip
          https://www.nxp.com/docs/en/reference-manual/IMXRT1050RM.pdf

          The full details on the beta Teensy 4 on this link
          https://forum.pjrc.com/threads/54711-Teensy-4-0-First-Beta-Test

          which has all the relevant technical details and the comparison between the two NXP variants 1050 vs 1060 as well.

          Hmm, I can also see a high speed ADC being of key interest, I’d need 8 of them at least, operating in same parallel sync and of 1-10MHz or so B/W each – maybe proof of concept unit I can just mux em and correct for offsets subsequently :-)

          Looks like I will be ordering at least a couple fairly soon, dealer in Australia ? Cheers

          1. That’s standard for every ARM chip, both for microcontrollers and CPU’s. ARM only licenses cores; it’s the manufacturers who add their own peripherals to create a full microcontroller design. So you use the ARM docs to understand the core, and the manufacturer docs to understand the peripherals and the hardware interfaces.

          2. Of course pelrun in the general case given Arm’s business model as it has developed but, bear in mind there is nothing to hold that necasserily should always be the case. Nothing precludes Arm from sampling new core designs or even simple amendments on any sort of sampling basis from third parties at any time such as an appropriate fab house. The way the article is written in excluding NXP as supplier suggests it could be from Arm in the early stages of release as they have done before, hence the value in clarifying the issue as in careful to not assume otherwise especially in a tight commercial environment.

            The teensy 4 comes across as great teaching tool and development platform for quite a few tight designs mostly around the chip in specific embedded environments, I can see I’m going to be busy this weekend looking closely at the reference manuals from both Arm and NXP and pinning down the key differences and detail eg 1060 vs 1062 etc…

          3. Fwiw in respect of going down the path of a high end consumer or industrial product beyond the teensy 4.0 albeit at very low cost, there is a evaluation board for the 1060 CPU at a little more, price of a fair lunch with secretary, for serious appraisal such as to explore more sophisticated embedded end user environments for the chip and with LCD too !

            I still like the teensy 4 for education and training but, this is more to my liking so far, once I check out the technical details in excruciating attention and between the lines stuff…

            https://www.nxp.com/design/development-boards/i.mx-evaluation-and-development-boards/mimxrt1060-evk-i.mx-rt1060-evaluation-kit:MIMXRT1060-EVK

        2. The 3637 page reference manual (you need to be logged into the nxp site, then search for IMXRT1060RM) on page 3456 has a block diagram of the ADC. And also a detailed explanation of all it’s modes and of its many configuration options.

          I assumed that the sample frequency could be the same as the ADC clock (like could be done on the NXP LPC4370 using an external SI5351C clock up to 80MSPS, but with no way to transfer the data off chip, or enough processing power process it on chip in real time), but it is not the case for this NXP chip the maximum is roughly 30x to 40x less than the ADC clock frequency depending on the options enabled.

          1. Hmm, getting some ADC b/w off chip could still be done ‘reasonably quickly’, I see the lpc4370 has DMA presumably to ram with some minimal processing then to LAN. If I can find a variant with DSP and a few counters too then LAN might be good enough for what i can demo, ie quicker to develop than the FPGA demo etc…

      1. 1) You’re probably aware of this, but strictly speaking, Python != Micropython, so there is a world of difference between Python on Linux and Micropython on bare metal.

        2) Can you see all the arguments above re: the benefits of using Arduino vs lower-level programming? Keep in mind that the same argument can be made about Micropython vs Arduino: fast & pleasurable development; vastly increased readability of code promoting collaboration; diversification of the user base.

        3) If Micropython is fast enough for a lightsaber implementation with real-time dynamic sound and neopixel effects driven by a MCU, it should be fast enough for many other applications.

        4) Not saying you’re wrong to use what you use. Just that platform wars are not super useful.

  5. Very cool! I have used the 3.1 for polyphonic audio playback (https://hackaday.io/project/6881-drum-master), but that topped out at about 11 simultaneous sounds (14 if I overclocked the Teensy). I suspect that limit would be essentially gone with this new board.

    As indicated by others here, [Paul] is also very active and helpful on his forums. Looking forward to trying this chip out in a new project!

    1. The Tsunami board also uses a Cortex M7 chip and supports polyphony to 32 mono or 18 stereo simultaneous uncompressed 44.1kHz, 16-bit tracks. It doesn’t handle MP3 at all as the creator claims it is a patent issue but that patent expired a few years ago.

      It’s also $80 or so retail price though versus $20.

      It would likely support considerably more if it was playing MP3 and not uncompressed WAV files? Unless decoding CPU cycles was the bottleneck rather than bus speed.

  6. Is hardware reset gone entirely? Teensy 3.0 had a reset pin, and in 3.1 and 3.2 this was moved to tiny (and very fragile) dot on the underside of the board (which I thought was a poor design decision). With the 4.0 the 3.0 reset pin location is occupied by a on/off switch connection, and I don’t see any connection labeled reset. Is turning the chip off the same as resetting it, or is it like going to sleep?

  7. And 5 times more expensive than the 3.6?
    You might as well go with the Pi Zero and bare metal.

    I’m currently building a tool chain for the W600 without WiFi.
    That is a $1 dollar MCU, allthough only a M3 80MHz ARM, with 288KB SRAM and 1MB flash.

    You just cant pay $30+ for a board that your supposed to leave in your application.

    1. I can’t speak for codec overhead, but 240p30 would be pushing 2,304,000 pixels per second uncompressed. (and far less if only marking changed regions like the B-frames of MP4 does)

      Roughly, at 600MHz, that leaves you room of 260 CPU cycles per pixel before compression. Should definitely be enough.

  8. >which restores the board to a known good state without the need for a PC connection.
    Except the known state is “traditional Arduino “blink” program”. So unless your application is the blink, you’ll still need a PC connection to restore the code you intended.

      1. At least to some extent. :-)

        I know of a board where a short circuit of the PSU (12 or 24V, capable of some 10s of amps) through the GND lead of an FTDI cable fried the external core voltage switch mode regulator. This delivered 2,6V instead of 1,2V. For test I replaced it with an LM317. The CPU started it’s LED blink from the test program again.
        The UART was still dead.

        1. Yeah, I had this problem with bluepill, I had a usb cable which somehow managed to fry something internal in stm32, processor worked, could be programmed and debugged, but used 200mA of current and was seriously hot. That cable somehow fried two mcu on two totally different boards (bluepill and my own design), one didn’t even had data lines connected. I still don’t know how it’s possible.

    1. Depending on the circuitry it is wired to having any pin defaulting to be a blinking output could be pretty dangerous. Imagine if you’d wired this pin through a switch (inf ohms if open, 0 ohms if closed) to gnd. Then when it blinks you’d have an overly large current draw from the pin to ground. Might be wiser to default to a sketch which does nothing at all.

        1. Hmm, nice one could use this maybe as cheapie low parts count dac eg into a resistor network and one of those sot23 5 pin opamps. If this were available on lower end chips also of small size opens up a couple of product ideas.

          Anyone off hand know the MOQ and price per unit FOB for this particular 1062 variant ?

          1. The programmable output strength is usually done by selectively wiring MOSFETs in parallel in the output driver. It’ll be affected by the usual voltage, temperature and process variations. Your crude DAC would also cause some heating effects in the transistor.

            You are much better off using a binary weighted resistor DAC. All you need is an additional resistor network and 2 extra I/O pins. e.g. a bussed resistor network – with 7 resistors connected to a common pin (available in through hole SIP or SMT). wire 2 resistors together and another 4 together to get a 1:1/2:1/4 to get your 4:2:1 ratio.

            https://www.allaboutcircuits.com/textbook/digital/chpt-13/r-2nr-dac/

            To make a better DAC, you want larger external resistors to minimized the effects of internal driver resistance.

          2. I think you misunderstood my post tekkieneet Or you are concerned with heating CPU output port designed to take it ?

            If I understand correctly about the cpu, some pins can have programmable ‘strength’ which i interpret to mean programmable drive current into some load range presumably designed so as not to heat the output drivers out of spec when loaded. If that is so then can feed those ouputs into (your suitable binary ladder) resistor network with voltage amplified/filtered by an opamp. So you get a voltage output and load capacity of the opamp of some range according to the bit pattern sent to the relevant pins. It’s only comparatively crude in respect of an ideal since summing I think max of 7 bits but, hey it’s quick, cheap and low board space and thus suitable for some low end dac options such as a CPU controlled voltage or current source power supply.

            What you do with that voltage is up to you, eg drive a series pass BJT or p-channel MOSFET if you prefer for a variable voltage supply of whatever current rating you desire and can design, heatsinking the series pass transistor part of the design constraints for the current you want is at your discretion.

          3. – “summing I think max of 7 bits ” 7 levels of I/O drive =/= 7 bits It is only 3-bits

            > voltage amplified
            Actually you are better off treating the I/O pin as a current mode DAC and use the opamp as transimpedance amplifier. Your I/O pin won’t be too happy to see mid rail voltages. By fixing it to GND (i.e. opamp referenced to GND) makes sure that it stays there.

            Remember that your idea of a DAC depends on MOSFET resistance. Self heating due to loading effects no matter how minor would adversely affect the DAC linearity. BY using large values external series resistors, that effect is minimized.

            Let’s say I know exactly what you are trying to do. You obviously haven’t spend more than 5 minutes thinking about it.

          4. Ah just saw your other post tekkieneet, lol at ‘exactly what you were trying to do’ – didn’t I make it clear…
            I was referring to 7 port lines for the 7 bits which of course means you can get greater finesse With drive options for each port line as well if you want…
            Re your idea of ‘not happy’ are you suggesting the port line has some feedback issue which could get in the way, that could even be most useful depending on its b/W :-)
            I never suggested attaching resistors so as to run it out of spec or even get close to self heating, obviously it would be basic ‘bad engineering’ practice and thus sensible to use highest value res net suitable to avoid any tangible level of self heating given the range of drive currents from the few selected pins, that’s why I wrote 7 because I read there are only 7 port lines that offer 3 bit binary control of drive strength, correct me if I am wrong np – the manual I read yesterday was 3000+ pages, must have another go at it !
            Lol, I’ve been doing this for decades don’t need more than 2mins to collate engineering experience since late 1970’s, done it often and sure you can use current mode np but, hey see my post about feedback and your worry about PVT.

            I don’t see why you care or need to tangentially insult eg 5mins issue ugh, it’s dead simple stuff you misread my post and likely didn’t know or recognise there are (if I am correct) only 7 pins that allows drive strength selection.

            Of course I would be interested in feedback on the CPU re pin drive strength, ie your point about ‘not happy’, care to check that ?

          5. PVT variation means that your DAC is that is calibrated to a one of hack what can’t be replicated without spending time to calibrating it against another uC or supply or temperature.

            You might also want to spend time understand my post.

          6. Well sure tekkieneet the same issue with any low level dac whether labelled as a hack or not, it all depends, that’s not the point. The cpu offers an interesting potential by being able to set the strength per pin of selected gpio lines. If you wanted to correct for temperature variation it’s not hard via the filtered dac output and/or the subsequent path to the device or from the device’s output ie feedback eg on one pin to the CPUs ADC :-)
            I’m in no way pushing for this other than as low level comparatively tight option although it’s most interesting you can combine the selection per pin With more than one pin at the same time (if I read correctly) as part of your mapping calibration as well as achieve more finesse in resolution, well then it’s more than just a hack And uses less pins too than R2R conventional :-)
            In terms of Product Verification/Validation Test (PVT) as I expect you intend that’s also rather straightforward to address with feedback eg subassembly at first power on as such if part of routine test for core functionality ie production line testing equipment commonly calibrated, all you need is the product’s firmware to ‘act accordingly’ in that environment saving time overall and thus cost etc

            Hmm, rather than advise me to understand your post, please clarify it – isn’t that more efficient And helps others eg those too shy to ask, hey what is it you imply I have missed ? I am keen to review & absorb variations, sure it could be labelled a (nasty) hack but, with suitable & fairly simple firmware can be augmented taking it out of hack status, spit it out man :-)

      1. I would actually love a micro with 40 MHz ADC sampling rates that also has enough power to do some processing on them. I can think of at least one application, building a simple multichannel analyzer (for radiation detection). There’s already a project that uses a Microchip part, but its performance is pretty limited due to the sampling rate of the ADC.

    1. There was a 3.x demo of a SDR IQ demodulator somewhere… https://hackaday.com/2014/04/25/building-a-software-defined-radio-with-a-teensy/ then https://github.com/DD4WH/Teensy-ConvolutionSDR, there may have been something else out there that I can’t find.

      It would be nice to have a compact portable device to do SDR demodulation without a PC, and at a decent sampling rate and ADC>12bits if possible. I think I also so a STM32F409 board demo, hopefully someone take this up a notch with the 4.0.

  9. “There will be some fraction of hackers that will still wonder why we need a 600 MHz microcontroller; another fraction will have already needed it yesterday.”
    In my case, it inspires a few more technical, compact projects, and yet there are simpler projects that I would only need something like an Intel 4004. For those, I’ll just have to underutilize an 8 bit micro and try not to feel guilty.

  10. Hm. Maybe a nice one to finally make an open source film and paper scanner backend. As the ccd’s are only serially analog and need a clock pulse to scan a line and advance the carriage…

    1. Depending on your actual “Real Time” requirement. There is RTLinux and other RTOS which specify an upper limit on the response time.
      – Some goes down to exact CPU cycles e.g. bitbanging USB or VGA – no OS
      – Some goes to a video frame e.g. games – can live with an OS

  11. This looks really cool. I likely will get one.

    Though, I would like a breadboard-mount narrow but all one, that only span as wide as a DIP.

    I have another wish: a version with bidirectional level converters. In other words: I want a 5volt version. 3.3volt is fine inside of a box, but when the wires go outside of the box, I want 5 volts, with a greater voltage drop and interference tolerance.

    1. A pair of 10K resistors and a BSS138 mosfet do 3v3 to 5v conversion pretty well. You need one of these three component sets for each channel to be shifted, so could end up taking quite a bit of PCB space. Costs of those parts is neglible, even more insignificant when they’re mass producing. I kind of wish the Raspberry Pi had such level shifting too.

    1. I was wondering about that same thing. The high clock speed, lots of GPIO pins, and no overhead of an OS makes me think it could be a good candidate. I’m not super-familiar with them, but I’ve heard of a number of projects using an Arduino to drive the stepper motor controllers directly, but it seems like this would work just as well, if not better. Although if people are already doing it with an Arduino, what would be the benefit of of switching to this? Not saying it isn’t a good idea (it’s what I immediately thought of as well when I was reading the article) – I just don’t know enough about it to say.

      1. MANY 3D printers use Arduinos or clones…usually megas. The most popular 3D printer firmware is written in Arduino. Many printers (especialy Deltas) also suffer from slow-downs because of this. While 600mhz is overkill, the price is less for this genuine board than for a genuine mega, and you could basically add whatever you wanted to it without worrying about any slow-downs. Only issue is you wouldn’t just be able to buy a RAMPS board; you would have to roll your own. I actually sort of hope Paul makes his own version of a RAMPS board for this; I’d dump my smoothieboard ASAP.

        1. Prusa has already switched to a 32-bit MCU for their newly-released Prusa mini, they probably won’t want to switch to something else so soon.

          Maybe there’s someone on reprap.org who’s going to port Marlin or another firmware to this new MCU?

  12. This is the kind of speed that could keep up with injection and ignition timings on a small engine. The 3.6 and even slower/older boards could do this in the past, but every CPU cycle could cost some timing accuracy for a commanded output to inject fuel or fire a spark plug. That meant separate devices or accessories to drive an LCD, communicate over CAN, provide telemetry etc.

    This board should be able to handle many of these tasks at the same time without costing so many CPU cycles that you start to lose timing resolution…even a couple microseconds matter when there’s multiple cylinders or >8k RPM. If engine management isn’t the right fit, it would at least make a good DAQ for amateur racing telemetry.

    1. Lol, I used a wire wrapped Z80 board at 4Mhz on the nmi interrupt to keep up synchronously with ignition timing and schedule injector pwm on a 1300cc 4cyl Kent motor for the Ford escort even at 5500 rpm, including a bank switch option to switch to cp/m when engine not running. My thesis “Fuel injection with Transmission control” at Western Australian Institute of Technology 1982 Ba. EE (student id 7602128) which I funded all electronics including mechanical fabrication to manifold, vane air flow sensors etc off a 2L VW Kombi approx Aud$1200.
      Caused all sorts of copyright headaches for the electronics department under Adrian Ball and earlier head Alan Cook with Orbital Engine co since I refused to accept their paltry $50 student grant which came of course with badly worded copyright deed of transfer in the fine print – Ha !

        1. Thanks for question Cuthbert,
          Didn’t stay ‘in the field’ in product development as such (though always kept informed & dabbled) due to the college at that time trying to snaffle copyright to my IP by virtue of requiring I accept a $50 ‘grant’ to fund parts for the proof of concept unit ie fine print on acceptance copyright transfer bad poison apple. Besides since Bosch had made the sensors and were already ahead it was dubious I’d go further individually without public co backing. As luck would have it though a senior engineering position opened up at Pretron Electronics Pty Ltd designing & managing control systems for an associated hydraulics company ie metal cutting guillotines and press brakes to 500Tonne capacity. Then promoted to manager till the directors and their Victorian attitude couldn’t compete with cheap press brakes from Malaysia 7 years later totally ignoring reforming to sell our add-on control systems globally despite our tech ahead…
          I’ve always been in the field of EFI though mostly for decades in diagnostics, instrumentation add-ons whilst running my own business in design, testing products and eg investigating so called over-unity ‘free energy’ devices – none even came close, people redeveloped their own electric motors in no way usefully different from the earliest 19th century designs but, so emotionally attached to their idea they could prove physics wrong – hilarious and tragic, so many lost their shirts. Worse these days with so many charlatans exploiting people’s lower education in physics eg; hydrino, battery rejuvenation, motor/generators etc it’s a mess.
          Now I’m mostly retired playing the stock market, occasional product design, reviewing QM, writing on a few forums, researching mineral supplementation that offers cognitive advancement and few odd ball physics research exploiting advanced instruments and math, cheers

    1. This is what I wanted to see! Especially when the article concludes with “but AI” as a possible use-case for all that grunt, then that begs the question of why one would not look at board of under half the price which includes out-of-the-box tensorflow support…

  13. I have been needing this kind of thing for a month now! Unfortunately, dealers in India aren’t able to say when they will have the Teensy 4.0 available. I was reading up and working on FPGA HDL programming. But with an overclocked Teensy 4.0, I can stop wrestling with the steep hill I’m trying to climb!

    BTW, till what clock rate can the Teensy 4 be overclocked?

  14. Powerful chip but the teensy 4.0 software:

    – Does not have a DAC
    – Does not have audio driver support (only slow serial mode supported)
    – No Midi or HID

    The 4.0 Is not ready

  15. I see a use for the Teensy 4.0 in a SDR project. The M7 core running at 600mhz+ should make a very good DSP platform. OK, it’s not a Shark or a Blackfin, but it’s only $20 for a ready to go board, and the compiler and DSP libraries are FREE.

  16. Just loaded the sha256 benchmark onto a maix-bit (kendryte k210 dual core RISC-V64 @ 400MHz). The result was 0.275seconds. Not an entirely fair comparison as there is no assembler optimisations as the math functions don’t have asm for the RISC-V whereas for the ARM there is asm optimisations. Clock for clock it’s about half the speed.

    With some optimisations and utilising the 64bit features, I’d expect the gap to narrow significantly. Enable the other core and there might be a bloodbath :)

    Aren’t we lucky we’ve got such economical and speedy toys to play with!

  17. Yes, we are lucky. Why is a 600MHz processor a ground breaking advance for hobbyists? Because now I can do pulsed LIDAR or RADAR and the data processing all in one Teensy little board!

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.