Free ARM cores for Xilinx FPGAs

In a surprising move, ARM has made two Cortex-M cores available for FPGA development at no cost.

In the over three decades since [Sophie Wilson] created the first ARM processor design for the Acorn Archimedes home computer, the architecture has been managed commercially such that it has become one of the most widely adopted on the planet. From tiny embedded microcontrollers in domestic appliances to super-powerful 64-bit multi-core behemoths in high-end mobile phones, it’s certain you’ll own quite a few ARM processors even if you don’t realise it. Yet none of those processors will have been made by ARM, instead the Cambridge-based company will have licenced the intellectual property of their cores to another semiconductor company who will manufacture the device around it to their specification. ARM core licences cost telephone-number sums, so unless you are a well-financed semiconductor company, until now you probably need not apply.

You will still have to shell out the dough to get your hands on a core for powerful chips like those smartphone behemoths, but if your tastes are more modest and run only to a Cortex M1 or M3 you might be in luck. For developers on Xilinx FPGAs they have extended the offer of those two processor cores at zero cost through their DesignStart Programme.

It’s free-as-in-beer rather than something that will please open-source enthusiasts, But it’s certainly a fascinating development for experimenters who want to take ARM for a spin on their own gate array. Speculation is swirling that this is a response to RISC-V, but we suspect it may be more of a partial lifting of the skirts to entice newbie developers such as students or postgraduates. If you arrive in the world of work already used to working with ARM IP at the FPGA level then you are more likely to be on their side of the fence when those telephone-number deals come up.

Thanks [Rik] for the tip!

28 thoughts on “Free ARM cores for Xilinx FPGAs

    1. Don’t be stupid. Their entire business model is built on IP rather than a physical product so they can’t just give it away. Giving it away like that would corporate suicide.

  1. This may also have something to do with Intel having acquired Altera and their embrace of programmable logic into NICs, as on-package coprocessors alongside server CPUs, etc. I’m assuming that’s the reason for singling out Xilinx rather than licensing this such that you could spin it up in an Altera, Lattice, etc. FPGA.

    Unless the core is open enough to modify (e.g. add instructions rather than just adding peripherals on an AXI/AMBA local bus interface) this doesn’t do much that a Zynq or similar SOC couldn’t (except by virtue of being small and cheap since this is a low end Cortex-M rather than mid-range Cortex-A like in the Zynq or Altera Cyclone V SOC line).

    Is it cheaper in dollars and/or die area to implement a Cortex-M as a soft core than it is to have that Cortex-A hard core? That remains to be seen.

    1. If you have a dedicated need, stripping unused instructions out of an M3 could reduce package complexity, and FPGA resources.
      Stripping out all 8-bit references, making the data bus 16 bits wide, would increase memory to 8GB [2^32 * 2 bytes per address]. Or even implement port-based I/O.

      Sounds great – now let’s see what reality brings.

    1. I don’t think it’s a replacement for microcontroller-based designs. I think it’s a way to costlessly add a powerful processing core to an FPGA-based design.

      I also wonder if this move is driven by trepidation over what Intel has planned for the Altera line they acquired.

      1. This and marketing indecision. Can’t tell how many times marketing indecision caused redesigns. I could easily anticipate their indecisions with a small programmable core. While there are many softcores available already they are not as universally used as ARMs. I don’t know of many software guys who don’t already have/use a cortex tool chain.

  2. What FPGA is required to have a minimal system around a Cortex M1 ? Even a old Spartan 3A is still costly compared to many Cortex-M0+ uC. If a special function require a FPGA, I prefer to just add a very cheap ICE40 aside of a Cortex-M0+ uC.

  3. ARM already proposed the M1 (and M7 ?) as precompiled/preplaced IP cores for the Actel ProASIC3 family (including IGLOO and Fus1on derivations). That’s about 10 years ago, now.
    Actel got swallowed by MicroSemi, which recently got swallowed in turn by Microchip. So this is another way MC gets an ARM license.

    The trick in the A3P chips was to use the included crypto protection to allow the IP core to be flashed into specially-branded chips, sold at no additional cost (it was just preprogrammed with a signature that allowed the software to flash the ARM core). ARM simply wanted to keep control of the dissemination of the implementation… I don’t know much more, except that the protection was easily broken due to bad software practice and a few hours of reverse engineering.

    People really wanted more powerful cores so they soon migrated the ARM core to the “hard” section where the ARM machinery uses a crazy tiny fraction of the die and energy, yet (unsurprisingly) runs faster.
    But the hyperconfigurability was lost…

  4. I guess that RISC-V is the most probable reason of this move. There is numerous RISC-V implementation ranging from little core (like pico-riscv) up to bekerley Rocket (base of SiFive cores) and pulpino families.

    1. Not the only competitor either. Also Microblaze or NIOS II and that Cadence one. All have free compilers and can run a full OS. Presumably ARM are alarmed by the number of FPGA designs ending up getting taped out with these free cores embedded and want in on some of that action.

  5. Just a question but why would I want to use an FPGA implementation of an older generation CPU vs having a modern ARM CPU and an FPGA to perform other tasks – how does having an FPGA ARM CPU benefit me?

      1. A quick Google of STM32 series shows typical clock of 120MHz.[depending on version]
        Xilinx clock rates are buried in data sheets, and tend to get mixed with I/O clock rates, but the Artix-& is 400 to 600 MHz, depending on “speed grade” [chip rating]

        That could translate to a significant speed hike.

        1. I googled one reference to a cortex m0 running on a spartan-6 at up to 60MHz. When people talk about an FPGA being capable of 500MHz, they mean the delay from one LUT to the next as being around 2ns. Complex designs have long distance routing, wide fan out, block RAM/ROM and higher level components which wire resources in series. Another quick google puts the number of pipeline stages on the m0 as 3.

        2. I doubt the soft core will actually be significantly faster than 120 MHz. Maybe 200-250 MHz or so unless you’re in kilobuck land. Flipflops are fast, but the interconnect is slow.

          But the key isn’t the microcontroller’s speed, it’s the *interconnect* speed. You’re not going to get *data* into an STM32 faster than ~20-30 MB/s or so. But because the FPGA fabric has direct access to the same memory as the soft-core does, you’ll likely have well over 100-200 MB/s throughput.

          Honestly the main reason for soft CPUs tends to *usually* be protocol handling, so basically you stick a softcore CPU in the design to handle an IP stack and networking, for instance. And the data throughput from a setup like that could be *much* faster than a standalone microcontroller and standalone FPGA, because the data “just shows up.”

    1. like Pat says, but also you can add new custom peripherals. basically you get a bus (memory-mapped) interface between the core and your own FPGA logic. You don’t need to cram all the data through a narrower interface like SPI.

      Also maybe your application needs the FPGA but doesn’t need much CPU at all; this will save you the $2 BOM cost on putting an M3 on the board. Software is like the ultimate in easy/flexible means of building a complex state machine for control – much easier than trying to do it in VHDL, especially when you don’t need the performance.

      1. I dunno where the M1 fits in all this… or why…..

        I mean a Microblaze or a Microblaze MCS is sooooo easy to drop in…. litterally drag and drop in Vivado. I have many Microblaze based designs all running in blockRAM. Can’t see what the M1 offers…. The M1 on the FPGA wont be very fast.
        I think its just a shot over Intel’s bow.
        glen.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.