When you think of developing with FPGAs, you usually think of writing Verilog or VHDL. However, there’s been a relatively recent trend to use C to describe what an FPGA should do and have tools that convert that to an FPGA. However, at least in the case of Xilinx parts, this capability is only available in their newest tool (Vivado), and Vivado doesn’t target the older lower-cost FPGAs that most low-cost development boards use.
[Sleibso] who blogs for Xilinx, has an answer. It turns out you can use the Vivado C compilation tools to generate code for older FPGAs; it just involves a less convenient workflow. Vivado (even the free version) generates unique files that the rest of the tool uses to pick up compiled C code. However, it also generates RTL (Verilog or VHDL) as a by-product, and you can import that into the older ISE tool (which has a perfectly fine free version) and treat it as you would any other RTL files.
There’s an example of using the Vivado tool in the video below. [Sleibso] points out that the video is three years old, and the talk about licensing on the video is out of date. The free tools now including this capability. [Sleibso] talks about using a Spartan 6, but the same split workflow should work with most devices ISE supports.
It is worth noting that this isn’t the same as having a CPU on your FPGA and programming it with C (although Vivaldo will do this, too). The technique [Sleibso] mentions converts a subset of C into FPGA logic. If you are interested in using C with an FPGA, Xilinx has a good document about how it all works. If you are expecting to just port code directly, be aware that things like dynamic memory allocation are not possible on the FPGA. On the other hand, you can get outrageous performance in many cases with this approach, especially for heavily compute-bound software like video processing or DSP.
If you want to learn more about FPGAs in general, and the traditional Verilog language used with them, you could do worse than read our three-part tutorial on getting started with the iCEStick FPGA board, or the more advanced tutorial on implementing a UART-controlled PWM controller on an FPGA that’s currently in progress. If you like Python, there’s a way to convert that to FPGA configurations, too.
Thanks [Richard Milward] for the tip.
Spartan 6 Photo: By Dake (Dake) CC-BY-SA-3.0 via Wikimedia Commons (modified)
Without having experience with this particular tool, can anyone comment on how customizable the sythesis of logic elements from the C code is?
For example, when writing an iterative algorithm processing a stream of data, can you specify (with pragmas or otherwise) whether to instantiate the combinatorial logic inside the loop once (minimize logic footprint) vs. instantiating it multiple times and pipelining the stages together for greater throughput at the cost of increased logic elements?
i am slowly playing around with this. It is well worth trying. The guts is pretty simple to explain. You write your C code and test it, in a very simple IDE-like environment. You can write standard C test benches to verify your code works. Then you convert it to HDL logic.
The steps are:
a. The HLS tool breaks the codes into “pseudo machine code”, each of which can be implemented with FPGA logic. (e.g. FMULT, ADD, INC)
b. Loops are analysed, and as directed by analysis pragmas can be unrolled. Loops that are not fully unrolled become the throughput choke points – e.g. if a complex multiply takes 30 cycles, and you do it up to 256 times the latency will be between 30 and 7,680 cycles. However, if you unroll it 256 times (with the #pragma HLS_PIPELINE directive) it will become a 7680 stage pipeline which can accept new data every cycle.
c. Function calls can cause either a separate instance of the logic required to support the sub-function to be created, or can cause some sort of arbitrated interface to be created for accessing a single instance of the function’s logic.
d. A schedule for how these pseudo machine code operations can be chained is generated, and takes into account the desired performance constraints (e.g. clock speed).
e. The code generator then maps the pseudo machine code operations onto logic, and emits HDL code (e.g. Verilog or VHDL), with a simple stream-line interface for parameters and return values. It seems you can also add other pragmas or datatypes to make AXI interfaces if desired – I haven’t looked at that yet.
f. You can then take this code/IP block and include it in your traditional HDL project.
Sounds easy!
When it comes to optimizations there are a whole lot of custom datatypes (e.g. n-bit integers, fixed precision integers…) that can be used, but I haven’t really got that far into it yet. Using these appropriately is important for generating minimal logic and increasing design performance. Pointer support is also a bit variable – contention for the memory interface can be an issue.
I’ve got a thread in the Microcontrollers and FPGAs section on EEVBLOG that I’m posting my limited experience if anybody is still interested…
Thanks Mike, that’s actuallly pretty interesting. Link to the EEVBLOG thread?
The last time I looked at these tools (~2007-ish) they were pretty bad compared to what our EE could produce with some guidance on the structure of the math (crypto, mostly both block ciphers and elliptic curve point multiplication). Sounds like things have progressed a bit. That said, it sounds like DRAM is still a pain w.r.t. performance. That’s true in the SW world as well – HW just makes it easier to see why you are screwed (superscalar CPU performance is hard to analyze…)
Link to thread is http://www.eevblog.com/forum/microcontrollers/vivado-hls-in-action/
A lot of the problems are still the same but different – if you switch from C standard 32-bit ints to 12 bits to increase performance and reduce logic, then you have to analyse the code to see if you guard against overflows and underflows. I wonder how that works with intermediate results? e.g. in 10 bit fixed point and the statement x = (i * j) / k;, is the intermediate value of (i*j) 20 bits?
I guess that is where C based verification comes in – far quicker than full HDL level simulation…
Thanks for the link.
I’d argue that the problems are exactly the same, though he context may be different ;) Many software people have seen some aspect of the problem (32 vs. 64 bit integers), but they don’t appreciate the possibilities when things don’t match up to the word sizes they know. Hopefully at least some SW people have taken a CPU implementation or advanced digital logic course and realized what the issues actually are (thank you, CPU architectures101).
> e.g. in 10 bit fixed point and the statement x = (i * j) / k;, is the intermediate value of (i*j) 20 bits?
If they’re following C semantics, it would have to. C promotes all smaller sizes to “int size” before performing operations, and then truncates the result.
@BeagleBoy: I did a test – I set it up to use 18-bit fixed point (2.16 format), and calculate (a*b)/c. I then looked at the generated HDL.
The intermediate of (a*b) is a 36-bit number, (fixed point 4.32 format)
The result of the whole (a*b)/c expression is assigned into an 52-bit number.
To get the final result, the 52-bit number is rounded and truncated.back to 18 bits.
1) Shouldn’t the title be *Programming* Xilinx FPGAs in C for Free
2) Mind blown – I didn’t see that coming. Personally I am going to stick with VHDL for now but it amazes me that the IDE can go from C to a Turing complete machine in one step and without an actual program – just hardware. wow.
In general, most people recommend VHDL for FPGAs, and C style procedural languages for ASIC CPUs.
One can emulate the procedural part in a gate structure, but generally it can cause more problems than it solves.
Thus, the primary reason there are hybrid FPGAs with an ASIC ARM CPU built-in to handle procedural problems efficiently.
For example, Linux on the NIOS soft-CPU allows one to map a part of the FPGA as a kernel module, but there are few circumstances where it makes sense to approach a problem this way.
The lesson can be very expensive for those that mistake unrolled emulated C code motion as good standard practice.
I would not recommend this even if it is “Free”.
Yes, it is absolutely amazing what the future holds for HLS. However, I will personally stay away from it for the mean time due to it’s complexity and ridiculous compile time (might as well learn Verilog). Eventually, the goal is to have generic chips that can be programmed OTG for task specific functions/accelerators.
See also OpenCL. Not so useful for a single FPGA, but a very cool approach if you have some FPGAs hanging off a classic CPU and want to farm out computation to them in the same way that you would to a GPU.
This is uber cool
I learned how to do this a couple years ago thanks to Colin O’Flynn
https://www.youtube.com/watch?v=UNu6Qh3fQGw
It doesn’t make sense to synthesize C into logic – the language is not suited to describing hardware design. Both Verilog and VHDL are relatively easy to learn, if you understand the fundamentals of logic design. And if you are new to digital logic, don’t struggle with a lousy abstraction and obscure tools! Save yourself the frustration.
There are times when it makes perfect sense to convert C into verilog. If I need to implement a DSP routine in hardware I really don’t care if it’s written in C, VHDL, system verilog, or created for me using a wizard. What I do care about is how much effort it took to create, how many hardware resources it took, reliability, and how portable the entire process is. There are times when C is going to be the best tool for the job.
Vivado is an obscure tool?
No, i think he means that in this case, c is obscure.