chipKIT Uno32: first impressions and benchmarks
posted May 27th 2011 9:01am by Phil Burgessfiled under: arduino hacks, Featured, Microcontrollers, reviews

Following Maker Faire, we’ve had a few days to poke around with Digilent’s 32-bit Arduino-compatible chipKIT boards and compiler. We have some initial performance figures to report, along with impressions of the hardware and software.
Disclaimer: Digilent has provided Hack a Day with Uno32 and Max32 boards for evaluation.
chipKIT isn’t the first attempt to extend the Arduino form factor to a 32-bit microcontroller core…other products such as Maple, Netduino or the FEZ Domino have been around for well over a year…but the chipKIT boards are notable for the effort Digilent has put into creating a seamless transition. The aim is to create a single unified tool both for traditional 8-bit Arduino boards and Digilent’s 32-bit work-alikes, where the same IDE, the same code, and a good number of the same shields can all work despite the different underlying architectures. In fact, they’re hoping the Arduino project accepts their integration method as an official means of adding new hardware to the Arduino IDE — not just for their own product, but for anyone else to use as well.
As noted in our prior report, we were impressed that they do appear to deliver on this promise. The transition between “classic” Arduinos and the 32-bit boards is indeed quite slick. But we’re finding at this early stage that there are still some rough bits to be worked out. So, for the time being, we’re keeping both the Arduino IDE and Mpide (Digilent’s multi-platform derivative) installed on the development system; the latter has not yet obviated the need for the former. But we see how the concept is supposed to work, and we like it.
For the most part, Mpide works as intended as a dual-platform IDE. Just select the appropriate device from the Tools->Board menu, recompile, and the code is now ready for the corresponding chip. But a couple things have bit us in the rear:
- The AVR compiler in Mpide either isn’t fully optimizing, or the floating-point libraries were built sans optimization or something. This threw off our benchmark numbers initially — the results were atrocious! In order to keep the numbers realistic, we’re using the standard Arduino IDE for the corresponding benchmarks. To be fair, they did warn us about this performance issue in person at Maker Faire, but until it’s fixed they could be more forthcoming about it with some documentation or on the web site…otherwise it could look like they’re trying to skew benchmarks more in their favor.
- The String() constructor is borked when handling integers. The following line compiles fine for AVR chips, but throws a tizzy fit with the PIC32 compiler:
String foo = String(42);
Given that the IDE was wrapped up literally hours before going live online and at Maker Faire, it’s understandable that there are some loose ends. Just be prepared as an early adopter that this won’t be as pain-free a transition as they’re aiming for. The great thing with open source is that we can get in there, spot such problems, and offer suggestions and submit fixes…the situation will no doubt improve with time.
Some Benchmarks
We wanted to create a fractal demo similar to what they were displaying at Maker Faire. We didn’t have the spiffy SparkFun Color LCD Shield on hand, so instead we had to settle for a serial LCD, 4D Systems’ uLCD-144. This does affect the numbers somewhat, as we’ll see.
In MIPS alone, the chipKIT should beat the Arduino by a factor of five. Then there’s the native 32-bit-ness of it: when dealing with larger numbers, the AVR processor at Arduino’s core has to shift and fiddle bits between consecutive 8-bit values in order to achieve 32-bit results. So the PIC32 should show a considerable performance benefit beyond MIPS alone. In practice, this doesn’t always pan out.
The uLCD-144 is a 128 by 128 pixel 16-bit color LCD with a serial UART interface running at 115,200 bits per second. The graphics commands aren’t terribly efficient, and it’s necessary to send a five byte packet for every pixel drawn. This includes coordinate data; there’s no block write function in serial mode. On the plus side, it’s easy to talk to using the Arduino or chipKIT’s native serial UART.
Here’s the code for the Mandelbrot sketch, using floating-point math:
/* Simple Mandelbrot set renderer for Arduino vs. chipKIT benchmarking
w/floating-point math, via www.hackaday.com. This example uses the
4D Systems uLCD-144(SGC) serial display module, wired as follows:
uLCD Pin: RES GND RX TX VIN
Arduino Pin: 2 GND 1 0 5V */
const int
pixelWidth = 128, // LCD dimensions
pixelHeight = 128,
iterations = 255; // Fractal iteration limit or 'dwell'
const float
centerReal = -0.6, // Image center point in complex plane
centerImag = 0.0,
rangeReal = 3.0, // Image coverage in complex plane
rangeImag = 3.0,
startReal = centerReal - rangeReal * 0.5,
startImag = centerImag + rangeImag * 0.5,
incReal = rangeReal / (float)pixelWidth,
incImag = rangeImag / (float)pixelHeight;
void setup()
{
pinMode(13,OUTPUT); // Arduino status LED
pinMode(2,OUTPUT); // LCD reset pin
digitalWrite(13,LOW); // LED off
Serial.begin(115200);
digitalWrite(2,LOW); // Reset LCD
delay(10);
digitalWrite(2,HIGH);
delay(2000); // Allow time for reset to complete
Serial.write(0x55); // Issue auto-baud command
while(Serial.read() != 0x06); // Wait for ACK
}
void loop()
{
unsigned char cmd[20]; // Serial packet for LCD commands
int x,y,n;
float a,b,a2,b2,posReal,posImag;
long startTime,elapsedTime;
Serial.write(0x45); // Clear screen
delay(100); // Brief pause, else 1st few pixels are lost
cmd[0] = 0x50; // 'Pixel' command is issued repeatedly
digitalWrite(13,HIGH); // LED on while rendering
startTime = millis();
posImag = startImag;
for(y = 0; y < pixelHeight; y++) {
cmd[2] = y; // Y coordinate of pixel
posReal = startReal;
for(x = 0; x < pixelWidth; x++) {
a = posReal;
b = posImag;
for(n = iterations; n > 0 ; n--) {
a2 = a * a;
b2 = b * b;
if((a2 + b2) >= 4.0) break;
b = posImag + a * b * 2.0;
a = posReal + a2 - b2;
}
cmd[1] = x; // X coordinate of pixel
cmd[3] = n * 29; // Pixel color MSB
cmd[4] = n * 67; // Pixel color LSB
Serial.write(cmd,5); // Issue LCD command
posReal += incReal;
}
posImag -= incImag;
}
elapsedTime = millis() - startTime;
digitalWrite(13,LOW); // LED off when done
// Set text to opaque mode
cmd[0] = 0x4f;
cmd[1] = 0x01;
Serial.write(cmd,2);
// Seems the chipKIT libs don't yet handle the String(long)
// constructor, hence this kludge. Working backward, convert
// each digit of elapsed time to a char, with " ms" at end
// and text command at head. Length is variable, so issue
// command from final determined head position.
cmd[19] = 0;
cmd[18] = 's';
cmd[17] = 'm';
cmd[16] = ' ';
n = 15;
do {
cmd[n--] = '0' + elapsedTime % 10;
elapsedTime /= 10;
} while(elapsedTime);
cmd[n--] = 0xff; // Color LSB
cmd[n--] = 0xff; // Color MSB
cmd[n--] = 0; // Use 5x7 font
cmd[n--] = 0; // Row
cmd[n--] = 0; // Column
cmd[n] = 0x73; // ASCII text command
Serial.write(&cmd[n],20-n);
delay(5000); // Stall a few seconds, then repeat
}
And the timing results, in milliseconds, for the Arduino (top) and chipKIT (bottom):

Arduino: 54,329 ms.
chipKIT: 12,417 ms.
To reiterate (pardon the pun), due to some performance issues we used the traditional Arduino compiler, not the one included in Mpide. If you’re curious, the output from that compiler took about 8.5 minutes to complete the task! Oof.
So, about a 4.4x speedup. Not bad, but we were expecting a more dramatic difference. Part of this is due to the inherent bottleneck of the serial communication with the LCD…we’ll get back to that in a moment. Another limiting factor is that both chips are emulating floating-point math. If we can use 32-bit integer data types, thePIC32 should really shine. So, a fixed-point Mandelbrot generator followed:
/* Simple Mandelbrot set renderer for Arduino vs. chipKIT benchmarking
w/fixed-point math, via www.hackaday.com. This example uses the
4D Systems uLCD-144(SGC) serial display module, wired as follows:
uLCD Pin: RES GND RX TX VIN
Arduino Pin: 2 GND 1 0 5V */
const int
bits = 12, // Fractional resolution
pixelWidth = 128, // LCD dimensions
pixelHeight = 128,
iterations = 255; // Fractal iteration limit or 'dwell'
const float
centerReal = -0.6, // Image center point in complex plane
centerImag = 0.0,
rangeReal = 3.0, // Image coverage in complex plane
rangeImag = 3.0;
const long
startReal = (long)((centerReal - rangeReal * 0.5) * (float)(1 << bits)),
startImag = (long)((centerImag + rangeImag * 0.5) * (float)(1 << bits)),
incReal = (long)((rangeReal / (float)pixelWidth) * (float)(1 << bits)),
incImag = (long)((rangeImag / (float)pixelHeight) * (float)(1 << bits));
void setup()
{
pinMode(13,OUTPUT); // Arduino status LED
pinMode(2,OUTPUT); // LCD reset pin
digitalWrite(13,LOW); // LED off
Serial.begin(115200);
digitalWrite(2,LOW); // Reset LCD
delay(10);
digitalWrite(2,HIGH);
delay(2000); // Allow time for reset to complete
Serial.write(0x55); // Issue auto-baud command
while(Serial.read() != 0x06); // Wait for ACK
}
void loop()
{
unsigned char cmd[20]; // Serial packet for LCD commands
int x,y,n;
long a,b,a2,b2,posReal,posImag,startTime,elapsedTime;
Serial.write(0x45); // Clear screen
delay(100); // Brief pause, else 1st few pixels are lost
cmd[0] = 0x50; // 'Pixel' command is issued repeatedly
digitalWrite(13,HIGH); // LED on while rendering
startTime = millis();
posImag = startImag;
for(y = 0; y < pixelHeight; y++) {
cmd[2] = y; // Y coordinate of pixel
posReal = startReal;
for(x = 0; x < pixelWidth; x++) {
a = posReal;
b = posImag;
for(n = iterations; n > 0 ; n--) {
a2 = (a * a) >> bits;
b2 = (b * b) >> bits;
if((a2 + b2) >= (4 << bits)) break;
b = posImag + ((a * b) >> (bits - 1));
a = posReal + a2 - b2;
}
cmd[1] = x; // X coordinate of pixel
cmd[3] = n * 29; // Pixel color MSB
cmd[4] = n * 67; // Pixel color LSB
Serial.write(cmd,5); // Issue LCD command
posReal += incReal;
}
posImag -= incImag;
}
elapsedTime = millis() - startTime;
digitalWrite(13,LOW); // LED off when done
// Set text to opaque mode
cmd[0] = 0x4f;
cmd[1] = 0x01;
Serial.write(cmd,2);
// Seems the chipKIT libs don't yet handle the String(long)
// constructor, hence this kludge. Working backward, convert
// each digit of elapsed time to a char, with " ms" at end
// and text command at head. Length is variable, so issue
// command from final determined head position.
cmd[19] = 0;
cmd[18] = 's';
cmd[17] = 'm';
cmd[16] = ' ';
n = 15;
do {
cmd[n--] = '0' + elapsedTime % 10;
elapsedTime /= 10;
} while(elapsedTime);
cmd[n--] = 0xff; // Color LSB
cmd[n--] = 0xff; // Color MSB
cmd[n--] = 0; // Use 5x7 font
cmd[n--] = 0; // Row
cmd[n--] = 0; // Column
cmd[n] = 0x73; // ASCII text command
Serial.write(&cmd[n],20-n);
delay(5000); // Stall a few seconds, then repeat
}
And the numbers:

Arduino: 27,734 ms.
chipKIT: 7,209 ms.
Now only a 3.8x difference, despite the PIC32 speaking its native tongue. What gives?
Even at 115,200 bits/sec, the serial LCD is seriously holding us back, as the code is going to “block” as each character is output. Some back-of-envelope calculations suggest how much time is being lost there:
128 x 128 pixels, 5-byte command per pixel = 81,920 bytes.
Including start and stop bits for each byte = 819,200 bits total
819,200 bits / 115,200 bps = ~7.1 seconds.
So our MCU is sitting there for seven seconds with its thumb up its ASCII in order to update the display. Sure enough, if we comment out the Serial.write() command but leave all the calculations in place, the results are significantly more dramatic:
Floating-point:
Arduino: 49,685 ms.
chipKIT: 5,822 ms.
9.3x improvement.
Fixed-point:
Arduino: 22,326 ms.
chipKIT: 168 ms
133x improvement. Hot damn. Now we’re talking!
So we could actually render this at interactive frame rates, for the want of a sufficiently fast interface to the LCD. This sort of limitation is going to crop up every time we connect to a real-world device. Not everything is 100% internal code and math…there are finite limits to I/O throughput, and that more than anything can cap the speed of the total application. So we really can’t give a consistent “Everything will be X percent faster” estimate for this board.
The performance looks good for math, especially if an algorithm can work in integer or fixed-point formats. Another thought we had was analog-to-digital sampling, which has applications in robotics…say for a line-follower or balancing robot. More frequent samples should yield smoother operation, or multiple samples can be averaged to yield higher-precision results. The PIC32 should scream in that regard. And yet…
void setup()
{
const int samples = 10000;
int i,n;
long startTime,elapsedTime;
Serial.begin(115200);
startTime = millis();
for(i = 0; i < samples; i++) {
n = analogRead(0);
}
elapsedTime = millis() - startTime;
Serial.print(samples);
Serial.print(" samples in ");
Serial.print(elapsedTime);
Serial.print(" ms = ");
Serial.print(((float)samples * 1000.0) / (float)elapsedTime);
Serial.println(" samples/sec");
}
void loop()
{
}
Arduino: 10000 samples in 1119 ms = 8936.55 samples/sec
chipKIT: 10000 samples in 1008 ms = 9920.63 samples/sec
Running full-tilt, the PIC32 is capable of up to 1 million ADC samples per second, compared to 125,000 on the Atmel chip. Certainly the library implementation is going to introduce some overhead, but what gives? Rooting through the library source code turns up this gem in wiring_analog.c:
//* A delay is needed for the the ADC start up time //* this value started out at 1 millisecond, I dont know how long it needs to be //* 99 uSecs will give us the same approximate sampling rate as the AVR chip // delay(1); delayMicroseconds(99);
This raises a couple of red flags. First, why should the sampling rate aim to match the AVR? For time-related functions like delay() and for Serial.begin() bitrates, of course we’d want similar numbers, those relate to temporal increments. But we don’t — or at least shouldn’t — measure time with ADC readings. And secondly, well, why not find out how long the ADC startup time really needs to be? A few minutes’ sifting through Microchip datasheets eventually turned up the correct answer: two microseconds. So, changing the line in wiring_analog.c to:
delayMicroseconds(2);
Yields dramatically different results:
chipKIT: 10000 samples in 101 ms = 99009.90 samples/sec
About a tenfold improvement, and the readings still look valid. This does break like-timing compatibility with the AVR-based Arduinos, but as we said, why? It’s understandable that some decisions may have been made in haste…it’s a monumental project, getting all this code ported to an entirely different chip, and the IDE is still fresh from the oven…but some of these little broken details do have us concerned about what other surprises may still lurk beneath.
Don’t get us wrong…we’re enthusiastic about the chipKIT boards. The technical challenge is met, and just needs some cleaning up. What remains for Digilent now is a marketing challenge: who is this really for? When we talk about things like megasamples and fixed-point algorithms, these aren’t exactly day-one topics familiar to the Arduino’s target audience of first-time programmers. And the more advanced user may have moved on already, leaving Arduino behind. So why keep this form factor? Why keep this IDE?
Obviously, part of the allure is the existing ecosystem of Arduino shields. There’s some pretty nifty stuff out there, networking and touch screens and stepper motor drivers, most of which will physically plug right in. Having an existing solution saves development time. Then there’s the ease and familiarity of the Arduino libraries. Even though they’re slow and clunky in places, it can be really handy sometimes just to squirt out some status information to a serial port without having to do all the UART setup manually.
The chipKIT boards are cleverly priced to approximate Arduino on a cost basis (even undercutting a bit). That’s a great start, with code and price parity, but where’s the extra value? What the Uno32 and Max32 may need are some killer apps. Ideas that the novice can implement, but that really take advantage of the PIC32 chip’s added performance and capabilities. Speed may be just one part of that. What can we do with the extra RAM and flash space that a normal Arduino just can’t handle, even with the fanciest of shields? Folks have done some mind-blowing stuff with the little 8-bit AVR. We’re looking forward to seeing if this is the tool that takes these hacks to the next level.






“To reiterate (pardon the pun)…”
“…with its thumb up its ASCII…”
Great, smart puns! More from this writer!