The usefulness of Raspberry Pis seems almost limitless, with new applications being introduced daily and with no end in sight. But, as versatile as they are, it’s no secret that Raspberry Pis are still lacking in pure processing power. So, some serious optimization is needed to squeeze as much power out of the Raspberry Pi as possible when you’re working on processor-intensive projects.
This simplest way to accomplish this optimization, of course, is to simply reduce what’s running down to the essentials. For example, there’s no sense in running a GUI if your project doesn’t even use a display. Another strategy, however, is to ensure that you’re actually using all of the available processing power that the Raspberry Pi offers. In [sagiz’s] case, that meant using Intel’s open source Threading Building Blocks to achieve better parallelism in his OpenCV project.
As you’re probably guessing, this wasn’t as easy as just typing “apt-get install tbb” into the terminal. That’s because Intel TBB wasn’t available in Raspbian, due to the difficulty of creating a build to run on ARM. But, [sagiz] was able to create a working build, and has made it available on his project page. Using his new build, he was able to increase OpenCV speed by 30%, which is definitely a non-trivial amount!
If you’re looking to get started with OpenCV on the Raspberry Pi, be sure to check out this guide. which will get you of to a grand start.
“…it’s no secret that Raspberry Pis are still lacking in pure processing power…
“…So, some serious optimization is needed to squeeze as much power out of the Raspberry Pi as possible when you’re working on processor-intensive projects…”
Raspberry Pi is NOT lacking in pure processor power. All one must do is program the processor in assembly language. And I’m NOT talking about something that the RPi Organizarion calls assembly language but which requires the Raspbian Operating System in order to work.
uhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh?
What?
@jawnhenry, OK, provide us with links to HOW to program in assembly and REALLY take advantage of these ARM SoC’s speed at the pin level. All I/O is typically crippled speed-wise by the on-die proprietary buses (e.g. AMBA from a historical perspective). Accessing the Real-World Fast is impossible because all the architecture and code is behind proprietary non-disclosures and/or binary blobs. There was/is something about programming close to the “Metal” for RPi, but it was really nonsense in terms of really accessing fast GPIO. I’ve seen some folks prying into PRU’s on ARM SoC’s (on the e.g., Beagle thing), but nothing supported is supported by the manufacturer – it’s all Secret – and you’re still hobbled by the on-die busses! There are some hacks that try to hijack DMA for faster GPIO, but you’re still constrained by the on-die bus I/O in the end. I think ARM and their IP licensees PREVENT users from accessing fast I/O so they can charge more for locked-down IP “Features” like display interfaces. There is great possibility in these cheap mass-produced SoC parts, if only they allow fast GPIO. No, it’s not the same as comparatively expensive programmable logic (e.g., FPGA/CPLD) with an embedded “core” micro-controller, even if the embedded core has a scheduler. That route is too expensive.
Dear lord, his website is unreadable. Grey text on a light grey background, with the occasional dark grey background for code blocks, and even lighter grey text for quotes? Someone needs to talk to him about contrast.
Just click ‘select all’ and read away!
I can read it perfectly fine. Plus the gray on gray looks nice.
As a person with a visual disability (albeit minor), I can tell you that the graphic and publishing standards that I learned about in the ’70s are virtually nonexistent on the web. Sad but true.
I could bore you to tears with the descriptions of arguments I’ve had with developers at an Incredibly Big Multinational Corporation or a west coast software company over design standards for an obsolete OS they were making together. Those with visual disabilities are persona non grata in the nerd community.
This grey on grey or grey on white shit started when Windows Vista was released. Vista’s default (and unchangeable) highlight color is an extremely light and transparent blue. In Vista’s Windows Explorer, when you’ve a folder open in the left pane and a file or folder selected in the right pane, the difference between the highlight on both is next thing to non-existent.
Which one is the active selection? Hit delete and find out! Live dangerously, hit Shift+Delete! Will you kill just one file or an entire folder tree?
In Windows 7, Microsoft made the highlights *very slightly* darker – but still refused to allow their colors to be changed without using a hack to enable 3rd party themes.
Seems as though Microsoft either made sure to allow only people with absolutely perfect vision to test the Aero Glass GUI, or were extremely arrogant in ignoring umpteen thousand complaints from people who had any problem at all dealing with their utterly stupid failure to use a proper highlight schema, or to add any ability for users to alter it to something usable.
It all started to go bad when Apple shredded, burned and peed on the ashes of their human interface guide when they released the horrid QuickTime 4.
Thank you for sharing this. It started me thinking about using TBB executing in a heterogeneous multicore architecture.
I found this(PDF warning)
https://www.codeplay.com/public/uploaded/public/hppc2010_tbb.pdf
And wondered if anyone has tried using this to access the GPUs as well as the shared memory cores.
I think it is time to really look at these heterogenous chips with a little more respect.
Or you could just code like a Unix person should and have all those pipes and sockets feeding through separate threads like people have been doing for decades. Even a well written bash script will do that.
And please lay off the hyperbole about the RPi, because it does have serious limitations that even the cheapest desktop PC motherboard does not.
Don’t mean to hype the pi. I know it can get a little irritating at times. But even on a desktop running UNIX/Linux the socket buffer size is very small, so there isn’t much room for parallelism. And we are talking about different use cases for each(SOC vs Desktop). If you add in the kernel to the equation it makes it hard to optimize. If you need to synchronize the threads…
The approach in this article seems to be worth a try.
(Another) ISEE-3 Reboot Project like! take my money again!
Ahhh, wrong post, sorry
OK, I’ll be that person.
./configure && make && make install now qualifies as a hack? I guess I’m spoiled running a distro without package management so I actually learned more than apt-get.
Yeah, I’m afraid I missed the difficulty in getting TBB to build for the RPi as well. The only thing I could see was the setting of two macros, TBB_USE_GCC_BUILTINS=1 and __TBB_64BIT_ATOMICS=0. Everything else looked like standard operating procedure.
What we need is openCL support for raspberry pi gpu , openCV 3 already has transparent api which will use faster hardware to run code whenever available.
1 Pee is not suitable for anything requiring grunt, it has raw processing power of a _10 year old laptop_ you get for $10 at goodwill (between Pentium M and first core2), the best optimization is replacing it with 5 year old laptop for easy 2-4x speedup ;-)
2 OpenCV is a prototyping library, You dont use that in production, just like you wouldnt use gnuradio for anything other than education, mockups and experimenting. OpenCV is quite famous for copying buffers every step of the way, and a lot of code floating around is full of naive non vectorized implementations (like authors background subtractor).
3 where did the 30% come from in the writeup? :o