Hardware Acceleration In The Cloud

May 5, 2018

Computers are great at a lot of things. However, general-purpose computers can benefit from help on certain tasks, which is why your video card and sound card both have their own specialized hardware to offload the CPU. If Accelize has its way, some of your hardware acceleration will be done in the cloud. Yes, we know. The cloud is the buzzword of the week and we are tired of hearing about it, too. However, this service is a particularly interesting way to add FPGA power to just about any network-connected CPU.

Currently, there are only four accelerators available, including a hardware-assisted random number generator, a GZIP accelerator, an engine for rapidly searching text, and a BMP to JPEG converter. The company claims, for example, that the search engine can find 2500 entries in the 60 GB Wikipedia archive in 6 minutes. They claim a traditional CPU would take over 16 days to do the same task. The BMP to JPEG converter can process faster than required to feed real-time HD video.

The cloud, in this case, is FPGA resources hosted in the Amazon cloud or in the OVH public cloud. They’ll clearly charge for the service at some point using a “coin” system. However, right now they are letting you sign up with nothing more than an e-mail address and crediting your account with 50,000 coins. Apparently, coins are 1,000 for one dollar.

Being hardware, there are certain limitations. For example, the search engine can’t handle more than 2,500 search terms and each word can’t be wider than 36 characters. That’s pretty generous, though. On the Amazon cloud, the search engine processes 145 MB/second and every 128 MB costs one coin. So for a dollar, you could process about 128 GB of data.

You need an API key to use the service. Presumably, that’s how they know where to deduct the coins. You can find examples of using each service on GitHub using Python. There’s nothing magic about Python, though if you don’t mind getting your hands dirty. The Python API offers simple calls to start the service and transfer files. But anything that can handle a REST API could use the service.

There are two interesting things about the Accelize offering. First, small computers like a Raspberry Pi stand the most to gain from acceleration like this. If it is worth paying for it is impossible to say without understanding the actual costs, but it is still interesting and could open up new possible applications.

The other interesting thing is that Accelize clearly means to create an “app store-like” environment. They are soliciting FPGA developers to create accelerators, make them available, and monetize them. In addition, they are building another store to provide IP cores that developers can use to build accelerators. For example, suppose you placed an FPGA “core” into the developer’s store (they call it QuickStore) to scale video. Someone developing an accelerator to do video can use this core. Users will use the accelerator which will cost a certain amount of coins. Some revenue will go to Accelize, some to the accelerator developer, and some to the video scaler developer.

If this were to take off, that could be a great way to monetize your FPGA skills. The only problem we see is the applicability of these FPGA accelerators. For example, one reason you might use an FPGA is to handle real-time processing. However, having the FPGA in the cloud necessitates a certain amount of overhead and uncertainty of timing and availability. If the overhead is small compared to the processing time, that’s a win. But clearly, there are some FPGA uses that aren’t going to be amenable to the cloud.

If you want to learn more about FPGAs as a prelude to getting rich by providing accelerator functions, you can start with our tutorial. By the way, we usually think of configuring FPGAs with Verilog, VHDL, or something similar. You can do that with the FPGAs in the accelerator, but you can also mix in C code, which isn’t unheard of.

30 thoughts on “Hardware Acceleration In The Cloud”

Ostracus says:

May 5, 2018 at 7:30 am

Be nice if Project Everest was available this way, since it’s already aiming for the cloud.

https://www.anandtech.com/show/12509/xilinx-announces-project-everest-fpga-soc-hybrid

Report comment

Reply
1. Stephane (Accelize) says:
  
  May 7, 2018 at 1:03 am
  
  When this future device will make its way into Public and Private Cloud, it is clear that some of the AccelStore accelerators will be ported to this new platform.
  Our goal, at Accelize, is to enable the ecosystem of developers to make their accelerators available on any FPGA on any cloud.
  
  Report comment
  
  Reply
Olsen says:

May 5, 2018 at 9:10 am

Yay! Download more RAM!

Report comment

Reply
rasz_pl says:

May 5, 2018 at 9:23 am

on one hand its a lot of bullshit
my GPU can already accelerate jpg compression. 8 year old GPU does fullHD JPG encoding at 120 frames per second https://github.com/hoopoe/gpujpeg

>2500 entries in the 60 GB Wikipedia archive in 6 minutes. They claim a traditional CPU would take over 16 days
except it would take ~20 hours on my desktop (~2GB/second reading from SSD, grep is obviously faster than disk)

on the other future looks like this:

https://www.youtube.com/watch?v=O9qqSZAny3I

Report comment

Reply
1. Stephane (Accelize) says:
  
  May 7, 2018 at 1:22 am
  
  Not all Accelerators are created equal, and some are initial demonstrators while others bring clear value.
  The JPEG encoder in AcellStore is just one of these simple demonstrators.
  As for GPUs vs. FPGAs, it is important to recognize that one architecture isn’t superior to the other and that they both hold their values for certain types of workloads.
  FPGAs will start drawing an advantage when the processing is not purely data parallel. Compression like GZIP is one of many examples, but any time a processing chain is made of several different functions, the FPGA will, in many cases, be able to do them in parallel while the GPU will have to do them sequentially (a video transcoding pipeline with decoding, processing and encoding is one example where the FPGA does a pretty darn good job).
  At the end of the day, for most Could Applications Developers it will come down to 1) Availability of a ready to use solution and 2) cost.
  
  Report comment
  
  Reply
  1. rasz_pl says:
    
    May 7, 2018 at 8:42 am
    
    >while the GPU will have to do them sequentially (a video transcoding pipeline with decoding, processing and encoding is one example where the FPGA does a pretty darn good job).
    
    no, GPU doesnt have to do it sequentially, nor CPU for that matter. Its all covered in YT clip above.
    “pretty darn good job” doesnt mean anything when its slower than 10 year old GPU right from the start.
    
    Report comment
    
    Reply
Drone says:

May 5, 2018 at 12:09 pm

Hmmm… Drop a disposable RTL-SDR dongle and ESPXXXX WiFi module connected to an open public hotspot (or a hijacked one) in an area of interest and walk away. It will vacuum-up all the raw I/Q data while sweeping the RF bands of interest and send it to Accelize in real time where it will sniff out the tasty stuff using accelerated DSP in FPGA, then send the results back to you. The problem is maintaining a chain of anonymity (if not security too) throughout the whole signal chain. Much depends on what anonymous options exist for signing-up and paying for the Accelize service (e.g., crypto-currency?) Of-course the need for anonymity and security in the signal chain is really not necessary if you are a responsible “White-Hat” doing the likes of a legitimate security audit.

Another application might be accelerating the post-production chain of raw hi-res video/image data from the field. Like a field survey team using drones to capture data. Rather than have the field survey team locally post-process the hi-res raw data (yeah, imaging sitting in a hotel room waiting hour after hour for that to finish), send the raw data back THROUGH Accelize which intelligently post processes it. Now nobody in the survey chain needs to have high-power computing hardware/software, and the field survey team can just dump data and move on quickly to the next location. The bottleneck in this approach is that you need bandwidth to send the raw data back to Accelize from the field, something that’s probably not possible in places where there is poor infrastructure. But for applications where the backhaul bandwidth from the field exists, this sounds like an interesting application. Especially when you figure-in the economics of a pay-as-you go service, and zero hardware ownership/upgrade/maintenance costs.

In the end, success of a service like this will depend on 1. proper documentation, 2. longevity, and 3. a Community which forms a self-support network. The likes of AWS is so-so good at this. Then there’s Google, who comes up with ideas like this, doesn’t document or support it, then cancels the service out of the blue like a four-year-old child that’s bored with a new toy.

Report comment

Reply
1. TGT says:
  
  May 5, 2018 at 4:33 pm
  
  I think a huge factor for success is going to be how much latency it has on real world networks and situations. It’s going to be a bit of a strange niche to need this enough to not just wait on unaccelerated hardware, yet not need it enough to justify using the acceleration hardware locally and skipping the networking bottleneck.
  
  I frankly have a hard time believing it’ll be worth it compared to just hooking up an fpga. But I’m sure there’s factors and variables I don’t know about/don’t understand adequately. I love to be wrong about these things.
  
  Report comment
  
  Reply
paul says:

May 5, 2018 at 12:46 pm

Is this thing usefull for mining bitcoins?

Report comment

Reply
1. TGT says:
  
  May 5, 2018 at 4:35 pm
  
  I can’t wait until Bitcoin mining is finally completely untenable so people stop wasting perfectly good tech on it.
  
  Report comment
  
  Reply
  1. dudeguy says:
    
    May 5, 2018 at 10:42 pm
    
    More like, so they can stop asking if unrelated tech can mine better than the existing tech for mining. The answer is: probably not any better than existing technology that has been specifically designed for the process.
    
    Also, what you want hopefully won’t be happening in our lifetimes. More likely is that the incentive for mining is a lot less lucrative and that should reduce urgency to find alternative methods.
    
    Report comment
    
    Reply
    1. TGT says:
      
      May 5, 2018 at 11:27 pm
      
      I’ve always wondered about that. Who is making lucrative money? To make any headway against the cost of electricity and recoup the initial investment of the equipment you gotta be incredibly efficient. How many people are actually making serious profit from it?
      
      I didn’t think anyone except major players had been mining lucrative amounts since the olden days when it didn’t take much power. Unless you have a huge amount of cheap hardware available to you and your own solar infrastructure that’s already paid off. Maybe I’m wrong. But I was under the impression the payout had been less than the electrical bill for a while now. I mean maybe they’re planning on mining and then sitting on it while it goes up in value, but if it costs the same why not save some time and effort and buy it directly?
      
      Regardless, this tech isn’t going to be able to mine anything. You definitely wouldn’t be making more money mining than the service fees. Otherwise why wouldn’t the company with the hardware be mining for themselves instead of renting it to you? Doesn’t add up.
      
      Report comment
      
      Reply
      1. Ren says:
        
        May 7, 2018 at 10:05 am
        
        Not only the electricity (including air conditioning) that is spent by one miner mining for a coin, but all the other miners who will be too late to mine the same coin and be redirected to the next.
        AIUI, everybody out there is mining the “next” coin, so when one “reaches” it, all the other mining being done by others is basically wasted as they have to mine the next “next” coin. If all that electricity being consumed by all the other miners was included in each coin, I think mining would grind to a halt pretty quick.
        
        Report comment
  2. tomás zerolo says:
    
    May 5, 2018 at 11:45 pm
    
    Look at the financial markets to get an impression on how much bullshit we humans are ready to put up with, just for the promise of “GETTING RICH”.
    
    Report comment
    
    Reply
kryptylomese says:

May 5, 2018 at 2:32 pm

Do we need supercomputers anymore? The Top 500 supercomputer vs Amazon Lambda (based on cost and performance) – does it come anywhere close?

Report comment

Reply
1. Ostracus says:
  
  May 5, 2018 at 3:20 pm
  
  Kind of like asking do we need CPUs anymore since we have GPUs.
  
  Report comment
  
  Reply
2. TGT says:
  
  May 5, 2018 at 11:31 pm
  
  I mean the cloud is still running on something. What’s a bank of servers networked together if not a supercomputer? I’m sure there’s nuance but when it comes down to it it’s a massive node-based networked system of some form.
  
  The cloud is just someone else’s computer. Gotta keep that in mind.
  
  Report comment
  
  Reply
  1. Robert Mateja says:
    
    May 6, 2018 at 1:37 am
    
    Latency. I’ts all about how fast can you shovel bits from one corner to another.
    
    Report comment
    
    Reply
Adobe/Flash hater says:

May 5, 2018 at 4:57 pm

Uhmm, about these “free”, online (cloud) services.
While handy for web site usage,
Let’s not forget what happened with Photobucket.
https://medium.com/@AxelApp/how-photobucket-broke-the-internet-and-why-you-should-care-4a244bda6b7e
Online services are fine, But…
Keep your(off line) backups of your data &images folks!
large capacity thumb drives are fairly low cost now
and don’t change “terms of service” Or go insolvent and disappear with your work.

Report comment

Reply
1. Denis says:
  
  May 6, 2018 at 12:34 am
  
  my big fear is this will happen to YouTube and is my biggest gripe with people hosting technical build info in vlogs rather than good ol’ web pages. YouTube has became a wealth of knowledge and some corporate greed is all it needs to disappear.
  
  Report comment
  
  Reply
  1. Ostracus says:
    
    May 6, 2018 at 6:24 am
    
    Kind of funny one says that, when a lot of the videos on YouTube are from corporations in one form or another. YouTube is a rather cheap way of storing promotional material.
    
    Report comment
    
    Reply
dudeguy says:

May 5, 2018 at 10:37 pm

We should resist these services, to keep PCs personal. However, I do see corporate sustaining the service.

Report comment

Reply
robert says:

May 6, 2018 at 3:23 am

Hooray, another opaque & proprietary single-vendor blob to make your systems depend on.

I thought we’d learned these lessons.

Report comment

Reply
eriklscott says:

May 6, 2018 at 10:49 am

4 years ago I had students grepping wikipedia in about 4 to 5 minutes. Hadoop. :-) To compare Apples to Apples, since this “hardware” solution is an in-memory problem, let’s try it with Apache Spark. :-)

Report comment

Reply
1. Stephane (Accelize) says:
  
  May 7, 2018 at 1:38 am
  
  You are absolutely right.
  The 60GB Find & Replace example is not here to be listed in the Guiness book, but to give perspective into how fast a single FPGA instance can run compared to a single CPU instance.
  One can surely use Apache Spark and easily beat that 6mn figure, but with how many CPU instances? and at what cost ($ and power)?
  Honestly, there aren’t that many problems that cannot be address by throwing more compute instances at it.
  The question is: Can it be done cheaper and consume a lot less power with the right FPGA accelerator?
  For many workloads out there, the answer is Yes!
  
  Report comment
  
  Reply
  1. CampGareth says:
    
    May 7, 2018 at 6:23 am
    
    Power doesn’t matter to anyone using cloud FPGAs unless they’re already using some local resource for the job. I’m at a company working on golang to verilog to cloud FPGAs and what we’ve found is that the factors that matter are speed, cost and ease of use. It seems like you’ve nailed those from an end user’s perspective but a contributor might have a hard time.
    
    Report comment
    
    Reply
  2. rasz_pl says:
    
    May 7, 2018 at 8:49 am
    
    >The 60GB Find & Replace example is not here to be listed in the Guiness book, but to give perspective into how fast a single FPGA instance can run compared to a single CPU instance
    
    No, its listed to misled potential clients. After all its not “compared to a single CPU instance” but compared to single threaded SED running naive read everything over and over from ancient HDD loop, is it not?
    Its only technically not a lie.
    
    Report comment
    
    Reply
Stephane (Accelize) says:

May 7, 2018 at 3:46 am

It is important to understand that the target users for such Accelerators are primarily application developers who already have their application running at one of these Cloud Services Providers (AWS and OVH in this current case). For them getting data to and from the FPGA instances is much faster than for somebody who would run his/her application locally.

So the question is not: Should I move some of my local workload processing to FPGAs in the cloud? but rather: Should I move some of my cloud workload processing to an FPGA instance?

And for those who operate workload processing locally, the AccelStore accelerators also run on FPGA boards that can be deployed on-premise …

Report comment

Reply
luke says:

May 7, 2018 at 7:00 am

is it just me, or are cloud-based random number generators a bloody stupid idea?

Report comment

Reply
1. SBA says:
  
  May 8, 2018 at 12:05 am
  
  True Random Number Generator is complex to handle on ANY electronic device (CPU/GPU/FPGA). From a software point of view, the rand() function seems magic and but it’s tricky to handle. Some research group shows that It’s possible to hack and predict the next random value.
  Random number generator is useless in this example design, it’s just a demonstator that generate random data at high speed that are “nist” compliant (https://csrc.nist.gov/projects/random-bit-generation/documentation-and-software). This basic hardware function is mandatory in high-end accelerator you want to design in a FPGA (ie: cryptography: aes,ssl,tls,etc)
  
  Report comment
  
  Reply

Hackaday

30 thoughts on “Hardware Acceleration In The Cloud”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Dearest C++, Let Me Count The Ways I Love/Hate Thee

Personal Reflections On Immutable Linux

Crunching The News For Fun And Little Profit

The End Of The Hackintosh Is Upon Us

The Hackaday Summer Reading List: No AI Involvement, Guaranteed

Our Columns

Hackaday Links: July 13, 2025

Trickle Down: When Doing Something Silly Actually Makes Sense

Hackaday Podcast Episode 328: Benchies, Beanies, And Back To The Future

This Week In Security: Bitchat, CitrixBleed Part 2, Opossum, And TSAs

Ask Hackaday: Are You Wearing 3D Printed Shoes?

30 thoughts on “Hardware Acceleration In The Cloud”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns