Hardware Acceleration in the Cloud

Computers are great at a lot of things. However, general-purpose computers can benefit from help on certain tasks, which is why your video card and sound card both have their own specialized hardware to offload the CPU. If Accelize has its way, some of your hardware acceleration will be done in the cloud. Yes, we know. The cloud is the buzzword of the week and we are tired of hearing about it, too. However, this service is a particularly interesting way to add FPGA power to just about any network-connected CPU.

Currently, there are only four accelerators available, including a hardware-assisted random number generator, a GZIP accelerator, an engine for rapidly searching text, and a BMP to JPEG converter. The company claims, for example, that the search engine can find 2500 entries in the 60 GB Wikipedia archive in 6 minutes. They claim a traditional CPU would take over 16 days to do the same task. The BMP to JPEG converter can process faster than required to feed real-time HD video.

The cloud, in this case, is FPGA resources hosted in the Amazon cloud or in the OVH public cloud. They’ll clearly charge for the service at some point using a “coin” system. However, right now they are letting you sign up with nothing more than an e-mail address and crediting your account with 50,000 coins. Apparently, coins are 1,000 for one dollar.

Being hardware, there are certain limitations. For example, the search engine can’t handle more than 2,500 search terms and each word can’t be wider than 36 characters. That’s pretty generous, though. On the Amazon cloud, the search engine processes 145 MB/second and every 128 MB costs one coin. So for a dollar, you could process about 128 GB of data.

You need an API key to use the service. Presumably, that’s how they know where to deduct the coins. You can find examples of using each service on GitHub using Python. There’s nothing magic about Python, though if you don’t mind getting your hands dirty. The Python API offers simple calls to start the service and transfer files. But anything that can handle a REST API could use the service.

There are two interesting things about the Accelize offering. First, small computers like a Raspberry Pi stand the most to gain from acceleration like this. If it is worth paying for it is impossible to say without understanding the actual costs, but it is still interesting and could open up new possible applications.

The other interesting thing is that Accelize clearly means to create an “app store-like” environment. They are soliciting FPGA developers to create accelerators, make them available, and monetize them. In addition, they are building another store to provide IP cores that developers can use to build accelerators. For example, suppose you placed an FPGA “core” into the developer’s store (they call it QuickStore) to scale video. Someone developing an accelerator to do video can use this core. Users will use the accelerator which will cost a certain amount of coins. Some revenue will go to Accelize, some to the accelerator developer, and some to the video scaler developer.

If this were to take off, that could be a great way to monetize your FPGA skills. The only problem we see is the applicability of these FPGA accelerators. For example, one reason you might use an FPGA is to handle real-time processing. However, having the FPGA in the cloud necessitates a certain amount of overhead and uncertainty of timing and availability. If the overhead is small compared to the processing time, that’s a win. But clearly, there are some FPGA uses that aren’t going to be amenable to the cloud.

If you want to learn more about FPGAs as a prelude to getting rich by providing accelerator functions, you can start with our tutorial. By the way, we usually think of configuring FPGAs with Verilog, VHDL, or something similar. You can do that with the FPGAs in the accelerator, but you can also mix in C code, which isn’t unheard of.

30 thoughts on “Hardware Acceleration in the Cloud

    1. When this future device will make its way into Public and Private Cloud, it is clear that some of the AccelStore accelerators will be ported to this new platform.
      Our goal, at Accelize, is to enable the ecosystem of developers to make their accelerators available on any FPGA on any cloud.

  1. on one hand its a lot of bullshit
    my GPU can already accelerate jpg compression. 8 year old GPU does fullHD JPG encoding at 120 frames per second https://github.com/hoopoe/gpujpeg

    >2500 entries in the 60 GB Wikipedia archive in 6 minutes. They claim a traditional CPU would take over 16 days
    except it would take ~20 hours on my desktop (~2GB/second reading from SSD, grep is obviously faster than disk)

    on the other future looks like this:

    1. Not all Accelerators are created equal, and some are initial demonstrators while others bring clear value.
      The JPEG encoder in AcellStore is just one of these simple demonstrators.
      As for GPUs vs. FPGAs, it is important to recognize that one architecture isn’t superior to the other and that they both hold their values for certain types of workloads.
      FPGAs will start drawing an advantage when the processing is not purely data parallel. Compression like GZIP is one of many examples, but any time a processing chain is made of several different functions, the FPGA will, in many cases, be able to do them in parallel while the GPU will have to do them sequentially (a video transcoding pipeline with decoding, processing and encoding is one example where the FPGA does a pretty darn good job).
      At the end of the day, for most Could Applications Developers it will come down to 1) Availability of a ready to use solution and 2) cost.

      1. >while the GPU will have to do them sequentially (a video transcoding pipeline with decoding, processing and encoding is one example where the FPGA does a pretty darn good job).

        no, GPU doesnt have to do it sequentially, nor CPU for that matter. Its all covered in YT clip above.
        “pretty darn good job” doesnt mean anything when its slower than 10 year old GPU right from the start.

  2. Hmmm… Drop a disposable RTL-SDR dongle and ESPXXXX WiFi module connected to an open public hotspot (or a hijacked one) in an area of interest and walk away. It will vacuum-up all the raw I/Q data while sweeping the RF bands of interest and send it to Accelize in real time where it will sniff out the tasty stuff using accelerated DSP in FPGA, then send the results back to you. The problem is maintaining a chain of anonymity (if not security too) throughout the whole signal chain. Much depends on what anonymous options exist for signing-up and paying for the Accelize service (e.g., crypto-currency?) Of-course the need for anonymity and security in the signal chain is really not necessary if you are a responsible “White-Hat” doing the likes of a legitimate security audit.

    Another application might be accelerating the post-production chain of raw hi-res video/image data from the field. Like a field survey team using drones to capture data. Rather than have the field survey team locally post-process the hi-res raw data (yeah, imaging sitting in a hotel room waiting hour after hour for that to finish), send the raw data back THROUGH Accelize which intelligently post processes it. Now nobody in the survey chain needs to have high-power computing hardware/software, and the field survey team can just dump data and move on quickly to the next location. The bottleneck in this approach is that you need bandwidth to send the raw data back to Accelize from the field, something that’s probably not possible in places where there is poor infrastructure. But for applications where the backhaul bandwidth from the field exists, this sounds like an interesting application. Especially when you figure-in the economics of a pay-as-you go service, and zero hardware ownership/upgrade/maintenance costs.

    In the end, success of a service like this will depend on 1. proper documentation, 2. longevity, and 3. a Community which forms a self-support network. The likes of AWS is so-so good at this. Then there’s Google, who comes up with ideas like this, doesn’t document or support it, then cancels the service out of the blue like a four-year-old child that’s bored with a new toy.

    1. I think a huge factor for success is going to be how much latency it has on real world networks and situations. It’s going to be a bit of a strange niche to need this enough to not just wait on unaccelerated hardware, yet not need it enough to justify using the acceleration hardware locally and skipping the networking bottleneck.

      I frankly have a hard time believing it’ll be worth it compared to just hooking up an fpga. But I’m sure there’s factors and variables I don’t know about/don’t understand adequately. I love to be wrong about these things.

      1. More like, so they can stop asking if unrelated tech can mine better than the existing tech for mining. The answer is: probably not any better than existing technology that has been specifically designed for the process.

        Also, what you want hopefully won’t be happening in our lifetimes. More likely is that the incentive for mining is a lot less lucrative and that should reduce urgency to find alternative methods.

        1. I’ve always wondered about that. Who is making lucrative money? To make any headway against the cost of electricity and recoup the initial investment of the equipment you gotta be incredibly efficient. How many people are actually making serious profit from it?

          I didn’t think anyone except major players had been mining lucrative amounts since the olden days when it didn’t take much power. Unless you have a huge amount of cheap hardware available to you and your own solar infrastructure that’s already paid off. Maybe I’m wrong. But I was under the impression the payout had been less than the electrical bill for a while now. I mean maybe they’re planning on mining and then sitting on it while it goes up in value, but if it costs the same why not save some time and effort and buy it directly?

          Regardless, this tech isn’t going to be able to mine anything. You definitely wouldn’t be making more money mining than the service fees. Otherwise why wouldn’t the company with the hardware be mining for themselves instead of renting it to you? Doesn’t add up.

          1. Not only the electricity (including air conditioning) that is spent by one miner mining for a coin, but all the other miners who will be too late to mine the same coin and be redirected to the next.
            AIUI, everybody out there is mining the “next” coin, so when one “reaches” it, all the other mining being done by others is basically wasted as they have to mine the next “next” coin. If all that electricity being consumed by all the other miners was included in each coin, I think mining would grind to a halt pretty quick.

    1. I mean the cloud is still running on something. What’s a bank of servers networked together if not a supercomputer? I’m sure there’s nuance but when it comes down to it it’s a massive node-based networked system of some form.

      The cloud is just someone else’s computer. Gotta keep that in mind.

  3. Uhmm, about these “free”, online (cloud) services.
    While handy for web site usage,
    Let’s not forget what happened with Photobucket.
    View story at Medium.com
    Online services are fine, But…
    Keep your(off line) backups of your data &images folks!
    large capacity thumb drives are fairly low cost now
    and don’t change “terms of service” Or go insolvent and disappear with your work.

    1. my big fear is this will happen to YouTube and is my biggest gripe with people hosting technical build info in vlogs rather than good ol’ web pages. YouTube has became a wealth of knowledge and some corporate greed is all it needs to disappear.

      1. Kind of funny one says that, when a lot of the videos on YouTube are from corporations in one form or another. YouTube is a rather cheap way of storing promotional material.

  4. 4 years ago I had students grepping wikipedia in about 4 to 5 minutes. Hadoop. :-) To compare Apples to Apples, since this “hardware” solution is an in-memory problem, let’s try it with Apache Spark. :-)

    1. You are absolutely right.
      The 60GB Find & Replace example is not here to be listed in the Guiness book, but to give perspective into how fast a single FPGA instance can run compared to a single CPU instance.
      One can surely use Apache Spark and easily beat that 6mn figure, but with how many CPU instances? and at what cost ($ and power)?
      Honestly, there aren’t that many problems that cannot be address by throwing more compute instances at it.
      The question is: Can it be done cheaper and consume a lot less power with the right FPGA accelerator?
      For many workloads out there, the answer is Yes!

      1. Power doesn’t matter to anyone using cloud FPGAs unless they’re already using some local resource for the job. I’m at a company working on golang to verilog to cloud FPGAs and what we’ve found is that the factors that matter are speed, cost and ease of use. It seems like you’ve nailed those from an end user’s perspective but a contributor might have a hard time.

      2. >The 60GB Find & Replace example is not here to be listed in the Guiness book, but to give perspective into how fast a single FPGA instance can run compared to a single CPU instance

        No, its listed to misled potential clients. After all its not “compared to a single CPU instance” but compared to single threaded SED running naive read everything over and over from ancient HDD loop, is it not?
        Its only technically not a lie.

  5. It is important to understand that the target users for such Accelerators are primarily application developers who already have their application running at one of these Cloud Services Providers (AWS and OVH in this current case). For them getting data to and from the FPGA instances is much faster than for somebody who would run his/her application locally.

    So the question is not: Should I move some of my local workload processing to FPGAs in the cloud? but rather: Should I move some of my cloud workload processing to an FPGA instance?

    And for those who operate workload processing locally, the AccelStore accelerators also run on FPGA boards that can be deployed on-premise …

    1. True Random Number Generator is complex to handle on ANY electronic device (CPU/GPU/FPGA). From a software point of view, the rand() function seems magic and but it’s tricky to handle. Some research group shows that It’s possible to hack and predict the next random value.
      Random number generator is useless in this example design, it’s just a demonstator that generate random data at high speed that are “nist” compliant (https://csrc.nist.gov/projects/random-bit-generation/documentation-and-software). This basic hardware function is mandatory in high-end accelerator you want to design in a FPGA (ie: cryptography: aes,ssl,tls,etc)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s