Shmoocon 2016: GPUs And FPGAs To Better Detect Malware

January 17, 2016

One of the big problems in detecting malware is that there are so many different forms of the same malicious code. This problem of polymorphism is what led Rick Wesson to develop icewater, a clustering technique that identifies malware.

Presented at Shmoocon 2016, the icewater project is a new way to process and filter the vast number of samples one finds on the Internet. Processing 300,000 new samples a day to determine if they have polymorphic malware in them is a daunting task. The approach used here is to create a fingerprint from each binary sample by using a space-filling curve. Polymorphism will change a lot of the bits in each sample, but as with human fingerprints, patterns are still present in this binary fingerprints that indicate the sample is a variation on a previously known object.

IPv4 addresses shown in a space-filling curve by xkcd CC-BY-NC

The images you’re seeing above are graphic representations of these fingerprints. Images aren’t actually part of the technique, but by converting each byte value to greyscale it is a good way for humans to understand what the computer is using in its analysis.

Once the fingerprint is made, it’s simple to compare and cluster together samples that are likely the same. The expensive part of this method is running the space-filling curve. It take a lot of time to run this using a CPU. FPGAs are idea, but the hardware is comparatively costly. In its current implementation, GPUs are the best balance of time and expense.

This expense measurement gets really interesting when you start talking Internet-scale problems; needing to constantly processing huge amounts of data. The current GPU method can calculate an object in about 33ms, allowing for a couple hundred thousand samples per day. This is about four orders of magnitude better than CPU methods. But the goal is to transition away form GPUs to leverage the parallel processing found in FPGAs.

Rick’s early testing with Xenon Phi/Altera FPGAs can calculate space-filling curves at a rate of one object every 586µs. This represents a gain of nine orders of magnitude over CPUs but he’s still not satisfied. His goal is to get icewater down to 150µs per object which would allow 10 million samples to be processed in four hours with a power cost of 4000 Watts.

How to do you compare computations on hardware the has a different cost to manufacture and different power budgets? Rick plans to reduce the problem with a measurement he calls InfoJoules. This is an expression of computational decisions versus Watt seconds. 1000 new pieces of information calculated in 1 second on a machine consuming 1000 Watts is 1 InfoJoule. This will make the choice of hardware a bit easier as you can weigh both the cost of acquiring the hardware with the operational cost per new piece of information.

40 thoughts on “Shmoocon 2016: GPUs And FPGAs To Better Detect Malware”

Aidan says:

January 17, 2016 at 1:16 pm

Watts isn’t a unit of energy. Maybe you meant Watt hours? ;-)

Report comment

Reply
1. Aidan says:
  
  January 17, 2016 at 1:59 pm
  
  I have to say this is definitely a cool hack. Sorry for the nitpicking but I just couldn’t help myself.
  
  Report comment
  
  Reply
  1. Elliot Williams says:
    
    January 17, 2016 at 2:27 pm
    
    A Watt is a Joule / second, so you really can think of a Joule as being a Watt*second. (And a Watt hour is 3600 Joules.) In short, you’re right. But so are we. :)
    
    Report comment
    
    Reply
    1. Gerard says:
      
      January 18, 2016 at 5:29 am
      
      No Elliot y’re not. Watt is power Watt.hour is energy Aidan is right!
      
      Report comment
      
      Reply
      1. 0MP says:
        
        January 19, 2016 at 11:20 am
        
        Elliot is correct, did you read the article? Correct dimensions were specified in both cases: “would allow 10 million samples to be processed in four hours with a power cost of 4000 Watts”. Power and time. The second case specifies InfoJoules: “Rick plans to reduce the problem with a measurement he calls InfoJoules. This is an expression of computational decisions versus Watt seconds. 1000 new pieces of information calculated in 1 second on a machine consuming 1000 Watts is 1 InfoJoule.”
        
        Report comment
    2. phreaknik says:
      
      January 18, 2016 at 9:13 am
      
      This is bad logic….
      
      Report comment
      
      Reply
Chris says:

January 17, 2016 at 2:14 pm

The space filling curves remind me of ..cantor.dust..
Could have sworn there was a HAD article about that as well at some point…
I wonder what happend to that?
https://sites.google.com/site/xxcantorxdustxx/
https://www.reddit.com/r/ReverseEngineering/comments/1izity/cantordust_a_binary_visualization_tool

Report comment

Reply
1. Rick Wesson says:
  
  January 17, 2016 at 2:45 pm
  
  Neat stuff, I’ll see if cantor dust can be leveraged. It doesn’t appear to lend itself to the easy translation to locality preservation of data what is what we are leveraging to cluster with.
  
  Report comment
  
  Reply
2. Nico Vijlbrief says:
  
  January 18, 2016 at 1:01 am
  
  I cant seem to find any updates on their program.
  
  Report comment
  
  Reply
  1. Rick Arriskutsua says:
    
    January 18, 2016 at 4:16 am
    
    There is an empty github repository on the developer’s github: https://github.com/xoreaxeaxeax so maybe something will come in a while. You also have Senseye that has this kind of mapping in a complex way – https://github.com/letoram/senseye/wiki
    and tons more @ http://reverseengineering.stackexchange.com/questions/6003/visualizing-elf-binaries
    
    Report comment
    
    Reply
Dan says:

January 17, 2016 at 2:47 pm

Being able to quantify the cost/benefit ratios for different problem solutions is very important. I remember having to write a G-Code database search engine where the results were based on how the end geometry looked. i.e. Return all files that produce an object similar to example X within a similarity distance of D. I got it working well enough for management to find the results spooky but the hardware they gave me to run it on was so shitty that I had to precompute a lot of data each night ready to be used the next day so if a bunch of new files were created, or a batch was imported from another project from an external source, then they would not show up in the search. So in a way it was solving a very similar problem.

Report comment

Reply
M says:

January 17, 2016 at 3:42 pm

Hack a day has hit a new low in basic writing quality. Has everyone abandoned spellcheck now?

Report comment

Reply
1. Rick Wesson says:
  
  January 17, 2016 at 5:03 pm
  
  The first thing I did in my shmoocon talk was note that I am dyslexic and enjoy meeting others that find both reading and writing difficult.
  
  Report comment
  
  Reply
  1. Dan says:
    
    January 17, 2016 at 5:52 pm
    
    Better than suffering from Sphinctus Gigantus disease like M does. Of course SG is not officially recognised as a disease yet, which is surprising given how many people in politics also suffer from Sphinctus Gigantus disease. I guess they are in denial…
    
    Report comment
    
    Reply
    1. Andrei Cociuba says:
      
      January 18, 2016 at 6:07 am
      
      you mean sphincterus minusculis? aka tight-arse?
      
      Report comment
      
      Reply
      1. Dan says:
        
        January 18, 2016 at 12:27 pm
        
        Ah, that too perhaps, but in this case I mean that people who abuse other people because of their language disabilities are giant assholes.
        
        Report comment
bthy says:

January 17, 2016 at 4:07 pm

What we need is a non-intrusive AV software package that plays nice with power-users.

Just yesterday I had to release a new version of a piece of software. I use NSIS (http://nsis.sourceforge.net/) to create an installer, and there’s a tiny bit of code in there to check for and download the .NET framework. Previous release was fine, now suddenly 3 out of 55 AV Scanners (on virustotal.com) identify it as :
– Win.Adware.Agent-59030
– BehavesLike.Win32.Dropper.tc
– suspected of Trojan.Downloader.gen.h

And this is not the first time i had an AV go bonkers over an executable i just compiled in VS, and there are never any details available. Just a single message with a cryptic internal name/id of what it supposedly found. And then there are often only the options to “fix” or quarantine/delete, or completely disable AV altogether.

I don’t believe in passive scanning for patterns anymore, false positives are rampant, and detection is never guaranteed.
It can serve as an immediate stopper only for well-known pieces of code, large and specific enough not to trigger false positives.

And then there’s the intrusiveness of the free AV packages..

I am so fed up with all the current AV publishers that I’m just one excuse away from uninstalling AV completely, even though i believe that’s irresponsible.

for the record i’m running AVG. Probably like many people here, i recommended it to everyone for years. In the beginning, 15 years ago or so, it was imo one of the best and least intrusive options out there, but that changed quickly and now it’s just another EVIL(tm) product.

Report comment

Reply
1. Rollyn01 says:
  
  January 17, 2016 at 9:58 pm
  
  As oddball as that show was, they need to bring it back. If only just to fix the loose plots.
  
  Report comment
  
  Reply
2. daveboltman says:
  
  January 17, 2016 at 10:49 pm
  
  Yes, you are not alone – I’ve had the same problem with my software. I use Innosetup (http://www.jrsoftware.org/isinfo.php) to create my installers, and one guy marked a particular version of Innosetup bad on Virustotal. Suddenly Microsoft Security Essentials was flagging all my installers as bad. Reverting to an older version of Innosetup fixed the problem, although Virustotal and Microsoft have since caught a wake-up.
  
  This guy has had it too, and has some good suggestions: http://www.nirsoft.net/false_positive_report.html
  
  Report comment
  
  Reply
3. MrX says:
  
  January 18, 2016 at 1:16 am
  
  Use a better OS and you won’t need these useless AVs.
  
  Report comment
  
  Reply
  1. Andrei Cociuba says:
    
    January 18, 2016 at 6:09 am
    
    just because an OS is hard to crack, doesnt mean there isnt someone trying to.
    
    an antivirus is a protection against maliccious people, not malicious code, directly.
    
    Report comment
    
    Reply
  2. bthy says:
    
    January 18, 2016 at 10:27 am
    
    yeah, and i’ll force the hundreds of car workshops that use this software to switch as well ?! If you’re not trolling then you’re an idiot.
    
    Report comment
    
    Reply
4. Chris C. says:
  
  January 18, 2016 at 2:37 am
  
  At one point I switched in a short period of time from:
  
  1) Norton (bluescreen crashes), to…
  2) Avira (system slowed down the longer it ran, until you could barely even move the mouse), to…
  3) AVG (excessive false positives and only remembered exclusions until next reboot – which they claimed was by design and couldn’t see how this could be a problem, plus so many pop-ups to buy the full version that it might as well be malware, needless to say I didn’t buy it)
  
  And then just decided to go AV-less while I pondered my next move. That took a while, and by then, I was wondering exactly how long I could go without getting any malware. Of course I had a firewall, and good backups. Plus used a VM for very risky behavior, although I did take a few calculated risks on the real machine.
  
  I terminated the experiment two years later, with ZERO infections. (And just one in a VM.) Power users don’t need an aggressive AV that sucks up resources, and scores a little higher on comparative tests at the expense of vastly increased false positives. Common sense is their main protection. They just need something that provides moderate protection against the occasional exposure, and is designed to be as unobtrusive as possible.
  
  So I’ve been very happy with Microsoft Security Essentials. No, it’s not perfect. Maybe something will slip through eventually. But even if I had to do a full restore once a year, each time losing a few files modified in the last 24 hours, it would *still* be less grief than other AV have caused me.
  
  Report comment
  
  Reply
  1. ganzuul says:
    
    January 18, 2016 at 10:05 am
    
    You might want to give EMET a look: https://support.microsoft.com/en-us/kb/2458544
    
    No idea why it isn’t in Windows by default.
    
    Report comment
    
    Reply
    1. Chris C. says:
      
      January 18, 2016 at 8:54 pm
      
      Thank you! I hadn’t seen EMET before.
      
      EMET appears to block several general classes of exploits by which malware can elevate its own privileges and self-install. Statistically, I don’t think that it will block much. The vast majority of infections I see are a result of a user being foolish enough to grant the needed privileges, rather than the malware using an exploit to infect without user assistance. Not saying the latter can’t or doesn’t happen, but it seems rare by comparison; the last verified instance I’ve personally witnessed was on Windows 2000, pre-SP4.
      
      And it may not be particularly effective in performing its intended tasks. In searching for details (the linked Microsoft page gives virtually none), I found some pages like these, suggesting EMET runs in user space and can therefore be bypassed:
      
      http://www.pcworld.com/article/2101640/researchers-bypass-protections-in-microsofts-emet-antiexploitation-tool.html
      
      But I wonder if, in some cases, it may still block some of the methods a typical user-assisted infection might employ to actively resist disinfection. On this I’m not certain, but if so, that would be a plus.
      
      I will test it in a VM, then my own system. If it has no downsides, then I will use it to harden my employer’s servers. Every bit of protection is welcome there, no matter how small.
      
      Report comment
      
      Reply
      1. ganzuul says:
        
        January 19, 2016 at 9:30 am
        
        Yup. Layered security buys you time to detect. :)
        
        Report comment
5. Max says:
  
  January 18, 2016 at 11:59 pm
  
  Not irresponsible at all – by all means, do have an antivirus installed (better yet, have two or three – although admittedly that gets hard with all the wretched “services” each one insists to install, stepping on each other’s toes) just make sure not one of them is allowed to run anything at startup or ever after, until you explicitly call them to scan something – which you do whenever you feel you’re doing something slightly dodgy.
  
  In theory, you could configure any of them to only scan at those special times (such as when downloading something) automatically and leave everything else normally alone, but we know that’s never how it works – if you let them start up, they’ll keep finding stuff to scan left and right anyway, bogging down even the beastliest centa-core monster to a crawl – and sadly, both false negatives and positives abound even so.
  
  There’s no silver bullet – ultimately, while judicious privilege management (ie. non-root / admin use) might save your system from being infected, it will ultimately do you no good at all considering you _need_ write access to your files to use them, and that’s all an encryptor needs to pwn you (and possibly your backups, too, unless you really do keep many past versions of everything – but do you now, really? Do you even _have_ backups…? Do they actually work? Are you sure…?)
  
  Report comment
  
  Reply
  1. Hitek146 says:
    
    January 21, 2016 at 7:06 pm
    
    @Max-
    I personally haven’t used any real-time protection on any of my own systems for several years, although I do insist on it for my customer’s computers. I just manually scan downloads before executing them, and have never had an infection…
    
    Report comment
    
    Reply
Leem says:

January 18, 2016 at 1:03 am

Anyone got a link to details of the theory/workings of the fingerprint generator? (Not the FPGA/GPU part). I been googling but I just get blogs full of this article copy-pasted.

Report comment

Reply
msat says:

January 18, 2016 at 1:09 am

Presumably this is something that could be implemented in a dedicated IC and not necessarily leveraging the ability of FPGAs to reconfigure circuits, correct? If so, then this seems like something that might be marketable enough to shove on some CPU dies, which would lead to a performance/efficiency increase.

Report comment

Reply
1. Dan says:
  
  January 18, 2016 at 1:26 am
  
  Presumably if you knew what you were talking about you would have used the term ASIC and know that a standard ASIC will cost you around a million dollars for the set-up, plus the minimum order. That is for block level design, not transistor level customisation or optimisation. If you are lucky and skilled, in that area, you may get it down to a quarter of a million for a couple of thousand chips. One error and you blow your money, then have to start again which doubles your costs.
  
  Report comment
  
  Reply
  1. msat says:
    
    January 18, 2016 at 4:13 pm
    
    You really are tiring. I know what an ASIC is, but I tend to avoid that term when talking about large-scale commercial ICs. What I had in mind was someone like intel embedding it on their server chips, just as they do with other security-related features. Do you call those items ASICs too?
    
    Report comment
    
    Reply
    1. Dan says:
      
      January 18, 2016 at 4:24 pm
      
      1. You are a terrible liar and 2. the cost for a CPU design like an i7 is much larger again. So if you don’t like learning stuff and do not have the courage to admit you don’t know something you probably are lurking on the wrong website. The problem that is being solved is not on one that the majority of end users need to solve so there is not going to be any cheap commodity solution to it ever, well other than what you get if you wait ten years for of the shelf CPUs to get faster, but by then the problem dataset may have grown too.
      
      Report comment
      
      Reply
      1. This old man says:
        
        January 19, 2016 at 9:16 am
        
        Can you cut back on the attitude? You could say what you’ve said in a constructive and friendly manner, and we’d learn the same things, except [msat] who would learn more because they wouldn’t feel like they are being talked down to. You would come across as intelligent and helpful and people would want to listen to everything you have to say. Instead, you keep attacking people as if their very act of asking something is an affront to your person-hood. It makes you look like a teenager who is unsure of their ability and wants prove them self at every turn (I was one of those once, so I know of what I speak). Chill, and we’ll all respect you more.
        
        Report comment
      2. Dan says:
        
        January 19, 2016 at 12:03 pm
        
        I don’t want your respect, I don’t need it, my words are either true of false and in the case of that particular individual they are spot on. I suggest that you examine their history for the explanation, then you will see that your words are not entirely true, therefore they are of far less value than you imagine.
        
        Report comment
neon22 says:

January 18, 2016 at 1:21 am

+1 on the InfoJoule – Go meme go!!!

Report comment

Reply
Chris C. says:

January 18, 2016 at 2:39 am

On the surface, this reminds me a bit of audio fingerprinting. The details of either are deep black magic to me though.

Report comment

Reply
1. Elliot Williams says:
  
  January 18, 2016 at 5:22 am
  
  That’s at least the second time I’ve read that in the comments. (Was it you the other time?)
  
  If I hear that one more time, I’m gonna write up Shazam for you.
  
  Report comment
  
  Reply
  1. Chris C. says:
    
    January 18, 2016 at 9:01 am
    
    Wasn’t me, at least not within the last few months.
    
    I do use audio fingerprinting in my personalized media system, through command line interface to “fpcalc.exe” from MusicBrainz, to generate an AcoustID. Just for local duplicate detection right now, not yet any online look-ups or actual identification. There’s a good tutorial on comparing AcoustIDs to each other, so that much was easy. No clue how fingerprints are generated in the first place.
    
    Report comment
    
    Reply
CJ says:

January 19, 2016 at 11:02 am

xenon are a form of headlights. Xeon are the Intel trademark used on the Phi chips.

Report comment

Reply