TruffleHog Sniffs Github For Secret Keys

January 13, 2017

Secret keys are quite literally the key to security in software development. If a malicious actor gains access to the keys securing your data, you’re toast. The problem is, to use keys, you’ve got to write them down somewhere – oftentimes in the source code itself. TruffleHog has come along to sniff out those secret keys in your Github repository.

It’s an ingenious trick — a Python script goes through the commit history of a repository, looking at every string of text greater than 20 characters, and analyzing its Shannon entropy. This is a mathematical way of determining if it looks like a relatively random string of numbers and letters. If it has high entropy, it’s probably a key of some sort.

Sharing source code is always a double-edged sword for security. Any flaws are out for all to see, and there are both those who will exploit the flaws and those who will help fix them. It’s a matter of opinion if the benefits outweigh the gains, but it’s hard to argue with the labor benefits of getting more eyes on the code to hunt for bugs. It’s our guess though, that a lot of readers have accidentally committed secret keys in a git repository and had to revert before pushing. This tool can crawl any publicly posted git repo, but might be just as useful in security audits of your own codebase to ensure accidentally viewable keys are invalidated and replaced.

For a real world example of stolen secret keys, read up on this HDMI breakout that sniffs HDCP keys.

22 thoughts on “TruffleHog Sniffs Github For Secret Keys”

nsayer says:

January 13, 2017 at 1:14 pm

I bet TruffleHog would catch public keys. Those aren’t necessarily security holes. For example, if you’re going to set up a web of trust of some sort, you’re probably going to compile the trust anchor key (or certificate) into your code.

Of course, it’s on you to insure that you closely hold and protect the related *private* key…

Many high value trust anchor private keys exist solely on paper. You use a computer with no network connections at all to generate the root keypair, then you have it print out the private key, which you lock in the safe. You then immediately generate a subsidiary key pair, turn the public key into a certificate signed by the root private key. You then print out that subsidiary cert and key, then you turn off the computer (and if you’re paranoid, you shred it and throw the pieces into a dumpster fire). The printed out root private key you lock away in a safe, and you take the two certificates and the subsidiary private key and load them into a connected computer. The private key you lock away with whatever protection is appropriate for your application. The two certificates you can publish.

You do all of that so that if the private key you actually use is ever compromised, you can pull the root private key out of the safe and use that to issue a CRL and a new certificate. You also mitigate that compromise risk by setting a low(er) expiration date on the subsidiary certificate and shortly before it expires you go back and perform the ritual of issuing a new subsidiary cert from the paper root using a disconnected computer. You do all of this because there is no easy remediation to a compromise of the root key other than a universal, global firmware update and flag day, which is comparatively very, very expensive.

Report comment

Reply
1. Adam says:
  
  January 13, 2017 at 2:11 pm
  
  That’s the whole point though. While everything you say is true, for people that are not making root keys sometimes the private key is in code that accidentally gets committed, of the side is accessing some database with a simple password and that gets committed. I would be willing to bet there are loads of these passwords hiding in GitHub commits.
  
  Report comment
  
  Reply
2. John Doe says:
  
  January 14, 2017 at 6:10 am
  
  It does indeed have some false positives. For example, in my repositories it found “double pi = 3.141…” and a base64-encoded image.
  
  Report comment
  
  Reply
robert says:

January 13, 2017 at 2:10 pm

> If it has high entropy, it’s probably a key of some sort.

Er… or… idunno, compressed?

Report comment

Reply
1. some guy says:
  
  January 13, 2017 at 2:40 pm
  
  this. Also the entropy think may work for keys, but not for (a lot of) passwords made of ASCII.
  Might be interesting: http://www.devttys0.com/2013/06/differentiate-encryption-from-compression-using-math/ (from the binwalk-guy(s?))
  BTW, from the same guy, might become handy one day: Database of private SSL/SSH keys for embedded devices https://github.com/devttys0/littleblackbox
  
  Report comment
  
  Reply
2. Noirwhal says:
  
  January 13, 2017 at 7:47 pm
  
  Or obfuscated.
  
  Report comment
  
  Reply
BioSehnsucht says:

January 13, 2017 at 2:49 pm

Or you could just use Github Search

http://www.securityweek.com/github-search-makes-easy-discovery-encryption-keys-passwords-source-code

Report comment

Reply
WJCarpenter says:

January 13, 2017 at 3:09 pm

I spend half of my “security hat” time chastising developers for putting keys into source code. You can never say never, but I have yet to come across a case in practice where it wasn’t a mistake or just laziness.

Report comment

Reply
1. GNUtoo says:
  
  January 14, 2017 at 4:29 am
  
  Finding a way to contact the project owner or authors is not hard to do, commits have email inside the author field, github probably also has way to send a message to the users, trough a pull request or by some other ways.
  
  So a tool that looks for key, password or secrets could be used to find the issues and notify the people that can fix it.
  
  However the question is how to prevent the huge number of potential false positive:
  – making a tool that understand how to find passphrases might be hard
  – I’d guess many passphrases would default to CHANGEME in the configuration files.
  – the configuration files might be shipped in locations where they are not actually used, but instead are used as documentation or examples.
  – Many software have default ssl certificates that are shipped with them, the user is then expected to generate new ones. I wonder whether such practice makes sense though, ssh for instance will generate new keys at boot if none are present. Now that we have letencrypt such software could use its protocol to generate valid keys.
  – “Rooting” software: Some consumer devices running GNU/Linux or Android don’t allow the user access to the underlying system. The Archos 605 WiFi is an example among many other. In that case there are tools to exploit security issues in the default firmware in order to help the user regain control of their devices. They sometimes ship dropbear binaries with hardcoded password.
  – Setup and security lying elsewhere: SSH is easily available everywhere, with a LEDE image, without a web interface, you still need to login in the device to setup a passphrase or SSH keys. You can however connect the device directly to your computer: this will prevent an attacker from being able to connect to it before you.
  –
  
  A good way to solve some of the problems above(routing and OpenWRT/LEDE) would be to inject the user’s keys or password inside the image or binary before installing it to the device, but currently AFAIK no such tool exist.
  
  If there is a way to keep the number of false positive is low (which would lead to a huge number of missed keys), it would be great to have a tool that scan for private keys and that would email the user.
  Would a way to make the repository owner indicate if private keys are to be found in the repository, a good way to deal with false positives?
  
  Another thing to do would be to find a way to deal with many of the issues above. Some known software ship with default ssl certificates, which are publicly known.
  Some documentation on good practices to follow might be a start to find ways to deal with theses.
  
  Denis.
  
  Report comment
  
  Reply
RÖB says:

January 13, 2017 at 4:22 pm

So we should just use hashes like “my secret word” so that is less than 20 characters and has a low entropy so it wont be detected right?

Report comment

Reply
GrowlerBoy says:

January 13, 2017 at 4:57 pm

Here we go again… Somebody wrote a python script! Amazing news! Somebody alert the mainstream media! #NotAHack

Report comment

Reply
1. JIm B says:
  
  January 13, 2017 at 5:22 pm
  
  Here we go again … somebody writes “NotAHack” and fails to provide a link to their hack-worthy work. OK, you are right, hackaday, please refund his subscription fees.
  
  Disdain is cheap. Provide something better.
  
  Report comment
  
  Reply
lol says:

January 13, 2017 at 7:57 pm

if you have a static secret key in your code, that is a problem yes. (these should be generated and/or read from a separate text file)

Report comment

Reply
1. Greenaum says:
  
  January 13, 2017 at 10:41 pm
  
  Then save the text file to an SD card, and put it through a shredder.
  
  Report comment
  
  Reply
  1. Cree says:
    
    January 14, 2017 at 6:35 am
    
    You will also have to burn the shredder. It has seen too much.
    
    Report comment
    
    Reply
  2. RÖB says:
    
    January 14, 2017 at 2:40 pm
    
    The SD die will probably end up on one shredded piece and intact except for the bond wires so the data would be recoverable.
    
    If you just write all ones to an SD card then it’s deleted and wouldn’t even be forensically recoverable.
    
    On the other hand HDD data is much more recoverable to you need about 35 passes of random data or just take the platter(s) out and fold them, that works well :)
    
    Report comment
    
    Reply
    1. some guy says:
      
      January 14, 2017 at 2:54 pm
      
      One pass for a HDD is enough. http://www.howtogeek.com/115573/htg-explains-why-you-only-have-to-wipe-a-disk-once-to-erase-it/
      But for an SD (or SSD) somebody may still be able to recover some data because of bad sectors that are not longer used by the controller (and so not overwritten).
      
      Report comment
      
      Reply
      1. RÖB says:
        
        January 14, 2017 at 3:10 pm
        
        If it were simply an urban mist then the US Department of Defence would have a specification for wiping data.
        Standard DoD 5220.22-M
        
        Sure, I take you point that 35 passes is not necessity any more but 7 passes random is probably as good as complementary and then one pass random. I don’t have any software that does complementary though.
        
        And as for bad sectors, if your software wont write to bad sectors then your not taking security seriously.
        
        Some prople believe that an OS like windows will *delete* files when in reality the file remains fully intact. windows just hides it from you.
        
        Report comment
      2. some guy says:
        
        January 14, 2017 at 6:11 pm
        
        I know there a specs that say wipe 35 times, but maybe the military guys who write this where just really careful/paranoid or the specs are simply really old?
        Concerning bad sectors, i was talking about the controller on the SD card that mapes access to one sector to another because the original sector has gone bad. You can’t overwrite the bad sector, except maybe using special commands to the SD card.
        Yes i know about Windows and sometimes it’s a good thing, especially with people that are not so good with computers…
        
        Report comment
      3. RÖB says:
        
        January 14, 2017 at 7:28 pm
        
        It’s a change in technology. Hard drives are analogue and on the old ones you could pull three layers of data off them. Newer hard drives use the same margin more resourcefully to get higher capacities.
        
        Report comment
Paul Klemstine says:

January 14, 2017 at 7:49 am

I built a frontend that grabs a random list of github repositories and runs them through TruffleHog. It also runs in a Docker container.
https://github.com/raver1975/secretpig

Report comment

Reply
mackenzie says:

November 18, 2020 at 5:30 am

TruffleHog and other open-source projects are great as a platform for building custom detection solutions on top of. I have found greater success particularly at scale with commercial alternatives that still provide some open source flexibility.

https://www.gitguardian.com/gitguardian-vs-trufflehog-alternatives

Report comment

Reply

Hackaday

TruffleHog Sniffs Github For Secret Keys

22 thoughts on “TruffleHog Sniffs Github For Secret Keys”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Back To The Future, 40 Years Old, Looks Like The Past

Why The Latest Linux Kernel Won’t Run On Your 486 And 586 Anymore

One Laptop Manufacturer Had To Stop Janet Jackson Crashing Laptops

The 2025 Iberian Peninsula Blackout: From Solar Wobbles To Cascade Failures

Field Guide To The North American Weigh Station

Our Columns

Hackaday Podcast Episode 327: A Ploopy Knob, Rube-Goldberg Book Scanner, Hard Drives And Power Grids Oscillating Out Of Control

Last Chance: 2025 Hackaday Supercon Still Wants You!

FLOSS Weekly Episode 839: I Want To Get Paid Twice

South Korea Brought High-Rise Fire Escape Solutions To The Masses

C++ Encounters Of The Rusty Zig Kind

22 thoughts on “TruffleHog Sniffs Github For Secret Keys”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns