Linux Fu: Where’s That Darn File?

Disk storage has exploded in the last 40 years. These days, even a terabyte drive is considered small. There is one downside, though. The more stuff you have, the harder it is to find it. Linux provides numerous tools to find files when you can’t remember their name. Each has plusses and minuses, and choosing between them is often difficult.

Definitions

Different tools work differently to find files. There are several ways you might look for a file:

  1. Find a file if you know its name but not its location.
  2. Find a file when you know some part of its name.
  3. Find a file that contains something.
  4. Find a file with certain attributes (e.g., larger than 100 kB)

You might combine these, too. For example, it is reasonable to query all PDF files created in the last week that are larger than 100 kB.

There are plenty of different types of attributes. Some file systems support tags, too. So, you might have a PERSONAL tag to mark files that apply to you personally. Unfortunately, tool support for tags is somewhat lacking, as you’ll see later.

Another key point is how up-to-date your search results are. If you sift through terabytes of files for each search, that will be slow. If you keep an index, that’s fast, but the index will quickly be out of date. Do you periodically refresh the index? Do you watch the entire file system for changes and then update the index? Different tools do it differently.

Find

The most common tool is, in fact, no tool at all. The find command just does what you would do. It does directory listings and searches through them for whatever you want. The most common way to use the command is:

find . -name 'hackaday.txt' -print

You can probably leave off the -print as that’s the default action. However, find can do so many things like filter by dates, attributes, and even execute commands using the file names it finds, which can be dangerous.

There’s no index to build and store which is nice, but that also means it can be slow. If you do a find / you’ll get a search across the entire file system. However, find is fast for reasonable directory depths.

If you are lazy, you can ask a website to generate your find commands for you. If you want a faster, more modern find, try fd, which is called fd-find on Ubuntu; you execute it with fdfind.

 

Locate/Rlocate/Mlocate/Plocate

If you use find a lot on entire filesystems, you’ll eventually tire of waiting for it to search everywhere. What then? Well, you aren’t the first one to get tired of it, so back in the dawn of Unix, the locate command appeared. The idea is simple: Periodically the updatedb command builds at least one index file then locate searches that index. You can create multiple indices, say one for user files, one for system files, or maybe one for a network drive produced on the network drive’s local machine.

There have been many improved versions of locate, although the latest appears to be plocate. If you want to use locate, you should probably use this version, which is very similar to the original. There are options to search without case comparison, for example. You can use regular expressions, limit the search to the file name (and not the path), and control the output format to some extent.

No matter what version you use, you should look at /etc/updatedb.conf and try to control the indexing process. For example, you might not want to index remote filesystems. Dropping the index for transient files like browser caches is also good.

Of course, locate and its sister commands can only find what you’ve indexed. If you index once a month, you will have trouble finding recent files. Of course, you can reissue the index command manually, but still. In addition, locate doesn’t look inside your files or help you with attribute searches.

There was a time when nearly every Linux system had some form of locate preinstalled. These days, many distros make you install it manually and have a GUI-based search as the default. If you want to use a GUI with locate-like tools, there are a few options. Krusader, one of the KDE file managers, can perform locate searches. There is also catfish. However, the GUIs often can’t handle all the options that locate provides.

Baloo

If you use KDE, then you certainly have seen Baloo. This is the default KDE file indexer. It is very powerful but also very intrusive. Early versions were infamous for chewing up huge amounts of resources while indexing large files. Worse, there were few ways to control what it was doing.

Honestly, I use Baloo, but I have a set of scripts that only allows it to index while my computer is idle and in the wee hours of the morning. Is that still necessary? I don’t know. I’m afraid to unleash Baloo on my system.

So why use Baloo? It integrates perfectly with KDE. It also indexes file system tags and, if you don’t turn it off, file contents. It uses KDE’s metadata extractors to look inside files like archives, for example.

You can use the baloosearch: kio to get a search from many places inside KDE. Normally, you search the Baloo database from Dolphin or KRunner, but there are command line tools, too. The balooctl program gives you some options for working with the database and the daemon. The baloosearch tool lets you find files from the command line. The database can be large, so even a query can take a long time. Remember that Baloo indexes content, so you will sometimes see a result that doesn’t appear to match in the file name. That probably means the search string appears in the file. You can see more about what Baloo knows about a file using the balooshow program with the -x option.

The query language is very complete. For example, you can search for MP3 files from a particular album or images with a certain aspect ratio. You can also use operators like the less than or greater than sign.

You definitely want to configure Baloo. I’ve found that any remote file system or loop in the file system will bring it to its knees.

Recoll

Recoll is another file searcher that can either update its index periodically or watch the file system constantly. Like baloo, it can decode several file types natively and with external programs. It is actively developed and tries to dig through as much as possible (although indexing inside tar files is off by default).

As noted on the program’s homepage, Recoll will index an MS Word document stored as an attachment to an e-mail message inside a Thunderbird folder archived in a Zip file. Wow.

Other Programs

There are some other search programs that are either obscure or were popular at one time but are less popular today:

Of course, there are doubtless many more. Do you use a program we missed? Let us know in the comments. An example of a remote file system you might to exclude from indexing? Hackaday. Want to build your own system? Be sure you know about incron and the file system watches.

20 thoughts on “Linux Fu: Where’s That Darn File?

  1. I’ve got a few aliases in my .bashrc to do that.

    ff is “find file”, a recursive search for any file name containing the text. It would manage a regex, but in practice I never use those. Just type the extension to find all files with that extension, or one word of the music file name to find all files with that word in the pathname.

    fif is “find in file”, a recursive search for any file containing the text. Useful for finding the subroutine definition, or which #include file contains the definition for something.

    #
    # Stupid bash doesn’t allow parameters in aliases! Have
    # to use a shell function instead.
    #
    function ff() {
    find . -iname “*$1*” -print 2>/dev/null
    }

    alias fif=”grep -riI”

    ft is “find type”, which will list any files matching the type. You have to know what the possible type are, but “audio” is one type and “image” is another.

    # symbolic
    # directory
    # audio
    # PDF
    # EPUB
    # ASCII
    # text
    # Perl
    # XML
    # ISO
    # shell
    # script
    # image
    #
    function ft() {
    TYPE=”${1,,}”
    # echo ${TYPE}

    find . -print0 | while read -d $’\0′ FILE; do
    # echo $FILE
    VAR1=$(file -b “$FILE”);
    VAR1=${VAR1,,};
    # echo $VAR1
    if [[ $VAR1 == *”${TYPE}”* ]]; then
    # echo “TYPE: ${TYPE}”;
    echo ${FILE};
    fi

    done
    }

  2. Almost by default, I declare ‘alias vind=”find -type f -print0|xargs -0 grep”‘ in my /etc/bash.aliases
    To use to search INSIDE files, recursively from the current directory. And using the Null-character delimiter, it is also not having problems with spaces in filenames
    Easy for dutchmen as Vind translates to Find :)

  3. A vote for the last you mentioned : fsearch. It does indexed searches in a list of paths (and you can put more than one in it) – for file names (doesn’t search contents that I saw). It pretty much gives instant live results.
    -there is a advanced syntax so you can filter down on date modified or size.

    I was using against my NAS (qirsearch on QNAP) and prefer it greatly.

  4. I’m using the low-tech approach to run a nightly cronjob with something like “find /raid/ > filelist.txt”, then I can use standard grep (with regular expressions as needed) to find files. Works pretty well even with a large raid containing ~20M files, a search still takes no more than a few seconds.

    1. Any reason for not using ‘locate’?
      (Which basically runs a nightly cron job indexing your file systems.)

      By default it uses globbing, but you can do regex searches with:
      $ locate -r <>

  5. I didn’t watch the video, but I would be surprised if it didn’t cover this one, but since it’s not in the article I’ll just leave this here:

    $ find src/ -type f -name “*.js” -exec grep -nHi MyFunctionName \{\} \;

    Or:
    $ find -type f -name “*.” -exec grep -nHi \{\} \;

    Search for files with and grep them for .

  6. plocate is blindingly fast (do the indexing periodically from cron) , it is efficient even when called from a bash cgi script so you can easily implement a very fast three term (grep pipeline) search engine that can easily handle tens of terabytes of documents (millions of files), but if you have a lot of content where the filename is not the idea search term you need to look at Apache Solr or similar to index the _contents_ of the files too. There are also some newer AI based add on layers in that area, but I’m yet to play with them so can’t recommend anything. The end goal is to have your entire life securely documented and intelligently searchable, a personal narrative upgrade module for your wetware.

  7. We’ve used Recoll for so many years we don’t know when we started. We like it It just works – like a lot of things in Linux. Until recently it was giving us too many of the same files in the results until we took the time to figure out how to eliminate duplicates. Done. Thumbs up for Recoll.

  8. For source code searches, ack and ag / “silver searcher” have provided alot of help under Linux alongside tags related tools and Doxygen’s browser interface. I have wished for Linux versions of Windows desktop tools like SourceInsight . Cloud resident files make the case for cloud aware search apps like Copernic Desktop search and Windows/Edge feature “Work Search”

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.