Linux Fu: Where’s That Darn File?

January 17, 2024

Disk storage has exploded in the last 40 years. These days, even a terabyte drive is considered small. There is one downside, though. The more stuff you have, the harder it is to find it. Linux provides numerous tools to find files when you can’t remember their name. Each has plusses and minuses, and choosing between them is often difficult.

Definitions

Different tools work differently to find files. There are several ways you might look for a file:

Find a file if you know its name but not its location.
Find a file when you know some part of its name.
Find a file that contains something.
Find a file with certain attributes (e.g., larger than 100 kB)

You might combine these, too. For example, it is reasonable to query all PDF files created in the last week that are larger than 100 kB.

There are plenty of different types of attributes. Some file systems support tags, too. So, you might have a PERSONAL tag to mark files that apply to you personally. Unfortunately, tool support for tags is somewhat lacking, as you’ll see later.

Another key point is how up-to-date your search results are. If you sift through terabytes of files for each search, that will be slow. If you keep an index, that’s fast, but the index will quickly be out of date. Do you periodically refresh the index? Do you watch the entire file system for changes and then update the index? Different tools do it differently.

Find

The most common tool is, in fact, no tool at all. The find command just does what you would do. It does directory listings and searches through them for whatever you want. The most common way to use the command is:

find . -name 'hackaday.txt' -print

You can probably leave off the -print as that’s the default action. However, find can do so many things like filter by dates, attributes, and even execute commands using the file names it finds, which can be dangerous.

There’s no index to build and store which is nice, but that also means it can be slow. If you do a find / you’ll get a search across the entire file system. However, find is fast for reasonable directory depths.

If you are lazy, you can ask a website to generate your find commands for you. If you want a faster, more modern find, try fd, which is called fd-find on Ubuntu; you execute it with fdfind.

Locate/Rlocate/Mlocate/Plocate

If you use find a lot on entire filesystems, you’ll eventually tire of waiting for it to search everywhere. What then? Well, you aren’t the first one to get tired of it, so back in the dawn of Unix, the locate command appeared. The idea is simple: Periodically the updatedb command builds at least one index file then locate searches that index. You can create multiple indices, say one for user files, one for system files, or maybe one for a network drive produced on the network drive’s local machine.

There have been many improved versions of locate, although the latest appears to be plocate. If you want to use locate, you should probably use this version, which is very similar to the original. There are options to search without case comparison, for example. You can use regular expressions, limit the search to the file name (and not the path), and control the output format to some extent.

No matter what version you use, you should look at /etc/updatedb.conf and try to control the indexing process. For example, you might not want to index remote filesystems. Dropping the index for transient files like browser caches is also good.

Of course, locate and its sister commands can only find what you’ve indexed. If you index once a month, you will have trouble finding recent files. Of course, you can reissue the index command manually, but still. In addition, locate doesn’t look inside your files or help you with attribute searches.

There was a time when nearly every Linux system had some form of locate preinstalled. These days, many distros make you install it manually and have a GUI-based search as the default. If you want to use a GUI with locate-like tools, there are a few options. Krusader, one of the KDE file managers, can perform locate searches. There is also catfish. However, the GUIs often can’t handle all the options that locate provides.

Baloo

If you use KDE, then you certainly have seen Baloo. This is the default KDE file indexer. It is very powerful but also very intrusive. Early versions were infamous for chewing up huge amounts of resources while indexing large files. Worse, there were few ways to control what it was doing.

Honestly, I use Baloo, but I have a set of scripts that only allows it to index while my computer is idle and in the wee hours of the morning. Is that still necessary? I don’t know. I’m afraid to unleash Baloo on my system.

So why use Baloo? It integrates perfectly with KDE. It also indexes file system tags and, if you don’t turn it off, file contents. It uses KDE’s metadata extractors to look inside files like archives, for example.

You can use the baloosearch: kio to get a search from many places inside KDE. Normally, you search the Baloo database from Dolphin or KRunner, but there are command line tools, too. The balooctl program gives you some options for working with the database and the daemon. The baloosearch tool lets you find files from the command line. The database can be large, so even a query can take a long time. Remember that Baloo indexes content, so you will sometimes see a result that doesn’t appear to match in the file name. That probably means the search string appears in the file. You can see more about what Baloo knows about a file using the balooshow program with the -x option.

The query language is very complete. For example, you can search for MP3 files from a particular album or images with a certain aspect ratio. You can also use operators like the less than or greater than sign.

You definitely want to configure Baloo. I’ve found that any remote file system or loop in the file system will bring it to its knees.

Recoll

Recoll is another file searcher that can either update its index periodically or watch the file system constantly. Like baloo, it can decode several file types natively and with external programs. It is actively developed and tries to dig through as much as possible (although indexing inside tar files is off by default).

As noted on the program’s homepage, Recoll will index an MS Word document stored as an attachment to an e-mail message inside a Thunderbird folder archived in a Zip file. Wow.

Other Programs

There are some other search programs that are either obscure or were popular at one time but are less popular today:

Of course, there are doubtless many more. Do you use a program we missed? Let us know in the comments. An example of a remote file system you might to exclude from indexing? Hackaday. Want to build your own system? Be sure you know about incron and the file system watches.

20 thoughts on “Linux Fu: Where’s That Darn File?”

PWalsh says:

January 17, 2024 at 10:25 am

I’ve got a few aliases in my .bashrc to do that.

ff is “find file”, a recursive search for any file name containing the text. It would manage a regex, but in practice I never use those. Just type the extension to find all files with that extension, or one word of the music file name to find all files with that word in the pathname.

fif is “find in file”, a recursive search for any file containing the text. Useful for finding the subroutine definition, or which #include file contains the definition for something.

#
# Stupid bash doesn’t allow parameters in aliases! Have
# to use a shell function instead.
#
function ff() {
find . -iname “*$1*” -print 2>/dev/null
}

alias fif=”grep -riI”

ft is “find type”, which will list any files matching the type. You have to know what the possible type are, but “audio” is one type and “image” is another.

# symbolic
# directory
# audio
# PDF
# EPUB
# ASCII
# text
# Perl
# XML
# ISO
# shell
# script
# image
#
function ft() {
TYPE=”${1,,}”
# echo ${TYPE}

find . -print0 | while read -d $’\0′ FILE; do
# echo $FILE
VAR1=$(file -b “$FILE”);
VAR1=${VAR1,,};
# echo $VAR1
if [[ $VAR1 == *”${TYPE}”* ]]; then
# echo “TYPE: ${TYPE}”;
echo ${FILE};
fi

done
}

Report comment

Reply
ExploWare says:

January 17, 2024 at 10:47 am

Almost by default, I declare ‘alias vind=”find -type f -print0|xargs -0 grep”‘ in my /etc/bash.aliases
To use to search INSIDE files, recursively from the current directory. And using the Null-character delimiter, it is also not having problems with spaces in filenames
Easy for dutchmen as Vind translates to Find :)

Report comment

Reply
the_morgan says:

January 17, 2024 at 11:03 am

A vote for the last you mentioned : fsearch. It does indexed searches in a list of paths (and you can put more than one in it) – for file names (doesn’t search contents that I saw). It pretty much gives instant live results.
-there is a advanced syntax so you can filter down on date modified or size.

I was using against my NAS (qirsearch on QNAP) and prefer it greatly.

Report comment

Reply
Jakob says:

January 17, 2024 at 11:11 am

I’m using the low-tech approach to run a nightly cronjob with something like “find /raid/ > filelist.txt”, then I can use standard grep (with regular expressions as needed) to find files. Works pretty well even with a large raid containing ~20M files, a search still takes no more than a few seconds.

Report comment

Reply
1. irox says:
  
  January 17, 2024 at 4:36 pm
  
  Any reason for not using ‘locate’?
  (Which basically runs a nightly cron job indexing your file systems.)
  
  By default it uses globbing, but you can do regex searches with:
  $ locate -r <>
  
  Report comment
  
  Reply
  1. irox says:
    
    January 17, 2024 at 4:38 pm
    
    Ugh HaD formatting strikes again….
    There should be ‘regex’ in those empty angle brackets.
    
    $ locate -r ‘regex’
    
    Report comment
    
    Reply
atrent says:

January 17, 2024 at 11:14 am

add `fzf` to the list :)

Report comment

Reply
1. stappers says:
  
  January 17, 2024 at 2:16 pm
  
  an URL please
  
  Report comment
  
  Reply
  1. Ben says:
    
    January 18, 2024 at 5:58 am
    
    https://github.com/junegunn/fzf
    
    Report comment
    
    Reply
v4d says:

January 17, 2024 at 1:15 pm

For in file search i prefer ripgrep (rg) over grep, its quiet faster for large files or slow filesystems.

Report comment

Reply
irox says:

January 17, 2024 at 1:56 pm

I didn’t watch the video, but I would be surprised if it didn’t cover this one, but since it’s not in the article I’ll just leave this here:

$ find src/ -type f -name “*.js” -exec grep -nHi MyFunctionName \{\} \;

Or:
$ find -type f -name “*.” -exec grep -nHi \{\} \;

Search for files with and grep them for .

Report comment

Reply
1. irox says:
  
  January 17, 2024 at 1:58 pm
  
  Ugh, I forgot putting stuff in <> angle brackets doesn’t work on HaD. So the last two lines of that comment are messed up… oh well. I’m sure you get the idea anyway.
  
  Report comment
  
  Reply
Fortran says:

January 17, 2024 at 4:15 pm

I highly recommend using fd (sometimes known as fd-find in some package managers), a simple, fast alternative to find:

https://github.com/sharkdp/fd

It might not have the sheer number of options/power that find has, but it is often 10x or more faster than find.

Report comment

Reply
𐂀 𐂅 says:

January 17, 2024 at 4:26 pm

plocate is blindingly fast (do the indexing periodically from cron) , it is efficient even when called from a bash cgi script so you can easily implement a very fast three term (grep pipeline) search engine that can easily handle tens of terabytes of documents (millions of files), but if you have a lot of content where the filename is not the idea search term you need to look at Apache Solr or similar to index the _contents_ of the files too. There are also some newer AI based add on layers in that area, but I’m yet to play with them so can’t recommend anything. The end goal is to have your entire life securely documented and intelligently searchable, a personal narrative upgrade module for your wetware.

Report comment

Reply
fred says:

January 17, 2024 at 5:18 pm

We’ve used Recoll for so many years we don’t know when we started. We like it It just works – like a lot of things in Linux. Until recently it was giving us too many of the same files in the results until we took the time to figure out how to eliminate duplicates. Done. Thumbs up for Recoll.

Report comment

Reply
Colin says:

January 18, 2024 at 9:55 pm

Don’t forget about the program “which”! really helpful finding the location of an executable, wish I discovered it forever ago.

Report comment

Reply
1. Martin Larsen says:
  
  January 21, 2024 at 1:11 pm
  
  “which” is very useful. There is also its cousin “whereis” that shows files belonging to the executable such as docs and language files.
  
  Report comment
  
  Reply
Sebastián Alvarez says:

January 19, 2024 at 4:32 am

I always think of implement tesseract on my memes folder so I rename the files based on portions of their content… boom performance boost lol

Report comment

Reply
wil blake says:

January 19, 2024 at 10:55 am

For source code searches, ack and ag / “silver searcher” have provided alot of help under Linux alongside tags related tools and Doxygen’s browser interface. I have wished for Linux versions of Windows desktop tools like SourceInsight . Cloud resident files make the case for cloud aware search apps like Copernic Desktop search and Windows/Edge feature “Work Search”

Report comment

Reply
PPJ says:

January 23, 2024 at 10:37 am

I feel lame – I just use find option in Midnight Commander.

Report comment

Reply

Hackaday

Linux Fu: Where’s That Darn File?

Definitions

Find

Locate/Rlocate/Mlocate/Plocate

Baloo

Recoll

Other Programs

Read more from this series:
Linux-Fu

20 thoughts on “Linux Fu: Where’s That Darn File?”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

A Field Guide To The North American Cold Chain

The DEW Line Remembered

The Fight To Save Lunar Trailblazer

Hacking When It Counts: DIY Prosthetics And The Prison Camp Lathe

Dearest C++, Let Me Count The Ways I Love/Hate Thee

Our Columns

FLOSS Weekly Episode 841: Drupal And AI: The Right Tool For Everything

Mach Cutoff: Bending The Sonic Boom

Robots Want The Jobs You Can’t Do

Hackaday Links: July 13, 2025

Trickle Down: When Doing Something Silly Actually Makes Sense

Definitions

Find

Locate/Rlocate/Mlocate/Plocate

Baloo

Recoll

Other Programs

Read more from this series:Linux-Fu

20 thoughts on “Linux Fu: Where’s That Darn File?”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns

Read more from this series:
Linux-Fu