Sunday, October 2, 2011

How to Find Anything Under Linux

The Linux find, grep, and awk commands are amazing power tools for fine-grained file searches, and for finding things inside files. With them you can find the largest and newest files on a system, fine-tune search parameters, search for text inside files, and perform some slick user management tricks.

Find Largest or Newest Files

The find command can do nearly anything, if you can figure out how. This example hunts down space hogs by finding the 10 largest files on your system, and sorts them from small to large in human-readable form:
# find / -type f -exec du {} \; 2>/dev/null | sort -n | tail -n 10 | xargs -n 1 du -h 2>/dev/null
1.2G	/home/carla/.local/share/Trash/files/download
1.3G	/home/carla/sda1/carla/.VirtualBox/Machines/ubuntu-hoary/Snapshots/{671041dd-700c-4506-68a8-7edfcd0e3c58}.vdi
2.2G	/home/carla/.local/share/Trash/files/dreamstudio.iso
[...]
These results remind me why I don’t like having a Trash bin, because when I delete something I mean it, by cracky. This command is a brute-force search of the entire filesystem and may take a few minutes to run, so use it as an excuse to go have a quick healthy walk outside. Of course you can modify the command to search whatever directories you want; for example, use find /var/ to hunt down obese logfiles.
Let’s dissect the command. find / -type f means “search all files in the entire root filesystem.” The -exec option is for incorporating other commands, in this case du, the disk usage command. -exec du {} \; means “run the du command on every file to get its size in bytes.” 2>/dev/null sends all error messages to the bitbucket, so they don’t clutter up your results. You can delete both 2>/dev/null occurrences and rerun the command if you’re curious about what you’re missing. sort -n puts all the files in order by size, and tail -n 10 displays the last 10, which thanks to the sort are the largest. You could stop there, and then your output would look like this:
1206316	/home/carla/.local/share/Trash/files/download
2209784	/home/carla/.local/share/Trash/files/dreamstudio.iso
xargs -n 1 du -h adds the final refinement, converting the file sizes from bytes to an easy-to-read format.
You can easily find all files on your system that were changed in the last five minutes:
# find / -mmin -5 -type f
This command finds all files changed between 10 and 20 minutes ago:
# find / -mmin +10 -mmin -20 -type f
+10 means more than 10 minutes ago, and -20 means less than 20. If you do not use a plus or minus, it means that number exactly. Use -mtime to search by 24-hour days. If you want to find directories, use -type d.

Searching Multiple Directories

You can list multiple arbitrary directories in which to search like this:
# find /etc /var /mnt /media -xdev -mmin -5 -type f
- xdev limits the search to the filesystem you are in and will not enter any other mounted filesystems. By default find does not follow symlinks, so you only need to include -xdev to stay inside a filesystem and not go wandering through network shares and removable devices.

Excluding Directories

You can narrow your searches by excluding directories with the prune option. prune is a little weird; you have to think backwards. This example searches the whole filesystem except for the /proc and /sys pseudo-directories:
# find / \( -name proc -o -name sys \) -prune -o -type f -mmin -1
First you name the directories to exclude, where -o means “or,” and escape the parentheses. Then -prune -o means “don’t look in the previously named directories.”
I like to use prune to exclude web browser caches, because they clutter the results. The following example does that, and also prints the date and time for each file:
$ find / \( -name proc -o -name sys -o -name .mozilla -o -name chromium \) -prune -o -type f -mmin -10 -printf "%Ac\t%p\n"
Wed 28 Sep 2011 10:34:54 AM PDT	/home/carla/.local/share/akonadi/db_data/ib_logfile0
Wed 28 Sep 2011 10:34:54 AM PDT	/home/carla/.local/share/akonadi/db_data/ibdata1
Wed 28 Sep 2011 05:21:48 PM PDT	/home/carla/articles/findgrep.html
The printf option is “print format.” Use printf when you want to control the formatting of your output. You get to specify newlines, date and time formatting, and file attributes such as permissions, ownership, and time stamps. %Ac prints the date and time, \t inserts a tab, %p prints the full filename, and \n inserts a newline.
As you can see, find has a lot of built-in functionality that people often add the ls command for.

Finding File Types

Searching by file extension is easy too. This example searches the current directory for three different types of image files:
$ find . -name "*.png" -o -name "*.jpg" -o -name "*.gif" -type f
Use the -name option to search on any part of a filename; either the extension or part of the name. For example, to find mysong.ogg you could search for mys*, or any part of it, using normal shell wildcards. Use -iname for a case-insensitive search.

Finding Duplicate Files

You can find duplicates files in a couple of ways. This command checks MD5 hashes:
$ find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 24
This calculates an MD5 hash for all the files, sorts them by hash, displays them on separate lines, and matches the first 24 digits of each hash.
The second way is to match files by file size:
$ find . -type f -printf "%p - %s\n" | sort -nr -k3 | uniq -D -f1
MD5 hashes are more accurate, but matching file sizes is faster.

Finding Text Inside Files

The grep command is endlessly useful for searching inside text files to find things. Suppose you have a directory full of configuration files for a server, and you want to search all of them to find all of your test entries. If you were foresightful you used the word “test” in all of them, so this command will find them:
# grep -inR -A2 test /etc/fooserver/
This tells grep to do a case-insensitive recursive search for “test” in all the files in the /etc/fooserver/ directory, and to print the next two lines following the line that matches the search. The n option prints line numbers, which is a nice bonus in large files.

Finding Blocks of Text

The awk command can find blocks of related text in a way that grep can’t, using this simple syntax: awk '/start-pattern/,/stop-pattern/'. Suppose you want to see expanded information from lspci for just your Ethernet device:
$ lspci -v | awk '/[Ee]thernet/,/^$/'
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
        Subsystem: Lenovo Device 2131
        Flags: bus master, fast devsel, latency 0, IRQ 46
        I/O ports at 3000 [size=256]
        Memory at f2004000 (64-bit, prefetchable) [size=4K]
        Memory at f2000000 (64-bit, prefetchable) [size=16K]
        [virtual] Expansion ROM at f2020000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: r8169
        Kernel modules: r8169
You need to know the beginning and end of the block that you want to see, so it’s a great tool for quickly snagging sections of configuration files.
This example takes advantage of configuration blocks delimited with curly braces, and homes in on the listen directives in radiusd.conf:
# awk '/listen {/,/}/' /etc/freeradius/radiusd.conf
listen {
	    ipaddr = *
#	    ipv6addr = ::
	    port = 0
	    type = acct
#	    interface = eth0
#	    clients = per_socket_clients
}

Managing Users and Files

Employees leave, and file ownership and permissions get messed up on an organization’s system files – but don’t worry, find can help you set things right quickly. You can find all files that belong to a specified username:
# find / -user carla
Or to a group:
# find / -group admins
You can also search by UID and GID with the -uid and -gid options. You can then move all of a user’s files to another user by either username or UID:
# find / -uid 1100 -ok chown -v 1200 {} \;
# find / -user carla -ok chown -v steven {} \;
Of course this works for changing group membership as well:
# find / -group carla -ok chgrp -v admins {} \;
The ok option requires you to verify each and every change. Replace it with -exec if you’re confident about your changes.
When employees leave you may have a policy of deleting their files, which find can do with ease:
# find / -user 1100 -exec rm {} \;
Of course you want to be very sure you have it right, because find won’t nag you and ask if you are sure. It will simply do what you tell it to.
find, grep, and awk – with tools like these, and maybe a little help from their man pages, you can find just about anything on your Linux systems. Source: http://olex.openlogic.com