find
, grep
, and awk
commands are amazing power tools for fine-grained file searches, and for
finding things inside files. With them you can find the largest and
newest files on a system, fine-tune search parameters, search for text
inside files, and perform some slick user management tricks.Find Largest or Newest Files
Thefind
command can do nearly anything, if you can
figure out how. This example hunts down space hogs by finding the 10
largest files on your system, and sorts them from small to large in
human-readable form:These results remind me why I don’t like having a Trash bin, because when I delete something I mean it, by cracky. This command is a brute-force search of the entire filesystem and may take a few minutes to run, so use it as an excuse to go have a quick healthy walk outside. Of course you can modify the command to search whatever directories you want; for example, use
find /var/
to hunt down obese logfiles.Let’s dissect the command.
find / -type f
means “search all files in the entire root filesystem.” The -exec
option is for incorporating other commands, in this case du
, the disk usage command. -exec du {} \;
means “run the du command on every file to get its size in bytes.” 2>/dev/null
sends all error messages to the bitbucket, so they don’t clutter up your results. You can delete both 2>/dev/null
occurrences and rerun the command if you’re curious about what you’re missing. sort -n
puts all the files in order by size, and tail -n 10
displays the last 10, which thanks to the sort are the largest. You
could stop there, and then your output would look like this:xargs -n 1 du -h
adds the final refinement, converting the file sizes from bytes to an easy-to-read format.You can easily find all files on your system that were changed in the last five minutes:
# find / -mmin -5 -type f
This command finds all files changed between 10 and 20 minutes ago:# find / -mmin +10 -mmin -20 -type f
+10 means more than 10 minutes ago, and -20 means less than
20. If you do not use a plus or minus, it means that number exactly. Use
-mtime
to search by 24-hour days. If you want to find directories, use -type d
.Searching Multiple Directories
You can list multiple arbitrary directories in which to search like this:- xdev
limits the search to the filesystem you are in and will not enter any other mounted filesystems. By default find
does not follow symlinks, so you only need to include -xdev
to stay inside a filesystem and not go wandering through network shares and removable devices.Excluding Directories
You can narrow your searches by excluding directories with theprune
option. prune
is a little weird; you have to think backwards. This example searches
the whole filesystem except for the /proc and /sys pseudo-directories:# find / \( -name proc -o -name sys \) -prune -o -type f -mmin -1
First you name the directories to exclude, where -o
means “or,” and escape the parentheses. Then -prune -o
means “don’t look in the previously named directories.”I like to use
prune
to exclude web browser caches,
because they clutter the results. The following example does that, and
also prints the date and time for each file:$ find / \( -name proc -o -name sys -o -name .mozilla -o -name chromium \) -prune -o -type f -mmin -10 -printf "%Ac\t%p\n"
Wed 28 Sep 2011 10:34:54 AM PDT /home/carla/.local/share/akonadi/db_data/ib_logfile0
Wed 28 Sep 2011 10:34:54 AM PDT /home/carla/.local/share/akonadi/db_data/ibdata1
Wed 28 Sep 2011 05:21:48 PM PDT /home/carla/articles/findgrep.html
The printf
option is “print format.” Use printf
when you want to control the formatting of your output. You get to
specify newlines, date and time formatting, and file attributes such as
permissions, ownership, and time stamps. %Ac
prints the date and time, \t
inserts a tab, %p
prints the full filename, and \n
inserts a newline.As you can see,
find
has a lot of built-in functionality that people often add the ls
command for.Finding File Types
Searching by file extension is easy too. This example searches the current directory for three different types of image files:$ find . -name "*.png" -o -name "*.jpg" -o -name "*.gif" -type f
Use the -name
option to search on any part of a filename; either the extension or part of the name. For example, to find mysong.ogg you could search for mys*
, or any part of it, using normal shell wildcards. Use -iname
for a case-insensitive search.Finding Duplicate Files
You can find duplicates files in a couple of ways. This command checks MD5 hashes:$ find . -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 24
This calculates an MD5 hash for all the files, sorts them by hash,
displays them on separate lines, and matches the first 24 digits of each
hash.The second way is to match files by file size:
MD5 hashes are more accurate, but matching file sizes is faster.
Finding Text Inside Files
Thegrep
command is endlessly useful for searching
inside text files to find things. Suppose you have a directory full of
configuration files for a server, and you want to search all of them to
find all of your test entries. If you were foresightful you used the
word “test” in all of them, so this command will find them:# grep -inR -A2 test /etc/fooserver/
This tells grep
to do a case-insensitive recursive
search for “test” in all the files in the /etc/fooserver/ directory, and
to print the next two lines following the line that matches the search.
The n
option prints line numbers, which is a nice bonus in large files.Finding Blocks of Text
Theawk
command can find blocks of related text in a way that grep
can’t, using this simple syntax: awk '/start-pattern/,/stop-pattern/'
. Suppose you want to see expanded information from lspci
for just your Ethernet device:$ lspci -v | awk '/[Ee]thernet/,/^$/'
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)
Subsystem: Lenovo Device 2131
Flags: bus master, fast devsel, latency 0, IRQ 46
I/O ports at 3000 [size=256]
Memory at f2004000 (64-bit, prefetchable) [size=4K]
Memory at f2000000 (64-bit, prefetchable) [size=16K]
[virtual] Expansion ROM at f2020000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: r8169
Kernel modules: r8169
You need to know the beginning and end of the block that you want to
see, so it’s a great tool for quickly snagging sections of configuration
files.This example takes advantage of configuration blocks delimited with curly braces, and homes in on the
listen
directives in radiusd.conf:# awk '/listen {/,/}/' /etc/freeradius/radiusd.conf
listen {
ipaddr = *
# ipv6addr = ::
port = 0
type = acct
# interface = eth0
# clients = per_socket_clients
}
Managing Users and Files
Employees leave, and file ownership and permissions get messed up on an organization’s system files – but don’t worry,find
can help you set things right quickly. You can find all files that belong to a specified username:# find / -user carla
Or to a group:# find / -group admins
You can also search by UID and GID with the -uid
and -gid
options. You can then move all of a user’s files to another user by either username or UID:# find / -uid 1100 -ok chown -v 1200 {} \;
# find / -user carla -ok chown -v steven {} \;
Of course this works for changing group membership as well:# find / -group carla -ok chgrp -v admins {} \;
The ok
option requires you to verify each and every change. Replace it with -exec
if you’re confident about your changes.When employees leave you may have a policy of deleting their files, which
find
can do with ease:# find / -user 1100 -exec rm {} \;
Of course you want to be very sure you have it right, because find
won’t nag you and ask if you are sure. It will simply do what you tell it to.find
, grep
, and awk
– with tools like these, and maybe a little help from their man pages, you can find just about anything on your Linux systems. Source: http://olex.openlogic.com