Largest Files and Directories in a Linux Filesystem

Simple script with nice formatting.

The results show:

a summary for the file system (line 6 df -m)
a list of files larger than 50 MB (line 16)
a list of the top 20 largest directories (line 27)

Directory sizes are recursively calculated (the totals include subdirectories).

In line 6, the grep -Po command allows us to use Perl-like regex syntax (e.g. lookaheads are supported), and to only return the matched portion of the searched text.

The script:

Bash


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


#!/bin/bash  

cd /opt/myDir  

echo  
echo Space used for /opt/myDir: $(df -m . | grep -Po '[0-9]+(?=%)')%  
echo  
df -m .  

echo  
echo Getting list of largest files...  
echo  

find /opt/myDir \  
  -type f \  
  -size +50M \  
  -exec ls -l {} \; 2> /dev/null \  
  | gawk '{ printf("%15'\''d - %s\n", $5, $NF) }' \  
  | sort -nrk 1,1  

echo  
echo Getting list of largest dirs...  
echo  

du -b /opt/myDir \  
  | sort -n -r \  
  | head -20 \  
  | gawk '{ printf("%15'\''d - %s\n", $1, $2) }'  

echo  

Output is formatted. For example, running the script against a Maven repo gives a directory size listing like this:

149,731,531 - .  
109,943,053 - ./.m2  
109,943,029 - ./.m2/repository  
 77,870,516 - ./.m2/repository/org  
 38,947,091 - ./.ivy2  
 38,947,072 - ./.ivy2/cache  
 10,494,306 - ./.m2/repository/com/google  
  8,723,165 - ./.m2/repository/org/apache/maven  
  8,471,569 - ./.m2/repository/org/hibernate

The directory sizes are right-justified using %15 and the thousand separators are added using \' (line 28).