Blog Links

Linux – disk usage (du) human readable AND sorted by size

This is quick tip to fix a problem that has always bugged me – When showing disk usage in a human readable form (KB, MB, GB) for each subdirectory using “du -sh *”, how can you properly sort it into size order.

If you just want the solution here it it…

alias duf='du -sk * | sort -n | perl -ne '\''($s,$f)=split(m{\t});for (qw(K M G)) {if($s<1024) {printf("%.1f",$s);print "$_\t$f"; last};$s=$s/1024}'\'

Put it into ~/.bashrc to make it permanent.

But if you can spare a minute or two you might get some ideas about how to write those programmatic aliases, in this case using perl.

When using the linux “du” command I like to make the file size human readable, so 8709100 becomes 8.4G, This is achieved by doing this:

du -sh *

Now, the main problem with the K, M and G filesize suffixes is that you can’t sort them.

If you try to pipe that through sort by using

du -sh * | sort

you’ll get something like this

8.4G Desktop
2.6G Documents
12K keys
12M Pictures
536K scripts

or if we sort numerically

du -sh * | sort -n

you’ll get something like this

2.6G Documents
8.4G Desktop
12K keys
12M Pictures
536K scripts

Obviously both these commands are not working as we are intending, because the K(ilo) M(ega) G(iga) suffixes mess up “sort”, The solution is a one liner wrapped up into an alias ‘duf’ for ‘disk usage formatted’

alias duf='du -sk * | sort -n | perl -ne '\''($s,$f)=split(m{\t});for (qw(K M G)) {if($s<1024) {printf("%.1f",$s);print "$_\t$f"; last};$s=$s/1024}'\'

When expanded out, formatted and commented the code looks like this

du -sk * | sort -n |   //get usage in KBytes and sort
perl -ne '             //we use perl to reformat the filesize in K M & G
($s,$f)=split(m{\t});  //splits the size/filename pair
 for (qw(K M G)) {  //loops for each size
  if($s<1024) {        //if s<1024 weve found the correct suffix
   printf("%.1f",$s);  //display the size
   print "$_\t$f";     //display the filename
   last                //line completed
  };
 $s=$s/1024            //for each sizes suffix divide by 1024
}'

This produces the output we intended like this.

12.0KB	keys
536.0KB	scripts
11.7MB	Pictures
2.5GB	Documents
8.3GB	Desktop

Here some useful additions that are worth adding as an edit to my original post:

1) Purely as a shell script, without the perl overhead - source 'inataysia' reddit
du -sk * | sort -n | while read size fname; do for unit in k M G T P E Z Y; do if [ $size -lt 1024 ]; then echo -e "${size}${unit}\t${fname}"; break; fi; size=$((size/1024)); done; done

2) As a function, instead of an alias - which allows you to pass paramters to du - source 'fire'
function duf {
du -sk "$@" | sort -n | perl -ne '($s,$f)=split(/\t/,$_,2);for(qw(K M G T)){if($s<1024){$x=($s<10?"%.1f":"%3d");printf("$x$_\t%s",$s,$f);last};$s/=1024}'
}

Combining together would probably make the best solution so far.
function duf {
du -sk "$@" | sort -n | while read size fname; do for unit in k M G T P E Z Y; do if [ $size -lt 1024 ]; then echo -e "${size}${unit}\t${fname}"; break; fi; size=$((size/1024)); done; done
}

If this has been useful to you, and you would like to buy me a coffee, or help towards my monthly server costs please click here to make a donation via paypal.

15 comments to Linux – disk usage (du) human readable AND sorted by size

  • Casper

    Thanks for this cool script. One caveat is to watch out to use this in some top-level folder, it can take a very long time to finish. (Wish we had file systems that maintained directory size somehow.)

  • [...] This post was Twitted by metoikos – Real-url.org [...]

  • du -s * | sort -n | sed -Ee ’s/^[0-9]+./”/’ -e ’s/$/”/’ | xargs du -sh

    Perl-less implementation; a little extra effort for filenames with spaces. (Yours doesn’t have to worry about that, obviously.)

  • Michael Speer

    http://www.nabble.com/Human-readable-sort-td23223205.html

    Never discount simply fixing the underlying problem.

  • That’s always bugged me as well! I’ve made a few changes,
    though, so that it produces the same formatted output as
    du -sh. Also, as a function it can take arguments:


    function duf {
    du -sk "$@" | sort -n | perl -ne '($s,$f)=split(/\t/,$_,2);for(qw(K M G T)){if($s<1024){$x=($s<10?"%.1f":"%3d");printf("$x$_\t%s",$s,$f);last};$s/=1024}'
    }

  • @casper

    I don’t.

    I prefer not to pay an additional cost on every write, to speed up this far-less-frequent case.

  • Latest version of sort (part of coreutils) supports -h (correct sorting of M,k,G suffixes).

  • I use the following… you see it use ‘du’ two times, but this is not really slower, ’cause the operating system caches.

    # sorted du -hsc
    function duhs() {
    du -s $* | sort -n | cut -f 2- | while read a; do du -sh $a; done
    }

  • DVoita

    If you modify du -sk * to du -sk * .??* you can see hidden dot files as well.

  • chris

    Thanks to inataysia on reddit for a bash only version

    du -sk * | sort -n | while read size fname; do for unit in k M G T P E Z Y; do if [ $size -lt 1024 ]; then echo -e "${size}${unit}\t${fname}"; break; fi; size=$((size/1024)); done; done

  • Why not promote the ‘human-readability’ step to a standalone utility?

    Let’s call it ‘hu’ for ‘human units’. Hypothetically, it would convert any whitespace-delimited numbers found on stdin to human-readable units when echoing to stdout. (Optional arguments could limit this conversion to just certain fields or to alternate unit systems.) Then the solution would be:

    du -sb * | sort -n | hu

  • Michael Speer

    Sat Jan 20 06:00:09 1996 Jim Meyering (——@na-net.ornl.gov)

    —snip—

    * du.c (main): New options –human-readable (-h) and –megabytes (-m).
    (human_readable): New function.
    From Larry McVoy (——@sgi.com).

    Ever since this patch was included in fileutils, system administrators have been frustrated by finding that while they could `du -h` they could not then `sort -h` the output. -h is not posix but is now solidly a part of the gnu coreutils du and ls commands. Including a switch for sort that respects the switch for du was not my invention. It has been argued a number of times on the developers mailing list. Mine was simply the straw which broke the camels back. The additional switch is consistent with the other tools, and merely augments the purpose of sort without creating a differing utility to it.

    Something of the functionality of `hu` may have been the appropriate fix in ‘96, but since the ‘96 -h switch is long set, adding a corresponding switch to sort seems only too appropriate. To `promote’ -h out of du, df and ls into a separate utility would break scripts of users that depend on it.

  • Josh

    You can also set the BLOCK_SIZE environment variable to the value human-readable and all the GNU coreutils that report sizes will respect it.

  • my solution

    du -s * 2>/dev/null | sort -n | cut -f2 | xargs du -sh 2>/dev/null

  • I like ‘-h’ too; it doesn’t have to go away for ‘hu’ to also exist and be useful in other contexts, or when people need a sort to precision hidden by ‘-h’ rounding.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>