About Tomasz Nurkiewicz

Java EE developer, Scala enthusiast. Enjoying data analysis and visualization. Strongly believes in the power of testing and automation.

Turning recursive file system traversal into Stream

When I was learning programming, back in the days of Turbo Pascal, I managed to list files in directory usingFindFirstFindNext and FindClose functions. First I came up with a procedure printing contents of a given directory. You can imagine how proud I was to discover I can actually call that procedure from itself to traverse file system recursively. Well, I didn’t know the term recursion back then, but it worked. Similar code in Java would look something like this:
 
 
 
 
 

public void printFilesRecursively(final File folder) {
    for (final File entry : listFilesIn(folder)) {
        if (entry.isDirectory()) {
            printFilesRecursively(entry);
        } else {
            System.out.println(entry.getAbsolutePath());
        }
    }
}

private File[] listFilesIn(File folder) {
    final File[] files = folder.listFiles();
    return files != null ? files : new File[]{};
}

Didn’t know File.listFiles() can return null, did ya? That’s how it signals I/O errors, like if IOException never existed. But that’s not the point. System.out.println() is rarely what we need, thus this method is neither reusable nor composable. It is probably the best counterexample of Open/Closed principle. I can imagine several use cases for recursive traversal of file system:

  1. Getting a complete list of all files for display purposes
  2. Looking for all files matching given pattern/property (also check out File.list(FilenameFilter))
  3. Searching for one particular file
  4. Processing every single file, e.g. sending it over network

Every use case above has a unique set of challenges. For example we don’t want to build a list of all files because it will take a significant amount of time and memory before we can start processing it. We would like to process files as they are discovered and lazily – by pipe-lining computation (but without clumsy visitor pattern). Also we want to short-circuit searching to avoid unnecessary I/O. Luckily in Java 8 some of these issues can be addressed with streams:

final File home = new File(FileUtils.getUserDirectoryPath());
final Stream<Path> files = Files.list(home.toPath());
files.forEach(System.out::println);

Remember that Files.list(Path) (new in Java 8) does not look into subdirectories – we’ll fix that later. The most important lesson here is: Files.list() returns a Stream<Path> - a value that we can pass around, compose, map, filter, etc. It’s extremely flexible, e.g. it’s fairly simple to count how many files I have in a directory per extension:

import org.apache.commons.io.FilenameUtils;

//...

final File home = new File(FileUtils.getUserDirectoryPath());
final Stream<Path> files = Files.list(home.toPath());
final Map<String, List<Path>> byExtension = files
        .filter(path -> !path.toFile().isDirectory())
        .collect(groupingBy(path -> getExt(path)));

byExtension.
        forEach((extension, matchingFiles) ->
                System.out.println(
                        extension + "\t" + matchingFiles.size()));

//...

private String getExt(Path path) {
    return FilenameUtils.getExtension(path.toString()).toLowerCase();
}

OK, just another API, you might say. But it becomes really interesting once we need to go deeper, recursively traversing subdirectories. One amazing feature of streams is that you can combine them with each other in various ways. Old Scala saying “flatMap that shit” is applicable here as well, check out this recursive Java 8 code:

//WARNING: doesn't compile, yet:

private static Stream<Path> filesInDir(Path dir) {
    return Files.list(dir)
            .flatMap(path ->
                    path.toFile().isDirectory() ?
                            filesInDir(path) :
                            singletonList(path).stream());
}

Stream<Path> lazily produced by filesInDir() contains all files within directory including subdirectories. You can use it as any other stream by calling map()filter()anyMatch()findFirst(), etc. But how does it really work?flatMap() is similar to map() but while map() is a straightforward 1:1 transformation, flatMap() allows replacing single entry in input Stream with multiple entries. If we had used map(), we would have end up withStream<Stream<Path>> (or maybe Stream<List<Path>>). But flatMap() flattens this structure, in a way exploding inner entries. Let’s see a simple example. Imagine Files.list() returned two files and one directory. For filesflatMap() receives a one-element stream with that file. We can’t simply return that file, we have to wrap it, but essentially this is no-operation. It gets way more interesting for a directory. In that case we call filesInDir()recursively. As a result we get a stream of contents of that directory, which we inject into our outer stream.

Code above is short, sweet and… doesn’t compile. These pesky checked exceptions again. Here is a fixed code, wrapping checked exceptions for sanity:

public static Stream<Path> filesInDir(Path dir) {
    return listFiles(dir)
            .flatMap(path ->
                    path.toFile().isDirectory() ?
                            filesInDir(path) :
                            singletonList(path).stream());
}

private static Stream<Path> listFiles(Path dir) {
    try {
        return Files.list(dir);
    } catch (IOException e) {
        throw Throwables.propagate(e);
    }
}

Unfortunately this quite elegant code is not lazy enough. flatMap() evaluates eagerly, thus it always traverses all subdirectories, even if we barely ask for first file. You can try with my tiny LazySeq library that tries to provide even lazier abstraction, similar to streams in Scala or lazy-seq in Clojure. But even standard JDK 8 solution might be really helpful and simplify your code significantly.

Related Whitepaper:

Functional Programming in Java: Harnessing the Power of Java 8 Lambda Expressions

Get ready to program in a whole new way!

Functional Programming in Java will help you quickly get on top of the new, essential Java 8 language features and the functional style that will change and improve your code. This short, targeted book will help you make the paradigm shift from the old imperative way to a less error-prone, more elegant, and concise coding style that’s also a breeze to parallelize. You'll explore the syntax and semantics of lambda expressions, method and constructor references, and functional interfaces. You'll design and write applications better using the new standards in Java 8 and the JDK.

Get it Now!  

Leave a Reply


eight × 7 =



Java Code Geeks and all content copyright © 2010-2014, Exelixis Media Ltd | Terms of Use | Privacy Policy
All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners.
Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries.
Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.

Sign up for our Newsletter

20,709 insiders are already enjoying weekly updates and complimentary whitepapers! Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

As an extra bonus, by joining you will get our brand new e-books, published by Java Code Geeks and their JCG partners for your reading pleasure! Enter your info and stay on top of things,

  • Fresh trends
  • Cases and examples
  • Research and insights
  • Two complimentary e-books