I tend to use
File::Find
the most in order to get some file searching and mangling.
Usually my scripts have the same simple structure as follows:$| = 1; # autoflush find( \&directory_scanner, ( $starting_directory ) ); $| = 0; # non autoflush # and the scanner is something like sub directory_scanner{ chomp; return if ( $_ eq $starting_directory || ! -f $_ ); return if ( $File::Find::dir !~ /$re_dir(\d{4}-\d{6})$/ ); ... }
As you can see the event handler invoked by
File::Find
is used to both print some report (the $counter
)
in order to tell me the script is still alive (I do pass 200+k files at once) but, most notably, applies a regexp
to the directory I'm in in order to avoid some staging/backup/etc. directory that could be likely the one I'm interested
into but I don't want the script to pass.
For a few times I've tried to convert my
Find::File
based scripts to File::Find::Rule
, just to get more used with such
interface, but I didn't know how to fix the application of regular expression to the traversing path. Reading a little more
deeply the documentation I found the exec
subroutine that allows me to specify an handler (i.e., a subroutine) that can
return true
or false
depending on what I want to do on the file I'm visiting.
Therefore, converting my scripts becomes as easy as follows:
$| = 1; my $engine = File::Find::Rule->new(); my @files = $engine->file() ->exec( sub { my ( $shortname, $path, $fullname ) = @_; return $path !~ /$re_dir(\d{4}-\d{6})$/; } ) ->exec( sub{ my ( $shortname, $path, $fullname ) = @_; $counter++; return $shortname =~ /KCL/; } ) ->exec( sub{ my ( $shortname, $path, $fullname ) = @_; print "." if ( $counter % 100 == 0 ); print "$counter\n" if ( $counter % 1000 == 0 ); return 1; # do not forget ! } ) ->in( $starting_directory ); $| = 0;
I've kept three different handlers for readibility sake, but as you can
image, it is possible to shrink them down into a single one.
The funny part here is that I can check the path against a regexp again.
The drawback is that an handler used for output reporting only must
return always a true value.
In the case you are wondering, the autoflush is used simply to display the dots while the program is running.