xfind is a command-line recursive file find utility implemented in multiple programming languages, currently these twenty-four (twenty-five including Ocaml, but that version is currently defunct):
Language | URL |
---|---|
Bash | https://en.wikipedia.org/wiki/Bash_(Unix_shell) |
C | https://en.wikipedia.org/wiki/C_(programming_language) |
Clojure | https://clojure.org/ |
C# | https://learn.microsoft.com/en-us/dotnet/csharp/ |
C++ | https://www.stroustrup.com/C++.html |
Dart | https://dart.dev/ |
Elixir | https://elixir-lang.org/ |
F# | https://learn.microsoft.com/en-us/dotnet/fsharp/ |
Go | https://golang.org/ |
Groovy | https://groovy-lang.org/ |
Haskell | https://www.haskell.org/ |
Java | https://www.java.com/ |
JavaScript | https://nodejs.org/ |
Kotlin | https://kotlinlang.org/ |
Objective-C | https://en.wikipedia.org/wiki/Objective-C |
Perl | https://www.perl.org/ |
PHP | https://www.php.net/ |
PowerShell | https://learn.microsoft.com/en-us/powershell/ |
Python | https://www.python.org/ |
Ruby | https://www.ruby-lang.org/ |
Rust | https://www.rust-lang.org/ |
Scala | https://www.scala-lang.org/ |
Swift | https://swift.org/ |
TypeScript | https://www.typescriptlang.org/ |
Using any language version, you can find files based on numerous criteria, including:
- filter by file extensions
- filter directory paths by regex
- filter file names by regex
- filter by file types
- find files at a minimum and/or maximum depth
- find files using max/min values for lastmod and size
- find under multiple separate directories
- include/exclude hidden directories/files
There are some other features being added, such as:
- filter files by user and group
- filter files using found ignore files
- find files inside archive files (zip, tar.gz, etc.)
The xfind
repo is derived from xsearch.
There are a number of "why questions" that can be asked about xfind
, such as:
- Why create another file find/search CLI utility?
- Why write a version of the utility in X language?
- Why rewrite the utility in so many languages?!?
Those are really better questions for xsearch
, the project that xfind
is derived from, and I
will answer those questions there. However, there are a couple of questions specific to xfind
:
I created xfind
from xsearch
after realizing that the file finding portion of the functionality
would be useful as a library dependency for other projects, such as a utility to find file duplicates
(see: pydupes). I also realized that a file finding utility with
regex filtering would be useful on its own when file searching is not needed. Lastly, it occurred
to me that I could modify xsearch
to use the file finding library as an external dependency, which
would add another dimension for inter-language comparison.
The process of creating xfind
from xsearch
is touched on here.
Honestly, yes. 😀 I have "reimplementation fatigue" from these projects, and I will probably never do another multi-language project. That being said, I'm glad for doing them, it has been a very educational and mostly enjoyable experience that I wouldn't trade. There's a lot more I can say about it, and I plan to. Have a look at the conclusions section for an overview of how I plan to tackle that.
The high-level process of creating xfind
from xsearch
included these steps:
- Clone a copy of the
xsearch
repo, renaming the root directory toxfind
- Write and execute a conversion script on the source under
xfind
(see scripts/xsearch2xfind.py) - Manually edit the source to finish the conversion - remove file search functionality, etc.
Looking back, there was more I could have added to the conversion script to further simplify the manual editing process, but it provided a good start. It did also get me thinking about programming language translation as another possible experimental project...
The first step is to clone xfind
. By default, it is expected to be located at $HOME/src/xfind
,
but this can be changed by setting XFIND_PATH
environment variable to wherever it has been cloned to.
$ git clone https://github.com/clarkcb/xfind.git
Once cloned, you have two installation options for xfind
:
- Build a Docker image and open in a container
- Build/run on your local system, installing any needed compilers, interpreters, etc.
The Docker image option is recommended if you think you will want to build and run all or many of the different language versions for comparison, since the setup will be easier and it won't effect your local system.
If you are only interested in code comparisons, or if you only want to run/compare a subset of the language versions, then building and running on your local system might make sense, especially if you already have support for those languages installed.
There is a Dockerfile that enables building a Docker image locally in order to build and run
xfind
in a container. This simplifies the setup and building process and is recommended if you
want to build and run all or many language versions.
There are two steps to this process: first build the image, then open a new container instance of the image (opening in VS Code is described below).
To build the Docker image, you will need to have Docker installed on your system. I also
recommend enabling experimental
in the Docker engine configuration in order to enable the
--squash
option for building an image, which is done by putting the following JSON in
the Docker Engine config:
{
"experimental": true
}
Next, open a terminal in the xfind
root directory and cd into the .devcontainer
subdirectory. There, run the following command (include --squash
if you enabled
experimental
):
$ docker build --squash -t xfind .
This build will take a long time, probably at least a half hour on a typical system with
a typical internet connection. To see the specifics of what is happening, have a look at
the Dockerfile, but in general, the necessary components to build and run (most of) the
language versions of xfind
in a container are downloaded and installed into a base ubuntu
image. If the build is interrupted or stalls at any point, it should be possible to restart
it by issuing the same command and have it continue close to where it left off.
After the image is built, you should be able to see it listed in your images when using
the docker images
command, it should include a line similar to this:
REPOSITORY TAG IMAGE ID CREATED SIZE
xfind latest ca8980e929b1 12 hours ago 5.94GB
Yes, the image is big, and it probably means there's something not quite right in the way I've configured the build, even with squashing. I have a TODO to research this.
Now that you have a built image, you can run it one of several ways. I recommend opening it inside VS Code.
First, open xfind
in VS Code as you would a project directory. VS Code should automatically
detect that the project is configured to be able to be opened inside a container and display
a popup asking if you want to do so, which you can confirm by clicking on the "Reopen in Container"
button. Otherwise, you can click in the green area in the far lower left corner on the status bar
and select the option to reopen in container from the menu.
The first time you open xfind
in a container in VS Code, some more installations will be
triggered, and this can take some time too, although not nearly as much as builing the image.
These installations are VS Code extensions to provide extra functionality for many of the languages,
build systems, etc. I tried to pick ones that are most popular / standard for a given language in
cases where it's obvious, as well as a few that I found useful and not too intrusive. The list
is in the devcontainer.json file in the extensions
array.
The next step will be to build the language versions of find
and compare them.
If you are primarily interested in specific language versions, and especially if you already have some or all of the language support for those versions installed locally, this installation option could make sense.
In this scenario, you will need to make sure you have the proper language support installed for each version that you want to build/run.
In general, different OSes provide or have available systems for managing installation of multiple software packages from a single interface:
- Linux - Ubuntu uses the
apt
command to install/manage software installs, Red Hat usesrpm
, etc. - OSX - a popular package manager and the one I would recommend is Homebrew
- Windows - from my limited experience, Chocolatey seems to be a popular package manager. If your version of Windows supports WSL (Windows Subsystem for Linux), you could install into an active subsystem using a Linux package manager. It is also possible to install something similar on older Windows using Cygwin.
It is also possible to install language support separately per language. Providing in-depth installation instructions for each individual language is beyond the scope of this README, but here is some basic info:
Bash (bashfind)
- On Unix-style OSes (Linux, OSX),bash
should be installed by default, butbashfind
will also run inzsh
. On Windows, WSL or Cygwin is recommended.C (cfind)
- You will need a C/C++ compiler likegcc
orclang
. On Linux, use a package manager to install the recommended tools. For example, on Ubuntu, you would runsudo apt update
followed bysudo apt install build-essential
. On OSX, installing XCode will install all necessary compiler components. On Windows, Visual Studio is a recommended choice that will provide the necessary tools.Clojure (cljfind)
- The Getting Started page on clojure.org includes installation instructions.C++ (cppfind)
- The instructions forC / cfind
also apply here.C# (CsFind)
-CsFind
runs on .NET 9.0, download available hereDart (dartfind)
- You can download the Dart SDK from hereElixir (exfind)
- Instructions for installing elixir can be found hereF# (FsFind)
-FsFind
also runs on .NET 9.0, download available hereGo (gofind)
- Download the OS-appropriate installer from hereGroovy (groovyfind)
- The download page is hereHaskell (hsfind)
- Installation instructions hereJava (javafind)
- Many different Java installs are available from Oracle and third parties (e.g. Red Hat), but be sure to install a JDK for at least Java version 11, preferably 17+.JavaScript (jsfind)
-jsfind
runs on Node.js, download avaible from hereKotlin (ktfind)
- To install support for Kotlin, you will need to install IntelliJ IDEA or Android Studio IDE.Objective-C (objcfind)
- On OSX, an Xcode install will provide everything needed. I don't have experience with this language on other platforms, but I believe thatgcc
and maybeclang
have some support for compiling it.OCaml (mlfind)
- See [https://ocaml.org/(https://ocaml.org/)]Perl (plfind)
- On Unix-style OSes,perl
should be installed by default. On Windows, you can install Strawberry Perl or ActiveState Perl, or install it via WSL or Cygwin.PHP (phpfind)
-phpfind
requiresphp
verison 8.3+. On Unix-style OSes,php
should be installed by default, although it might be an older version. On Windows, you can download and buildphp
from source, but your best bet will probably be to install via WSL or Cygwin.PowerShell (ps1find)
- On most modern versions of Windows, PowerShell should be installed by default. It can also be installed on OSX and LinuxPython (pyfind)
-pyfind
requires Python 3.9 or higher. Find downloads for various OSes and versions hereRuby (rbfind)
-rbfind
requires Ruby 3.x. Find downloads for various OSes and versions hereRust (rsfind)
- See the Getting started page for installation instructionsScala (scalafind)
-scalafind
requires Scala 3.x. Downloads available from the Scala language home pageSwift (swiftfind)
- On OSX, an Xcode install will provide everything needed. There are also swift installations available for Linux and Windows, see this pageTypeScript (tsfind)
-tsfind
also runs on Node.js, download avaible from here
Several of the language versions require additional software to build/manage dependencies:
Clojure (cljfind)
- the leiningen tool is used for package management and buildingHaskell (hsfind)
- this version requires the stack utility (instead of justcabal
)Java (javafind)
- the gradle tool is used to buildjavafind
Kotlin (ktfind)
- the gradle tool is also used to buildktfind
PHP (phpfind)
- the composer utility is used for dependency managementScala (scalafind)
- the sbt tool is also used to buildscalafind
You will also need to set an environment variable called $XFIND_PATH
to the path that you cloned
xfind
to. For example, on my OSX machine is it set to this:
XFIND_PATH=$HOME/src/xfind
If undefined, $XFIND_PATH
defaults to $HOME/src/xfind
, so if you clone xfind
to that
location you will have reasonably good success in running various versions and tools
without setting $XFIND_PATH
, but setting it is strongly recommended nonetheless.
Finally, note that there are some useful utilities in the scripts folder. Most require
bash
, although some of those also have powershell versions you can use instead. There are
also several written in python
, most notably benchmark.py (see Comparison);
you will need python3
to run those.
There is a build script provided to build any/all language versions, and you will definitely need to use it at least initially, because all language versions, regardless of whether the language is compiled or interpreted, have some necessary build steps to put the version into a runnable state. For example, many of the language builds copy resource files from a shared location to the language version's local resource location.
The build script is under scripts and named build.sh. If you are on Windows, or if you just prefer PowerShell, you can also use build.ps1. To run the build for a specific language, run the script on the command line with the name of the language (or the language's extension that the language version name is derived from) as the argument. For example, you can build the TypeScript version using either of these commands:
$ ./scripts/build.sh typescript
# -or-
$ ./scripts/build.sh ts
You can specify more than one language version to build:
$ ./scripts/build.sh c cpp go hs objc rs swift
You can build all language versions together by passing all
:
$ ./scripts/build.sh all
If you try to build any language version without the necessary software installed, the build script will simply point out what is missing and move on.
For each language version successfully built, a softlink to the executable is created under
$XFIND_PATH/bin
(gofind
and hsfind
binaries are installed there directly), so after building
you can try running any version from there, either by changing to that directory or by adding it to
your path:
PATH=$PATH:$XFIND_PATH/bin
For compiled languages that differentiate between debug and release builds, you can
include --debug
and/or --release
to target those specific builds (they will be ignored for
languages that don't differentiate). If neither is specified, debug-only will be assumed. If both
are specified, both builds will run, but the softlink will be created for the release version.
Examples:
$ ./scripts/build.sh --debug swift
# -or-
$ ./scripts/build.sh --release swift
# -or-
$ ./scripts/build.sh --debug --release swift
Specific to Python, there is also a --venv
option that indicates that a virtual environment should
be used, either using an existing one if found or creating a new one, when building and running
pyfind
. I recommend using this, because it isolates dependencies to the given virtual environment.
When I want to build all language versions, say after a new feature has been added, I will usually
do a clean
, then a build
and finally a unittest
:
$ ./scripts/clean.sh all
# . . .
$ ./scripts/build.sh --debug --release --venv all
# . . .
$ ./scripts/unittest.sh all
This section concerns usage of the xfind
tool by running any individual language version.
For information on running comparatively, see the Comparison section.
Assuming you have $XFIND_PATH/bin
in your path or that you are in that directory, you
can run any version with the -h
to get the help/usage. Here's an example for the
python version:
$ pyfind -h
Usage:
pyfind [options] <path> [<path> ...]
Options:
--archivesonly Find only archive files
-d,--in-dirpattern Specify name pattern for directories to include in find
-D,--out-dirpattern Specify name pattern for directories to exclude from find
--debug Set output mode to debug
--excludehidden Exclude hidden files and directories*
-f,--in-filepattern Specify name pattern for files to include in find
-F,--out-filepattern Specify name pattern for files to exclude from find
--followsymlinks Follow symlinks to their linked files or directories
-h,--help Print this usage and exit
--in-archiveext Specify extension for archive files to include in find
--in-archivefilepattern Specify name pattern for archive files to include in find
--includehidden Include hidden files and directories
--maxdepth Find files at most maxdepth levels below startpath
--maxlastmod Find files with lastmod less than or equal to maxlastmod
--maxsize Find files with size <= maxsize
--mindepth Find files at least mindepth levels below startpath
--minlastmod Find files with lastmod greater than or equal to minlastmod
--minsize Find files with size >= minsize
--nofollowsymlinks Do not follow symlinks to their linked files or directories*
--noprintdirs Do not print matching directories after finding*
--noprintfiles Do not print matching files after finding
--out-archiveext Specify extension for archive files to exclude from find
--out-archivefilepattern Specify name pattern for archive files to exclude from find
--printdirs Print matching directories after finding
--printfiles Print matching files after finding*
-R,--norecursive Do not find recursively (no subdirectories)
-r,--recursive Find recursively through subdirectories*
--settings-file Path to JSON file with specified find settings
--sort-ascending Sort results in ascending order*
--sort-by Sort by: PATH, NAME, TYPE, SIZE, LASTMOD
--sort-caseinsensitive Sort results case-insensitive
--sort-casesensitive Sort results case-sensitive*
--sort-descending Sort results in descending order
-t,--in-filetype File type to find (text, binary)
-T,--out-filetype File type not to find (text, binary)
-v,--verbose Set output mode to verbose
-V,--version Print version and exit
-x,--in-ext Specify extension for files to include in find
-X,--out-ext Specify extension for files to exclude from find
-Z,--excludearchives Exclude archive files (bz2, gz, tar, zip)*
-z,--includearchives Include archive files (bz2, gz, tar, zip)
The asterisk after the help indicates the default value for the option.
Now try running it to find specific files under $XFIND_PATH
, using the following
criteria:
- Find files with
js
orts
extension - Skip directories that match
node_module
ordist
- Find files that have
find
in the name - Look for files under
$XFIND_PATH/javascript
and$XFIND_PATH/typescript
Here's what that looks like (using the rust
version):
$ cd $XFIND_PATH
$ rsfind -x js,ts -D node_module -D dist -f find ./javascript ./typescript
Matching files (22):
./javascript/jsfind/src/finder.js
./javascript/jsfind/src/finderror.js
./javascript/jsfind/src/findfile.js
./javascript/jsfind/src/findoption.js
./javascript/jsfind/src/findoptions.js
./javascript/jsfind/src/findsettings.js
./javascript/jsfind/src/jsfind.js
./javascript/jsfind/tests/finder.test.js
./javascript/jsfind/tests/findfile.test.js
./javascript/jsfind/tests/findoptions.test.js
./javascript/jsfind/tests/findsettings.test.js
./typescript/tsfind/src/finder.ts
./typescript/tsfind/src/finderror.ts
./typescript/tsfind/src/findfile.ts
./typescript/tsfind/src/findoption.ts
./typescript/tsfind/src/findoptions.ts
./typescript/tsfind/src/findsettings.ts
./typescript/tsfind/src/tsfind.ts
./typescript/tsfind/tests/finder.test.ts
./typescript/tsfind/tests/findfile.test.ts
./typescript/tsfind/tests/findoptions.test.ts
./typescript/tsfind/tests/findsettings.test.ts
Now change the command to skip files that have find
in the name
(and use the go
version this time):
$ gofind -x js,ts -D node_module -D dist -F find ./javascript ./typescript
Matching files (16):
javascript/jsfind/jest.config.js
javascript/jsfind/src/common.js
javascript/jsfind/src/config.js
javascript/jsfind/src/filetype.js
javascript/jsfind/src/filetypes.js
javascript/jsfind/src/fileutil.js
javascript/jsfind/tests/filetypes.test.js
javascript/jsfind/tests/fileutil.test.js
typescript/tsfind/jest.config.js
typescript/tsfind/src/common.ts
typescript/tsfind/src/config.ts
typescript/tsfind/src/filetype.ts
typescript/tsfind/src/filetypes.ts
typescript/tsfind/src/fileutil.ts
typescript/tsfind/tests/filetypes.test.ts
typescript/tsfind/tests/fileutil.test.ts
There are several scripts in the scripts directory to help with comparing the language versions in various ways, but the one that will likely be of primary interest is the python script benchmark.py, an unscientific tool for comparing performance and functionality (i.e. ensuring matching output of all versions).
By default, the benchmark.py script will run and compare all language versions, but this can be customized one of three ways:
- Pass a comma-separated list of languages/extensions to include, e.g.
-l c,cpp,go,hs,objc,rs,swift
- Pass a comma-separated list of languages/extensions to exclude, e.g.
-L clj,groovy,scala
- Modify the
lang_dict
dictionary in xfind.py
The benchmark.py script executes a series of "scenarios" for each configured language version, and outputs whether the results of all versions match with a table of ranked performance. At the end, the performance values from all scenarios are summed and averaged and a final summary table is presented.
The scenarios can be read from a json file, and the included scenarios.json will be the default if no file is specified.
Here's what the help looks like for benchmark.py:
$ python3 ./scripts/benchmark.py -h
usage: benchmark.py [-h] [-g [GROUP ...]] [-s [SCENARIO ...]] [-l LANGS] [-L NOLANGS] [-r RUNS] [-b] [-f SCENARIOS_FILE] [--debug]
Run xfind benchmark
options:
-h, --help show this help message and exit
-g [GROUP ...], --group [GROUP ...]
Name of scenario group to run
-s [SCENARIO ...], --scenario [SCENARIO ...]
Name of scenario to run
-l LANGS, --langs LANGS
Comma-separated list of languages to include in benchmark
-L NOLANGS, --nolangs NOLANGS
Comma-separated list of languages to exclude from benchmark
-r RUNS, --runs RUNS Number of runs for each scenario
-b, --exit-on-diff Exit on first output difference
-f SCENARIOS_FILE, --scenarios-file SCENARIOS_FILE
A scenarios json file
--debug Print debug output
Here's an example of the final output from a run that was executed on 2024-11-18:
$ python3 ./scripts/benchmark.py
. . .
Outputs of all versions in all scenarios match
Total results for 45 out of 45 scenarios with 450 out of 450 total runs
Date/time: 2024-11-18 16:46:08.988141
Git branch: "develop" (a11e9281af1bd490ef37f02d0b06aa1536720c95)
total avg rank
---------- ------- ---------- ------
gofind 2.41 0.00535556 1
cfind 4.38 0.00973333 2
cppfind 4.52 0.0100444 3
rsfind 15.26 0.0339111 4
dartfind 33.68 0.0748444 5
objcfind 39.53 0.0878444 6
hsfind 41.65 0.0925556 7
swiftfind 55.06 0.122356 8
phpfind 132.17 0.293711 9
jsfind 207.09 0.4602 10
tsfind 211.11 0.469133 11
pyfind 241.21 0.536022 12
csfind 257.62 0.572489 13
fsfind 328.93 0.730956 14
javafind 411.36 0.914133 15
plfind 445.78 0.990622 16
exfind 518.84 1.15298 17
ktfind 592.61 1.31691 18
scalafind 850.37 1.88971 19
rbfind 1083.55 2.40789 20
ps1find 1583.9 3.51978 21
bashfind 1934.13 4.29807 22
groovyfind 2972.74 6.60609 23
cljfind 3340.5 7.42333 24
Notice the line above the table that says "Output of all versions in all scenarios match". It is important to see this and similar messages on all scenario runs, otherwise one of the language versions isn't working properly and the results will be invalid. An obvious example of this would be attempting to run language versions that aren't built.
In this section I will write about the experience of developing these projects, writing the different language versions, and what personal conclusions I drew from it. For now I will just outline the approach I will use.
Here's a list of criteria to evaluate each language by:
- documentation / resources
- learning curve
- readability
- core library
- building/running
- managing dependencies
- speed of development
- efficiency/performance
- platform agnosticity
The conclusions from these are helpful in determining which languages are most and least suited for given requirements:
- one-off utilities / scripting
- high-performance
- cross-platform
- rich core and/or third-party dependencies
- specific platform (e.g. iOS or Android)
- specific framework (e.g. JVM or CLR)
I will give summaries of the experience of developing each of the language versions, and then rank them by criteria and requirements.
- Add mime type support - detection, filtering, wildcards. This is nearly complete.
- Determine how archive file support should work, two options:
- Provide option to find files inside archives - in this case should change
archivesonly
andincludearchives
options toinarchivesonly
andfindinarchives
, respectively - Find archives the same as other files (without option to look inside them) - in this case should consider removing
archivesonly
andincludearchives
options
- Provide option to find files inside archives - in this case should change
- Add documentation about the what/why/how of
xfind
- Add
stats
option to get a json object with various stats, such as unique extensions / extension counts, etc. - Resolve OCaml issues
- Research Docker best practices to determine if there are ways to reduce the image size
- Add other language versions (in alphabetical order and subject to change)
- Common Lisp - I want to see how it compares to Clojure and learn more about macros
- Julia - Julia is described as a high-performance scripting language, so I'm interested to see how it compares to existing implementations
- Lua - another language that I would like to compare with existing implementations
- Racket - this might be an alternate choice to Common Lisp, or another comparison point
This project is licensed under the MIT license. See the LICENSE file for more info.