Platform | Build Status |
---|---|
Ubuntu latest | |
macOS 11 | |
macOS 12 | |
Windows latest |
goselect provides SQL like 'select' interface for files. This means one can execute a select query like:
select name, path, size from . where or(like(name, result.*), eq(isdir, true)) order by 3 desc
to get the file name, file path and size of all the files that are either directories or their names begin with result. The result will be ordered by size in descending order.
The project goselect was created to understand the following:
🔹 Parsing. The parsing pipeline typically involves a lexer, a parser and an AST.
🔹 Recursive descent parser
🔹 Representation of functions in the code. Functions like lower
, upper
and trim
take a single parameter, functions like now
and currentDate
take zero parameters whereas functions like concat
take a variable number of parameters.
🔹 Execution of scalar functions like lower
, upper
and substr
. These functions are stateless and run for each row. They may take a parameter and return a value for each row. These functions can also involve nesting. For example, lower(substr(ext, 1))
.
🔹 Execution of aggregation functions like average
, min
, max
and countDistinct
. These functions run over a collection of rows and return a final value. These functions can also involve nesting. For example, average(count())
.
🔹 Execution of nested scalar and aggregation functions like countDistinct(lower(name))
. Here, the function lower(name)
runs for each row whereas countDistinct
runs over a collection of rows.
- Overview
- Example queries
- Feature overview
- Differences between SQL select and goselect
- Supported platforms
- Installation
- Changelog
- Commands
- Queries in detail
- FAQs
- All the supported features
- Unsupported ideas
- Screenshots
- Planned changes
- select * from .
- select * from . limit 10
- select name, size from . where gt(size, 1024)
- select name, size from . where gt(size, 1024) order by 2 desc
- select name, size from . where gt(size, 1024) order by 2
- select name, size from . where gt(size, 1024) order by 2 limit 10
- select name, size from . where or(like(name, result.*), eq(isdir, true)) order by 2 limit 10
- select name, size from ~/Documents where or(like(name, result.*), eq(isdir, true)) order by 2 limit 10
goselect provides various features including:
- Support for attribute aliases. For example, filename is same as fname
- Support for function aliases. For example, lower is same as low
- Support for various string scalar functions like
lower
,upper
,concat
,substr
etc - Support for various comparison scalar functions
eq
,le
,lt
,ge
,gt
etc - Support for various date based scalar functions
now
,extract
,parsedatetime
,daysdifference
etc - Support for various composite scalar functions
or
,and
,not
etc - Support for various mime type scalar functions
isText
,isPdf
,isImage
etc - Support for various aggregate functions like
count
,countdistinct
,average
etc - Support for exporting the results in table, json and html format
- Support for performing select in nested directories
- Support for skipping directories like
.git
&.github
- Support for executing queries with aliases. For example,
goselect ex -q='select name from .' --createAlias=ls -n=false
will save the query along with its alias in a text file in the current directory. In order to execute the query using an alias, rungoselect ex --useAlias=ls -n=false
- Support for predefined query aliases
Features that are different from SQL:
- goselect does not support 'group by'. All the aggregating functions return results that repeat for each row
- goselect does not support expressions like
1+2
or1*2
. Instead, goselect gives functions like 'add' and 'mul' etc to write such expressions - goselect does not support expressions like
name=sample.log
in 'where' clause. Instead, various functions are given to represent such expressions. These functions include:eq
,ne
,lt
,ge
etc - goselect has a weak grammar. For example, a query like:
select 1+2, name from /home/projects
will ignore 1+2
and return file names
- goselect's 'order by' clause supports only attribute positions. For example, a query like:
select name, size from /home/projects order by 1
will order the results by the first attribute name
- goselect has been tested on macOS Big Sur 11.4, macOS Monterey 12.6, Ubuntu 20.0.3 and Windows 10
-
Download the current release
wget -o - https://github.com/SarthakMakhija/goselect/releases/download/v0.0.7/goselect_0.0.7_Darwin_all.tar.gz
-
Unzip the release in a directory
mkdir goselect && tar xvf goselect_0.0.7_Darwin_all.tar.gz -C goselect
-
Download the current release
wget -o - https://github.com/SarthakMakhija/goselect/releases/download/v0.0.7/goselect_0.0.7_Linux_x86_64.tar.gz
-
Unzip the release in a directory
mkdir goselect && tar xvf goselect_0.0.7_Linux_x86_64.tar.gz -C goselect
-
Create a directory
mkdir goselect
-
Switch to the newly created directory
cd goselect
-
Download the current release
curl https://github.com/SarthakMakhija/goselect/releases/download/v0.0.7/goselect_0.0.7_windows_x86_64.tar.gz -o goselect.tar.gz
-
Unzip the release in the current directory
tar -xvzf .\goselect.tar.gz
- Support for Windows operating system
- Introduction of saving query with aliases. For example,
goselect ex -q='select name from .' --createAlias=ls -n=false
will save the query along with its alias in a text file in the current directory. In order to execute the query using an alias, rungoselect ex --useAlias=ls -n=false
- Introduction of listQueryAliases command. For example,
goselect listQueryAliases
will return a list of all the saved query aliases - Support for predefined query aliases. Run
goselect listQueryAliases
to get a list of all the query aliases
- Support for single and double quotes. For example,
goselect ex -q='select name, extension from . where eq(name, "hello world.txt")'
- Support for groups in regular expressions. For example,
goselect ex -q='select name, extension from . where like(name, "^([0-9]{4})-[0-9]{2}-[0-9]{2}-.*\.txt$")'
- Addition of a new function
isArchive
- Addition of a new function
parseSize
makes the comparison based on file size easier. Use,select * from . where gt(size, parseSize(15 Mb))
- Addition of 't' as a shorthand for the flag 'term' in describe command
- Addition of new flags 'minWidth' and 'maxWidth' for controlling the width of the attributes while exporting as table. Usage:
goselect ex -q='select name, ext, size, fmtsize(size), isdir from .' --minWidth=10 --maxWidth=50
, will put a minimum of 10 characters and maximum of 50 characters in each row. - Table export does not capitalize the headers
- Removal of 'sorted' flag from
listAttributes
,listFunctions
,listWhereClauseFunctions
andlistTimeFormats
commands
- Inference of input values as numeric, floating point, string and boolean
- Addition of functions: isImage, isText, isPdf, isAudio, isVideo to where clause
- Improvement in the performance of the function like
- Introduction of new functions: startsWith, endsWith and ifBlank
- Introduction of version command
- Adoption of consistent file size units. File size is now reported in IEC units: (B, KiB, MiB, GiB, TiB, PiB, EiB)
- No extension is reported for hidden files
*
projection now returnsname
,extension
,size
andabsolute path
fmtsize
is provided as a function that can be used to format the file sizehsize
(or human-readable size) is dropped from attribute list in favor of the functionfmtsize
. Usefmtsize(size)
to get the formatted size
- Support for getting the mime type from a file
- Support for various functions like
isPdf
,isVideo
,isText
,isAudio
andisImage
that work on the mime type
Example Queries for mime type
1. select name, size, mime from ~/Documents/ where eq(isImage(mime), true) order by 2 desc limit 10
2. select name, size, mime from ~/Documents/ where eq(isPdf(mime), true) order by 2 desc limit 10
Command | Description | Available since | Example usage |
---|---|---|---|
listAttributes | List all the attributes supported by goselect | v0.0.1 | goselect listAttributes |
listFunctions | List all the functions supported by goselect | v0.0.1 | goselect listFunctions |
listWhereClauseFunctions | List all the functions supported by goselect in 'where' clause | v0.0.1 | goselect listWhereClauseFunctions |
listTimeFormats | List the date/time formats supported by goselect | v0.0.1 | goselect listTimeFormats |
describe | Describe an attribute or a function | v0.0.1 | goselect describe --term=lower |
execute | Execute a select query | v0.0.1 | goselect execute -q='select * from .' |
version | Return the current version of goselect | v0.0.4 | goselect version |
listQueryAliases | List all the saved query aliases | v0.0.7 | goselect listQueryAliases |
- Use * as projection
goselect ex -q='select * from . '
- Select name, size, extension, absolute path using aliases
goselect ex -q='select name, size, ext, abspath from . '
- Select just the file name without its extension
goselect ex -q='select basename, extension from . '
- Determine if a file is a directory or not
goselect ex -q='select name, isdir, isfile from .'
- Select created time, access time and modified time of all the files
goselect ex -q='select name, ctime, atime, mtime from .'
- Select the mime type of all the files
goselect ex -q='select name, mime from .'
- Select file name in upper case and format the file size
goselect ex -q='select upper(name), size, formatsize(size) from .'
- Select the length of all the file names
goselect ex -q='select name, len(name) from .'
- Select the length of all the file extensions without including . in the length
goselect ex -q='select name, ext, len(substr(ext, 1)) from .'
- Select name and extension of all the files, but return NA if there is no extension
goselect ex -q='select name, ifblank(ext, NA) from .'
- Select trimmed file names
goselect ex -q='select trim(name), ltrim(name), rtrim(name) from .'
- Concat the basename, extension and the formatted file size
goselect ex -q='select concat(basename, ext, fmtsize(size)) from .'
- Concat the basename, extension and the formatted file size with a separator @
goselect ex -q='select concatWs(basename, ext, fmtsize(size), @) from .'
- Determine if a file name contains the string project
goselect ex -q='select name, contains(name, project) from .'
- Replace all the dots (.) in file names with a hash (#)
goselect ex -q='select name, replaceall(name, ., #) from .'
- Determine if a file name starts with a string go
goselect ex -q='select name, startswith(name, go) from .'
- Determine if a file name ends with a string go
goselect ex -q='select name, endswith(name, go) from .'
- Select the date from the access time of the files
goselect ex -q='select name, atime, extract(atime, date) from .'
- Select current month, current year, current date, day of the week and current day
goselect ex -q='select cmonth(), cyear(), cdate(), dayofweek(), cday() from . limit 1'
- Select file name and determine if the user can read, write and execute a file
goselect ex -q='select name, uread, uwrite, uexecute from .'
- Order the results by size in descending order
goselect ex -q='select name, size, ext, abspath from . order by 2 desc'
- Order the results by size in descending order and limit the total number of results to 5
goselect ex -q='select name, size, ext, abspath from . order by 2 desc limit 5'
- Count all the entries in the current directory
goselect ex -q='select count() from .'
- Count distinct file extensions
goselect ex -q='select countDistinct(ext) from .'
- Select the sum of size of all the files and also return the average file size
goselect ex -q='select sum(size), avg(size) from .'
- Select the sum of size of all the files and also return the average file size, but format the sizes
goselect ex -q='select fmtsize(sum(size)), fmtsize(avg(size)) from .'
- Select the minimum and the maximum file size
goselect ex -q='select min(size), max(size) from .'
- Select the minimum and the maximum length of all the file names
goselect ex -q='select min(len(name)), max(len(name)) from .'
- Select file name and extension of all the files containing the string go in their name
goselect ex -q='select name, extension from . where contains(name, go)'
- Select file name and extension of all the files if their names begin with READ, followed by numbers, followed by any number of characters
goselect ex -q='select name, extension from . where like(name, ^READ[0-9].*)'
- Select file name and extension of all the files if their names start with READ
goselect ex -q='select name, extension from . where startswith(name, READ)'
- Select file name and extension of all the files if their names end with go
goselect ex -q='select name, extension from . where endswith(name, go)'
- Select file name and extension of all the text files
goselect ex -q='select name, extension from . where isText(mime)'
- Select file name and extension of all the image files
goselect ex -q='select name, extension from . where isImage(mime)'
- Select file name and extension of all the audio files
goselect ex -q='select name, extension from . where isAudio(mime)'
- Select file name and extension of all the video files
goselect ex -q='select name, extension from . where isVideo(mime)'
- Select file name and extension of all the pdf files
goselect ex -q='select name, extension from . where isPdf(mime)'
- Select file name and extension of all the archive files (Version 0.0.6)
goselect ex -q='select name, extension from . where isArchive(mime)'
- Select file name, extension and size of all the files if their size is greater than 1 Mib
goselect ex -q='select name, extension, size, fmtsize(size) from . where gt(size, mul(1, 1024, 1024))'
1 Mib = 1 * 1024 * 1024 bytes
Or
goselect ex -q='select name, extension, size, fmtsize(size) from . where gt(size, 1048576)'
Or (Version 0.0.5)
goselect ex -q='select name, extension, size, fmtsize(size) from . where gt(size, parseSize(1 Mib))'
- Select file name, extension and size of all the files if their size is greater than or equal to 1 Mib
goselect ex -q='select name, extension, size, fmtsize(size) from . where gte(size, mul(1, 1024, 1024))'
Or
goselect ex -q='select name, extension, size, fmtsize(size) from . where gte(size, 1048576)'
Or (Version 0.0.5)
goselect ex -q='select name, extension, size, fmtsize(size) from . where gte(size, parseSize(1 Mib))'
- Select file name, extension and size of all the files if their size is less than 1 Mib
goselect ex -q='select name, extension, size, fmtsize(size) from . where lt(size, mul(1, 1024, 1024))'
Or
goselect ex -q='select name, extension, size, fmtsize(size) from . where lt(size, 1048576)'
Or (Version 0.0.5)
goselect ex -q='select name, extension, size, fmtsize(size) from . where lt(size, parseSize(1 Mib))'
- Select file name, extension and size of all the files if their size is less than or equal to 1 Mib
goselect ex -q='select name, extension, size, fmtsize(size) from . where lte(size, mul(1, 1024, 1024))'
Or
goselect ex -q='select name, extension, size, fmtsize(size) from . where lte(size, 1048576)'
Or (Version 0.0.5)
goselect ex -q='select name, extension, size, fmtsize(size) from . where lte(size, parseSize(1 Mib))'
- Select file name, extension and size of all the files if their size equal to 600 bytes
goselect ex -q='select name, extension, size, fmtsize(size) from . where eq(size, 600)'
Or (Version 0.0.5)
goselect ex -q='select name, extension, size, fmtsize(size) from . where eq(size, parseSize(600 B))'
- Select file name, extension and size of all the files if their extension is equal to .go
goselect ex -q='select name, extension, size, fmtsize(size) from . where eq(ext, .go)'
- Select file name, extension and size of all the files if their extension is not equal to .go
goselect ex -q='select name, extension, size, fmtsize(size) from . where ne(ext, .go)'
- Select file name and extension of all the text files if their names start with lowercase test
goselect ex -q='select name, extension from . where and(isText(mime), startsWith(lower(name), test))'
- Select file name of all the hidden files and directories
goselect ex -q='select name from . where or(eq(isDir, true), eq(isHidden, true))'
Or
goselect ex -q='select name from . where or(isDir, isHidden)'
- Select count of all the hidden files
goselect ex -q='select count(), name from . where eq(isHidden, true)'
- Select count of all the pdf files
goselect ex -q='select count() from . where eq(isPdf(mime), true)'
- Select sum of size of all the hidden files
goselect ex -q='select fmtsize(sum(size)), size from . where eq(isHidden, true)'
- Select all the files that are not hidden
goselect ex -q='select name, extension from . where not(isHidden, true)'
Or
goselect ex -q='select name, extension from . where eq(isHidden, false)'
- Select basename and extension of all the files with .log as the extension and their basename should have a date format: Year-Month-Date (for example, 2022-09-28)
goselect ex -q='select basename, ext from . where and(eq(ext, .log), like(basename, [0-9]{4}-[0-9]{2}-[0-9]{2}))'
- Select file name and extension of all the files that were accessed 2 or more days earlier
goselect ex -q='select name, ext, atime from . where gte(daysdiff(atime), 2)'
Default value of second parameter in daysdiff is now()
- Select file name and extension of all the files that were modified on or after 2022-09-22
goselect ex -q='select name, ext, mtime from . where gt(mtime, parseDateTime(2022-09-22, dt))'
Here, parseDateTime is given a date without timezone, that means while comparing mtime and the input value, timezone may play role.
'2022-09-22' will have UTC as the timezone that might not be same the timezone of mtime.
- Select size and the formatted size of the file that has hello world.txt as the name
goselect ex -q='select size, formatSize(size) from . where eq(name, "hello world.txt")'
- Select size and the formatted size of the file that has hello world.txt as the name (query begins with double quotes)
goselect ex -q="select size, formatSize(size) from . where eq(name, 'hello world.txt')"
- Select size and the formatted size of the file that has "hello world.txt" as the name (file name contains double quotes)
goselect ex -q='select size, formatSize(size) from . where eq(name, \"hello world.txt\")'
- Select size and the formatted size of the file that has "hello world.txt" as the name (query begins with double quotes and the file name contains double quotes)
goselect ex -q="select size, formatSize(size) from . where eq(name, \\\"hello world.txt\\\")"
- How do I get a list of all the supported attributes?
Use goselect listAttributes
or goselect attrs
to get all the supported attributes.
Usage
goselect listAttributes
goselect attrs
- How do I get a list of all the supported functions?
Use goselect listFunctions
or goselect fns
to get all the supported functions.
Usage
goselect listFunctions
goselect fns
- How do I get the description or the purpose of a function or an attribute?
Let's say that we want to get the description of fname
. In order to do that, use goselect describe --term=fname
. term
flag
can be given an attribute name or a function name.
Usage
goselect describe --term=fname
goselect desc --term=path
goselect desc --term=lower
- Are there any functions to parse an input in date/time type?
Yes, goselect supports a function parsedatetime
to parse an input string in time
type. It takes 2 parameters, the first parameter is a string to be parsed and the second is the format identifier.
- How do I get the supported date/time formats and their identifiers?
Use goselect listTimeFormats
or goselect fmts
to get all the supported date/time formats along with their identifiers.
Usage
goselect listTimeFormats
goselect fmts
- Are the select queries case-sensitive?
No, the select queries are case-insensitive.
- Are all the functions supported in where clause?
No, goselect supports a set of functions that can be used in where
clause. Use goselect listWhereClauseFunctions
to get a list of all the functions that can be used in where
clause.
Usage
goselect listWhereClauseFunctions
goselect wherefns
- Can I get some example queries that involve or/and functions?
Usage
1. goselect execute -q='select * from . where or(like(name, result.*), eq(ishidden, true))'
2. goselect execute -q='select * from . where and(like(name, result.*), eq(isdir, false))'
- The first query returns all the files where name is like
result.*
and the file is a hidden file. - The second query returns all the files where name is like
result.*
and the file is not a directory.
- Can I get some example queries that involve date/time functions?
Usage
1. goselect execute -q='select atime, extract(atime, date) from .'
2. goselect execute -q='select atime, extract(atime, date) from . where gt(daysdifference(atime), 2)'
- The first query extracts the
date
from theaccess time
of a file - The second query returns all the files where difference (in days) between today and the access time of a file is greater than 2
- Can I get some example queries that aggregate functions?
Usage
1. goselect execute -q='select count(), min(len(name)) from .'
2. goselect execute -q='select countdistinct(ext) from .'
- The first query returns the count of all the entries in the source directory along with the minimum file length
- The second query returns all the distinct file extensions in the source directory
- What is a query alias (Version 0.0.7)?
Query alias allows saving and executing your favorite queries with an alternate name. For example, select name, extension, size from . where and(isPdf(mime), gt(size, parseSize(1 Mib)))
query returns all the pdfs with size greater than 1Mib. Using query alias, you can save the query with an alternate name and execute it later using the saved name.
Usage
goselect ex -q='select name, extension, size from . where and(isPdf(mime), gt(size, parseSize(1 Mib)))' --createAlias=allPdfsGt1Mib
goselect ex --useAlias=allPdfsGt1Mib
- Are the execution options like -n=false also saved with query alias?
No, query alias only saves the query and its alias
- Where is the query alias saved?
Query alias is saved in a human-readable file in the current directory.
- How do I get a list of all the saved query aliases?
Use listQueryAliases
command with goselect
Usage
goselect listQueryAliases
- Are there any predefined query aliases?
Yes, goselect provides a set of predefined query aliases. Some examples include:
Alias | Query |
---|---|
lsCurrent | select * from . |
lsCurrentOrderedBySizeDesc | select name, ext, size, abspath from . order by 3 desc |
lsCurrentFormattedSize | select name, ext, size, fmtSize(size), abspath from . order by 3 desc |
fileWithMaxSizeInCurrent | select name, size, fmtSize(size) from . order by 2 desc limit 1 |
- Support for
where
clause- [X] select * from /home/apps where eq(add(2,3), 5) - [X] select * from /home/apps where eq(lower(ext), .log) - [X] select * from /home/apps where ne(lower(ext), .log) - [X] `where` clause supports functions for comparison like `eq`, `le`, `lt`, `ge`, `gt`, `ne`, `contains`, `or`, `and`, `not` etc
- Support for projections
- projections with attribute name:
name
,size
- projections with alias in attribute name:
fName
instead ofname
- projections with scalar functions:
contains
,lower
- projections with alias in scalar functions:
low
instead oflower
- projections with aggregate functions:
min
,max
- projections with equivalent of expressions like add(1, 2)
- projections with attribute name:
- Support for
order by
clause- order by with positions:
order by 1
- order by in descending order:
order by 1 desc
- order by in optional ascending order:
order by 1 asc
- order by with positions:
- Support for
limit
clause- limit clause with a value:
limit 10
- limit clause with a value:
- Support for various functions
- Support for formatting the results
- Json formatter
- Html formatter
- Table formatter
- Support for saving and executing queries using query alias
- Support for exporting the formatted result
- Console
- File
goselect does not support the following:
- Use of parentheses inside a function call. One can not use parentheses inside a function call. This means the following queries are treated invalid:
select * from . where eq(add(2, 3), (4))
select * from . where lt(size, (add(2, 3)))
If a file name contains parentheses, then one could use single or double quotes in the query to match such a file name.
select * from . where eq(name, "hello (1).txt")
- Use of expressions in order by. goselect supports attribute positions in order by. This means the following queries are treated invalid:
select name, extension from . order by name
select name, extension from . order by lower(name)
- Support for checking if a (text) file contains a specific term
- Caching the expression results. This is useful for cases like
select lower(name) from . where eq(lower(name), sample)
. In this example,lower(name)
need not be evaluated twice for a row - Support for concurrent execution and streaming the results as soon as available. Will not work for
order by
andaggregate
functions. It is applicable for queries that involve scalar functions without order by. It might make sense to use this feature where the number of files is too many, say more than 0.1 million - Support installation using
brew
,apt
,yum
- Support for
between
scalar function