Set operations in the UNIX shell!
Resting on the shoulders of giants like grep
, cat
, sort
, uniq
, comm
, diff
, cut
, awk
, and more.
Using npm
:
$ npm install -g setop
..or, copy the script to a file called setop
, give it the proper permissions, and move it to somewhere in your PATH, like so:
$ chmod u+x setop
$ mv setop /usr/local/bin
- Membership Test
- Equality Test
- Cardinality
- Subset Test
- Union
- Intersection
- Complement
- Symmetric Difference
- Cartesian Product
- Disjoint Sets Test
- Empty Set Test
- Minimum Element
- Maximum Element
$ setop is-member <kwd> <set1> <set2> ... <setn>
Tests whether the line
kwd
is present in the filesset1
,set2
, ...,setn
.
For example:
$ setop is-member abc set-1.txt
1
$ setop is-member mno set-1.txt set-2.txt set-3.txt
0
$ setop is-member xyz set-*.txt
1
set-1.txt
abc
def
abc
ghi
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop equals <set1> <set2>
Tests whether the unique lines in file
set1
are the same as the unique lines inset2
.
For example:
$ setop equals set-1.txt set-2.txt
1
$ setop equals set-1.txt set-3.txt
0
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop count <set1> <set2> ... <setn>
Counts the number of unique lines in files
set1
,set2
, ...,setn
combined.
For example:
$ setop count set-1.txt
3
$ setop count set-1.txt set-2.txt set-3.txt
4
$ setop count set-*.txt
4
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop is-subset <base> <set1> <set2> ... <setn>
Tests whether all lines in file
base
are present in filesset1
,set2
, ...,setn
combined
For example:
$ setop is-subset set-3.txt set-1.txt
0
$ setop is-subset set-3.txt set-2.txt set-3.txt
1
$ setop is-subset set-3.txt set-*.txt
1
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop union <set1> <set2> ... <setn>
Displays all unique lines that are present in files
set1
,set2
, ...,setn
combined
For example:
$ setop union set-1.txt set-2.txt
abc
def
ghi
$ setop union set-1.txt set-2.txt set-3.txt
abc
def
ghi
xyz
$ setop union set-*.txt
abc
def
ghi
xyz
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop inter <set1> <set2>
Displays all unique lines that are common to files
set1
andset2
For example:
$ setop inter set-1.txt set-2.txt
abc
def
ghi
$ setop inter set-1.txt set-3.txt
abc
def
ghi
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop minus <set1> <set2>
Displays all unique lines in file
set1
that are not present in fileset2
For example:
$ setop minus set-1.txt set-2.txt
$ setop minus set-3.txt set-2.txt
xyz
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop sym-diff <set1> <set2> ... <setn>
Displays all unique lines that are present in either files
set1
orset2
... orsetn
, but not in all of them
For example:
$ setop sym-diff set-1.txt set-2.txt
$ setop sym-diff set-1.txt set-2.txt set-3.txt
xyz
$ setop sym-diff set-*.txt
xyz
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop product <set1> <set2>
Displays the cartesian product of the unique lines from files
set1
andset2
For example:
$ setop product set-1.txt set-2.txt
abc abc
ghi abc
def abc
abc def
ghi def
def def
abc ghi
ghi ghi
def ghi
$ setop product set-2.txt set-3.txt
abc abc
ghi abc
def abc
abc def
ghi def
def def
abc ghi
ghi ghi
def ghi
abc xyz
ghi xyz
def xyz
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
$ setop is-disjoint <set1> <set2>
Tests whether there are any lines common to both files
set1
andset2
For example:
$ setop is-disjoint set-1.txt set-2.txt
0
$ setop is-disjoint set-2.txt set-3.txt
0
$ setop is-disjoint set-3.txt set-4.txt
1
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
set-3.txt
abc
xyz
def
ghi
set-4.txt
mno
pqr
stu
$ setop is-empty <set1> <set2> ... <setn>
Tests whether there are any lines in files
set1
,set2
, ...,setn
combined
For example:
$ setop is-empty set-1.txt
0
$ setop is-empty <(setop sym-diff set-1.txt set-2.txt)
1
set-1.txt
abc
def
ghi
def
set-2.txt
def
ghi
ghi
abc
$ setop min <set1> <set2> ... <setn>
Displays the lexicographic minimum from among the lines in files
set1
,set2
, ...,setn
combined
For example:
$ setop min set-4.txt
mno
$ setop min set-1.txt set-4.txt
abc
set-1.txt
abc
def
ghi
def
set-4.txt
mno
pqr
stu
$ setop max <set1> <set2> ... <setn>
Displays the lexicographic maximum from among the lines in files
set1
,set2
, ...,setn
combined
For example:
$ setop min set-1.txt
ghi
$ setop min set-1.txt set-3.txt
xyz
set-1.txt
abc
def
ghi
def
set-3.txt
abc
xyz
def
ghi
Many of the bash one-liners that are part of this project were found at a post on Peter Krumin's blog. I've been using them for years and I finally decided to put it all together in one script, with easy-to-remember command names.
Please open an issue for support.