- What are shells?
- Which one(s) to use?
- What is scripting and what are its applications?
- How to write scripts.
- How to use some useful tools in scripts.
The program that is started after logging on a Linux machine and that allows users to type commands (aka the login shell). For historical reasons there are several shells:
- The Bourne shell (
sh
) or the Bourne again shell (bash
) - The C shell (
csh
) or the tC shell (tcsh
) - The Korn shell (
ksh
) - The Z shell (
zsh
)
You can check the Introduction to the Command Line for
Genomics
Carpentries lesson for an intro to bash
.
Everything about each shell is explained in the "man pages", there are books written about scripting and there is a ton of info on the web too.
On Linux machines sh
and bash
are the same program, csh
and tcsh
are
also the same program.
-
bash
is the shell most used by biologists (and used to install packages for historical reasons) -
csh
is the shell most used by astronomers.The
csh
was written aftersh
to use a syntax more like the C programming language (hence the name), while thetcsh
is an improvement to the C shell for terminal use (hence thet
). -
a few people and systems use the Korn shell.
You can write scripts in any shell syntax you care to, independently of what your login shell is.
-
Programming languages are sets of instructions compiled to produce executables with machine level instructions (a more "complicated" process).
-
By contrast, scripting refers to sets of instructions that are parsed and executed by a program, hence
- there is no need to compile the script (convenient)
- but, in principle, they run more slowly than executables.
-
Scripts can combine existing modules or components, while programming languages are used to build more sophisticated/complicated applications from scratch.
-
Some scripting languages are parsed to check for syntax errors before executing them, but not shell scripts.
-
Scripts are used to help run applications, on Hydra a job file is a (simple) script.
-
You can write complex scripts, and scripts can invoke (i.e., start) other scripts.
-
A script is a text file that holds a list of commands, and thus can be written with any type of editor (
nano
,vi
,emacs
). -
The commands in the script must follow the syntax of the shell used to parse the script.
-
A script can take arguments (options) and thus be more general, it allows you to define variables, use expressions or execute commands
- A variable is a mechanism to hold a value and refer to it by its name, or a way to modify how some commands behave - variables can also be one dimensional arrays (lists)
- An expression allows a user to perform simple arithmetic or use commands to
create values held by variables (for example: set the variable
num
to hold the number of lines in a file).
-
Scripting syntax allows for "flow control" namely it allows for
- tests and logical operators -
if
statements - loops -
for
statements - more flow control:
case
,while
,until
andselect
bash
also allows users to define functions (not covered)- the precise syntax is shell specific, i.e.
[ba]sh
syntax is different from[t]csh
syntax.
- tests and logical operators -
-
Scripts allow for I/O redirection
- input: aka
stdin
- output: aka
stdout
- error: aka
stderr
- pipes: redirecting output of one command to be the input to another command
- input: aka
Sophisticated shell scripting is akin to programming, we can't & won't teach programming today.
We will focus on basic skills and make time for hands-on learning.
There is more than one way to bake a cake, which makes things confusing at first.
-
A variable is a character string to which a value is assigned.
-
The assigned value can be a number, some text, a filename, a device, or any other type of data.
-
The value of a variable is obtained by using
$
followed by the variable name.
filename=/here/goes/nothing.txt
string='hello class'
let num=33
echo $filename $string $num
set filename = /here/goes/nothing.txt
set string = 'hello class'
@ num = 33
echo $filename $string $num
/here/goes/nothing.txt hello class 33
blah='tell me more'
name0=hello
name1='hello $blah'
name2="hello $blah"
echo name1=$name1
echo name2=$name2
blah='tell me more'
name1='hello $blah'
name2="hello ${blah}"
echo name1=${name1}
echo name2=${name2}
name1=hello $blah
name2=hello tell me more
arr=(1 2 3)
echo ${arr[@]}
1 2 3
echo ${#arr[@]}
3
echo ${arr[0]}
1
echo ${arr[1]}
2
set arr = (1 2 3)
echo $arr
1 2 3
echo $#arr
3
echo ${arr[1]}
1
echo ${arr[2]}
2
Note 1-indexed vs 0-indexed
- makes the exported variables seen by child processes, like programs or scripts
- example
cat hello.sh
echo hello=$hello
- result:
hello='yo!'
echo hello=$hello
sh hello.sh
hello=
export hello
sh hello.sh
hello=yo!
setenv hello 'yo!'
Note no '=' sign
path=to/some/where/some.txt
base=$(basename $path)
file=$(dirname $path)
echo base=$base file=$file
base=some.txt file=to/some/where
set path = to/some/where/some.txt
set base = $path:h
set file = $path:t
echo base=$base file=$file root=$file:r ext=$file:e
base=to/some/where file=some.txt root=some ext=txt
Let's pause here for 5-10 minutes
let x=1
let y=$x+3
echo x=$x y=$y
@ x = 1
@ y = $x + 3
echo x=$x y=$y
x=1 y=4
- redirect a command to read from a file:
command < file
- redirect output and error to a file
>
,>>
,&>
,&>>
, and2>&1
- redirect output of one command as input to the next
command1 | command2
- read input from script or terminal
command <<EOF
some text
ok to use $variables
EOF
- duplicate output to show on the output and to be written in a file; the
tee
command:
command | tee file
- You can set a variable to be the result of a command
let x=1
let y=$x+3
Y=$(echo $y | awk '{printf "%3.3d", $1}')
echo Y=$Y
- output:
Y=004
@ x = 1
@ y = $x + 3
set Y = `echo $y | awk '{printf "%3.3d", $1}'`
echo Y=$Y
let x=1
let y=$x+3
Y=`echo $y | awk '{printf "%3.3d", $1}'`
echo Y=$Y
-
Arguments are a way to supply parameters to a script.
-
Arguments are useful when a script has to perform differently depending on the values of some input parameters.
- A trivial
bash
example,echo.sh
$ cat echo.sh
echo this is a demo of args
echo the script name is "'$0'"
echo "you have passed $# argument(s)"
echo the first argument is "'$1'"
echo the second argument is "'$2'"
echo etc...
$ sh echo.sh
this is a demo of args
the script name is 'echo.sh'
you have passed 0 argument(s)
the first argument is ''
the second argument is ''
etc...
$ sh echo.sh help me now
this is a demo of args
the script name is 'echo.sh'
you have passed 3 argument(s)
the first argument is 'help'
the second argument is 'me'
etc...
-
Logical operators can be used to test conditions and create complex expressions by combining conditions.
-
These operators allow you to evaluate if a condition or multiple conditions is/are true, and provide a way to control the flow of execution of scripts.
let num=$1
if [ $num -gt 5 ]
then
echo this is big
else
echo this is small
fi
Note the blank spaces around the
[
, while none after a=
when setting a variable.
The indentation is optional, and you can write this in a more compact way, using
;
.
In fact, you can put more commands on a single line by separating them with
;
while you use#
to add comments
let num=$1
if [ $num -gt 5 ]; then
echo this is big
else
echo this is small
fi
@ num = $1
if ($num > 5) then
echo this is big
else
echo this is small
endif
- Loops allow us to repeat a command or set of commands for each item in a list.
for val in one two three
do
echo val=$val
done
for val in $(ls)
do
echo val=$val
done
for (( i=1; i<=5; i++ ))
do
echo i=$i
done
for i in {0..10..2}
do
echo i=$i
done
case
- multiple options "if"while
- loop on a simple condition- etc...
$?
||
and&&
- error checking forbash
$status
- error checking forcsh
$ cat case.sh
#!/bin.sh
var="$1"
case $var in
big)
num=300
color=none
;;
small)
num=1
color=none
;;
blue*)
num=0
color=$var
;;
*)
echo var="'$var'" is not a valid option
exit 1
;;
esac
echo num=$num color=$color
$ sh case.sh
var='' is not a valid option
$ sh case.sh big
num=300 color=none
$ sh case.sh small
num=1 color=none
$ sh case.sh blue
num=0 color=blue
$ sh case.sh bluepale
num=0 color=bluepale
$ cat case.csh
#!/bin/csh
set var = "$1"
switch ($var)
case big:
set num = 300
set color = none
breaksw
case small:
set num = 1
set color = none
breaksw
case blue*:
set num = 0
set color = $var
breaksw
default:
echo var="'$var'" is not a valid option
exit 1
endsw
echo num=$num color=$color
$ gerp junk test
sh: gerp: command not found
$ echo $?
127
$ grep junk test
grep: test: No such file or directory
$ echo $?
2
$ grep junk case.sh
$ echo $?
1
$ grep exit case.sh
exit 1
$ echo $?
0
$ grep junk case.sh || echo no junk
no junk
$ grep exit case.sh && echo got exit
exit 1
got exit
grep junk case.sh
if [ $? != 0 ]; then
echo "no junk"
echo "exiting"
exit 1
fi
or
grep junk case.sh || (echo no junk; echo exiting; exit 1)
grep junk case.csh
if ($status != 0) then
echo "no junk"
echo "exiting"
exit 1
endif
chmod +x script
#!/bin/sh
or
#!/bin/csh
PATH=($PATH /where/the/script/is)
set path = ($path /where/the/script/is)
rehash
- edit
.bashrc
,.cshrc
or use a module
sed
- the stream editor for filtering and transforming textawk
- pattern scanning and processing languagegrep
- search a file for a patterntr
- translate or delete characterscut
- remove sections from each line of filesbc
- An arbitrary precision calculator languagedate
+ -date="specification" - date handling
PERL
- Practical Extraction and Report LanguagePython
- high level general purpose programming languageTcl
- Tool Command Language- etc...
sed 's/this/that/' file > newfile
sed 's/this/that/g'
sed -e 's/one/two' -e 's/blah/foo/'
sed -e "s/XXX/$values/" template.txt > input.txt
- awk instructions are
<condition> { what to do }
- Special conditions
BEGIN {}
END {}
- Special variables
NR
NF
- Operators
>, <, ==, =!
, etc...
- no
{what to do}
means{print $0}
or print the line
awk '{print $1}' file
awk -F: '{print $1}' file
awk 'NR == 3' file
awk 'NF > 4 { print $4}' file
awk '$2 == 0 { print $1}' file
awk -f instructions.awk file
grep hello file
grep -i hello file
grep -v hello file
egrep 'hello|bye' file
tr '[a-z]' '[A-Z]' < input > output
tr -d '[^a-z]' < input > output
- missing
qty=23.45
blob=$(echo "$qty * 12.4" | bc)
echo $blob
set qty = 23.45
set blob = `echo "$qty * 12.4" | bc`
echo $blob
- subtract one hour or 3600 seconds
let x=$(date --date='1/5/2022' +%s)
let x-=3600
then=$(date --date=@$x)
date --date='1/5/2022'
Wed Jan 5 00:00:00 EST 2022
echo $then
Tue Jan 4 23:00:00 EST 2022
- change the format with
+
then=$(date +%y%m%d_%H%M --date=@$x)
echo $then
220104_2300
In the hands-on portion of the workshop you will learn how to:
- Run a set of commands: Scripts
- Simplify and avoid errors
- Test assumptions
- Generalize a script to be used for multiple files or executions: Arguments
- Ease hands-on time by repeating a command for every file: for loops
- Re-use arguments by manipulating the text
- Get parameters from another file
If you need a reminder about how to log into Hydra and how to change your password, check out our Intro to Hydra tutorial: https://github.com/SmithsonianWorkshops/Hydra-introduction/blob/master/hydra_intro.md