An AWK implementation of 30-seconds-of-code.
Note:- This is in no way affiliated with the original 30-seconds-of-code.
- choosing right implementation
- basic I/O
- replacing grep
- replacing sed
- replacing head and tail
- performing math
- prettyprinting long awk programs
- troubleshooting awk programs
There are three modern implementations to consider:
- best feature set
- wide availability
- moderate performance
- best performance (example)
- smallest footprint
- best availability (even on embedded linux)
- there are multiple AWK implementations
- stick to the modern implementations gawk,mawkeventuallybusybox awk
AWK is able to read and write text streams very easily
function awk_showcase() {
  echo -e "AWK is still useful\ntext-processing  technology!" | \
    awk 'BEGIN{wcnt=0;print "lineno/#words/3rd-word:individual words\n" > "/dev/stderr"}
              {printf("% 6d/% 6d/% 8s:%s\n",NR,NF,$3,$0);wcnt+=NF}
           END{print "\nSummary:", NR, "lines/records,", wcnt, "words/fields" > "/dev/stderr"}'
}
$ awk_showcase
lineno/#words/3rd-word:individual words
     1/     4/   still:AWK is still useful
     2/     2/        :text-processing  technology!
Summary:2 lines/records, 6 words/fields
$ awk_showcase 2>/dev/null
     1/     4/   still:AWK is still useful
     2/     2/        :text-processing  technology!- printing data can be done via printandprintf()buildin functions
- passing data to stderrstream is done redirecting to awk special file handle/dev/stderr
# 0. text data generation
$ ps auxwww > /tmp/ps.log
# 1. grep command
$ grep ^avahi /tmp/ps.log
avahi      671  0.0  0.0  28108  2820 ?        Ss   11:45   0:00 avahi-daemon: running [linux.local]
avahi      688  0.0  0.0  27980   228 ?        S    11:45   0:00 avahi-daemon: chroot helper
# 2. mimicking grep command, keeping awk code as short as possible (using default/implicit actions)
$ awk '/^avahi/' /tmp/ps.log
avahi      671  0.0  0.0  28108  2820 ?        Ss   11:45   0:00 avahi-daemon: running [linux.local]
avahi      688  0.0  0.0  27980   228 ?        S    11:45   0:00 avahi-daemon: chroot helper
# 3. mimicking grep command, awk code as readable as possible
$ awk '/^avahi/ {print $0}' /tmp/ps.log
avahi      671  0.0  0.0  28108  2820 ?        Ss   11:45   0:00 avahi-daemon: running [linux.local]
avahi      688  0.0  0.0  27980   228 ?        S    11:45   0:00 avahi-daemon: chroot helper# 1. let's have multiple `gvfs-udisks2-volume-monitor+` processes
$ grep -F "gvfs-udisks2-volume-monitor+" /tmp/ps.log
gdm       1629  0.0  0.1 413944  8428 tty1     Sl  11:46   0:00 /qbin/gvfs-udisks2-volume-monitor+
gdm       1630  0.0  0.1 413940  8420 tty1     Sl  11:46   0:00 /qbin/gvfs-udisks2-volume-monitor+
f_ii      2711  0.0  0.1 414288  8564 tty2     Sl  11:46   0:00 /qbin/gvfs-udisks2-volume-monitor+
# 2. we need to select the *first one* running under user `gdm` (silly & fragile solution relying on `gdm` one comes first)
$ grep -F "gvfs-udisks2-volume-monitor+" -m 1 /tmp/ps.log
gdm       1629  0.0  0.1 413944  8428 tty1     Sl  11:46   0:00 /qbin/gvfs-udisks2-volume-monitor+
# 3. we need to select the one running under user gdm (more correct solution)
$ grep -F "gvfs-udisks2-volume-monitor+" /tmp/ps.log | grep -E -m 1 "^gdm[ \t]+"
gdm       1629  0.0  0.1 413944  8428 tty1     Sl  11:46   0:00 /qbin/gvfs-udisks2-volume-monitor+
# 4. naive implementation of the grep command 2. would be
$ awk 'index($0,"gvfs-udisks2-volume-monitor+") > 0 {print $0;exit(0);}' /tmp/ps.log
gdm       1629  0.0  0.1 413944  8428 tty1     Sl  11:46   0:00 /qbin/gvfs-udisks2-volume-monitor+
# 5. more correct implementation of the grep command 3. is
$ awk '$1 == "gdm" && index($0,"gvfs-udisks2-volume-monitor+") > 0 {print $0;exit(0);}' /tmp/ps.log
gdm       1629  0.0  0.1 413944  8428 tty1     Sl  11:46   0:00 /qbin/gvfs-udisks2-volume-monitor+# we have multiple uptime files showinh uptime at 13:00, 14:00, 15:00 and 16:00.
$ ls uptime.*
uptime.13:00  uptime.14:00  uptime.15:00  uptime.16:00
# let's show files where there were at least 10 active users logged in
#   using grep with extended regexp
$ grep -E ' [1-9][0-9]+ users' uptime.*
uptime.14:00: 12:04:11 up  2:19, 10 users,  load average: 0.28, 0.32, 0.27
uptime.15:00: 13:04:13 up  2:19, 11 users,  load average: 0.26, 0.31, 0.27
uptime.16:00: 14:04:15 up  2:19, 17 users,  load average: 0.26, 0.31, 0.27
# do the same with awk more optimal way
$ awk '$4 >= 10 {print FILENAME ":" $0}' uptime.*
uptime.14:00: 12:04:11 up  2:19, 10 users,  load average: 0.28, 0.32, 0.27
uptime.15:00: 13:04:13 up  2:19, 11 users,  load average: 0.26, 0.31, 0.27
uptime.16:00: 14:04:15 up  2:19, 17 users,  load average: 0.26, 0.31, 0.27- awkis text-processing swiss-knife and handles very easily everything what- grepknow
- there are AWK implicit/default actions which can make programs shorter but also much less readable
- every word in input stream is accessible via $ivariable
- an implicit conversion to number takes place so we can easily compare words to integer or float
- FILENAMEvariable keeps name of currently processed (read) file
TODO
# 0. text data generation
$ ps auxwww > /tmp/ps.log
# 1. essential head
$ head -3 /tmp/ps.log 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1 194320  8872 ?        Ss   11:44   0:02 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
root         2  0.0  0.0      0     0 ?        S    11:44   0:00 [kthreadd]
# 1.awk mimicking the head command
$ awk 'NR<=3 {print}' /tmp/ps.log
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1 194320  8872 ?        Ss   11:44   0:02 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
root         2  0.0  0.0      0     0 ?        S    11:44   0:00 [kthreadd]
# 2. basic tail
$ tail -n 2 /tmp/ps.log 
f_ii     10347  0.0  0.2 470984 21808 ?        S    12:42   0:00 file.so [kdeinit5] file local:...
f_ii     10354  0.0  0.0 154704  3840 pts9    R+   12:43   0:00 ps auxwww
# 2.awk performing the same in tail is much less straightforward
$ awk -v n=2 'BEGIN{arr[-1]=0}
>                  {arr[arr[-1]]=$0;arr[-1]++;}
>               END{for(i=arr[-1]-n;i<arr[-1];i++){print arr[i]}}' /tmp/ps.log 
f_ii     10347  0.0  0.2 470984 21808 ?        S    12:42   0:00 file.so [kdeinit5] file local:...
f_ii     10354  0.0  0.0 154704  3840 pts9    R+   12:43   0:00 ps auxwww
- variable NRcontains number of processed record (acting as record/line number)
- passing an awk variable from command-line is possible via -v variable-name=variable-value
- associative arrays are fully supported
- BEGIN{}rule is evaluated before first line is read
- END{}rule is evaluated after last line is processed
# fibo.awk
# actions before reading text stream
BEGIN{
  for(i=0; i<(ARGC > 1 ? ARGV[1] : 10); i++)
    print(fibo_recursive(i));
}
# local functions
function fibo_recursive(in_val) {
    return( in_val<=1 ? in_val : fibo_recursive(in_val-1) + fibo_recursive(in_val-2));
}An execution results in Fibonacci sequence:
$ awk -f ~/tmp/fibo.awk 10
0
1
1
2
3
5
8
13
21
34
# wordstats.awk
# actions before reading text stream - initiate counters
BEGIN{
    wcnt = bllinecnt = 0;
    wmin = wmax = "";
}
# at every line ruleblock
{
    # count the words
    wcnt += NF;
    # count blank/empty lines
    if (NF == 0)
        bllinecnt++;
    # update maximum and minimum word count variables
    if( (NF>(wmax+0)) || (NR==1))
        wmax = NF;
    if( (NF<(wmin+0)) || (NR==1))
        wmin = NF;
}
# final text stream statistics
END{
    printf("Total %d words on %d lines. (average/min/max %.2f/%s/%s words per line; %d blank lines)\n",
           wcnt, NR, wcnt / NR, wmin, wmax, bllinecnt);
}The executions:
$ echo -e "A B C\nA B" | awk -f ~/tmp/wordstats.awk 
Total 5 words on 2 lines. (average/min/max 2.50/2/3 words per line; 0 blank lines)
$ ps auxwww | awk -f ~/tmp/wordstats.awk
Total 3458 words on 270 lines. (average/min/max 12.81/11/26 words per line; 0 blank lines)
$ curl -s 'https://www.gnu.org/licenses/old-licenses/lgpl-2.1.txt' | awk -f ~/tmp/wordstats.awk
Total 4381 words on 502 lines. (average/min/max 8.73/0/15 words per line; 75 blank lines)
- parsing command-line arguments are possible via C-like variables
- count of arguments ARGCand
- individual arguments ARGVarray
 
- count of arguments 
- AWK allow functions to be declared either on the beginning or at the end of awk source code
- NFvariable contains number of words (awk fields) on line (awk record)
- ternary operator <cond> ? <when-true> : <when-false>is fully available
The rule of thumb says you should place AWK program into separate file if is long and unreadable as the oneliner.
# 0. text data generation
$ ps auxwww > /tmp/ps.log
# 1. mimicking tail -n 2 
# 1.a as not much readable oneliner
$ awk -v n=2 'BEGIN{arr[-1]=0}{arr[arr[-1]]=$0;arr[-1]++;}END{for(i=arr[-1]-n;i<arr[-1];i++){print arr[i]}}' /tmp/ps.log 
f_ii     10347  0.0  0.2 470984 21808 ?        S    12:42   0:00 file.so [kdeinit5] file local:...
f_ii     10354  0.0  0.0 154704  3840 pts/9    R+   12:43   0:00 ps auxwww
# 1.b as oneliner on separated lines
$ awk -v n=2 'BEGIN{arr[-1]=0}
>                  {arr[arr[-1]]=$0;arr[-1]++;}
>               END{for(i=arr[-1]-n;i<arr[-1];i++){print arr[i]}}' /tmp/ps.log 
f_ii     10347  0.0  0.2 470984 21808 ?        S    12:42   0:00 file.so [kdeinit5] file local:...
f_ii     10354  0.0  0.0 154704  3840 pts/9    R+   12:43   0:00 ps auxwww
# 1.c oneliner prettyprinted by "GNU AWK profiler"
$ gawk -p -v n=2 'BEGIN{arr[-1]=0}{arr[arr[-1]]=$0;arr[-1]++;}END{for(i=arr[-1]-n;i<arr[-1];i++){print arr[i]}}' /tmp/ps.log 
f_ii     10347  0.0  0.2 470984 21808 ?        S    12:42   0:00 file.so [kdeinit5] file local:...
f_ii     10354  0.0  0.0 154704  3840 pts/9    R+   12:43   0:00 ps auxwww
$ cat awkprof.out 
        # gawk profile, created Tue Dec 25 14:48:02 2018
        # BEGIN block(s)
        BEGIN {
     1          arr[-1] = 0
        }
        # Rule(s)
   261  {
   261          arr[arr[-1]] = $0
   261          arr[-1]++
        }
        # END block(s)
        END {
     2          for (i = arr[-1] - n; i < arr[-1]; i++) {
     2                  print arr[i]
                }
        }
# 1.d program in separated file
$ awk -v n=2 -f tail.awk /tmp/ps.log 
f_ii     10347  0.0  0.2 470984 21808 ?        S    12:42   0:00 file.so [kdeinit5] file local:...
f_ii     10354  0.0  0.0 154704  3840 pts/9    R+   12:43   0:00 ps auxwww- AWK program should be inserted in separate *.awkfile to avoid poor readability
- prettyprinting of badly formatted program is posible via GNU AWK "profiling" feature gawk -p
To troubleshoot not functional AWK program you should have GNU AWK in "profiling" mode (gawk -p), generated execution summary helps to understand what is wrong.
- use GNU AWK "profiling" feature gawk -pto uncover issues with your awk code
- if you do not have GNU AWK you need to add custom debugging messages