From ad5076149bb99ab4deae15e3a478c89e20bf6b00 Mon Sep 17 00:00:00 2001 From: pmitev Date: Thu, 1 Sep 2022 08:29:16 +0200 Subject: [PATCH] Fixes after the last workshop --- docs/1.Simple_example.md | 2 +- docs/3.Shell_we_awk.md | 2 +- docs/6.One_line_programs.md | 2 +- docs/Case_studies/manipulating_vcf.md | 2 ++ 4 files changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/1.Simple_example.md b/docs/1.Simple_example.md index 18aedd4f..05fff8e2 100644 --- a/docs/1.Simple_example.md +++ b/docs/1.Simple_example.md @@ -153,7 +153,7 @@ $ awk ' /pattern/ {action} ' file1 file2 ... fileN - remove the minus sign in the `%-8s` formatting to see the effect. - more string manipulations [exercises](Exercises/String_manipulation.md) -More on format modifiers: [gawk documentation](https://https://www.gnu.org/software/gawk/manual/html_node/Format-Modifiers.html#Format-Modifiers) +More on format modifiers: [gawk documentation](https://www.gnu.org/software/gawk/manual/html_node/Format-Modifiers.html#Format-Modifiers) !!! example "Files" * [coins.txt](data/coins.txt) diff --git a/docs/3.Shell_we_awk.md b/docs/3.Shell_we_awk.md index 36a7ff9c..af72f6ae 100644 --- a/docs/3.Shell_we_awk.md +++ b/docs/3.Shell_we_awk.md @@ -90,7 +90,7 @@ Let's use the output from the Gaussian code as an example (*something from my re 2O,std. LJ params. for H2O, MC/QM\\-1,1\O,0,0.,0.,0.\H,0,-0.836605,-0. . . . ( more lines ) . . . ``` -The numbers that I am interested in are in bold. There are **56 such pairs** in the whole file. I need them tabulated in simple, two-column file that is easy to read, analyze and plot. Here I will not discuss other solutions. Instead, here is a possible awk solution: +The numbers that I am interested in are in bold. There are **56 such pairs** in the whole [file](https://github.com/pmitev/to-awk-or-not/raw/master/docs/data/gaussian.out). I need them tabulated in simple, two-column file that is easy to read, analyze and plot. Here I will not discuss other solutions. Instead, here is a possible awk solution: ``` awk title="extract-gaussian.awk" #!/usr/bin/awk -f diff --git a/docs/6.One_line_programs.md b/docs/6.One_line_programs.md index dc534c15..a9a143cf 100644 --- a/docs/6.One_line_programs.md +++ b/docs/6.One_line_programs.md @@ -5,6 +5,6 @@ Many useful awk programs are as short as just a line or two. Here is a collectio Here I realized that it is better to post some links and just mention some of my favorites, perhaps. * [The best AWK one-liners](http://tuxgraphics.org/~guido/scripts/awk-one-liner.html) -* [AWK one-liners](http://www.softpanorama.org/Tools/Awk/awk_one_liners.shtml) by Softpanorama * [awk one-liners](https://nixshell.wordpress.com/2009/04/01/awk-one-liners/) by *nix shell +* [Handy One-line Scripts for AWK](https://www.pement.org/awk/awk1line.txt) compiled by Eric Pement diff --git a/docs/Case_studies/manipulating_vcf.md b/docs/Case_studies/manipulating_vcf.md index 623bdaa5..2cf155bb 100644 --- a/docs/Case_studies/manipulating_vcf.md +++ b/docs/Case_studies/manipulating_vcf.md @@ -117,6 +117,8 @@ Count and sort the different genomic features in chromosome 4 by number. } ``` + ??? "_Solution_ proposed by Loïs Rancilhac - 2022.08.30" + `awk '/_SNP/ {SNP++; print $0 > "chr4_SNPs.vcf"} /_DEL/ {DEL++; print $0 > "chr4_DEL.vcf"; LENGTH=length($4)-length($5); print LENGTH > "Deletions_lengths.txt"} /_INS/ {INS++; print $0 > "chr4_INS.vcf"; LENGTH=length($5)-length($4); print LENGTH > "Insertions_lengths.txt"} END{print "SNPs: "SNP"\nInsertions: "INS"\nDeletions: "DEL}' chr4.vcf` #### *Follow-up task:* Print nucleotide substitution that these SNPs introduce sorted by number. Remember the coins...