-
Notifications
You must be signed in to change notification settings - Fork 9
Create a lens from bottom to top
This example will show how to create a lens from bottom to top. If your brain is wired so that you prefer a top-down approach, you may also read Creating-a-lens-step-by-step
This example will focus on developing a lens for Debian control file. This file is used to create Debian package.
Writing a lens is easier if you write the tests along the lens.
Let's start by the file layout. Let's create the lens file and the test file:
libdebctrl-augeas$ tree . |-- debctrl.aug `-- tests `-- test_debctrl.aug
At every step, you will have to tests the syntax of the lens:
$ augparse debctrl.aug
And test the lens :
$ augparse -I. tests/test_debctrl.aug
To run the test continuously, you may want to run this command in a terminal:
watch -n 1 'augparse debctrl.aug && augparse -I. tests/test_debctrl.aug'
A typical control file has (e-mails are mangled to avoid spam):
Source: libconfig-model-perl Section: perl Uploaders: Dominique Dumont <dominique.dumont@xx.yyy>, gregor herrmann <gregoa@xxx.yy> Priority: optional Build-Depends: debhelper (>= 7.0.0), perl-modules (>= 5.10) | libmodule-build-perl Build-Depends-Indep: perl (>= 5.8.8-12), libcarp-assert-more-perl, libconfig-tiny-perl, libexception-class-perl, libparse-recdescent-perl (>= 1.90.0), liblog-log4perl-perl (>= 1.11) Maintainer: Debian Perl Group <pkg-perl-maintainers@xx> Standards-Version: 3.8.2 Vcs-Svn: svn://svn.debian.org/svn/pkg-perl/trunk/libconfig-model-perl Vcs-Browser: http://svn.debian.org/viewsvn/pkg-perl/trunk/libconfig-model-perl/ Homepage: http://search.cpan.org/dist/Config-Model/ # a comment Package: libconfig-model-perl Architecture: all Depends: ${perl:Depends}, ${misc:Depends},libcarp-assert-more-perl, libexception-class-perl, libparse-recdescent-perl (>= 1.90.0), liblog-log4perl-perl (>= 1.11) Suggests: libconfig-tiny-perl, libterm-readline-perl-perl | libterm-readline-gnu-perl Description: describe and edit configuration data Config::Model enables project developers to provide an interactive configuration editor (graphical, curses based or plain terminal) to their users. For this they must: - describe the structure and constraint of the project's configuration - if the configuration data is not stored in INI file or in Perl data file, provide some code to read and write configuration from configuration files must be provided . With the elements above, Config::Model will generate interactive configuration editors (with integrated help and data validation). These editors can be graphical (with Config::Model::TkUI), curses based (with Config::Model::CursesUI) or based on ReadLine.
Note that:
- Blank lines separate sections between source package and binary package
- Description text must have a leading space
- data fields can span several lines. In this case, the "continuation" line must begin with space or tab
As described there, let's start with an almost empty lens:
module Debctrl = autoload xfm (* lens must be used with AUG_ROOT set to debian package source directory *) let xfm = transform lns (incl "/debian/control")
and a test that is almost as empty
module Test_debctrl =
Let's start small by creating a lens and its test (or should I say the test and its lens) for source. This is one keyword (Source) with a single value. Since we are working from top to bottom, we must test this lens snippet and not the whole dectrl lens. Otherwisem we will have to modify this unit test as long as we make the lens more complex (and the resulting tree deeper)
module Test_debctrl = (* the source line *) let source = "Source: libtest-distmanifest-perl\n" (* declare the lens to test and the resulting tree^Wtwig *) test Debctrl.source get source = { "Source" = "libtest-distmanifest-perl" }
and the lens:
module Debctrl = autoload xfm (* import eol from util.aug lens *) let eol = Util.eol (* keywords and values are separated by a colon *) let colon = del /:[ \t]*/ ": " (* note that no space is allowed between "Source" and ':' *) let source = [ key "Source" . colon . store /[^ \t]+/ . eol ] (* this lens will get more complex in time *) let lns = source (* lens must be used with AUG_ROOT set to debian package source directory *) let xfm = transform lns (incl "/debian/control")
Now let's test the lens:
$ augparse debctrl.aug $ augparse -I. tests/test_debctrl.aug
No news is good news :-)
This keywords may have several values on several lines. In this case, eol does not matter
Here's the test:
let uploaders = "Uploaders: foo@bar, Dominique Dumont <dominique.dumont@xx.yyy>,\n" . " gregor herrmann <gregoa@xxx.yy>\n" test Debctrl.uploaders get uploaders = { "Uploaders" { "1" = "foo@bar"} { "2" = "Dominique Dumont <dominique.dumont@xx.yyy>" } { "3" = "gregor herrmann <gregoa@xxx.yy>" } }
and the lens:
let del_opt_ws = del /[\t ]*/ "" (* lens that defines a "continuation" line in a data field *) let cont_line = del /\n[ \t]+/ "\n " let comma = del /,[ \t]*/ "," (* defines comma separated data which may span several lines *) let sep_comma_with_nl = del_opt_ws . cont_line* . comma . cont_line* (* 2 regex to catch 2 types of email: Foo Bar <foo@bar> and plain foo@bar *) let email = /([A-Za-z]+ )+<[^\n>]+>/ | /[^\n,\t ]+/ (* define a function with a keyword and an array of data separated by commas *) let multi_line_array_entry (k:string) = [ key k . colon . [ seq k . store email] . [ seq k . sep_comma_with_nl . store email ]* . eol ] (* apply the above function to Uploaders fiels *) let uploaders = multi_line_array_entry "Uploaders" (* now the lens can parse Uploaders and Source *) let lns = ( uploaders | source )*
Pasing the Package field is similar to the Source field. Instead of duplicating Source lens, one can use Augeas variable to minimize code duplication.
The "meat" of the Source lens is moved to the simple_entry function:
let simple_entry (k:regexp) = let value = store /[^ \t\n]+/ in [ key k .colon . value . eol ]
So the Source lens is now
let source = simple_entry "Source"
Likewise, the Package lens is
let package = simple_entry "Package"
To factorize even further we can define the simple fields for the binary package:
let simple_bin_keyword = "Package" | "Architecture" | "Section" | "Priority" | "Essential" | "Homepage" let simple_bin_entry = simple_entry simple_bin_keyword
Likewise, the source package parser will become:
let simple_src_keyword = "Source" | "Section" | "Priority" | "Standards-Version" | "Homepage" let simple_src_entry = simple_entry simple_src_keyword
Since the test calls the lens by its name, the test must be changed to:
test Debctrl.simple_src_entry get source = { "Source" = "libtest-distmanifest-perl" }
Looks like "simple_entry" lens is working well. Let's apply it to the maintainer field:
test (Debctrl.simple_entry Debctrl.simple_src_keyword ) get "Maintainer: Debian Perl Group <pkg-perl-maintainers@xxx>\n" = { "Maintainer" = "Debian Perl Group <pkg-perl-maintainers@xxx>" }
Rats, augparse complains:
Error encountered here (0 characters into string) <|=|Maintainer: Debian Perl Grou>
Note: Contrary to other generated parsers like Perl's Parse::RecDescent, Augeas either succeeds completely or parse nothing. The simple entry lens fails because the Maintainer fields contains white space.
ok let's set authorize white space and tabs in the simple entry value:
let value = store /[^\n]+/ (* /[^ \t][^\n]+/*) in ...
Rats, another error:
debctrl.aug:11.5-.26:exception: ambiguous concatenation 'Package: A' can be split into 'Package:|=| A' and 'Package: |=|A'
Even before parsing the test file, Augeas complains: any white space between the keyword ("Package:") and the value ("A") can be part of the left lens or the right lens.
Here, the solution is to specify that the value must begin with a non white space character:
let value = store /[^ \t][^\n]+/ in ...
Note that the value will not contain trailing white spaces. Contrary to Perl regexp, one cannot specify a non-greedy quantifier ("+"), any white space is gobbled until the next lens. And the next lens cannot be "eol" but only 'del "\n" "\n"':
let simple_entry (k:regexp) = let value = store /[^ \t][^\n]+/ in [ key k . colon . value . hardeol ]
The dependency list is a tough nut to crack:
- dependencies are separated by commas
- newlines don't matter if they are followed by a space
- dependencies can be or'ed with "|"
- dependencies have a optional field to specify constraints regarding package version (e.g. ">= perl 5.8.8-12" )
- dependencies have an optional field to specify compatible (or not) arch (e.g. "[i386]" or "[!amd64]")
- name (package name)
- version (optional)
- arch (optional_
- relation (can be >=,< = ...)
- number
- prefix ( "!" or "" )
- name
let version_depends = [ label "version" . [ del_opt_ws . del /\(/ "(" . label "relation" . del_opt_ws . store /[<>=]+/ ] . [ del_opt_ws . label "number" . store /[a-zA-Z0-9_\.\-]+/ . del_opt_ws . del /\)/ ")" ] ]
and the test:
test Debctrl.version_depends get "( >= 5.8.8-12 )" = { "version" { "relation" = ">=" } { "number" = "5.8.8-12" } }
Parsing the arch won't be detailed as it is similar to version parsing.
Here's the package deprendency lens :
let package_depends = [ label "name" . store /[a-zA-Z0-9_\-]+/ ] . ( version_depends | arch_depends ) *
This lens is tested with :
let p_depends_test = "perl ( >= 5.8.8-12 ) [ !hurd-i386]" test Debctrl.package_depends get p_depends_test = { "name" = "perl" } { "version" { "relation" = ">=" } { "number" = "5.8.8-12" } } { "arch" { "prefix" = "!" } { "name" = "hurd-i386" } }
A complete package dependency with the "or" can be parsed with a simple sequence. Having several packages in this sequence implies a "or" :
(* "counter" is required to reset the counter that list the alternate dependencies *) let dependency = [ counter "dep" . seq "dep" . package_depends ] . [ del_opt_ws . seq "dep" . del /\|/ "|" . del_opt_ws . package_depends ] *
And the test:
let dependency_test = "perl-modules (>= 5.10) | libmodule-build-perl" test Debctrl.dependency get dependency_test = { "1" { "name" = "perl-modules" } { "version" { "relation" = ">=" } { "number" = "5.10" } } } { "2" { "name" = "libmodule-build-perl" } }
Now, let's tackle the dependency list. Since the multi line list is similar to the "Uploaders" list, one might want to reuse the "multi_line_array_entry" lens. But this one was dedicated to find list of emails.
Now it's time to add a second parameter to this lens so it can deal with email and package dependencies (of course, this has an impact of the lens that currently use "multi_line_array_entry"):
(* k and v are the lens parameters *) let multi_line_array_entry (k:string) (v:lens) = [ key k . colon . [ seq k . v ] . [ seq k . sep_comma_with_nl . v ]* . eol ]
The dependency lens is :
let dependency_list (field:string) = multi_line_array_entry field dependency
The field parameter is necessary because the lens will be used for several types of dependencies
And the test is :
test (Debctrl.dependency_list "Build-Depends-Indep") get "Build-Depends-Indep: perl (>= 5.8.8-12), libcarp-assert-more-perl,\n" . " libconfig-tiny-perl\n" = { "Build-Depends-Indep" { "1" { "1" { "name" = "perl" } { "version" { "relation" = ">=" } { "number" = "5.8.8-12" } } } } { "2" { "1" { "name" = "libcarp-assert-more-perl" } } } { "3" { "1" { "name" = "libconfig-tiny-perl" } }} }
Great, but the resulting tree is ugly and might be confusing to the user who might not know the relation between the package: the "and" and "or" relations are not explicit. As a developer, you will probably pay such a mistake with tons of docs or FAQs ;-)
So the "and" and "or" must be explicit. Let's try this new lense:
let dependency = [ label "or" . package_depends ] . [ label "or" . del / *\| */ " | " . package_depends ] * let dependency_list (field:regexp) = [ key field . colon . [ label "and" . dependency ] . [ label "and" . sep_comma_with_nl . dependency ]* . eol ]
And the resulting tree:
test (Debctrl.dependency_list "Build-Depends-Indep") get "Build-Depends-Indep: perl (>= 5.8.8-12) [ !hurd-i386], \n" . " perl-modules (>= 5.10) | libmodule-build-perl,\n" . " libconfig-tiny-perl\n" = { "Build-Depends-Indep" { "and" { "or" { "perl" { "version" { "relation" = ">=" } { "number" = "5.8.8-12" } } { "arch" { "prefix" = "!" } { "name" = "hurd-i386" } } } } } { "and" { "or" { "perl-modules" { "version" { "relation" = ">=" } { "number" = "5.10" } } } } { "or" { "libmodule-build-perl" } } } { "and" { "or" { "libconfig-tiny-perl" } } } }
The resulting tree may be a little lispish, but the relation is now explicit. The lens with "name" label was also replaced by a lens which use the package name as "key"
The main paragraphs of fields of the control file are separated by blank lines. The lens must return a list of paragraph that will contain the fields.
Let's parse these paragraph with this test:
let simple_bin_pkg = "Package: libconfig-model-perl\n" . "Architecture: all\n" let paragraph_simple = source . uploaders . "\n" . simple_bin_pkg test Debctrl.lns get paragraph_simple = { "srcpkg" { "Source" = "libtest-distmanifest-perl" } { "Uploaders" { "1" = "foo@bar"} { "2" = "Dominique Dumont <dominique.dumont@xx.yyy>" } { "3" = "gregor herrmann <gregoa@xxx.yy>" } } } { "binpkg" { "Package" = "libconfig-model-perl" } { "Architecture" = "all" } } { "binpkg" { "Package" = "libconfig-model2-perl" } { "Architecture" = "all" } }
To separate the different paragraph, we'll use a label lens so the source package paragraph is clearly separated in the tree from the binary package paragraphs.
Now, the top lenses is modified to take the paragraphs into account:
let lns = [ label "srcpkg" . ( uploaders | simple_src_entry )* ] . [ label "binpkg" . del "\n" "\n" . simple_bin_entry* ]*
This field is a multi-line field that contains:
- a summary on the same line as the Description
- a multi-line description ended by a blank line or a line that does not begin with a space
let hardeol = del "\n" "\n" (* store any line that begins with ' ' *) let multi_line_entry (k:string) = let line = /[^\n]+/ in [ label k . del /^ / " " . store line . hardeol ] * (* Description will contains 2 nodes: summary and text *) let description = [ key "Description" . colon . [ label "summary" . store /[a-zA-Z][^\n]+/ . hardeol ] . [ multi_line_entry "text" ] ]
Here the matching test:
let description = "Description: describe and edit configuration data\n" ." Config::Model enables [...] must:\n" ." - if the configuration data\n" ." .\n" ." With the elements above, (...) on ReadLine.\n" test Debctrl.description get description = { "Description" { "summary" = "describe and edit configuration data" } { "text" = "Config::Model enables [...] must:" } { "text" = " - if the configuration data" } { "text" = "." } { "text" = "With the elements above, (...) on ReadLine."} }
Note that the author could not find any way to replace the single '.' with a blank line. Suggestions are welcome.
Drum rolls, please ...
The source paragraph lens is:
let uploaders = multi_line_array_entry /Uploaders/ email let simple_src_keyword = "Source" | "Section" | "Priority" | "Standards\-Version" | "Homepage" | /Vcs\-Svn/ | /Vcs\-Browser/ | "Maintainer" let depend_src_keywords = "Build-Depends" | "Build-Depends-Indep" let src_entries = ( simple_entry simple_src_keyword | uploaders | dependency_list depend_src_keywords ) *
The binary paragraph lens is:
let simple_bin_keywords = "Package" | "Architecture" let depend_bin_keywords = "Depends" | "Recommends" | "Suggests" let bin_entries = ( simple_entry simple_bin_keywords | dependency_list depend_bin_keywords ) + . description
And the final lens is:
let lns = [ label "srcpkg" . src_entries ] . [ label "binpkg" . hardeol+ . bin_entries ]+ . eol* (* lens must be used with AUG_ROOT set to debian package source directory *) let xfm = transform lns (incl "/control")
Having a successful parser does not mean that you are out of the woods. Some subtle bugs are often discovered during put tests.
Put tests are also useful to check how the file is written when starting from scratch.
First tests that your lenses does not break the file:
test Debctrl.src_entries put uploaders after set "/Uploaders/1" "foo@bar" = uploaders
Then that additions are correctly handled:
test Debctrl.src_entries put uploaders after set "/Uploaders/4" "baz@bar" = "Uploaders: foo@bar, Dominique Dumont <dominique.dumont@xx.yyy>,\n" . " gregor herrmann <gregoa@xxx.yy>,\n" . " baz@bar\n"
And last but not least, test from a minimal file:
test Debctrl.lns put (source."\nPackage: test\nDescription: foobar\n") after set "/srcpkg/Uploaders/1" "foo@bar" ; set "/srcpkg/Uploaders/2" "Dominique Dumont <dominique.dumont@xx.yyy>" ; set "/srcpkg/Uploaders/3" "gregor herrmann <gregoa@xxx.yy>" ; set "/srcpkg/Build-Depends-Indep/and[1]/or/perl/version/relation" ">=" ; set "/srcpkg/Build-Depends-Indep/and[1]/or/perl/version/number" "5.8.8-12" ; set "/srcpkg/Build-Depends-Indep/and[1]/or/perl/arch/prefix" "!" ; set "/srcpkg/Build-Depends-Indep/and[1]/or/perl/arch/name" "hurd-i386" ; set "/srcpkg/Build-Depends-Indep/and[2]/or[1]/perl-modules/version/relation" ">=" ; set "/srcpkg/Build-Depends-Indep/and[2]/or[1]/perl-modules/version/number" "5.10" ; set "/srcpkg/Build-Depends-Indep/and[2]/or[2]/libmodule-build-perl" ""; set "/srcpkg/Build-Depends-Indep/and[3]/or/libcarp-assert-more-perl" "" ; set "/srcpkg/Build-Depends-Indep/and[4]/or/libconfig-tiny-perl" "" ; set "/binpkg[1]/Package" "libconfig-model-perl" ; (* must remove description because set cannot insert Architecture before Description *) rm "/binpkg[1]/Description" ; set "/binpkg/Architecture" "all" ; set "/binpkg[1]/Description/summary" "dummy1" ; set "/binpkg[1]/Description/text" "dummy text 1" ; set "/binpkg[2]/Package" "libconfig-model2-perl" ; set "/binpkg[2]/Architecture" "all" ; set "/binpkg[2]/Description/summary" "dummy2" ; set "/binpkg[2]/Description/text" "dummy text 2" = "Source: libtest-distmanifest-perl Uploaders: foo@bar, Dominique Dumont <dominique.dumont@xx.yyy>, gregor herrmann <gregoa@xxx.yy> Build-Depends-Indep: perl ( >= 5.8.8-12 ) [ !hurd-i386 ], perl-modules ( >= 5.10 ) | libmodule-build-perl, libcarp-assert-more-perl, libconfig-tiny-perl Package: libconfig-model-perl Architecture: all Description: dummy1 dummy text 1 Package: libconfig-model2-perl Architecture: all Description: dummy2 dummy text 2 "
This lens was developed for Google Summer of Code 2009 by the project's mentor. This was done to show that an alternative solution based on Augeas was possible.
In order to kill 2 birds with one stone, this wiki page was created to show how to create a lens from bottom to top. So that every one can benefit from the explanations I would have to give to my student ;-)
Anyway, I hope that the test strategy developed here will be useful.
Here are the complete lens and test from SVN