diff --git a/pom.xml b/pom.xml
index 56474753..d66f0547 100644
--- a/pom.xml
+++ b/pom.xml
@@ -81,6 +81,7 @@
truetemplates/**
+ data/**
@@ -88,6 +89,7 @@
falsetemplates/**
+ data/**
diff --git a/src/main/xar-resources/data/author-reference/author-reference.xml b/src/main/xar-resources/data/author-reference/author-reference.xml
index d4aef598..6847a483 100644
--- a/src/main/xar-resources/data/author-reference/author-reference.xml
+++ b/src/main/xar-resources/data/author-reference/author-reference.xml
@@ -7,6 +7,7 @@
January 2018authoring
+ exist
diff --git a/src/main/xar-resources/data/contentextraction/listings/listing-3.txt b/src/main/xar-resources/data/contentextraction/listings/listing-3.txt
index 12fc979f..8d70e791 100644
--- a/src/main/xar-resources/data/contentextraction/listings/listing-3.txt
+++ b/src/main/xar-resources/data/contentextraction/listings/listing-3.txt
@@ -1,3 +1,3 @@
content:get-metadata($binary as xs:base64Binary) as document-node()
content:get-metadata-and-content($binary as xs:base64Binary) as document-node()
-content:stream-content($binary as xs:base64Binary, $paths as xs:string*, $callback as function, $namespaces as element()?, $userData as item()*) as empty-sequence()
+content:stream-content($binary as xs:base64Binary, $paths as xs:string*, $callback as function, $namespaces as element()?, $userData as item()*) as empty()
\ No newline at end of file
diff --git a/src/main/xar-resources/data/db4-versions/contentextraction-db4.xml b/src/main/xar-resources/data/db4-versions/contentextraction-db4.xml
index 55057f30..0ddc027a 100644
--- a/src/main/xar-resources/data/db4-versions/contentextraction-db4.xml
+++ b/src/main/xar-resources/data/db4-versions/contentextraction-db4.xml
@@ -35,7 +35,7 @@ include.feature.contentextraction = true
The module provides three functions:content:get-metadata($binary as xs:base64Binary) as document-node()
content:get-metadata-and-content($binary as xs:base64Binary) as document-node()
-content:stream-content($binary as xs:base64Binary, $paths as xs:string*, $callback as function, $namespaces as element()?, $userData as item()*) as empty-sequence()
+content:stream-content($binary as xs:base64Binary, $paths as xs:string*, $callback as function, $namespaces as element()?, $userData as item()*) as empty()
The first two functions need little explanation: get-metadata just returns some
metadata extracted from the resource, while get-metadata-and-content will also
provide the text body of the resource—if there is any. The third function is a
diff --git a/src/main/xar-resources/data/db4-versions/documentation-db4.xml b/src/main/xar-resources/data/db4-versions/documentation-db4.xml
index b5014ade..69a64756 100644
--- a/src/main/xar-resources/data/db4-versions/documentation-db4.xml
+++ b/src/main/xar-resources/data/db4-versions/documentation-db4.xml
@@ -64,7 +64,7 @@
All Documentation
- Besides these articles, you can search eXist-db's XQuery function module library.
+ Besides these articles, you can search eXist-db's XQuery function module library.Getting Started
@@ -464,4 +464,4 @@
-
\ No newline at end of file
+
diff --git a/src/main/xar-resources/data/db4-versions/extensions-db4.xml b/src/main/xar-resources/data/db4-versions/extensions-db4.xml
index c13025b7..bb14554c 100644
--- a/src/main/xar-resources/data/db4-versions/extensions-db4.xml
+++ b/src/main/xar-resources/data/db4-versions/extensions-db4.xml
@@ -69,7 +69,7 @@
eXist-db must also be told which modules to load, this is done in
conf.xml and the Class name and Namespace for each module is listed
below. Note – eXist-db will require a restart to load any new modules added. Once a Module is configured
- and loaded eXist-db will display the module and its function definitions as part of the function library page or through
+ and loaded eXist-db will display the module and its function definitions as part of the function library page or through
util:decribe-function().
@@ -310,4 +310,4 @@
Namespace: http://xmlcalabash.com
-
\ No newline at end of file
+
diff --git a/src/main/xar-resources/data/db4-versions/incompatibilities-db4.xml b/src/main/xar-resources/data/db4-versions/incompatibilities-db4.xml
index 9d95d074..569b1a80 100644
--- a/src/main/xar-resources/data/db4-versions/incompatibilities-db4.xml
+++ b/src/main/xar-resources/data/db4-versions/incompatibilities-db4.xml
@@ -40,7 +40,7 @@
The XQuery engine has been updated to support the changed syntax for maps in XQuery 3.1. The query parser will still accept the
old syntax for map constructors though (map { x:= "y"} instead of map { x: "y" } in XQuery 3.1), so old
code should run without modifications. All map module functions from XQuery 3.1 are
- available.
+ available.
The signatures for some higher-order utility functions like fn:filter, fn:fold-left and fn:fold-right have changed as well. Please review your
use of those functions. Also, fn:map is now called fn:for-each, though the old name is still accepted.The bundled Lucene has been upgraded from 3.6.1
@@ -192,4 +192,4 @@ dbutil:scan-collections(xs:anyURI("/db"), function($collection) {
-
\ No newline at end of file
+
diff --git a/src/main/xar-resources/data/db4-versions/validation-db4.xml b/src/main/xar-resources/data/db4-versions/validation-db4.xml
index 47e00446..110241ec 100644
--- a/src/main/xar-resources/data/db4-versions/validation-db4.xml
+++ b/src/main/xar-resources/data/db4-versions/validation-db4.xml
@@ -142,7 +142,7 @@
Jing
- Each of these options are discussed in the following sections. Consult the XQuery Function Documentation for detailed functions
+ Each of these options are discussed in the following sections. Consult the XQuery Function Documentation for detailed functions
descriptions.JAXP
@@ -482,4 +482,4 @@
-
\ No newline at end of file
+
diff --git a/src/main/xar-resources/data/db4-versions/xquery-db4.xml b/src/main/xar-resources/data/db4-versions/xquery-db4.xml
index ef6e6b2d..2d1e83c9 100644
--- a/src/main/xar-resources/data/db4-versions/xquery-db4.xml
+++ b/src/main/xar-resources/data/db4-versions/xquery-db4.xml
@@ -130,7 +130,7 @@
currently not support the schema import and
schema validation features
defined as optional
- in the XQuery specification. eXist-db provides extension functions
+ in the XQuery specification. eXist-db provides extension functions
to perform XML validation. The database does not
store type information along with the nodes. It therefore cannot know
the typed value of a node and has to assume
diff --git a/src/main/xar-resources/data/development-starter/development-starter.xml b/src/main/xar-resources/data/development-starter/development-starter.xml
index 6c3b0520..585b31c7 100644
--- a/src/main/xar-resources/data/development-starter/development-starter.xml
+++ b/src/main/xar-resources/data/development-starter/development-starter.xml
@@ -1,360 +1,372 @@
-
- Getting Started with Web Application Development
- October 2012
-
- TBD
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Getting Started with Web Application Development
+ 1Q18
+
+ getting-started
+ application-development
+
+
-
+
-
- Introduction
+ eXist-db is much more than just an XML database. It is also an excellent platform for the development of rich web applications, based on XML
+ and related technologies (XQuery, XForms, XSLT, XHTML...). This article describes one of the possible approaches and demonstrates how to quickly
+ prototype an application, using the following key components:
+
- Being much more than just an XML database, eXist-db provides a complete platform for the development of rich web applications based on XML
- and related technologies (XQuery, XForms, XSLT, XHTML...).
- Key components of this platform are:
-
-
- a standardized packaging format for modular applications, allowing deployment into any eXist-db instance
-
-
- a set of tools to create application packages, integrated into our XQuery IDE, eXide
-
-
- an HTML templating framework for a clean separation of HTML page content and application
- logic
-
-
- a tight integration with XForms for fast form development
-
-
- a clean approach for the deployment of RESTful services based on XQuery code annotations (RESTXQ)
-
-
- This tutorial will demonstrate how to quickly prototype an application based on above components. Certainly there are many other ways to
- integrate eXist-db into your own application. This guide describes only one of the many possible approaches. However, we have found that
- taking the first steps is the most difficult part for new users, so we tried to make it as simple as possible.
-
+
+
+ A standardized packaging format for modular applications, allowing deployment into any eXist-db instance.
+
+
+ A set of tools to create application packages, integrated into our XQuery IDE, eXide.
+
+
+ An HTML templating framework for a clean separation of HTML page content and application
+ logic.
+
+
+ A tight integration with XForms for fast form development.
+
+
+ A clean approach for the deployment of RESTful services based on XQuery code annotations (RESTXQ).
+
+
-
+
+ Those working on data sets in TEI may want to look at TEI Publisher also. It includes an
+ application generator tailored at digital editions. The created app packages include all the basic functionality needed for browsing the
+ edition, searching it or producing PDFs and ePUBs.
+
-
- The Packaging
+
- eXist-db builds on the concept of self-contained, modular applications which can be deployed into any database instance using a
- standardized packaging format. Applications live inside the database, so application code, HTML views, associated services and data all reside
- in the same place - though maybe in different root collections. This makes it easy to e.g. backup an application into the file system, pass it
- to someone else for installation in his own db, or even publish it to eXist-db's public repository. The documentation, the dashboard, eXide
- and the eXist XQuery Features Demo are examples of application packages.
- The packaging format is compliant with the EXPath packaging proposal, though it extends it considerably. For distribution, an application
- is packed into a .xar archive, which is basically a ZIP archive. The .xar archive contains all the application code required by the
- application, and optionally sample or full data, along with a set of descriptors. The descriptors describe the application and control the
- installation process. As an absolute minimum, an application package must contain two descriptor files:
- expath-pkg.xml and repo.xml. You can read more about those files in the package repository documentation, but knowledge about these is not required for the following sections, since eXide will create the
- proper descriptors for you automatically.
-
+
+ Packaging
-
+ eXist-db builds on the concept of self-contained, modular applications using a standardized packaging format. Applications live inside the
+ database, so application code, HTML views, associated services and data all reside in the same place. This allows packaging an application and
+ pass it to someone else for installation in another database. It might even be published to eXist-db's public repository. The packaging format
+ is compliant with the EXPath packaging proposal, though it extends
+ it considerably.
-
- Starting a New Application
+ For distribution, an application is packed into a .xar archive, which is basically a ZIP archive. The .xar archive
+ contains all the application code and optionally data.
+ The documentation, the dashboard, eXide and the eXist XQuery Features Demo are all examples of these application packages.
- To start a new application, open eXide by clicking the link or by going via the dashboard or the
- system tray menu of eXist-db.
-
-
- From the main menu, select Application/New Application. The Deployment Editor
- dialog should pop up. If you are not logged in as an admin user yet, you'll first be required to do so.
-
-
- Fill out the form by choosing a template, a target collection, a name, an abbreviation and a title for the application. All other form
- fields and pages are optional, so you can ignore them for now.
-
- Deployment Editor Dialog
-
-
-
-
-
-
- The important fields are:
-
-
- Template
-
- The template used to generate the new application. Right now three templates are available: the "eXist-db Design" template is
- based on the standard eXist-db page layout and design. The "Plain" template creates a basic page layout without eXist-db specific
- stuff. Both templates use the Bootstrap CSS library for styling and the HTML templating XQuery module to achieve a clean separation
- of concerns. The "Empty Package" creates an empty package with just the descriptors in it.
-
-
-
- Type of the package
-
- The main distinction between "library" and "application" is: a "library" does not have an HTML view and will thus not appear
- with a clickable icon in the dashboard. Selecting "library" here does only make sense in combination with the "Empty Package"
- template.
-
-
-
- Target collection
-
- This is the collection where your app will be installed by default. Please note that this can be changed by the package manager
- during install. It is just a recommendation, not a requirement.
-
-
-
- Name
-
- A unique identifier for the application. The EXPath proposal recommends to use a URI here to avoid name collisions so we have
- made this is requirement.
-
-
-
- Abbreviation
-
- A short abbreviation for the application. Among other things, it will be used as the file name for the final .xar package and
- for the name of the collection into which the application installs. It is thus best to choose a simple abbreviation without spaces
- or punctuation characters.
-
-
-
- Title
-
- A short description of the application, which will be displayed to the user, e.g. in the dashboard.
-
-
-
-
-
- Click on Done when you have completed the form. eXide will now generate a collection hierarchy for the
- application based on the template you had selected. Once this has completed, the Open Document dialog will pop
- up to indicate that you can start editing files.
- In the following, we assume that the app has been called "Tutorial" and its abbreviation is "tutorial".
-
- Open Document Dialog after generating application
-
-
-
-
-
-
-
-
-
- Run Dialog
-
-
-
-
-
-
- To test if the application has been generated properly, select the index.html page of the new app in the open
- dialog and load it into eXide. With index.html open, select Application/Run
- App from the menu. A small dialog box will pop up, showing you a link to the application.
- Click on the link to run the application in a separate browser tab.
-
- The Default Start Page of the Application
-
-
-
-
-
-
- All the items in the Application menu apply to the active app, which is the application to which the file
- currently open in the editor belongs. You can check which app is active by looking at the "Current app:" status label at the top right of
- the eXide window.
-
-
-
-
+ The packaging format includes descriptor files. These describe the application and control the installation process. As an absolute minimum,
+ an application package must contain two descriptor files: expath-pkg.xml and
+ repo.xml. You can read more about these files in the package repository documentation.
+ Detailed knowledge about these files is not required for the following sections: eXide will create the proper basic descriptors for you
+ automatically.
+
-
- Understanding the Default Application Structure
+
- As you can see, eXide has created an application skeleton for you which works out of the box. All resources of the application reside
- below the target collection (/db/tutorial).
- The generated collection hierarchy follows a certain structure, which is usually the same for all apps which ship with eXist-db. The most
- important collections and files are described below:
-
-
- /modules/
-
- Contains XQuery modules. Most of the actual application code should go here.
-
-
-
- /resources/
-
- Secondary resources like CSS files, images or JavaScript.
-
-
-
- /templates/
+
+ Starting a New Application
+
+ To start a new application, open eXide by clicking the link or by going via the dashboard or
+ the system tray menu of eXist-db.
+
+
+ From the main menu, select Application/New Application. The Deployment Editor dialog should pop
+ up. (If you are not logged in as an admin user yet, you'll first be asked to do so.)
+
+
+ Fill out the form by choosing a template, a target collection, a name, an abbreviation and a title for the application (see below). All
+ other form fields and pages are optional, so you can ignore them for now.
+
+ Deployment Editor Dialog
+
+
+
+
+
+
+ The important fields are:
+
+
+ Template
- Page templates containing all the repeating parts of the site's HTML layout, i.e. all the stuff which applies to every HTML view of
- the application.
+ The template used to generate the new application. Right now three templates are available:
+
+
+ The "eXist-db Design" template is based on the standard eXist-db page layout and design.
+
+
+ The "Plain" template creates a basic page layout without eXist-db specific stuff.
+
+
+ The "Empty Package" creates an empty package with just the descriptors in it.
+
+
+ Both the "eXist-db Design" and "Plain" templates use the Bootstrap CSS library for styling and the HTML templating XQuery module
+ to achieve a clean separation of concerns. More about this later.
-
-
- collection.xconf
+
+
+ Type of the package
- A template file for the index configuration that should apply to this application. This file will be copied into the correct system
- collection when the application is installed, thereby automatically indexing any data that is installed.
+ The main distinction between "library" and "application" is: a "library" does not have an HTML view and will therefore not show a
+ clickable icon in the dashboard. If you want to create a library, use the "Empty Package" template.
-
-
- controller.xql
+
+
+ Target collection
- The URL rewriting controller which handles the URL space of the application. You will rarely need to change this for simple
- applications.
+ This is the collection where your app will be installed by default. Please note that this can be changed by the package manager
+ during install. It is a recommendation, not a requirement.
-
-
- expath-pkg.xml and repo.xml
+
+
+ Name
- These are the package descriptor files for the application that contain the information you entered via the Deployment Editor. You
- don't need to edit these files directly. Instead, open the Deployment Editor to change any of the descriptor properties.
+ A unique identifier for the application. The EXPath proposal recommends to use a URI here to avoid name collisions so we have made
+ this is a requirement.
-
-
- index.html
+
+
+ Abbreviation
- The default start page of the application.
+ A short abbreviation for the application. Among other things, it will be used as the file name of the final .xar
+ package and for the name of the collection into which the application installs. Choose a simple abbreviation without spaces or
+ punctuation characters.
-
-
- pre-install.xql
+
+
+ Title
- An XQuery script which will be run by the package manager before installing the app. By default, the script
- only ensures that the index configurations in collection.xconf are properly copied into the corresponding system collection before the
- app is installed.
- In addition to pre-install.xql, you may also define a post-install.xql script
- via the Deployment Editor. As the name says, this script will run after the app has been deployed into the database and is most often
- used to copy resources or run initialization tasks required by the app.
+ A short description of the application, which will be displayed to the user in the dashboard.
-
-
- You are not required to keep this structure. Feel free to restructure the app as you like it and remove some of its parts. However, you
- have to preserve the two descriptor files expath-pkg.xml and repo.xml.
-
-
-
-
-
- The HTML Templating Framework
+
+
+
+
+ Click on Done when you have completed the form. eXide will now generate a collection hierarchy for the
+ application based on the template selected. Once completed, the Open Document dialog will pop up to indicate that
+ you can start editing files.
+ In the following, we assume that the app has been called "Tutorial" and its abbreviation is "tutorial".
+
+ Open Document Dialog after generating application
+
+
+
+
+
+
+
+
- The generated application code uses the HTML Templating Framework to connect HTML views with the
- application logic. The goal of the HTML templating framework in eXist-db is a clean separation of concerns. Generating web pages directly in
- XQuery is quick and dirty, but this makes maintenance difficult and it is usually bad for code sharing and for team work. If you look at the
- index.html page, you'll see it is just an HTML div defining the actual content body. The rest of the page is dragged in
- from the page template residing in templates/page.html.
- The controller.xql is configured to call the HTML templating for every URL ending with .html. The
- processing flow for an arbitrary .html file is shown below:
-
- Processing Flow
-
+ To test if the application has been generated properly, select the index.html page of the new app in the open dialog
+ and load it into eXide. With index.html open, select Application, Run App from the menu. A
+ small dialog box will pop up, showing you a link to the application.
+ Click on the link to run the application in a separate browser tab.
+
+ Run Dialog
+
-
+
-
-
- The input for the templating is always a plain HTML file. The module scans the HTML view for elements with class attributes, following a
- simple convention. It tries to translate the class attributes into XQuery function calls. By using class attributes, the HTML remains
- sufficiently clean and does not get messed up with application code. A web designer could take the HTML files and work on them without being
- bothered by the extra class names.
- If you look at index.html, the class attribute on the outer div contains a call to a templating function:
- <div class="templates:surround?with=templates/page.html&at=content">
-
- templates:surround is one of the default templating functions provided by the module. It loads
- templates/page.html and inserts the current div from index.html into the element with the id
- "content" in templates/page.html. A detailed description of templates:surround can be found in the HTML
- templating module documentation.
- In the generated application template, you can add your own templating functions to the XQuery module
- modules/app.xql, which is included by default (you can also add your own modules though: just import them in
- modules/view.xql).
-
+
+
+
+ The Default Start Page of the Application
+
+
+
+
+
+
+ All the items in the Application menu apply to the active app (the application to which the file currently open in
+ the editor belongs). You can check which app is active by looking at the Current app: status label at the top
+ right of the eXide window.
+
+
+
-
+
-
- Example: "Hello World!"
+
+ The Default Application Structure
- For illustration, let's implement the traditional "Hello World!" example:
-
-
- Create a new HTML view, hello.html, in eXide and add the following content. To create the file, choose
- File / New from the menu. Make sure you set the file type to HTML
- (see the drop down box at the top right in eXide).
-
- This creates a simple form and a paragraph which is connected to a template function, app:helloworld, through its
- class attribute.
- Save the HTML view to the root collection of your application, e.g. /db/apps/tutorial/hello.html.
-
-
- Open modules/app.xql and add an XQuery function matching the app:helloworld template
- call:
-
- A template function is a normal XQuery function known in the context of the calling XQuery (modules/view.xql),
- which takes at least two required parameters: $node and $model, though additional parameters are allowed (see below). $node is the HTML
- element currently being processed - in this case a p element. $model is an XQuery map containing application data. We
- can ignore both parameters for this simple example, but they must be present or the function won't be recognized by the templating module.
- Please refer to the HTML templating documentation to read more about those parameters.
- The third parameter, $name, is injected automatically by the templating framework. For now it is sufficient to know that the
- templating library will try to make a best guess about how to fill in any additional parameters. In this case, an HTTP request parameter
- with the name "name" will be passed in when the form is submitted. The parameter name matches the name of the variable, so the templating
- framework will try to use it and the function parameter will be set to the value of the request parameter.
-
-
- Open hello.html in the web browser using the base URL of your app, e.g.:
- http://localhost:8080/exist/apps/tutorial/hello.html
- Fill out the box with a name and press return.
-
-
- The templating framework has many more features, so you may want to head over to its documentation to
- read more about it.
-
+ eXide has created an application skeleton for you which works out of the box. All resources of the application reside below the target
+ collection (/db/tutorial).
+ The generated collection hierarchy follows a certain structure, which is usually the same for all apps which ship with eXist-db. The most
+ important collections and files are described below:
+
+
+ /modules/
+
+ Contains XQuery modules. The actual application code should go here.
+
+
+
+ /resources/
+
+ Resources like CSS files, images or JavaScript.
+
+
+
+ /templates/
+
+ Page templates containing all the repeating parts of the site's HTML layout.
+
+
+
+ collection.xconf
+
+ A template file for the index configuration for this application. This file will be copied into the correct system collection when the
+ application is installed. This causes automatic indexing of installed data.
+
+
+
+ controller.xql
+
+ The URL rewriting controller, which handles the URL handling of the application. You will rarely need to change this for simple
+ applications.
+
+
+
+ expath-pkg.xml and repo.xml
+
+ Package descriptor files for the application. These contain the information you entered via the Deployment Editor. You don't need to
+ edit these files directly. Instead, open the Deployment Editor to change any of the descriptor properties.
+
+
+
+ index.html
+
+ The default start page of the application.
+
+
+
+ pre-install.xql
+
+ An XQuery script run by the package manager before installing the app. By default, the script only ensures the
+ index configurations (in collection.xconf) is properly copied to the corresponding system collection.
+ In addition to pre-install.xql, you may also define a post-install.xql script via the Deployment
+ Editor. As the name says, this script will run after the app is deployed into the database. It is most often used to copy resources or run
+ initialization tasks.
+
+
+
+ You are not required to keep this structure. Feel free to restructure the app as you like it and remove some of its parts. However, preserve
+ the two descriptor files expath-pkg.xml and repo.xml.
+
-
+
-
- Exporting the App
+
+ The HTML Templating Framework
- Once you have created the first pages of an application, it is usually a good idea to export it to a folder on disk. You could just click
- on Application/Download app to retrieve a .xar archive of the
- application, but exporting the app to the file system has the advantage that you can continue working on the app and have eXide keep track of
- which files have been modified since the last export. You may also want to add your app to a source control system like subversion or git, and
- this is easier if you have a copy on the file system.
- To create an export to a directory on the file system, click
- Application/Synchronize. In the popup dialog, fill in the path to the desired
- Target directory. If you are accessing eXist-db on a server, not the machine you are opening eXide in, this must point
- to a directory on the server running eXist-db, not your local file system. If you are running eXist from your own machine, the two are the
- same. The Start time can be ignored for now. It may show the time of your last export if you call synchronize again on
- the same application.
-
- Synchronize Dialog
-
-
-
-
-
-
- Clicking on Synchronize will start the export and the names of the written resources should show up in the
- table at the bottom of the dialog.
-
+ The generated application code uses the HTML Templating Framework to connect HTML views with the
+ application logic. The goal of the HTML Templating Framework is a clean separation of concerns. If you look at the index.html
+ page, you'll see it is just an HTML div defining the actual content body. The rest of the page is dragged in from the page template residing in
+ templates/page.html.
+ The controller.xql is configured to call the HTML Templating for every URL ending with .html. The
+ process flow for an arbitrary .html file is shown below:
+
+ Processing Flow
+
+
+
+
+
+
+ The input for the templating is always a plain HTML file. The module scans the HTML view for elements with class attributes, following a
+ simple convention: It tries to translate the class attributes into XQuery function calls. By using class attributes, the HTML
+ remains sufficiently clean and does not get messed up with application code. A web designer could take the HTML files and work on them without
+ being bothered by this.
+ For instance, if you look at index.html, the class attribute on the outer div contains a call to
+ a templating function:
+ <div class="templates:surround?with=templates/page.html&at=content">
+
+ templates:surround is one of the default templating functions provided by the module. It loads
+ templates/page.html and inserts the current div from index.html into the element with the id
+ content in templates/page.html. A detailed description of templates:surround can be found in the
+ HTML templating module documentation.
+ Add your own templating functions to the XQuery module modules/app.xql. Or add your own modules by importing them in
+ modules/view.xql).
+
+
+
+
+
+ Example: "Hello World!"
+
+ As an example let's implement the traditional "Hello World!":
+
+
+ Create a new HTML view, hello.html: Choose File, New from the menu and set the
+ file type to HTML (drop down box at the top right in eXide). Add the following contents:
+
+ This creates a simple form and a paragraph connected to the template function app:helloworld through its class
+ attribute.
+ Save the HTML view to the root collection of your application as /db/apps/tutorial/hello.html.
+
+
+ Open modules/app.xql and add an XQuery function matching the app:helloworld template call:
+
+
+ A template function is a normal XQuery function with two required parameters: $node and $model. Additional
+ parameters are allowed (see below).
+ $node is the HTML element currently being processed, in our example case a p element. $model is an
+ XQuery map containing application data. We can ignore both parameters for this simple example, but they must be present or the function
+ won't be recognized by the templating module. Please refer to the HTML templating documentation for
+ more information.
+ The third parameter, $name, is injected automatically by the templating framework. The templating library will try to make
+ a best guess about how to fill in any additional parameters. In this case, an HTTP request parameter called name will be passed
+ in when the form is submitted. The parameter name matches the name of the variable, so the templating framework will try to use
+ it and the function parameter will be set to the value of this request parameter.
+
+
+ Open hello.html in the web browser using the base URL of the app:
+ http://localhost:8080/exist/apps/tutorial/hello.html
+ Fill out the box with a name and press return.
+
+
+
+
+
+
+
+
+ Exporting the App
-
+ Once you have created the first pages of an application, it is a good idea to export it to a folder on disk. One way to do this is choose
+ Application, Download from the menu to create a .xar archive of the application. You can however
+ also export the app to the file system which has the advantage that you can continue working on the app and have eXide keep track of which files
+ were modified since the last export. With a copy on the file system you can also add it to a version control system like GIT or SVN.
+ To create an export to the file system, click choose Application, Synchronize from the menu. In the popup dialog,
+ fill in the path to the desired Target directory:
+
+
+ If you are accessing eXist-db on a server, not the machine you are opening eXide in, this must point to a directory on the
+ server running eXist-db, not your local file system.
+
+
+ If you are running eXist from your own machine use this.
+
+
-
- Alternatives for TEI-based Applications
+ The Start time can be ignored for now. It will show the date/time of your last export when you synchronize
+ again.
+
+ Synchronize Dialog
+
+
+
+
+
+
+ Clicking on Synchronize starts the export. The names of the resources written show up in the table at the bottom
+ of the dialog.
+
- Those working on data sets in TEI may want to look into TEI Publisher instead of following
- the procedures described above. It includes an application generator tailored at digital editions. The created app packages already include
- all the basic functionality needed for browsing the edition, searching it or producing PDFs and ePUBs.
-
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide/devguide.xml b/src/main/xar-resources/data/devguide/devguide.xml
deleted file mode 100644
index 9f8bd1a3..00000000
--- a/src/main/xar-resources/data/devguide/devguide.xml
+++ /dev/null
@@ -1,69 +0,0 @@
-
-
- Developer's Guide
- September 2009
-
- TBD
-
-
-
-
-
- REST-Style Web API
-
-
-
-
- Writing Java Applications with the XML:DB API
-
-
-
-
- Using the XML-RPC API
-
-
-
-
- SOAP
-
-
-
-
- xqDoc - Documenting Your XQuery Code
-
-
-
-
-
-
-
- Overview
-
- This section provides a quick but broad introduction to the APIs and interfaces
- provided by eXist. To get started with XQuery development in eXist-db, you should
- first read the tutorial on Getting Started with Web Application Development in eXist-db.
- In another chapter, we look at the basic
- REST-style API and its available
- HTTP request operations. Following that, we address Java
- programmers, and focus on the XML:DB API - a standard
- Java API used to access native XML database services - and its extensions.
- The following chapters examine the network APIs for
- XML-RPC and its methods -
- this includes the use of XUpdate. SOAP interface
- is discussed as an alternative to
- XML-RPC. Finally, we include an important appendix of
- libraries required to implement specific APIs and interfaces.
-
-
-
-
-
- Embedding/Required Libraries
-
- For more information on eXist libraries and how to embed eXist into your own application, please
- refer to the corresponding section in the
- Deployment Guide.
-
-
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_codereview/devguide_codereview.xml b/src/main/xar-resources/data/devguide_codereview/devguide_codereview.xml
index 8294030d..7b1aecd1 100644
--- a/src/main/xar-resources/data/devguide_codereview/devguide_codereview.xml
+++ b/src/main/xar-resources/data/devguide_codereview/devguide_codereview.xml
@@ -1,311 +1,307 @@
-
- Code Review Guide
- September 2009
-
- TBD
-
-
-
-
-
-
- Introduction
-
-
- Several aspects of the design and deployment of the
- custom solution will be analyzed and appropriate recommendations be provided as well.
- These criteria are to be followed when conducting code review.
-
-
-
- Does the solution provide custom class libraries for reusable classes
- & methods
-
-
- What classes have been extended or implemented and whether they are
- supported.
-
-
- Look for deprecated classes & methods. Look for alternate methods if
- deprecated methods used.
-
-
- Have proper classes been extended to provide functionality based on
- functional requirements? Need to check whether too heavy of objects been
- extended where a lighter weight object would suffice.
-
-
- Has proper abstract classes & interfaces been used to provide a
- flexible, yet cohesive design model?
-
-
- Has code been properly documented: JavaDoc, etc.?
-
-
- And in general, we will look for the 7 Deadly Sins of Software
- Design and make recommendation accordingly:
+
+ Code Review Guide
+ 1Q18
+
+ exist
+
+
+
+
+
+ This article provides guidelines on how to do a proper code review when developing for eXist-db's own code base.
+
+
+
+
+ Introduction
+
+ The following areas are important when conducting code review:
+
+
+ Does the solution provide custom class libraries for reusable classes & methods?
+
+
+ What classes have been extended or implemented and whether they are supported?
+
+
+ Look for deprecated classes & methods. Look for alternate methods if deprecated methods used.
+
+
+ Have the proper classes been extended to provide functionality based on functional requirements? Check whether objects have been
+ extended too heavily where a lighter extension would suffice.
+
+
+ Are proper abstract classes & interfaces used to provide a flexible, yet cohesive design model?
+
+
+ Has code been properly documented, for instance using JavaDoc?
+
+
+
+ And in general, we will look for the 7 Deadly Sins of Software Design and make recommendation accordingly:
+
+
+ Rigidity: Make it hard to change, especially if changes result in ripple effects or when you don't know what will happen when you make
+ changes.
+
+
+ Fragility: Make it easy to break. Whenever you change something, something breaks.
+
+
+ Immobility: Make it hard to reuse. When something is coupled to everything. When you try to take a piece of code (class etc.) it takes
+ all of its dependencies with it.
+
+
+ Viscosity: Make it hard to do the right thing. There are usually several ways to work with a design. Viscosity happens when it is hard
+ to work with the design the way the designer intended to. The results are tricks and workarounds that, many times, have unexpected outcomes
+ (especially if the design is also fragile).
+
+
+ Needless Complexity (Over design): When you overdo it (the "Swiss-Army knife" anti-pattern). A class that tries to anticipate every
+ possible need. Another example is applying too many patterns to a simple problem.
+
+
+ Needless Repetition: The same code is scattered around which makes it error prone.
+
+
+
+ The 7th Deadly Sin of Software Design is (the obvious) "Not doing any".
+
+
+
+
+
+
+
+
+
+ Clean Unnecessary Code
+
+ As our business needs and techniques evolve, there are more and more changes to our implementation, therefore a lot of code gets deprecated.
+ Some of this may need to remain, due to legacy data. However, some of it can be cleaned to make maintenance easy and improve performance.
+ Different cases that might be found in the review:
+
+
+ Whole classes are deprecated
+
+
+ Code segments are unnecessary due to changes.
+
+
+ Redundant registration of listener services.
+
+
+ A good planning at the beginning will help to avoid messy code. And whenever there is a change to be implemented, plan it first with a
+ thorough review to its impact and identify the code that needs to be changed. Only then implementation can start.
+ We recommend to review code that has a lot changes remove unnecessary code and merge similar code using team discussion. This may be
+ scheduled after migrated system go-live. It will improve code quality, make it easy for debugging, and reduce unnecessary maintenance
+ work.
+
+
+
+
+
+ Optimize and reduce database query to improve performance
+
+ Sometimes, we need to balance between Disk/Network and RAM access. We may improve performance at cost of memory, as long as memory
+ consumption is not the bottleneck. As a rule of the thumb, a database query is more expensive then in memory processing in terms of performance,
+ so we need to optimize and combine some of the queries to reduce database queries as much as possible.
+ Another balance is between the optimization and readability. We need to manage optimization in a controllable way.
+
+
+
+
+
+ Use local cache to improve performance
+
+ It is recommended to use local caches to store frequently accessed data to avoid database queries. Such a cache mechanism is simple and easy
+ to implement.
+
+
+
+
+
+ Avoid the use of expensive operations
+
+ Avoid using any expensive operations such as String concatenations. String concatenations are expensive because Strings are constant; their
+ values cannot be changed after they are created. So each concatenation will create a new String object.
+
+
+
+
+
+ Proper rollback of database transactions
+
+ For database transaction take care of rollback properly. Here is an example how not to do this:
+
+ The consequence of this code is that if there is an error when executing the transaction, the transaction will not be rolled back correctly.
+ This will have an impact on data integrity.
+
+ We recommend using the following structure for all database transactions to ensure data integrity.
+
+
+
+
+
+
+ Hard Coding
+
+ Hard code is hard to maintain and may cause potential problems.Here is an example:
+
+ This code will remove the host name from the URL. However, it is hard coded and will break if there is a rehosting.
+
+
+
+
+
+ Resources not released
+
+ Resource should be released when it is not needed. Here is an example:
+
+ The FileInputStream is not closed before returning from the method.
+
+
+
+
+
+ Comply to Java Coding Standards
+
+ Code conventions are important to programmers for a number of reasons:
+
+
+ 80% of the costs, over the lifetime of a piece of software, goes to maintenance.
+
+
+ Hardly any software is maintained for its whole life by the original author.
+
+
+ Code conventions improve the readability of the software, allowing engineers to understand new code more quickly and thoroughly.
+
+
+ If you ship your source code as a product, you need to make sure it is well packaged and clean, as any other product you create.
+
+
+ eXist follows the Sun coding standards. In addition to that:
+
+
+
+
+ Comments
-
- Rigidity: Make it hard to change, especially if changes might result in
- ripple effects or when you don't know what will happen when you make
- changes.
-
-
- Fragility: Make it easy to break. Whenever you change something, something
- breaks.
-
-
- Immobility: Make it hard to reuse. When something is coupled to everything
- it uses. When you try to take a piece of code (class etc.) it takes all of
- its dependencies with it.
-
-
- Viscosity: Make it hard to do the right thing. There are usually several
- ways to work with a design. Viscosity happens when it is hard to work with
- the design the way the designer intended to. The results are tricks and
- workarounds that, many times, have unexpected outcomes (esp. if the design
- is also fragile).
-
-
- Needless Complexity (Over design): When you overdo it; e.g. the
- "Swiss-Army knife" anti-pattern. A class that tries to anticipate every
- possible need. Another example is applying too many patterns to a simple
- problem etc.
-
-
- Needless Repetition: The same code is scattered about which makes it error
- prone.
-
+
+ Further details on JavaDoc comments over and above the Sun standard, can be found in the Sun Doc Comments how to guide.
+
+
+ Have JavaDoc comment for all classes
+
+
+ Each class should have a comment. This comment should describe the function, intent and role of the class.
+
+
+ Have JavaDoc comment for all methods
+
+
+ Each method should have a comment describing how the method is called and what it does. Discussion of implementation specifics should
+ be avoided since this is not for the user of a method to know in most cases. That information belongs in implementation comments.
+
+
+ Within the method JavaDoc comment, info should be added on the parameters. Each method JavaDoc comment should contain an
+ @param comment for each parameter, an @return comment if not a void or constructor method, and an
+ @throws comment for each exception (cf. Documenting Exceptions with @throws Tag).
+
+
+ The method pre and post conditions should be documented here. Pre-conditions comprise parameter ranges and the overall state of the
+ object and system expected when calling the method. Post-conditions should document the expected return value sets and the state of the
+ object and system that will apply when the method exits. These should map to assertions.
+
+
+ The JavaDoc should also document traceability of this method to the design and the requirements. Have JavaDoc comment for all fields
+ Each non-trivial field should have a comment describing the role and purpose of the field, as well as any other appropriate information
+ such as the range.
+
-
- The 7th Deadly Sin of Software Design is (the obvious) "Not
- doing any".
-
-
-
-
-
-
- Clean Unnecessary Code
-
- As our business need and technique evolves, there are more and more changes to our
- implementation, and thus there are many code deprecated. Some of them may need to
- remain due to legacy data, but some of them can be cleaned to make the maintenance
- easy and even improve the performance.
- Different cases that might be found in the review:
+
+
+
+
+
+ Exceptions
-
- Whole classes are deprecated
-
-
- Some code segments are unnecessary due to changes.
-
-
- Redundant registration of some listener services.
-
+
+ All exceptions should be handled, it is never acceptable to simply print the exception message and stack trace. Exceptions should be
+ dealt and corrective or informative action taken to highlight the issue.
+
+
+ For debugging purposes the stack trace should be logged at the final destination of the exception or at whenever the exception is
+ modified, for example throwing a XXException instead of a java.io.IOException.
+
- A good planning at the beginning will help to avoid mess in the code. And whenever
- there is a change to be implemented, plan it first with a thorough review to its
- impact and identify those codes need to be changed at the same time. Only after that
- actions can be taken to implement it,
- We recommend to review that code have a lot changes and remove unnecessary code
- and merge similar code with team discussion. This may be scheduled after migrated
- system go-live. This will improve code quality, make it easy for debugging, and
- reduce unnecessary maintenance work.
-
-
-
-
-
- Optimize and reduce database query to improve performance
-
- Sometimes, we need to balance between the access of Disk/Network and RAM. And we
- may improve performance at cost of memory as long as memory consumption is not the
- bottleneck. As a rule of thumb, database query is more expensive then in memory
- processing in terms of performance, so we need to optimize and combine some of the
- queries to reduce database query as much as possible. Another balance need to
- control is between the optimization and readability. We need to manage optimization
- in a controllable way.
-
-
-
-
-
- Use local cache to improve performance
-
- It is recommended to use local cache to store those frequently access data to
- avoid database queries. This cache mechanism is simple and easy to implement.
-
-
-
-
-
- Avoid the use of expensive operations
-
- Avoid using any expensive operations such as String concatenation. String
- concatenation which is expensive because Strings are constant; their values cannot
- be changed after they are created. So each concatenation will create a new String
- object.
-
-
-
-
-
- Proper rollback of database transactions
-
- For database transaction, fewer places don’t rollback properly, here is an
- example:
-
- The consequence of this code is that if there is an error happens when executing
- the transaction, the transaction will not be rolled back correctly. This will have
- an impact to the data integrity.
- Recommendations
- We recommend using the following structure for all database transactions to ensure
- data integrity.
-
-
-
-
-
-
- Hard Coding
-
- Hard code is hard to maintain and may cause potential problems.
- Here is an example:
-
- This code will remove the host name from the URL, however, it is hard coded and
- may break if there is a rehosting.
-
-
-
-
-
- Resources not released
-
- Resource should be released when it is not needed.
- Here is an example extracted from a system:
-
- The FileInputStream is not closed before returning from the method.
-
-
-
-
-
- Comply to Java Coding Standards
-
- Code conventions are important
- to programmers for a number of reasons:
+
+
+
+
+
+ Logging
+
-
- 80% of the lifetime cost of a piece of software goes to
- maintenance.
-
-
- Hardly any software is maintained for its whole life by the original
- author.
-
-
- Code conventions improve the readability of the software, allowing
- engineers to understand new code more quickly and thoroughly.
-
-
- If you ship your source code as a product, you need to make sure it is as
- well packaged and clean as any other product you create.
-
+
+ If no logging system is in use in the package already or the logging is unconditional, then log4j should be used for all logging.
+ Please see the Log4J Logging Guide.
+
+
+ If a class has unconditional logging, then it should be updated to use log4j. A case of unconditional logging is where there are
+ System.out.println() in the code with no conditions surrounding the call. This unnecessarily clutters the log and places a
+ burden on performance.
+
+
+ Logging should always be performed at method exit and entry as follows:
+
+
+ Log entry
+
+
+ Log arguments
+
+
+ Log exit
+
+
+ Log return values
+
+
+
+
+ The occurrence of exceptions and the stack trace should also be logged as info level items.
+
+
+ Logging calls should be wrapped in enablement checks so that arguments do not get unnecessarily evaluated for example:
+
+
- Sun coding standards is the standards eXist follows for coding. In
- addition to that, there is some addendum.
-
- Comments
-
- Further details on JavaDoc comments over and above the Sun standard,
- can be found in the Sun Doc Comments how to guide.
- Have JavaDoc comment
- for all classes
- Each class should have a comment. This comment should
- describe the function, intent and role of the class.
- Have JavaDoc
- comment for all methods
- Each method should have a comment describing
- how the method is called and what it does. Discussion of implementation specifics
- should be avoided since this is not for the user of a method to know in most cases.
- That information belongs in implementation comments.
- Within the method
- JavaDoc comment, info should be added on the parameters. Each method JavaDoc comment
- should contain an @param comment for each parameter, an @return comment if not a
- void or constructor method, and an @throws comment for each exception (cf.
- Documenting Exceptions with @throws Tag).
- The method pre and post
- conditions should be documented here. Pre-conditions comprise parameter ranges and
- the overall state of the object and system expected when calling the method.
- Post-conditions should document the expected return value sets and the state of the
- object and system that will apply when the method exits. These should map to
- assertions.
- The JavaDoc should also document traceability of this
- method to the design and the requirements. Have JavaDoc comment for all fields Each
- non-trivial field should have a comment describing the role and purpose of the
- field, as well as any other appropriate information such as the range.
-
-
-
- Exceptions
-
-
- All exceptions should be handled, it is never acceptable to simply
- print the exception message and stack trace. Exceptions should be dealt and
- corrective or informative action taken to highlight the issue.
- For
- debugging purposes the stack trace should be logged at the final destination of the
- exception or at whenever the exception is modified, for example throwing a
- XXException instead of a java.io.IOException.
-
- Logging
-
- If no logging system is in use in the package already or the logging is
- unconditional, then log4j should be used for all logging. Please see the
- Log4J Logging
- Guide
- .
- If a class has unconditional logging, then
- it should be updated to use log4j. A case of unconditional logging is where there
- are System.out.println() in the code with no conditions surrounding the call. This
- unnecessarily clutters the log and places a burden on
- performance.
- Logging should always be performed at method exit and entry
- as follows:
+
+
+
+
+
+ Assertions
+
+
-
- Log entry
-
-
- Log arguments
-
-
- Log exit
-
-
- Log return values
-
+
+ Assertions should be used in the code to verify that the expected results have occurred. Assertions should be used as liberally as
+ possible.
+
+
+ Standard Assertions should be performed at method entry and exit; these methods should validate the pre and post-conditions for the
+ method. All arguments should be checked for validity as should the return values. Similarly the state of the object and broader system
+ should be checked as appropriate on both method entry and exit, for example if a file is open.
+
- The occurrence of exceptions and the stack trace should also be
- logged as info level items.
- Logging calls should be wrapped in
- enablement checks so that arguments do not get unnecessarily evaluated for
- example:
-
-
- Assertions
-
- Assertions should be used in the code to verify that the expected
- results have occurred. Assertions should be used as liberally as possible.
-
- Standard Assertions should be performed at method entry and exit; these
- methods should validate the pre and post-conditions for the method. All arguments
- should be checked for validity as should the return values. Similarly the state of
- the object and broader system should be checked as appropriate on both method entry
- and exit, for example if a file is open.
-
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_indexes/devguide_indexes.xml b/src/main/xar-resources/data/devguide_indexes/devguide_indexes.xml
index 7347a147..6a283dbd 100644
--- a/src/main/xar-resources/data/devguide_indexes/devguide_indexes.xml
+++ b/src/main/xar-resources/data/devguide_indexes/devguide_indexes.xml
@@ -1,1492 +1,1437 @@
-
- Developer's Guide to Modularized Indexes
- June 2007
-
- TBD
-
-
+
+ Developer's Guide to Modularized Indexes
+ 1Q18
+
+ java-development
+ indexes
+
+
-
+
-
- Note
+ eXist-db provides a modular mechanism to index data. This eases index development and the development of more-or-less related custom functions
+ (in Java).
- This document has been reviewed for eXist-db 1.2.
-
+
-
-
- The new modularized indexes
+
+ Introduction
+ eXist ships with two index types based on this mechanism:
+
+
+ NGram index
+
+ An NGram index will store the N-grams contained in the data's characters. For instance, if the index is configured to index 3-grams,
+ <data>abcde</data> will generate these index entries:
+
+
+ abc
+
+
+ bcd
+
+
+ cde
+
+
+ de␣
+
+
+ e␣␣
+
+
+
+
+
+ Spatial index
+
+ A spatial index will store some of the geometric characteristics of Geography Markup Language geometries (tested against
+ GML version 2.1.2). For instance:
+
+ This will generate several index entries. The most important ones are:
+
+
+ The spatial referencing system (osgb:BNG for this polygon)
+
+
+ The polygon itself, stored in a binary form (Well-Known Binary)
+
+
+ The coordinates of its bounding box
+
+
+
+
+
+
-
+
-
- Brief overview
+
- Since around SVN revision 6000, spring 2007, i.e. after the 1.1 release, eXist-db provides a new mechanism to index
- XML data. This mechanism is modular and should ease index development as well as the development of related (possibly not so) custom
- functions. As a proof of concept, eXist currently ships with two index types :
-
-
- NGram index
-
- An NGram index will store the N-grams contained in the data's characters, i.e. if the index is configured to index 3-grams,
- <data>abcde</data> will generate these index entries :
-
-
- abc
-
-
- bcd
-
-
- cde
-
-
- de␣
-
-
- e␣␣
-
-
-
-
-
- Spatial index
-
- A spatial index will store some of the geometric characteristics of Geography Markup Language geometries (currently only tested with GML version 2.1.2).
-
- will generate index entries among which most important are :
-
-
- the spatial referencing system
- (osgb:BNG for this polygon)
-
-
- the polygon itself, stored in a binary form (Well-Known
- Binary)
-
-
- the coordinates of its bounding box
-
-
-
- The spatial index will we discussed in details further below.
-
-
-
- So, the new architecture introduces a new package, org.exist.indexing which contains a class that we will
- immediately study, IndexManager.
-
+
+ Classes
-
+
+ org.exist.indexing.IndexManager
-
-
- org.exist.indexing.IndexManager
-
+ The indexing architecture introduces a package, org.exist.indexing, responsible for index management. It is created by
+ org.exist.storage.BrokerPool which allocates org.exist.storage.DBBrokers to each process accessing
+ each DB instance. Each time a DB instance is created (most installations generally have only one, most often called exist),
+ the initialize() method contructs an IndexManager that will be available through the
+ getIndexManager() method of org.exist.storage.BrokerPool.
+ public IndexManager(BrokerPool pool, Configuration config)
+ This constructor keeps track of the BrokerPool that has created the instance and receives the database's configuration
+ object, usually defined in an XML file called conf.xml. This new entry is expected in the configuration file:
+
+ This defines 2 indexes, backed-up by their specific classes (class attribute; these classes implement the
+ org.exist.indexing.Index interface as will be seen below), eventually assigns them a human-readable (writable even)
+ identifier and passes them custom parameters which are implementation-dependant. Then, it configures (by calling their
+ configure() method), opens (by calling their open() method) and keeps track of each of them.
+
+ org.exist.indexing.IndexManager also provides these public methods:
- As its name suggests, this is the class which is responsible for index management. It is created by
- org.exist.storage.BrokerPool which allocates org.exist.storage.DBBrokers to each process
- accessing each DB instance. Each time a DB instance is created (most installations generally have only one, most often called
- exist), the initialize() method contructs an IndexManager that will be
- available through the getIndexManager() method of org.exist.storage.BrokerPool.
- public IndexManager(BrokerPool pool, Configuration config)
- This constructor keeps track of the BrokerPool that has created the instance and receives the database's
- configuration object, usually defined in an XML file called conf.xml. This new entry is expected in the configuration
- file :
-
- ... which defines 2 indexes, backed-up by their specific classes (class attribute ; these classes
- implement the org.exist.indexing.Index interface as will be seen below), eventually assigns them a human-readable
- (writable even) identifier and passes them custom parameters which are implementation-dependant. Then, it configures (by calling their
- configure() method), opens (by calling their open() method) and keeps track of each of
- them.
-
- org.exist.indexing.IndexManager also provides these public methods :
- public BrokerPool getBrokerPool()
- ... which returns the org.exist.storage.BrokerPool for which this IndexManager was
- created.
- public synchronized Index getIndexById(String indexId)
- A method that returns an Index given its class identifier (see below). Allows custom functions to access
- Indexes whatever their human-defined name is. This is probably the only method in this class that will be really
- needed by a developer.
- public synchronized Index getIndexByName(String indexName)
- The counterpart of the previous method. Pass the human-readable name of the Index as defined in the
- configuration.
- public void shutdown()
- This method is called when eXist shuts down. close() will be called for every registered
- Index. That allows them to free the resources they have allocated.
- public void removeIndexes()
- This method is called when repair() is called from
- org.exist.storage.NativeBroker.
-
-
- repair() reconstructs every index (including the structural one) from what is contained in the persistent DOM
- (usually dom.dbx).
-
-
- remove() will be called for every registered Index. That allows each index to destroy its
- persistent storage if it wants to do so (but it is probably suitable given that repair() is called when the DB
- and/or its indexes are corrupted).
- public void reopenIndexes()
- This method is called when repair() is called from
- org.exist.storage.NativeBroker.
-
-
- repair() reconstructs every index (including the structural one) from what is contained in the persistent DOM
- (usually dom.dbx).
-
-
- open() will be called for every registered Index. That allows each index to (re)allocate the
- resources it needs for its persistent storage.
-
+ public BrokerPool getBrokerPool()
+ This returns the org.exist.storage.BrokerPool for which this IndexManager was created.
+ public synchronized Index getIndexById(String indexId)
+ A method that returns an Index given its class identifier (see below). Allows custom functions to access
+ Indexes whatever their human-defined name is. This is probably the only method in this class that will be really needed
+ by a developer.
+ public synchronized Index getIndexByName(String indexName)
+ The counterpart of the previous method. Pass the human-readable name of the Index as defined in the
+ configuration.
+ public void shutdown()
+ This method is called when eXist shuts down. close() will be called for every registered Index. That
+ allows them to free the resources they have allocated.
+ public void removeIndexes()
+ This method is called when repair() is called from org.exist.storage.NativeBroker.
+
+
+ repair() reconstructs every index (including the structural one) from what is contained in the persistent DOM (usually
+ dom.dbx).
+
+
+ remove() will be called for every registered Index. That allows each index to destroy its persistent
+ storage if it wants to do so (but it is probably suitable given that repair() is called when the DB and/or its indexes are
+ corrupted).
+ public void reopenIndexes()
+ This method is called when repair() is called from org.exist.storage.NativeBroker.
+
+
+ repair() reconstructs every index (including the structural one) from what is contained in the persistent DOM (usually
+ dom.dbx).
+
+
+ open() will be called for every registered Index. That allows each index to (re)allocate the resources
+ it needs for its persistent storage.
+
-
+
-
-
- org.exist.indexing.IndexController
-
+
+ org.exist.indexing.IndexController
- Another important class is org.exist.indexing.IndexController which, as its name suggests, controls the way data
- to be indexed are dispatched to the registered indexes, using org.exist.indexing.IndexWorkers that will be described
- below. Each org.exist.storage.DBBroker constructs such an IndexController when it is itself
- constructed, using this constructor :
- public IndexController(DBBroker broker)
- ... that registers the broker's IndexWorkers, one for each registered
- Index. These IndexWorkers, that will be described below, are returned by the
- getWorker() method in org.exist.indexing.Index, which is usually a good place to create
- such an IndexWorker, at least the first time it is called.
- This IndexController will be available through the getIndexController() method of
- org.exist.storage.DBBroker.
- Here are the other public methods :
- public Map configure(NodeList configNodes, Map namespaces)
- This method receives the database's configuration object, usually defined in an XML file called conf.xml. Both
- configuration nodes and namespaces (remember that some configuration settings including e.g. pathes need namespaces to be defined) will be
- passed to the configure() method of each IndexWorker. The returned object is a
- java.util.Map that will be available from
- collection.getIndexConfiguration(broker).getCustomIndexSpec(INDEX_CLASS_IDENTIFIER).
- public IndexWorker getWorkerByIndexId(String indexId)
- A method that returns an IndexWorker given the class identifier of its associated Index
- identifier. Very useful to the developer since it allows custom functions to access IndexWorkers whatever the
- human-defined name of their Index is. This is probably the only method in this class that will be really needed by a
- developer.
- public IndexWorker getWorkerByIndexName(String indexName)
- The counterpart of the previous method. For the human-readable name of the Index as defined in the
- configuration.
- public void setDocument(DocumentImpl doc)
- This method sets the org.exist.dom.DocumentImpl on which the IndexWorkers shall work.
- Calls setDocument(doc) on each registered IndexWorker.
- public void setMode(int mode)
- This method sets the operating mode in which the IndexWorkers shall work. See below for further details on
- operating modes. Calls setMode(mode) on each registered IndexWorker.
- public void setDocument(DocumentImpl doc, int mode)
- A convenience method that sets both the org.exist.dom.DocumentImpl and the operating mode. Calls
- setDocument(doc, mode) on each registered IndexWorker.
- public DocumentImpl getDocument()
- Returns the org.exist.dom.DocumentImpl on which the IndexWorkers will have to work.
- public int getMode()
- Returns the operating mode in which the IndexWorkers will have to work.
- public void flush()
- Called in various places when pending operations, obviously data insertion, update or removal, have to be completed. Calls
- flush() on each registered IndexWorker.
- public void removeCollection(Collection collection, DBBroker broker)
- Called when a collection is to be removed. That allows to delete index entries for this collection in a single operation. Calls
- removeCollection() on each registered IndexWorker.
- public void reindex(Txn transaction, StoredNode reindexRoot, int mode)
- Called when a document is to be reindexed. Only the reindexRoot node and its descendants will have their index
- entries updated or removed depending of the mode parameter.
- public StoredNode getReindexRoot(StoredNode node, NodePath path)
- Determines the node which should be reindexed together with its descendants. Calls getReindexRoot() on each
- registered IndexWorker. The top-most node will be the actual node from which the DBBroker will
- start reindexing.
- public StoredNode getReindexRoot(StoredNode node, NodePath path, boolean includeSelf)
- Same as above, with more parameters.
- public StreamListener getStreamListener()
- Returns the first org.exist.indexing.StreamListener in the StreamListeners pipeline. There
- is at most one StreamListener per IndexWorker that will intercept the (re)indexed nodes
- stream. IndexWorkers that are not interested in the data (depending of e.g. the document and/or the operating mode)
- may return null through their getListener() method and thus not participate in the
- (re)indexing process. In other terms, they will not listen to the indexed nodes.
- public void indexNode(Txn transaction, StoredNode node, NodePath path, StreamListener listener)
- Index any kind of indexable node (currently elements, attributes and text nodes ; comments and especially processing instructions might
- be considered in the future).
- public void startElement(Txn transaction, ElementImpl node, NodePath path, StreamListener listener)
- More specific than indexNode(). For an element. Will call startElement() on
- listener if it is not null. Hence the analogy with STAX events is obvious.
- public void attribute(Txn transaction, AttrImpl node, NodePath path, StreamListener listener)
- More specific than indexNode(). For an attribute. Will call attribute() on
- listener if it is not null.
- public void characters(Txn transaction, TextImpl node, NodePath path, StreamListener listener)
- More specific than indexNode(). For a text node. Will call characters() on
- listener if it is not null.
- public void endElement(Txn transaction, ElementImpl node, NodePath path, StreamListener listener)
- Signals end of indexing for an element node. Will call endElement() on listener if it is
- not null
-
- public MatchListener getMatchListener(NodeProxy proxy)
- Returns a org.exist.indexing.MatchListener for the given node.
- The two classes aim to be essentially used by eXist itself. As a programmer you will probably need to use just one or two of the above
- methods.
-
+ Another important class is org.exist.indexing.IndexController which, as its name suggests, controls the way data to be
+ indexed are dispatched to the registered indexes, using org.exist.indexing.IndexWorkers that will be described below. Each
+ org.exist.storage.DBBroker constructs such an IndexController when it is itself constructed, using
+ this constructor :
+ public IndexController(DBBroker broker)
+ ... that registers the broker's IndexWorkers, one for each registered Index.
+ These IndexWorkers, that will be described below, are returned by the getWorker() method in
+ org.exist.indexing.Index, which is usually a good place to create such an IndexWorker, at least the
+ first time it is called.
+ This IndexController will be available through the getIndexController() method of
+ org.exist.storage.DBBroker.
+ Here are the other public methods :
+ public Map configure(NodeList configNodes, Map namespaces)
+ This method receives the database's configuration object, usually defined in an XML file called conf.xml. Both
+ configuration nodes and namespaces (remember that some configuration settings including e.g. pathes need namespaces to be defined) will be
+ passed to the configure() method of each IndexWorker. The returned object is a
+ java.util.Map that will be available from
+ collection.getIndexConfiguration(broker).getCustomIndexSpec(INDEX_CLASS_IDENTIFIER).
+ public IndexWorker getWorkerByIndexId(String indexId)
+ A method that returns an IndexWorker given the class identifier of its associated Index identifier.
+ Very useful to the developer since it allows custom functions to access IndexWorkers whatever the human-defined name of
+ their Index is. This is probably the only method in this class that will be really needed by a developer.
+ public IndexWorker getWorkerByIndexName(String indexName)
+ The counterpart of the previous method. For the human-readable name of the Index as defined in the
+ configuration.
+ public void setDocument(DocumentImpl doc)
+ This method sets the org.exist.dom.DocumentImpl on which the IndexWorkers shall work. Calls
+ setDocument(doc) on each registered IndexWorker.
+ public void setMode(int mode)
+ This method sets the operating mode in which the IndexWorkers shall work. See below for further details on operating
+ modes. Calls setMode(mode) on each registered IndexWorker.
+ public void setDocument(DocumentImpl doc, int mode)
+ A convenience method that sets both the org.exist.dom.DocumentImpl and the operating mode. Calls
+ setDocument(doc, mode) on each registered IndexWorker.
+ public DocumentImpl getDocument()
+ Returns the org.exist.dom.DocumentImpl on which the IndexWorkers will have to work.
+ public int getMode()
+ Returns the operating mode in which the IndexWorkers will have to work.
+ public void flush()
+ Called in various places when pending operations, obviously data insertion, update or removal, have to be completed. Calls
+ flush() on each registered IndexWorker.
+ public void removeCollection(Collection collection, DBBroker broker)
+ Called when a collection is to be removed. That allows to delete index entries for this collection in a single operation. Calls
+ removeCollection() on each registered IndexWorker.
+ public void reindex(Txn transaction, StoredNode reindexRoot, int mode)
+ Called when a document is to be reindexed. Only the reindexRoot node and its descendants will have their index entries
+ updated or removed depending of the mode parameter.
+ public StoredNode getReindexRoot(StoredNode node, NodePath path)
+ Determines the node which should be reindexed together with its descendants. Calls getReindexRoot() on each registered
+ IndexWorker. The top-most node will be the actual node from which the DBBroker will start
+ reindexing.
+ public StoredNode getReindexRoot(StoredNode node, NodePath path, boolean includeSelf)
+ Same as above, with more parameters.
+ public StreamListener getStreamListener()
+ Returns the first org.exist.indexing.StreamListener in the StreamListeners pipeline. There is at
+ most one StreamListener per IndexWorker that will intercept the (re)indexed nodes stream.
+ IndexWorkers that are not interested in the data (depending of e.g. the document and/or the operating mode) may return
+ null through their getListener() method and thus not participate in the (re)indexing process. In other
+ terms, they will not listen to the indexed nodes.
+ public void indexNode(Txn transaction, StoredNode node, NodePath path, StreamListener listener)
+ Index any kind of indexable node (currently elements, attributes and text nodes ; comments and especially processing instructions might be
+ considered in the future).
+ public void startElement(Txn transaction, ElementImpl node, NodePath path, StreamListener listener)
+ More specific than indexNode(). For an element. Will call startElement() on
+ listener if it is not null. Hence the analogy with STAX events is obvious.
+ public void attribute(Txn transaction, AttrImpl node, NodePath path, StreamListener listener)
+ More specific than indexNode(). For an attribute. Will call attribute() on
+ listener if it is not null.
+ public void characters(Txn transaction, TextImpl node, NodePath path, StreamListener listener)
+ More specific than indexNode(). For a text node. Will call characters() on
+ listener if it is not null.
+ public void endElement(Txn transaction, ElementImpl node, NodePath path, StreamListener listener)
+ Signals end of indexing for an element node. Will call endElement() on listener if it is not
+ null
+
+ public MatchListener getMatchListener(NodeProxy proxy)
+ Returns a org.exist.indexing.MatchListener for the given node.
+ The two classes aim to be essentially used by eXist itself. As a programmer you will probably need to use just one or two of the above
+ methods.
+
-
+
-
-
- org.exist.indexing.Index and org.exist.indexing.AbstractIndex
-
+
+ org.exist.indexing.Index and org.exist.indexing.AbstractIndex
- Now let's get into the interfaces and classes that will need to be extended by the index programmer. The first of them is the interface
- org.exist.indexing.Index which will maintain the index itself.
- As described above, a new instance of the interface will be created by the constructor of
- org.exist.indexing.IndexManager which calls the interface's newInstance() method. No need
- for a constructor then.
- Here are the methods that have to be implemented in an implementation:
- String getIndexId()
- Returns the class identifier of the index.
- String getIndexName()
- Returns the human-defined name of the index, if one was defined in the configuration file.
- BrokerPool getBrokerPool()
- Returns the org.exist.storage.BrokerPool that has created the index.
- void configure(BrokerPool pool, String dataDir, Element config)
- Notifies the Index a data directory (normally ${EXIST_HOME}/webapp/WEB-INF/data) and the
- configuration element in which it is declared.
- void open()
- Method that is executed when the Index is opened, whatever it means. Consider this method as an initialization
- and allocate the necessary resources here.
- void close()
- Method that is executed when the Index is closed, whatever it means. Consider this method as a finalization and
- free the allocated resources here.
- void sync()
- Unused.
- void remove()
- Method that is executed when eXist requires the index content to be entitrely deleted, e.g. before repairing a corrupted
- database.
- IndexWorker getWorker(DBBroker broker)
- Returns the IndexWorker that operates on this Index on behalf of
- broker. One may want to create a new IndexWorker here or pick one form a pool.
- boolean checkIndex(DBBroker broker)
- To be called by applications that want to implement a consistency check on the Index.
- There is also an abstract class that implements org.exist.indexing.Index,
- org.exist.indexing.AbstractIndex that can be used a a basis for most Index implementations.
- Most of its methods are abstract and still have to be implemented in the concrete classes. These are the few concrete methods:
- public String getDataDir()
- Returns the directory in which this Index operates. Usually defined by configure() which
- itself receives eXist's configuration settings. NB! There might be some Indexes for which the concept of data
- directory isn't accurate.
- public void configure(BrokerPool pool, String dataDir, Element config)
- Its minimal implementation retains the org.exist.storage.BrokerPool, the data directory and the human-defined
- name, if defined in the configuration file (in an attribute called id). Sub-classes may call
- super.configure() to retain this default behaviour.
- This member is protected :
- protected static String ID = "Give me an ID !"
- This is where the class identifier of the Index is defined. Override this member with, say,
- MyClass.class.getName() to provide a reasonably unique identifier within your system.
-
+ Now let's get into the interfaces and classes that will need to be extended by the index programmer. The first of them is the interface
+ org.exist.indexing.Index which will maintain the index itself.
+ As described above, a new instance of the interface will be created by the constructor of
+ org.exist.indexing.IndexManager which calls the interface's newInstance() method. No need for a
+ constructor then.
+ Here are the methods that have to be implemented in an implementation:
+ String getIndexId()
+ Returns the class identifier of the index.
+ String getIndexName()
+ Returns the human-defined name of the index, if one was defined in the configuration file.
+ BrokerPool getBrokerPool()
+ Returns the org.exist.storage.BrokerPool that has created the index.
+ void configure(BrokerPool pool, String dataDir, Element config)
+ Notifies the Index a data directory (normally ${EXIST_HOME}/webapp/WEB-INF/data) and the
+ configuration element in which it is declared.
+ void open()
+ Method that is executed when the Index is opened, whatever it means. Consider this method as an initialization and
+ allocate the necessary resources here.
+ void close()
+ Method that is executed when the Index is closed, whatever it means. Consider this method as a finalization and free
+ the allocated resources here.
+ void sync()
+ Unused.
+ void remove()
+ Method that is executed when eXist requires the index content to be entitrely deleted, e.g. before repairing a corrupted database.
+ IndexWorker getWorker(DBBroker broker)
+ Returns the IndexWorker that operates on this Index on behalf of broker. One may
+ want to create a new IndexWorker here or pick one form a pool.
+ boolean checkIndex(DBBroker broker)
+ To be called by applications that want to implement a consistency check on the Index.
+ There is also an abstract class that implements org.exist.indexing.Index,
+ org.exist.indexing.AbstractIndex that can be used a a basis for most Index implementations. Most of
+ its methods are abstract and still have to be implemented in the concrete classes. These are the few concrete methods:
+ public String getDataDir()
+ Returns the directory in which this Index operates. Usually defined by configure() which itself
+ receives eXist's configuration settings. NB! There might be some Indexes for which the concept of data directory isn't
+ accurate.
+ public void configure(BrokerPool pool, String dataDir, Element config)
+ Its minimal implementation retains the org.exist.storage.BrokerPool, the data directory and the human-defined name, if
+ defined in the configuration file (in an attribute called id). Sub-classes may call super.configure() to
+ retain this default behaviour.
+ This member is protected :
+ protected static String ID = "Give me an ID !"
+ This is where the class identifier of the Index is defined. Override this member with, say,
+ MyClass.class.getName() to provide a reasonably unique identifier within your system.
+
-
+
-
-
- org.exist.indexing.IndexWorker
-
+
+ org.exist.indexing.IndexWorker
- The next important interface that will need to be implemented is org.exist.indexing.IndexWorker which is
- responsible for managing the data in the index. Remember that each org.exist.storage.DBBroker will have such an
- IndexWorker at its disposal and that their IndexController will know what method of
- IndexWorker to call and when to call it.
- Here are the methods that have to be implemented in the concrete implementations :
- public String getIndexId()
- Returns the class identifier of the index.
- public String getIndexName()
- Returns the human-defined name of the index, if one was defined in the configuration file.
- Object configure(IndexController controller, NodeList configNodes, Map namespaces)
- This method receives the database's configuration object, usually defined in an XML file called conf.xml. Both
- configuration nodes and namespaces (remember that some configuration settings including e.g. pathes need namespaces to be defined) will be
- passed to the configure() method of the IndexWorker's
- IndexController. The IndexWorker can use this method to retain custom configuration options in
- a custom object that will be available in the java.util.Map returned by
- collection.getIndexConfiguration(broker).getCustomIndexSpec(INDEX_CLASS_IDENTIFIER). The return type is free but
- will probably generally be an implementation of java.util.Collection in order to retain several parameters.
- void setDocument(DocumentImpl doc)
- This method sets the org.exist.dom.DocumentImpl on which this IndexWorker will have to
- work.
- void setMode(int mode)
- This method sets the operating mode in which this IndexWorker will have to work. See below for further details on
- operating modes.
- void setDocument(DocumentImpl doc, int mode)
- A convenience method that sets both the org.exist.dom.DocumentImpl and the operating mode.
- DocumentImpl getDocument()
- Returns the org.exist.dom.DocumentImpl on which this IndexWorker will have to work.
- int getMode()
- Returns the operating mode in which this IndexWorker will have to work.
- void flush()
- Called periodically by the IndexController or by any other process. That is where data insertion, update or
- removal should actually take place.
- void removeCollection(Collection collection, DBBroker broker)
- Called when a collection is to be removed. That allows to delete index entries for this collection in a single operation without a need
- for a StreamListener (see below) or a call to setMode() nor
- setDocument().
- StoredNode getReindexRoot(StoredNode node, NodePath path, boolean includeSelf)
- Determines the node which should be reindexed together with its descendants. This will give a hint to the
- IndexController to determine from which node reindexing should start.
- StreamListener getListener()
- Returns a StreamListener that will intercept the (re)indexed nodes stream. IndexWorkers
- that are not interested in the data (depending of e.g. the document and/or the operating mode) may return null
- here.
- MatchListener getMatchListener(NodeProxy proxy)
- Returns a org.exist.indexing.MatchListener for the given node.
- boolean checkIndex(DBBroker broker)
- To be called by applications that want to implement a consistency check on the index.
- Occurrences[] scanIndex(DocumentSet docs)
- Returns an array of org.exist.dom.DocumentImpl.Occurrences that is an ordered list of the
- index entries, in a textual form, associated with the number of occurences for the entries and a list of the documents containing them. NB!
- For some indexes, the concept of ordered or textual occurrences might not be meaningful.
-
+ The next important interface that will need to be implemented is org.exist.indexing.IndexWorker which is responsible
+ for managing the data in the index. Remember that each org.exist.storage.DBBroker will have such an
+ IndexWorker at its disposal and that their IndexController will know what method of
+ IndexWorker to call and when to call it.
+ Here are the methods that have to be implemented in the concrete implementations :
+ public String getIndexId()
+ Returns the class identifier of the index.
+ public String getIndexName()
+ Returns the human-defined name of the index, if one was defined in the configuration file.
+ Object configure(IndexController controller, NodeList configNodes, Map namespaces)
+ This method receives the database's configuration object, usually defined in an XML file called conf.xml. Both
+ configuration nodes and namespaces (remember that some configuration settings including e.g. pathes need namespaces to be defined) will be
+ passed to the configure() method of the IndexWorker's IndexController. The
+ IndexWorker can use this method to retain custom configuration options in a custom object that will be available in the
+ java.util.Map returned by
+ collection.getIndexConfiguration(broker).getCustomIndexSpec(INDEX_CLASS_IDENTIFIER). The return type is free but will
+ probably generally be an implementation of java.util.Collection in order to retain several parameters.
+ void setDocument(DocumentImpl doc)
+ This method sets the org.exist.dom.DocumentImpl on which this IndexWorker will have to work.
+ void setMode(int mode)
+ This method sets the operating mode in which this IndexWorker will have to work. See below for further details on
+ operating modes.
+ void setDocument(DocumentImpl doc, int mode)
+ A convenience method that sets both the org.exist.dom.DocumentImpl and the operating mode.
+ DocumentImpl getDocument()
+ Returns the org.exist.dom.DocumentImpl on which this IndexWorker will have to work.
+ int getMode()
+ Returns the operating mode in which this IndexWorker will have to work.
+ void flush()
+ Called periodically by the IndexController or by any other process. That is where data insertion, update or removal
+ should actually take place.
+ void removeCollection(Collection collection, DBBroker broker)
+ Called when a collection is to be removed. That allows to delete index entries for this collection in a single operation without a need
+ for a StreamListener (see below) or a call to setMode() nor setDocument().
+ StoredNode getReindexRoot(StoredNode node, NodePath path, boolean includeSelf)
+ Determines the node which should be reindexed together with its descendants. This will give a hint to the
+ IndexController to determine from which node reindexing should start.
+ StreamListener getListener()
+ Returns a StreamListener that will intercept the (re)indexed nodes stream. IndexWorkers that are not
+ interested in the data (depending of e.g. the document and/or the operating mode) may return null here.
+ MatchListener getMatchListener(NodeProxy proxy)
+ Returns a org.exist.indexing.MatchListener for the given node.
+ boolean checkIndex(DBBroker broker)
+ To be called by applications that want to implement a consistency check on the index.
+ Occurrences[] scanIndex(DocumentSet docs)
+ Returns an array of org.exist.dom.DocumentImpl.Occurrences that is an ordered list of the index
+ entries, in a textual form, associated with the number of occurences for the entries and a list of the documents containing them. NB! For some
+ indexes, the concept of ordered or textual occurrences might not be meaningful.
+
-
+
-
-
- org.exist.indexing.StreamListener and org.exist.indexing.AbstractStreamListener
-
+
+ org.exist.indexing.StreamListener and org.exist.indexing.AbstractStreamListener
- The interface org.exist.indexing.StreamListener has these public members :
- public final static int UNKNOWN = -1;
- public final static int STORE = 0;
- public final static int REMOVE_ALL_NODES = 1;
- public final static int REMOVE_SOME_NODES = 2;
- Obviously, they are used by the setMode() method in org.exist.indexing.IndexController
- which is istself called by the different org.exist.storage.DBBrokers when they have to (re)index a node and its
- descendants. As their name suggests, there is a mode for storing nodes and two modes for removing them from the indexes. The difference
- between StreamListener.REMOVE_ALL_NODES and StreamListener.REMOVE_SOME_NODES is that the
- former removes all the nodes from a document whereas the latter removes only some nodes from a document, usually the descendants of the node
- returned by getReindexRoot(). We thus have the opportunity to trigger a process that will directly remove all the
- nodes from a given document without having to listen to each of them. Such a technique is described below.
- Here are the methods that must be implement by an implemetation:
- IndexWorker getWorker()
- Returns the IndexWorker that owns this StreamListener.
- void setNextInChain(StreamListener listener);
- Should not be used. Used to specify which is the next StreamListener in the
- IndexController's StreamListeners pipeline.
- StreamListener getNextInChain();
- Returns the next StreamListener in the IndexController's
- StreamListeners pipeline. Very important because it is the responsability of the
- StreamListener to forward the event stream to the next StreamListener in the
- pipeline.
- void startElement(Txn transaction, ElementImpl element, NodePath path)
- Signals the start of an element to the listener.
- void attribute(Txn transaction, AttrImpl attrib, NodePath path)
- Passes an attribute to the listener.
- void characters(Txn transaction, TextImpl text, NodePath path)
- Passes some character data to the listener.
- void endElement(Txn transaction, ElementImpl element, NodePath path)
- Signals the end of an element to the listener. Allow to free any temporary resource created since the matching
- startElement() has been called.
- Beside the StreamListener interface, each custom listener should extend
- org.exist.indexing.AbstractStreamListener.
- This abstract class provides concrete implementations for setNextInChain() and
- getNextInChain() that should normally never be overridden.
- It also provides dummy startElement(), attribute(),
- characters(), endElement() methods that do nothing but forwarding the node to the next
- StreamListener in the IndexController's StreamListeners
- pipeline.
- public abstract IndexWorker getWorker()
- remains abstract though, since we still can not know what IndexWorker will own the
- Listener until we haven't a concrete implementation.
-
-
+ The interface org.exist.indexing.StreamListener has these public members :
+ public final static int UNKNOWN = -1;
+ public final static int STORE = 0;
+ public final static int REMOVE_ALL_NODES = 1;
+ public final static int REMOVE_SOME_NODES = 2;
+ Obviously, they are used by the setMode() method in org.exist.indexing.IndexController which is
+ istself called by the different org.exist.storage.DBBrokers when they have to (re)index a node and its descendants. As
+ their name suggests, there is a mode for storing nodes and two modes for removing them from the indexes. The difference between
+ StreamListener.REMOVE_ALL_NODES and StreamListener.REMOVE_SOME_NODES is that the former removes all
+ the nodes from a document whereas the latter removes only some nodes from a document, usually the descendants of the node returned by
+ getReindexRoot(). We thus have the opportunity to trigger a process that will directly remove all the nodes from a given
+ document without having to listen to each of them. Such a technique is described below.
+ Here are the methods that must be implement by an implemetation:
+ IndexWorker getWorker()
+ Returns the IndexWorker that owns this StreamListener.
+ void setNextInChain(StreamListener listener);
+ Should not be used. Used to specify which is the next StreamListener in the IndexController's
+ StreamListeners pipeline.
+ StreamListener getNextInChain();
+ Returns the next StreamListener in the IndexController's StreamListeners
+ pipeline. Very important because it is the responsability of the StreamListener to forward the event stream to the next
+ StreamListener in the pipeline.
+ void startElement(Txn transaction, ElementImpl element, NodePath path)
+ Signals the start of an element to the listener.
+ void attribute(Txn transaction, AttrImpl attrib, NodePath path)
+ Passes an attribute to the listener.
+ void characters(Txn transaction, TextImpl text, NodePath path)
+ Passes some character data to the listener.
+ void endElement(Txn transaction, ElementImpl element, NodePath path)
+ Signals the end of an element to the listener. Allow to free any temporary resource created since the matching
+ startElement() has been called.
+ Beside the StreamListener interface, each custom listener should extend
+ org.exist.indexing.AbstractStreamListener.
+ This abstract class provides concrete implementations for setNextInChain() and getNextInChain() that
+ should normally never be overridden.
+ It also provides dummy startElement(), attribute(), characters(),
+ endElement() methods that do nothing but forwarding the node to the next StreamListener in the
+ IndexController's StreamListeners pipeline.
+ public abstract IndexWorker getWorker()
+ remains abstract though, since we still can not know what IndexWorker will own the Listener until we
+ haven't a concrete implementation.
+
+
-
+
-
- A use case : developing an indexing architecture for GML geometries
+
+ Use case: developing an indexing architecture for GML geometries
+ To demonstrate how modular eXist Indexes are, we have decided to show how a spatial Index could be
+ implemented. What makes its design interesting is that this kind of Index doesn't store character data from the document, nor
+ does it use a org.exist.storage.index.BFile to store the index entries. Instead, we will store WKB index entries in a JDBC
+ database, namely a HSQLDB to keep the distribution as
+ light as possible and reduce the number of external dependencies, but it wouldn't be too difficult to use another one like PostGIS given that the implementation has
+ itself been designed in a quite modular way.
+ In eXist's SVN repository, the modularized Indexes code is in extensions/indexes and the file system's
+ architecture is designed to follow eXist's core architecture, i.e. org.exist.indexing.* for the Indexes
+ and org.exist.xquery.* for their associated Modules. There is also a dedicated location for required
+ external libraries and for the test cases. The build system should normally be able to download the required libraries from the WWW (do no
+ forget to adjust your proxy server's properties in build.properties if required) build the all the files automatically, in
+ particular the extension-modules Ant target, and even launch the tests provided that the DB's configuration file declares the
+ Indexes (see above) and their associated Modules (see below).
+ The described spatial Index heavily relies on the excellent open source librairies provided by the Geotools project. We have experienced a few
+ problems that will be mentioned further, but since feedback has been provided, the situation will unquestionably improve in the future, making
+ current workarounds redundant.
+ The Index has been tested with only one file which is available from the Ordnance Survey of Great-Britain, a topography layer of Port-Talbot, which is
+ available as sample data. Shall we mention that obtaining
+ valid and sizeable GML data is still extremely difficult?
-
+
-
- Introduction
+
+ Writing the concrete implementation of org.exist.indexing.AbstractIndex
- To demonstrate how modular eXist Indexes are, we have decided to show how a spatial Index
- could be implemented. What makes its design interesting is that this kind of Index doesn't store character data from
- the document, nor does it use a org.exist.storage.index.BFile to store the index entries. Instead, we will store WKB
- index entries in a JDBC database, namely a HSQLDB to keep the distribution as light as possible and
- reduce the number of external dependencies, but it wouldn't be too difficult to use another one like PostGIS given that the implementation has itself been designed in a quite modular way.
- In eXist's SVN repository, the modularized Indexes code is in extensions/indexes and the
- file system's architecture is designed to follow eXist's core architecture, i.e. org.exist.indexing.* for the
- Indexes and org.exist.xquery.* for their associated Modules. There is
- also a dedicated location for required external libraries and for the test cases. The build system should normally be able to download the
- required libraries from the WWW (do no forget to adjust your proxy server's properties in build.properties if required)
- build the all the files automatically, in particular the extension-modules Ant target, and even launch the tests
- provided that the DB's configuration file declares the Indexes (see above) and their associated
- Modules (see below).
- The described spatial Index heavily relies on the excellent open source librairies provided by the Geotools project. We have experienced a few problems that will be mentioned further, but since
- feedback has been provided, the situation will unquestionably improve in the future, making current workarounds redundant.
- The Index has been tested with only one file which is available from the Ordnance Survey of Great-Britain, a topography layer of Port-Talbot, which is
- available as sample data. Shall we mention
- that obtaining valid and sizeable GML data is still extremely difficult?
-
+ Well, in fact we will start by writing an abstract implementation first. As said above, we have planned a modular JDBC spatial
+ Index, which will be abstract, and that will be extended by a concrete HSQLDB Index.
+ Let's start with this :
+
+ ... where we define an abstract class that extends org.exist.indexing.AbstractIndex and thus implements
+ org.exist.indexing.Index. We also define a few members like ID that will be returned by the
+ unoverriden getIndexId() from org.exist.indexing.AbstractIndex, a Logger, a
+ java.util.HashMap that will be a "pool" of IndexWorkers (one for each
+ org.exist.storage.DBBroker) and a java.sql.Connection that will handle the database operations at the
+ index level.
+ Let' now introduce this general purpose interface :
+
+ ... that defines the spatial operators that will be used by spatial queries (what would be worth a spatial index that doesn't support
+ spatial queries?). For more information about the semantics, see the JTS documentation (chapter 11). We will use this
+ wonderful library everytime a spatial computation is required. So does the Geotools project by the way.
+ Here are a few concrete methods that should be usable by any JDBC-enabled database:
+
+ First, an empty constructor, not even necessary since the Index is created through the newInstance()
+ method of its interface (see above).
+ Then, a configuration method that calls its ancestor, whose behaviour fullfills our needs. This method calls a
+ checkDatabase() method whose semantics will be dependant of the underlying DB. The basic idea is to prevent eXist to
+ continue its initialization if there is a problem with the DB.
+ Then we will do nothing during open(). No need to open a database, which is costly, if we dont need it.
+ The close() will flush any pending operation currently queued by the IndexWorkers and resets their
+ state in order to prevent them to start any further operation, which should never be possible if eXist is their only user. Then it will call a
+ shutdownDatabase() method whose semantics will be dependant of the underlying DB. They can be fairly simple for DBs that
+ shut down automatically when the virtual machine shuts down.
+ The sync() is never called by eXist. It's here to make the interface happy.
+ The remove() method is similar to close(). It then calls two database-dependant methods that are
+ pretty redundant. deleteDatabase() will probably not be able to do what its name suggests if eXist doesn't own the admin
+ rights. Conversely, removeIndexContent() wiould probably have nothing to do if eXist owns the admin rights since physically
+ destroying the DB would probably be more efficient than deleteing table contents.
+
+ checkIndex() will delegate the task to the broker's IndexWorker.
+ The remaining methods are DB-dependant and thus abstract :
+
+ Let's see now how a HSQL-dependant implementation would be going by describing the concrete class :
+
+ Of course, we extend org.exist.indexing.spatial.AbstractGMLJDBCIndex, then a few members are defined : a
+ Logger, a file prefix (which will be required by the files required by HSQLDB storage, namely
+ spatial_index.lck, spatial_index.log, spatial_index.script and
+ spatial_index.properties), then a table name in which the spatial index data will be stored, then a variable that will
+ hold the org.exist.storage.DBBroker that currently holds a connection to the DB (we could have used an
+ IndexWorker here, given their 1:1 relationship). The problem is that we run HSQLDB in embedded mode and that only one
+ connection is available at a given time.
+ A more elaborated DBMS, or HSQLDB running in server mode would permit the allocation of one connection per IndexWorker,
+ but we have chosen to keep things simple for now. Indeed, if IndexWorkers are thread-safe (because each
+ org.exist.storage.DBBroker operates within its own thread), a single connection will have to be controlled by the
+ Index which is controlled by the org.exist.storage.BrokerPool. See below how we will handle
+ concurrency, given such perequisites.
+ The last member is the timeout when a Connection to the DB is requested.
+ As we can see, we have an empty constructor again.
+ The next method calls its ancestor's configure() method and just retains the content of the
+ connectionTimeout attribute as defined in the configuration file.
+
+ The next method is also quite straightforward :
+
+ It picks an IndexWorker (more precisely a org.exist.indexing.spatial.GMLHSQLIndexWorker that will be
+ described below) for the given broker from the "pool". If needed, namely the first time the method is called with with
+ parameter, it creates one. Notice that this IndexWorker is DB-dependant. It will be described below.
+ Then come a few general-purpose methods:
+
+
+ checkDatabase() just checks that we have a suitable driver in the CLASSPATH. We don't want to open the database right now.
+ It costs too much.
+
+ shutdownDatabase() is just one of the many ways to shutdown a HSQLDB.
+
+ deleteDatabase() is just a file system management problem ; remember that the database should be closed at that moment : no
+ file locking issues.
+
+ removeIndexContent() deletes the table that contains spatial data. Less efficient than deleteing the whole databse though
+ ;-), as explained above.
+ The 2 next methods are totally JDBC-specific and, given the way they are implemented, are totally embedded HSQLDB-specific. The
+ current code is directly adapted from org.exist.storage.lock.ReentrantReadWriteLock to show that
+ connection management should be severely controlled given the concurrency context induced by using many
+ org.exist.storage.DBBroker. Despite the fact DBBrokers are thread-safe, access to
+ shared storage must be concurrential, in particular when flush() is called.
+
+ org.exist.storage.index.BFile users would call getLock() to acquire and release locks on the index
+ files. Our solution is thus very similar.
+ However, since most JDBC databases are able to work in a concurrential context, it would then be better to never call these
+ Index-level methods from the IndexWorkers and let each IndexWorker handle its
+ connection to the underlying DB.
+
+
+ acquireConnection() acquires an exclusive JDBC Connection to the storage engine for
+ an IndexWorker (or a org.exist.storage.DBBroker, which roughly means the same thing). This is where a
+ Connection is created if necessary (see below) and makes the first connection's performance cost due only when
+ needed.
+
+ releaseConnection() marks the connection as being unused. It will thus become available when requested again.
+ The last method concentrates the index-level DB-dependant code in just one place (removeIndexContent() is relatively
+ DB-independant).
+
+ This method opens a Connection and, if it is a new one (the new one since we only have one),
+ checks that we have a SQL table for the spatial data. If not, i.e. if the spatial index doesn't exist yet, a table is created with the
+ following structure :
+
+
-
+
+
+
+
+ Field name
+
+
+ Field type
+
+
+ Description
+
+
+ Comments
+
+
+
+
+ DOCUMENT_URI
+
+
+ VARCHAR
+
+
+ The document's URI
+
+
+
+
+
+ NODE_ID_UNITS
+
+
+ INTEGER
+
+
+ The number of useful bits in NODE_ID
+
+
+ See below
+
+
+
+
+ NODE_ID
+
+
+ BINARY
+
+
+ The node ID, as a byte array
+
+
+ See above. Only some bits might be considered due to obvious data alignment requirements
+
+
+
+
+ GEOMETRY_TYPE
+
+
+ VARCHAR
+
+
+ The geometry type
+
+
+ As returned by the JTS
+
+
+
+
+ SRS_NAME
+
+
+ VARCHAR
+
+
+ The SRS of the geometry
+
+
+
+ srsName attribute in the GML element
+
+
+
+
+ WKT
+
+
+ VARCHAR
+
+
+ The Well-Known
+ Text representation of the geometry
+
+
+
+
+
+ WKB
+
+
+ BINARY
+
+
+ The WKB representation of the geometry
+
+
+
+
+
+ MINX
+
+
+ DOUBLE
+
+
+ The minimal X of the geometry
+
+
+
+
+
+ MAXX
+
+
+ DOUBLE
+
+
+ The maximal X of the geometry
+
+
+
+
+
+ MINY
+
+
+ DOUBLE
+
+
+ The minimal Y of the geometry
+
+
+
+
+
+ MAXY
+
+
+ DOUBLE
+
+
+ The maximal Y of the geometry
+
+
+
+
+
+ CENTROID_X
+
+
+ DOUBLE
+
+
+ The X of the geometry's centroid
+
+
+
+
+
+ CENTROID_Y
+
+
+ DOUBLE
+
+
+ The Y of the geometry's centroid
+
+
+
+
+
+ AREA
+
+
+ DOUBLE
+
+
+ The area of the geometry
+
+
+ Expressed in the measure defined in its SRS
+
+
+
+
+ EPSG4326_WKT
+
+
+ VARCHAR
+
+
+ The WKT representation of the geometry
+
+
+ In the epsg:4326
+ SRS
+
+
+
+
+ EPSG4326_WKB
+
+
+ BINARY
+
+
+ The WKB representation of the geometry
+
+
+ In the epsg:4326 SRS
+
+
+
+
+ EPSG4326_MINX
+
+
+ DOUBLE
+
+
+ The minimal X of the geometry
+
+
+ In the epsg:4326 SRS
+
+
+
+
+ EPSG4326_MAXX
+
+
+ DOUBLE
+
+
+ The maximal X of the geometry
+
+
+ In the epsg:4326 SRS
+
+
+
+
+ EPSG4326_MINY
+
+
+ DOUBLE
+
+
+ The minimal Y of the geometry
+
+
+ In the epsg:4326 SRS
+
+
+
+
+ EPSG4326_MAXY
+
+
+ DOUBLE
+
+
+ The maximal Y of the geometry
+
+
+ In the epsg:4326 SRS
+
+
+
+
+ EPSG4326_CENTROID_X
+
+
+ DOUBLE
+
+
+ The X of the geometry's centroid
+
+
+ In the epsg:4326 SRS
+
+
+
+
+ EPSG4326_CENTROID_Y
+
+
+ DOUBLE
+
+
+ The Y of the geometry's centroid
+
+
+ In the epsg:4326 SRS
+
+
+
+
+ EPSG4326_AREA
+
+
+ DOUBLE
+
+
+ The area of the geometry
+
+
+ In the epsg:4326 SRS (measure unknown, to be clarified)
+
+
+
+
+ IS_CLOSED
+
+
+ BOOLEAN
+
+
+ Whether or not this geometry is "closed"
+
+
+ See the JTS documentation (chapter 13)
+
+
+
+
+ IS_SIMPLE
+
+
+ BOOLEAN
+
+
+ Whether or not this geometry is "simple"
+
+
+ See the JTS documentation (chapter 13)
+
+
+
+
+ IS_VALID
+
+
+ BOOLEAN
+
+
+ Whether or not this geometry is "valid"
+
+
+ See the JTS documentation (chapter 13). Should
+ always be TRUE
+
+
+
+
+
+ Uniqueness will be enforced on a (DOCUMENT_URI, NODE_ID_UNITS, NODE_ID) basis. Indeed, we can have at most one index
+ entry for a given node in a given document.
+ Also, indexes are created on these fields to help queries :
+
+
+ DOCUMENT_URI
+
+
+ NODE_ID
+
+
+ GEOMETRY_TYPE
+
+
+ WKB
+
+
+ EPSG4326_WKB
+
+
+ EPSG4326_MINX
+
+
+ EPSG4326_MAXX
+
+
+ EPSG4326_MINY
+
+
+ EPSG4326_MAXY
+
+
+ EPSG4326_CENTROID_X
+
+
+ EPSG4326_CENTROID_Y
+
+
+ Every geometry will be internally stored in both its original SRS and in the epsg:4326 SRS. Having this kind of
+ common, world-wide applicable, SRS for all geometries in the index allows to make operations on them even if they are
+ originally defined in different SRSes.
+
+ By default, eXist's build will download the lightweight gt2-epsg-wkt-XXX.jar library which lacks some parameters, the
+ Bursa-Wolf
+ ones. A better accuracy for geographic transformations might be obtained by using a heavier library like
+ gt2-epsg-hsql-XXX.jar
+ which is documented here.
+
+
-
- Writing the concrete implementation of org.exist.indexing.AbstractIndex
-
+
- Well, in fact we will start by writing an abstract implementation first. As said above, we have planned a modular JDBC spatial
- Index, which will be abstract, and that will be extended by a concrete HSQLDB Index.
- Let's start with this :
-
- ... where we define an abstract class that extends org.exist.indexing.AbstractIndex and thus implements
- org.exist.indexing.Index. We also define a few members like ID that will be returned by the
- unoverriden getIndexId() from org.exist.indexing.AbstractIndex, a
- Logger, a java.util.HashMap that will be a "pool" of IndexWorkers
- (one for each org.exist.storage.DBBroker) and a java.sql.Connection that will handle the
- database operations at the index level.
- Let' now introduce this general purpose interface :
-
- ... that defines the spatial operators that will be used by spatial queries (what would be worth a spatial index that doesn't support
- spatial queries?). For more information about the semantics, see the JTS documentation (chapter 11). We will use this wonderful
- library everytime a spatial computation is required. So does the Geotools project by the way.
- Here are a few concrete methods that should be usable by any JDBC-enabled database:
-
- First, an empty constructor, not even necessary since the Index is created through the
- newInstance() method of its interface (see above).
- Then, a configuration method that calls its ancestor, whose behaviour fullfills our needs. This method calls a
- checkDatabase() method whose semantics will be dependant of the underlying DB. The basic idea is to prevent eXist
- to continue its initialization if there is a problem with the DB.
- Then we will do nothing during open(). No need to open a database, which is costly, if we dont need it.
- The close() will flush any pending operation currently queued by the IndexWorkers and
- resets their state in order to prevent them to start any further operation, which should never be possible if eXist is their only user. Then
- it will call a shutdownDatabase() method whose semantics will be dependant of the underlying DB. They can be fairly
- simple for DBs that shut down automatically when the virtual machine shuts down.
- The sync() is never called by eXist. It's here to make the interface happy.
- The remove() method is similar to close(). It then calls two database-dependant
- methods that are pretty redundant. deleteDatabase() will probably not be able to do what its name suggests if eXist
- doesn't own the admin rights. Conversely, removeIndexContent() wiould probably have nothing to do if eXist owns the
- admin rights since physically destroying the DB would probably be more efficient than deleteing table contents.
-
- checkIndex() will delegate the task to the broker's
- IndexWorker.
- The remaining methods are DB-dependant and thus abstract :
-
- Let's see now how a HSQL-dependant implementation would be going by describing the concrete class :
-
- Of course, we extend org.exist.indexing.spatial.AbstractGMLJDBCIndex, then a few members are defined : a
- Logger, a file prefix (which will be required by the files required by HSQLDB storage, namely
- spatial_index.lck, spatial_index.log, spatial_index.script and
- spatial_index.properties), then a table name in which the spatial index data will be stored, then a variable that
- will hold the org.exist.storage.DBBroker that currently holds a connection to the DB (we could have used an
- IndexWorker here, given their 1:1 relationship). The problem is that we run HSQLDB in embedded mode and that only
- one connection is available at a given time.
- A more elaborated DBMS, or HSQLDB running in server mode would permit the allocation of one connection per
- IndexWorker, but we have chosen to keep things simple for now. Indeed, if IndexWorkers are
- thread-safe (because each org.exist.storage.DBBroker operates within its own thread), a single connection will have
- to be controlled by the Index which is controlled by the org.exist.storage.BrokerPool. See
- below how we will handle concurrency, given such perequisites.
- The last member is the timeout when a Connection to the DB is requested.
- As we can see, we have an empty constructor again.
- The next method calls its ancestor's configure() method and just retains the content of the connectionTimeout attribute as defined in the configuration file.
-
- The next method is also quite straightforward :
-
- It picks an IndexWorker (more precisely a org.exist.indexing.spatial.GMLHSQLIndexWorker
- that will be described below) for the given broker from the "pool". If needed, namely the first time the method is
- called with with parameter, it creates one. Notice that this IndexWorker is DB-dependant. It will be described
- below.
- Then come a few general-purpose methods:
-
-
- checkDatabase() just checks that we have a suitable driver in the CLASSPATH. We don't want to open the database
- right now. It costs too much.
-
- shutdownDatabase() is just one of the many ways to shutdown a HSQLDB.
-
- deleteDatabase() is just a file system management problem ; remember that the database should be closed at that
- moment : no file locking issues.
-
- removeIndexContent() deletes the table that contains spatial data. Less efficient than deleteing the whole databse
- though ;-), as explained above.
- The 2 next methods are totally JDBC-specific and, given the way they are implemented, are totally embedded HSQLDB-specific. The
- current code is directly adapted from org.exist.storage.lock.ReentrantReadWriteLock to show
- that connection management should be severely controlled given the concurrency context induced by using many
- org.exist.storage.DBBroker. Despite the fact DBBrokers are thread-safe, access to
- shared storage must be concurrential, in particular when flush() is called.
-
- org.exist.storage.index.BFile users would call getLock() to acquire and release locks on the
- index files. Our solution is thus very similar.
- However, since most JDBC databases are able to work in a concurrential context, it would then be better to never call these
- Index-level methods from the IndexWorkers and let each IndexWorker
- handle its connection to the underlying DB.
-
-
- acquireConnection() acquires an exclusive JDBC Connection to the
- storage engine for an IndexWorker (or a org.exist.storage.DBBroker, which roughly means the
- same thing). This is where a Connection is created if necessary (see below) and makes the first connection's
- performance cost due only when needed.
-
- releaseConnection() marks the connection as being unused. It will thus become available when requested
- again.
- The last method concentrates the index-level DB-dependant code in just one place (removeIndexContent() is
- relatively DB-independant).
-
- This method opens a Connection and, if it is a new one (the new one since we only have one),
- checks that we have a SQL table for the spatial data. If not, i.e. if the spatial index doesn't exist yet, a table is created with the
- following structure :
-
-
+
+ Writing the concrete implementation of org.exist.indexing.IndexWorker
-
-
-
-
- Field name
-
-
- Field type
-
-
- Description
-
-
- Comments
-
-
-
-
- DOCUMENT_URI
-
-
- VARCHAR
-
-
- The document's URI
-
-
-
-
-
- NODE_ID_UNITS
-
-
- INTEGER
-
-
- The number of useful bits in NODE_ID
-
-
- See below
-
-
-
-
- NODE_ID
-
-
- BINARY
-
-
- The node ID, as a byte array
-
-
- See above. Only some bits might be considered due to obvious data alignment requirements
-
-
-
-
- GEOMETRY_TYPE
-
-
- VARCHAR
-
-
- The geometry type
-
-
- As returned by the JTS
-
-
-
-
- SRS_NAME
-
-
- VARCHAR
-
-
- The SRS of the geometry
-
-
-
- srsName attribute in the GML element
-
-
-
-
- WKT
-
-
- VARCHAR
-
-
- The Well-Known Text representation of the geometry
-
-
-
-
-
- WKB
-
-
- BINARY
-
-
- The WKB representation of the geometry
-
-
-
-
-
- MINX
-
-
- DOUBLE
-
-
- The minimal X of the geometry
-
-
-
-
-
- MAXX
-
-
- DOUBLE
-
-
- The maximal X of the geometry
-
-
-
-
-
- MINY
-
-
- DOUBLE
-
-
- The minimal Y of the geometry
-
-
-
-
-
- MAXY
-
-
- DOUBLE
-
-
- The maximal Y of the geometry
-
-
-
-
-
- CENTROID_X
-
-
- DOUBLE
-
-
- The X of the geometry's centroid
-
-
-
-
-
- CENTROID_Y
-
-
- DOUBLE
-
-
- The Y of the geometry's centroid
-
-
-
-
-
- AREA
-
-
- DOUBLE
-
-
- The area of the geometry
-
-
- Expressed in the measure defined in its SRS
-
-
-
-
- EPSG4326_WKT
-
-
- VARCHAR
-
-
- The WKT representation of the geometry
-
-
- In the epsg:4326 SRS
-
-
-
-
- EPSG4326_WKB
-
-
- BINARY
-
-
- The WKB representation of the geometry
-
-
- In the epsg:4326 SRS
-
-
-
-
- EPSG4326_MINX
-
-
- DOUBLE
-
-
- The minimal X of the geometry
-
-
- In the epsg:4326 SRS
-
-
-
-
- EPSG4326_MAXX
-
-
- DOUBLE
-
-
- The maximal X of the geometry
-
-
- In the epsg:4326 SRS
-
-
-
-
- EPSG4326_MINY
-
-
- DOUBLE
-
-
- The minimal Y of the geometry
-
-
- In the epsg:4326 SRS
-
-
-
-
- EPSG4326_MAXY
-
-
- DOUBLE
-
-
- The maximal Y of the geometry
-
-
- In the epsg:4326 SRS
-
-
-
-
- EPSG4326_CENTROID_X
-
-
- DOUBLE
-
-
- The X of the geometry's centroid
-
-
- In the epsg:4326 SRS
-
-
-
-
- EPSG4326_CENTROID_Y
-
-
- DOUBLE
-
-
- The Y of the geometry's centroid
-
-
- In the epsg:4326 SRS
-
-
-
-
- EPSG4326_AREA
-
-
- DOUBLE
-
-
- The area of the geometry
-
-
- In the epsg:4326 SRS (measure unknown, to be clarified)
-
-
-
-
- IS_CLOSED
-
-
- BOOLEAN
-
-
- Whether or not this geometry is "closed"
-
-
- See the JTS documentation (chapter
- 13)
-
-
-
-
- IS_SIMPLE
-
-
- BOOLEAN
-
-
- Whether or not this geometry is "simple"
-
-
- See the JTS documentation (chapter
- 13)
-
-
-
-
- IS_VALID
-
-
- BOOLEAN
-
-
- Whether or not this geometry is "valid"
-
-
- See the JTS documentation (chapter 13).
- Should always be TRUE
-
-
-
-
-
- Uniqueness will be enforced on a (DOCUMENT_URI, NODE_ID_UNITS, NODE_ID) basis. Indeed, we can have at most one
- index entry for a given node in a given document.
- Also, indexes are created on these fields to help queries :
-
-
- DOCUMENT_URI
-
-
- NODE_ID
-
-
- GEOMETRY_TYPE
-
-
- WKB
-
-
- EPSG4326_WKB
-
-
- EPSG4326_MINX
-
-
- EPSG4326_MAXX
-
-
- EPSG4326_MINY
-
-
- EPSG4326_MAXY
-
-
- EPSG4326_CENTROID_X
-
-
- EPSG4326_CENTROID_Y
-
-
- Every geometry will be internally stored in both its original SRS and in the epsg:4326 SRS. Having this kind of
- common, world-wide applicable, SRS for all geometries in the index allows to make operations on them even if they are
- originally defined in different SRSes.
-
- By default, eXist's build will download the lightweight gt2-epsg-wkt-XXX.jar library which lacks some parameters,
- the Bursa-Wolf
- ones. A better accuracy for geographic transformations might be obtained by using a heavier library like
- gt2-epsg-hsql-XXX.jar
- which is documented here.
-
-
-
-
-
-
- Writing the concrete implementation of org.exist.indexing.IndexWorker
-
+ Just like for org.exist.indexing.spatial.AbstractGMLJDBCIndex, we will start to design a database-independant abstract
+ class. This class should normally be the basis of every JDBC spatial index. It will handle most of the hard work.
+ Let's start by a few members and a few general-purpose public methods :
+
+ Of course, org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker implements
+ org.exist.indexing.IndexWorker.
+
+ GML_NS is the GML namespace for which the spatial index is specially designed. Use this public member to avoid redundancy
+ and, worse, inconsistencies.
+
+ INDEX_ELEMENT is the configuration's element name which is accurate for our Index configuration. To
+ configure a collection in order to index its GML data, define such a configuration :
+
+ Got the gml element? We will shortly see how this information is able to configure our IndexWorker.
+
+ controller, index and broker should now be quite straightforward.
+
+ currentMode and currentDoc should also be straightforward.
+
+ geometries is a collection of com.vividsolutions.jts.geom.Geometry instances that are currently held in
+ memory, waiting for being "flushed" to the database. Depending of currentMode, they're pending for insertion or
+ removal.
+
+ currentNodeId is used to share the ID of the node currently being processed between the different inner classes.
+
+ streamedGeometry is the last com.vividsolutions.jts.geom.Geometry that has been generated by GML
+ parsing. It is null if the geometry is topologically not well formed. This latter case is maybe a too restrictive feature
+ of Geotools parser which also throws NullPointerExceptions (!) if the GML is somehow not well-formed. See GEOT-742 for more information on
+ this issue.
+
+ documentDeleted is a flag indicating that the current document has been deleted and that we don't have to process it any
+ more. Remember that StreamListener.REMOVE_ALL_NODES send some events for all nodes.
+
+ flushAfter will hold our configuration's setting.
+
+ geometryHandler is our GML geometries SAX handler that will convert GML to a
+ com.vividsolutions.jts.geom.Geometry instance. It is included in a handler chain composed of
+ geometryFilter and geometryDocument.
+
+ transforms will cache a list a transformations between a source and a target SRS.
+
+ useLenientMode will be set to true is the transformation libraries that are in the CLASSPATH don't have
+ the Bursa-Wolf parameters. Transformations will be attempted, but with a precision loss (see above).
+
+ gmlStreamListener is our own implementation of org.exist.indexing.StreamListener. Since there is a 1:1
+ (or even 1:0) relationship with the IndexWorker, it will be implemented as an inner class and will be described
+ below.
+
+ coordinateTransformer will be needed during Geometry transformations to other SRSes.
+
+ gmlTransformer will be needed during Geometry transformations to XML.
+
+ wkbWriter and wkbReader will be needed during Geometry serialization and
+ deserialization to and from the database.
+
+ wktWriter and wktReader will be needed during Geometry WKT serialization and
+ deserialization to and from the database. WKT could be dynamically generated from Geometry but we have chosen to store it
+ in the HSQLDB.
+
+ base64Encoder and base64Decoder will be needed to convert binary date, namely WKB, to XML types, namely
+ xs:base64Binary.
+ No need to comment the methods, expect maybe getIndexId() that will return the static ID of the
+ Index. No chance to be wrong with such a design.
+ The next method is a bit specific :
+
+ It is only interested in the gml element of the configuration. If it finds one, it creates a
+ org.exist.indexing.spatial.GMLIndexConfig instance wich is a very simple class :
+
+ ... that retains the configuration attribute and provides a getter for it.
+ This configuration object is saved in a Map with the Index ID and will be available as shown in the next method
+ :
+
+ The objective is to determine if document should be indexed by the spatial Index.
+ For this, we look up its collection configuration and try to find a "custom" index specification for our Index. If one
+ is found, our document will be processed by the IndexWorker. We also take advantage of this process to
+ set one of our members. If document doesn't interest our IndexWorker, we reset some members to avoid
+ having an inconsistent sticky state.
+ The next methods don't require any particular comment :
+
+ The next method is somehow tricky :
+
+ It doesn't return any StreamListener in the StreamListner.REMOVE_ALL_NODES. It would be totally
+ unnecessary to listen at every node whereas a JDBC database will be able to delete all the document's nodes in one single statement.
+ The next method is a place holder that needs more thinking. How to highlight a geometric information smartly?
+
+ The next method computes the reindexing root. We will go bottom-up form the not to be modified until the top-most element in the GML
+ namespace. Indeed, GML allows "nested" or "multi" geometries. If a single part of such Geometry is modified, the whole
+ geometry has to be recomputed.
+
+ The next method delegates the write operations :
+
+ Even though its code looks thick, it proves to be a good way to acquire (then release) a Connection whatever the way it
+ is provided by the IndexWorker (see above for these aspects, concurrency in particular). It then delegates the write
+ operations to dedicated methods, which do not have to care about the Connection. Write operations are embedded in a
+ transaction. Should an exception occur, it would be logged and swallowed: eXist doesn't like exceptions when it flushes its data.
+ The next method delegates node storage:
+
+ It will call saveGeometryNode() (see below) passing a container inner class that will not be described given its
+ simplicity.
+ The next two methods are built with the same design. The first one destroys the index entry for the currently processed node and the
+ second one removes the index entries for the whole document.
+
+ The next method is a mix of the designs described above. It also previously makes a check:
+
+ Indeed, we have to check if the collection is indexable by the Index before trying to delete its index entries.
+ The next methods are built on the same design (Collection and exception management) and will thus not be
+ described.
+
+ ... because all these methods delegate to the following abstract methods that will have to be implemented by the DB-dependant concrete
+ classes :
+
+ Let's have a look however at this method that doesn't need a DB-dependant implementation :
+
+ Same design (Collection and exception management, delegation mechanism). We probably will add more like this in the
+ future.
+ The following methods are utility methods to stream Geometry instances to XML and vice-versa.
+
+ The first one uses a org.geotools.gml.GMLFilterDocument (see below) and the second one uses a
+ org.geotools.gml.producer.GeometryTransformer which needs some polishing because, despite it is called a transformer, it
+ doesn't cope easily with a Handler and returns a... String ! See GEOT-1315.
+ The last method is also a utility method :
+
+ It implements a workaround for our test file SRS which isn't yet known by Geotools libraries (see GEOT-1307), then it
+ tries to get the transformation from our cache. If it doesn't succeed, it tries to find one in the libraries that are in the CLASSPATH. Should
+ those libraries lack the Bursa-Wolf parameters, it will make another attempt in lenient mode, which will induce a loss of accuracy. Then, it
+ transforms the Geometry from its sourceCRS to the required targetCRS.
+ Now, let's study how the abstract methods are implement by the HSQLDB-dependant class :
+
+ The only noticeable point is that we indeed extend our org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
+
+ Now, this method will do something more interesting, store the Geometry associated to a node :
+
+ The generated SQL statement should be straightforward. We make a heavy use of the methods provided by
+ com.vividsolutions.jts.geom.Geometry, both on the "native" Geometry and on its EPSG:4326
+ transformation. Would could probably store other properties here (like, e.g. the geometry's boundary). Other IndexWorkers,
+ especially those accessing a spatially-enabled DBMS, might prefer to store fewer properties if they can be computed dynamically at a cheap
+ price.
+ The next method is even much easier to understand :
+
+ ... and this one even more :
+
+ This one however, is a little bit trickier :
+
+ ... maybe because it makes use of a SQL function to filter the right documents ?
+ The two next methods are straightforward, now that we have explained that Connections had to be requested from the
+ Index to avoid concurrency problems on an embedded HSQLDB instance.
+
+ The next method is much more interesting. This is where is the core of the spatial index is:
+
+ The trick is to filter the geometries on (fast) BBox operations first (intersecting geometries have BBox intersecting as well) which is
+ possible in every case but for the Spatial.DISJOINT operator. For the latter case, we will have to fetch the BBox
+ coordinates in order to apply a further filtering. Then, we examine the results and filter out the documents that are not in the
+ contextSet. Spatial.DISJOINT filtering is then applied to avoid the next step in case the BBoxes are
+ themselves disjoint. Only then, we perform the costly operations, namely Geometry deserialization from the DB then
+ performing spatial operations on it. Matching nodes are then returned.
+ The next method is quite straightforward:
+
+ Notice that it will return EPSG:4326 Geometryies and that it will rethrow a
+ com.vividsolutions.jts.io.ParseException as a java.sql.SQLException.
+ The next method is a bit more restrictive and modular :
+
+ ... because if directly selects the right node and allows to return either the original Geometry, either its EPSG:4326
+ transformation.
+ The next method is a generalization of the previous one:
+
+ It queries the whole index for the requested Geometry, ignoring the documents that are not in the
+ contextSet, and it also ignores the nodes that are not in the contextSet. After that the
+ Geometry is deserialized.
+
+ This method is not yet used by the spatial functions but it is planned to use it in a future optimization effort.
+
+ This is the next method, designed like getGeometryForNode():
+
+ It directly requests the required property from the DB and returns an appropriate XML atomic value.
+ The next method is a generalization of the previous one :
+
+ It queries the whole index for the requested property, ignoring the documents that are not in the contextSet, and it
+ also ignores the nodes that are not in the contextSet. Finally the property mapped to the appropriate XML atomic value is
+ returned.
+
+ This method is not yet used by the spatial functions but it is planned to use it in a future optimization effort.
+
+ The last method is a utility method and we will only show a part of its body:
+
+ It deserializes each Geometry and checks that its data are consistent with what is stored in the DB.
+
- Just like for org.exist.indexing.spatial.AbstractGMLJDBCIndex, we will start to design a database-independant
- abstract class. This class should normally be the basis of every JDBC spatial index. It will handle most of the hard work.
- Let's start by a few members and a few general-purpose public methods :
-
- Of course, org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker implements
- org.exist.indexing.IndexWorker.
-
- GML_NS is the GML namespace for which the spatial index is specially designed. Use this public member to avoid
- redundancy and, worse, inconsistencies.
-
- INDEX_ELEMENT is the configuration's element name which is accurate for our Index
- configuration. To configure a collection in order to index its GML data, define such a configuration :
-
- Got the gml element? We will shortly see how this information is able to configure our
- IndexWorker.
-
- controller, index and broker should now be quite
- straightforward.
-
- currentMode and currentDoc should also be straightforward.
-
- geometries is a collection of com.vividsolutions.jts.geom.Geometry instances that are
- currently held in memory, waiting for being "flushed" to the database. Depending of currentMode, they're pending for
- insertion or removal.
-
- currentNodeId is used to share the ID of the node currently being processed between the different inner
- classes.
-
- streamedGeometry is the last com.vividsolutions.jts.geom.Geometry that has been generated by
- GML parsing. It is null if the geometry is topologically not well formed. This latter case is maybe a too restrictive
- feature of Geotools parser which also throws NullPointerExceptions (!) if the GML is somehow not well-formed. See
- GEOT-742 for more information on this issue.
-
- documentDeleted is a flag indicating that the current document has been deleted and that we don't have to process it
- any more. Remember that StreamListener.REMOVE_ALL_NODES send some events for all nodes.
-
- flushAfter will hold our configuration's setting.
-
- geometryHandler is our GML geometries SAX handler that will convert GML to a
- com.vividsolutions.jts.geom.Geometry instance. It is included in a handler chain composed of
- geometryFilter and geometryDocument.
-
- transforms will cache a list a transformations between a source and a target SRS.
-
- useLenientMode will be set to true is the transformation libraries that are in the CLASSPATH
- don't have the Bursa-Wolf parameters. Transformations will be attempted, but with a precision loss (see above).
-
- gmlStreamListener is our own implementation of org.exist.indexing.StreamListener. Since there
- is a 1:1 (or even 1:0) relationship with the IndexWorker, it will be implemented as an inner class and will be
- described below.
-
- coordinateTransformer will be needed during Geometry transformations to other SRSes.
-
- gmlTransformer will be needed during Geometry transformations to XML.
-
- wkbWriter and wkbReader will be needed during Geometry serialization
- and deserialization to and from the database.
-
- wktWriter and wktReader will be needed during Geometry WKT
- serialization and deserialization to and from the database. WKT could be dynamically generated from Geometry but we
- have chosen to store it in the HSQLDB.
-
- base64Encoder and base64Decoder will be needed to convert binary date, namely WKB, to XML
- types, namely xs:base64Binary.
- No need to comment the methods, expect maybe getIndexId() that will return the static ID
- of the Index. No chance to be wrong with such a design.
- The next method is a bit specific :
-
- It is only interested in the gml element of the configuration. If it finds one, it creates a
- org.exist.indexing.spatial.GMLIndexConfig instance wich is a very simple class :
-
- ... that retains the configuration attribute and provides a getter for it.
- This configuration object is saved in a Map with the Index ID and will be available as shown in the next method
- :
-
- The objective is to determine if document should be indexed by the spatial Index.
- For this, we look up its collection configuration and try to find a "custom" index specification for our Index.
- If one is found, our document will be processed by the IndexWorker. We also take advantage of
- this process to set one of our members. If document doesn't interest our IndexWorker, we reset
- some members to avoid having an inconsistent sticky state.
- The next methods don't require any particular comment :
-
- The next method is somehow tricky :
-
- It doesn't return any StreamListener in the StreamListner.REMOVE_ALL_NODES. It would be
- totally unnecessary to listen at every node whereas a JDBC database will be able to delete all the document's nodes in one single
- statement.
- The next method is a place holder that needs more thinking. How to highlight a geometric information smartly?
-
- The next method computes the reindexing root. We will go bottom-up form the not to be modified until the top-most element in the GML
- namespace. Indeed, GML allows "nested" or "multi" geometries. If a single part of such Geometry is modified, the
- whole geometry has to be recomputed.
-
- The next method delegates the write operations :
-
- Even though its code looks thick, it proves to be a good way to acquire (then release) a Connection whatever the
- way it is provided by the IndexWorker (see above for these aspects, concurrency in particular). It then delegates the
- write operations to dedicated methods, which do not have to care about the Connection. Write operations are embedded
- in a transaction. Should an exception occur, it would be logged and swallowed: eXist doesn't like exceptions when it flushes its
- data.
- The next method delegates node storage:
-
- It will call saveGeometryNode() (see below) passing a container inner class that will not be described given
- its simplicity.
- The next two methods are built with the same design. The first one destroys the index entry for the currently processed node and the
- second one removes the index entries for the whole document.
-
- The next method is a mix of the designs described above. It also previously makes a check:
-
- Indeed, we have to check if the collection is indexable by the Index before trying to delete its index
- entries.
- The next methods are built on the same design (Collection and exception management) and will thus not be
- described.
-
- ... because all these methods delegate to the following abstract methods that will have to be implemented by the DB-dependant concrete
- classes :
-
- Let's have a look however at this method that doesn't need a DB-dependant implementation :
-
- Same design (Collection and exception management, delegation mechanism). We probably will add more like this in
- the future.
- The following methods are utility methods to stream Geometry instances to XML and vice-versa.
-
- The first one uses a org.geotools.gml.GMLFilterDocument (see below) and the second one uses a
- org.geotools.gml.producer.GeometryTransformer which needs some polishing because, despite it is called a
- transformer, it doesn't cope easily with a Handler and returns a... String ! See GEOT-1315.
- The last method is also a utility method :
-
- It implements a workaround for our test file SRS which isn't yet known by Geotools libraries (see GEOT-1307), then it tries to get the transformation from our cache.
- If it doesn't succeed, it tries to find one in the libraries that are in the CLASSPATH. Should those libraries lack the Bursa-Wolf
- parameters, it will make another attempt in lenient mode, which will induce a loss of accuracy. Then, it transforms the
- Geometry from its sourceCRS to the required targetCRS.
- Now, let's study how the abstract methods are implement by the HSQLDB-dependant class :
-
- The only noticeable point is that we indeed extend our org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
-
- Now, this method will do something more interesting, store the Geometry associated to a node :
-
- The generated SQL statement should be straightforward. We make a heavy use of the methods provided by
- com.vividsolutions.jts.geom.Geometry, both on the "native" Geometry and on its EPSG:4326
- transformation. Would could probably store other properties here (like, e.g. the geometry's boundary). Other
- IndexWorkers, especially those accessing a spatially-enabled DBMS, might prefer to store fewer properties if they
- can be computed dynamically at a cheap price.
- The next method is even much easier to understand :
-
- ... and this one even more :
-
- This one however, is a little bit trickier :
-
- ... maybe because it makes use of a SQL function to filter the right documents ?
- The two next methods are straightforward, now that we have explained that Connections had to be requested from
- the Index to avoid concurrency problems on an embedded HSQLDB instance.
-
- The next method is much more interesting. This is where is the core of the spatial index is:
-
- The trick is to filter the geometries on (fast) BBox operations first (intersecting geometries have BBox intersecting as well) which is
- possible in every case but for the Spatial.DISJOINT operator. For the latter case, we will have to fetch the BBox
- coordinates in order to apply a further filtering. Then, we examine the results and filter out the documents that are not in the
- contextSet. Spatial.DISJOINT filtering is then applied to avoid the next step in case the
- BBoxes are themselves disjoint. Only then, we perform the costly operations, namely Geometry deserialization from the
- DB then performing spatial operations on it. Matching nodes are then returned.
- The next method is quite straightforward:
-
- Notice that it will return EPSG:4326 Geometryies and that it will rethrow a
- com.vividsolutions.jts.io.ParseException as a java.sql.SQLException.
- The next method is a bit more restrictive and modular :
-
- ... because if directly selects the right node and allows to return either the original Geometry, either its
- EPSG:4326 transformation.
- The next method is a generalization of the previous one:
-
- It queries the whole index for the requested Geometry, ignoring the documents that are not in the
- contextSet, and it also ignores the nodes that are not in the contextSet. After that the
- Geometry is deserialized.
-
- This method is not yet used by the spatial functions but it is planned to use it in a future optimization effort.
-
- This is the next method, designed like getGeometryForNode():
-
- It directly requests the required property from the DB and returns an appropriate XML atomic value.
- The next method is a generalization of the previous one :
-
- It queries the whole index for the requested property, ignoring the documents that are not in the contextSet, and
- it also ignores the nodes that are not in the contextSet. Finally the property mapped to the appropriate XML atomic
- value is returned.
-
- This method is not yet used by the spatial functions but it is planned to use it in a future optimization effort.
-
- The last method is a utility method and we will only show a part of its body:
-
- It deserializes each Geometry and checks that its data are consistent with what is stored in the DB.
-
+
-
+
+ Writing a concrete implementation of org.exist.indexing.StreamListener
-
- Writing a concrete implementation of org.exist.indexing.StreamListener
-
+ The StreamListener's main purpose is to generate Geometry instances, if accurate, from the nodes it
+ listens to.
+ This will be done using a org.geotools.gml.GMLFilterDocument provided by the Geotools libraries. The trick is to map
+ our STAX events to the expected SAX events.
+ As stated above, our StreamListener will be implemented as an inner class of
+ org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker. Of course, it will extend
+ org.exist.indexing.AbstractStreamListener:
+
+ There are only two members. srsNamesStack will maintain a (String)
+ java.util.Stack for the srsName attribute of the elements in the GML namespace
+ (http://www.opengis.net/gml). null will be pushed if such an attribute doesn't exist, namely because it isn't
+ accurate.
+
+ deferredElement will hold an element whose streaming is deferred, namely because we still haven't received its
+ attributes.
+ The getWorker() method should be straightforward.
+ Let's see how the process is performed:
+
+ Element deferring occurs only if currentDoc is to be indexed of course. If so, an incoming element is deferred but we
+ do not forget to forward the event to the next StreamListener in the pipeline.
+ If we have a deferred element, we will process it (see below) in order to collect its attributes and if relevant,
+ endElement(), will add an index entry for the current element. The method characters() also forwards
+ its data to the SAX handler.
+
+ We could have used attribute() to collect the deferred element's attributes. The described design is just a matter of
+ choice.
+
+ Let's see how the deferred element is processed :
+
+ We first need to collect its attributes and that's why it is deferred, because attributes events come after the call
+ to startElement(). Elements in the GML namespace that carry an srsName attribute will push its value. If
+ the element is not in the GML namespace or if no srsName attribute exists, null is pushed.
+
+ We could have had a smarter mechanism, but we first have to take a decision about the fact that we could define a default SRS here,
+ either from the config, or from a higher-level element. This part of the code will thus probably be revisited once the decision is
+ taken.
+
+ When the attributes are collected, we can send a startElement() event to the SAX handler, thus marking the end of the
+ deferring process.
+ Processing of the current element with endElement():
+
+ We first pop a SRS name from the stack. null will indicate that the element doesn't have any and thus that it is an
+ element which doesn't carry enough information to build a complete geometry. That doesn't prevent us to forward this element to the SAX
+ handler.
+ The SAX handler might have been able to build a Geometry then. If so, the current index entry (composed of
+ currentSrsName, streamedGeometry, currentNodeId and the "global"
+ org.exist.dom.DocumentImpl) is added to geometries (wrapped in the convenience
+ SRSGeometry class). We then check if it's time to flush the pending index entries.
+ This is how the GeometryHandler looks like. It is also implemented as an inner class of
+ org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
+
+
+ Thanks to Geotools SAX parser, it hasn't to be more complicated than setting the streamedGeometry "global"
+ member.
+
+ However, it may throw some NullPointerExceptions as described above.
+
+
- The StreamListener's main purpose is to generate Geometry instances, if accurate, from the
- nodes it listens to.
- This will be done using a org.geotools.gml.GMLFilterDocument provided by the Geotools libraries. The trick is to
- map our STAX events to the expected SAX events.
- As stated above, our StreamListener will be implemented as an inner class of
- org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker. Of course, it will extend
- org.exist.indexing.AbstractStreamListener:
-
- There are only two members. srsNamesStack will maintain a (String)
- java.util.Stack for the srsName attribute of the elements in the GML namespace
- (http://www.opengis.net/gml). null will be pushed if such an attribute doesn't exist, namely because it isn't
- accurate.
-
- deferredElement will hold an element whose streaming is deferred, namely because we still haven't received its
- attributes.
- The getWorker() method should be straightforward.
- Let's see how the process is performed:
-
- Element deferring occurs only if currentDoc is to be indexed of course. If so, an incoming element is deferred
- but we do not forget to forward the event to the next StreamListener in the pipeline.
- If we have a deferred element, we will process it (see below) in order to collect its attributes and if relevant,
- endElement(), will add an index entry for the current element. The method characters()
- also forwards its data to the SAX handler.
-
- We could have used attribute() to collect the deferred element's attributes. The described design is just a
- matter of choice.
-
- Let's see how the deferred element is processed :
-
- We first need to collect its attributes and that's why it is deferred, because attributes events come after the
- call to startElement(). Elements in the GML namespace that carry an srsName
- attribute will push its value. If the element is not in the GML namespace or if no srsName attribute
- exists, null is pushed.
-
- We could have had a smarter mechanism, but we first have to take a decision about the fact that we could define a default SRS here,
- either from the config, or from a higher-level element. This part of the code will thus probably be revisited once the decision is
- taken.
-
- When the attributes are collected, we can send a startElement() event to the SAX handler, thus marking the end of
- the deferring process.
- Processing of the current element with endElement():
-
- We first pop a SRS name from the stack. null will indicate that the element doesn't have any and thus that it is
- an element which doesn't carry enough information to build a complete geometry. That doesn't prevent us to forward this element to the SAX
- handler.
- The SAX handler might have been able to build a Geometry then. If so, the current index entry (composed of
- currentSrsName, streamedGeometry, currentNodeId and the "global"
- org.exist.dom.DocumentImpl) is added to geometries (wrapped in the convenience
- SRSGeometry class). We then check if it's time to flush the pending index entries.
- This is how the GeometryHandler looks like. It is also implemented as an inner class of
- org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker
-
-
- Thanks to Geotools SAX parser, it hasn't to be more complicated than setting the streamedGeometry "global"
- member.
-
- However, it may throw some NullPointerExceptions as described above.
-
-
+
-
+
+ Implementing some functions that cooperate with the spatial index
-
- Implementing some functions that cooperate with the spatial index
+ We currently provide three sets of functions that are able to cooperate with spatial indexes.
+ The functions are declared in the org.exist.xquery.modules.spatial.SpatialModule module which operates in the
+ http://exist-db.org/xquery/spatial namespace (whose default prefix is spatial).
+ The functions signatures are documented together with the functions themselves in this page. Here we will only look at their eval()
+ methods.
+ The first functions set we will describe is org.exist.xquery.modules.spatial.FunSpatialSearch, which performs searches
+ on the spatial index:
+
+ We first build an early result if empty sequences are passed to the function.
+ Then, we try to access the XQueryContext's AbstractGMLJDBCIndex (remember that there is a 1:1
+ relationship between XQueryContext and DBBroker and a 1:1 relationship between
+ DBBroker and IndexWorker). If we can not find an AbstractGMLJDBCIndex we throw an
+ Exception since we will need this and its concrete class to delegate spatial operations to (whatever its underlying DB
+ implementation is, thanks to our generic design).
+ Then, we examine if the geometry node is persistent, in which case it might be indexed. If so, we try to get an
+ EPSG:4326 Geometry from the index.
+ If nothing is returned here, either because the node isn't indexed or because it is an in-memory node, we stream it to a
+ Geometry and we transform this into an EPSG:4326 Geometry. Of course, this process is slower than a
+ direct lookup into the index.
+ Then we search for the geometry in the index after having determined the spatial operator from the function's name.
+ The second functions set is org.exist.xquery.modules.spatial.FunGeometricProperties, which retrieves a property for a
+ Geometry:
+
+ The design is very much the same : we build an early result if empty sequences are involved, we get a
+ AbstractGMLJDBCIndex, then we set a propertyName, which is actually a SQL field name, depending on the
+ function's name.
+ An attempt to retrieve the field content from the DB is made and, if unsuccessful, we try to get the node's Geometry
+ from the index.
+ Then, if we still haven't got this Geometry, either because the node isn't indexed or because it is an in-memory node,
+ we stream it to a Geometry and we transform this into an EPSG:4326 Geometry if the function's name
+ requires to do so.
+ We then dynamically build the property to be returned.
+
+ This mechanism if far from being efficient compared to the index lookup, but it shows how easy it would be to return a property which is
+ not available in a spatial index.
+
+ The third functions set, org.exist.xquery.modules.spatial.FunGMLProducers, uses the same design :
+
+ It looks more complicated because of the
+ multiple possible argument counts. However, the pinciple remains the same: early result when empty sequences are involved, fetching of the
+ Geometryies (and of its/their SRS) from the DB, streaming if nothing can be fetched, then geometric computations after a
+ transformation of the second Geometry if relevant.
+ The final process streams the resulting Geometry as the result, in the SRS specified by the relevant argument or in the
+ SRS of the first Geometry, depending on the function's name.
+
- We currently provide three sets of functions that are able to cooperate with spatial indexes.
- The functions are declared in the org.exist.xquery.modules.spatial.SpatialModule module which operates in the
- http://exist-db.org/xquery/spatial namespace (whose default prefix is spatial).
- The functions signatures are documented together with the functions themselves in this page. Here we will only look at their eval()
- methods.
- The first functions set we will describe is org.exist.xquery.modules.spatial.FunSpatialSearch, which performs
- searches on the spatial index:
-
- We first build an early result if empty sequences are passed to the function.
- Then, we try to access the XQueryContext's AbstractGMLJDBCIndex (remember that there is a
- 1:1 relationship between XQueryContext and DBBroker and a 1:1 relationship between
- DBBroker and IndexWorker). If we can not find an AbstractGMLJDBCIndex
- we throw an Exception since we will need this and its concrete class to delegate spatial operations to (whatever its
- underlying DB implementation is, thanks to our generic design).
- Then, we examine if the geometry node is persistent, in which case it might be indexed. If so, we try to get an
- EPSG:4326 Geometry from the index.
- If nothing is returned here, either because the node isn't indexed or because it is an in-memory node, we stream it to a
- Geometry and we transform this into an EPSG:4326 Geometry. Of course, this process is slower
- than a direct lookup into the index.
- Then we search for the geometry in the index after having determined the spatial operator from the function's name.
- The second functions set is org.exist.xquery.modules.spatial.FunGeometricProperties, which retrieves a property
- for a Geometry:
-
- The design is very much the same : we build an early result if empty sequences are involved, we get a
- AbstractGMLJDBCIndex, then we set a propertyName, which is actually a SQL field name,
- depending on the function's name.
- An attempt to retrieve the field content from the DB is made and, if unsuccessful, we try to get the node's
- Geometry from the index.
- Then, if we still haven't got this Geometry, either because the node isn't indexed or because it is an in-memory
- node, we stream it to a Geometry and we transform this into an EPSG:4326 Geometry if the
- function's name requires to do so.
- We then dynamically build the property to be returned.
-
- This mechanism if far from being efficient compared to the index lookup, but it shows how easy it would be to return a property which
- is not available in a spatial index.
-
- The third functions set, org.exist.xquery.modules.spatial.FunGMLProducers, uses the same design :
-
- It looks more complicated because of
- the multiple possible argument counts. However, the pinciple remains the same: early result when empty sequences are involved, fetching of
- the Geometryies (and of its/their SRS) from the DB, streaming if nothing can be fetched, then geometric computations
- after a transformation of the second Geometry if relevant.
- The final process streams the resulting Geometry as the result, in the SRS specified by the relevant argument or
- in the SRS of the first Geometry, depending on the function's name.
-
+
-
+
+ Playing with the spatial index
-
- Playing with the spatial index
-
- Now that we have described the spatial index, it is time to play with it. Only a few of its features will be demonstrated, but we will
- explain again what happens under the hood.
- The first step is to make sure to have a recent enough release version of eXist : 1.2 of later.
- Then, you have to prepare eXist to build the spatial index library. To do that, go into the
- ${EXIST_HOME}/extensions/indexes directory and, if necessary, copy build.properties to a new
- file, local.properties.
- Open this file and check that include.index.spatial is set to true.
- Invoke build.bat clean or build.sh clean, depending on your platform, from a command line. This
- will generate ${EXIST_HOME}/extensions/indexes/build.xml, which is needed by the modularized indexes
- infrastructure.
- Invoke build.bat extension-indexes or build.sh extension-indexes, depending on your platform, from
- a command line.
- If necessary, the required external (large) libraries will be downloaded from the WWW into the
- ${EXIST_HOME}/extensions/indexes/spatial/lib directory. Most of them have a gt2-*.jar name. Make
- sure to make them available in your application's classpath !
-
- If you are behind a proxy, do not forget to set its settings in ${EXIST_HOME}/build.properties.
-
- A file named exist-spatial-module.jar should be generated into the
- ${EXIST_HOME}/lib/extensions directory.
- Enable the spatial index and the spatial module in ${EXIST_HOME}/conf.xml if it is not already done.
- the spatial index:
-
- and the spatial module:
-
- This concludes the prerequisites for running the test.
- Our demonstration file is taken from the Ordnance Survey of Great-Britain's WWW
- site which offers sample
+ Now that we have described the spatial index, it is time to play with it. Only a few of its features will be demonstrated, but we will
+ explain again what happens under the hood.
+ The first step is to make sure to have a recent enough release version of eXist : 1.2 of later.
+ Then, you have to prepare eXist to build the spatial index library. To do that, go into the
+ ${EXIST_HOME}/extensions/indexes directory and, if necessary, copy build.properties to a new file,
+ local.properties.
+ Open this file and check that include.index.spatial is set to true.
+ Invoke build.bat clean or build.sh clean, depending on your platform, from a command line. This will
+ generate ${EXIST_HOME}/extensions/indexes/build.xml, which is needed by the modularized indexes infrastructure.
+ Invoke build.bat extension-indexes or build.sh extension-indexes, depending on your platform, from a
+ command line.
+ If necessary, the required external (large) libraries will be downloaded from the WWW into the
+ ${EXIST_HOME}/extensions/indexes/spatial/lib directory. Most of them have a gt2-*.jar name. Make sure
+ to make them available in your application's classpath !
+
+ If you are behind a proxy, do not forget to set its settings in ${EXIST_HOME}/build.properties.
+
+ A file named exist-spatial-module.jar should be generated into the ${EXIST_HOME}/lib/extensions
+ directory.
+ Enable the spatial index and the spatial module in ${EXIST_HOME}/conf.xml if it is not already done.
+ the spatial index:
+
+ and the spatial module:
+
+ This concludes the prerequisites for running the test.
+ Our demonstration file is taken from the Ordnance Survey of Great-Britain's WWW site which offers sample
data.
- The chosen topography layer is of Port-Talbot, which is available as 2182-SS7886-2c1.gz.
- Download this file, gunzip it, and give to the resulting file a .gml extension (port-talbot.gml)
- this will allow eXist to reckognize it as an XML file.
-
- If you have previously executed build test, the file should have been downloaded and gunzipped for you in
- ${EXIST_HOME}/extensions/indexes/spatial/test/resources.
-
- Since this file refers to an OSGB-hosted schema, we will need to bypass validation in
- ${EXIST_HOME}/conf.xml.
- Make sure the mode value is set like this:
- <validation mode="no">
- We are now ready to start the demonstration and we will use the interactive client for that. Run either
- ${EXIST_HOME}/bin/client.bat or ${EXIST_HOME}/bin/client.sh from the command line (please read
- elsewhere if you do not know how to start it).
- Let's start by creating a collection named spatial in the /db collection. The menus might be
- localised, but in english it is File/Create a collection....
- Then, we will configure this collection by creating a configuration collection.
- Let's navigate to /db/system/config
-
- If required, let's create a general configuration collection : File/Create a collection... name it
- db and get into it.
- Then let's create a configuration collection for /db/spatial: File/Create a collection... name
- it spatial and get into it.
- We are now in /db/system/config/db/spatial.
- Let's now create a configuration file for this collection: File/Create an empty document... name it
- collection.xconf.
- Double-click on this document and let's replace its auto-generated content :
- <template/>
- with this one:
-
-
- Do not forget to save the document before closing the window.
-
- The /db/system/config/db/spatial collection is now configured to index GML geometries when they are uploaded. The
- in-memory index entries will be flushed to the HSQLDB every 200 geometries and will wait at most 100 seconds, the default value, to
- establish a connection to the HSQL db.
- Let's navigate to /db/spatial.
- Let's upload port-talbot.gml: File/Upload files/directories...
- On my computer, the operation on this 23.6 Mb file is performed in about 100 seconds, including some default fulltext indexing. Let's
- close the upload window and quit the interactive client.
- Let's look our our GML file looks like on GML Viewer, a free viewer provided by Snowflake software :
-
-
-
-
-
-
-
- If you want to have a look at the spatial index HSQLDB, which, if you are using the default data-dir, is in
- ${EXIST_HOME}/webapp/WEB-INF/data/spatial_index.* there is a dedicated script file in
- ${EXIST_HOME}/extensions/indexes/spatial/. to launch HSQL's GUI client: Use either hsql.bat or
- hsql.sh [data-dir] (you only need to supply data-dir if it is not the default one).
- If the SQL command SELECT * FROM SPATIAL_INDEX_V1; is executed, the result window shows that 21961 geometries have
- been indexed.
- Let's get back to the interactive client and open the query window (the one we get when clicking on the binocular button in the
- toolbar).
- This query:
-
- ... is processed in a little bit less than 2 seconds. That could seem high, but there is a cost for the Geotools transformation
- factories initialization. Subsequent requests will be much faster, although there will always be a small cost for the streaming of the
- in-memory node to a Geometry object.
- The result is:
-
-
- Due to the current Geotools limitations, there is no srsName attribute on
- gml:Polygon ! See above.
-
- ... but people might find more convenient to perform this query :
-
- ... which returns:
-
- So, 3 degrees West, 51 deegrees North... we must be indeed northern of Brittany, i.e. in south-western Great-Britain.
- Let's see what our polygon looks like:
-
-
-
-
-
-
-
- Now, we continue doing something more practical:
-
- This query returns 756 gml:Polygons in about 15 seconds. A subsequent call returns in just about 450 ms, not having
- the cost for initializations (in particular the first connection to the HSQLDB). A slighly modified query, in order to show the performance
- without utilising eXist's performant cache:
-
- ... retuns 755 gml:Polygon (one less) in just about 470 ms.
- The result of our first intersection query looks like this:
-
-
-
-
-
-
-
- Let's try another type of spatial query:
-
- It returns 598 gml:Polygons in just a little bit more than 400 ms. Here is what they look like:
-
-
-
-
-
-
-
- The last query of this session is just to demonstrate some interesting capabilities of the spatial functions:
-
- See the (not so) rounded corners of our 500 metres buffer over Port-Talbot :
-
-
-
-
-
-
-
-
+ The chosen topography layer is of Port-Talbot, which is available as 2182-SS7886-2c1.gz.
+ Download this file, gunzip it, and give to the resulting file a .gml extension (port-talbot.gml) this
+ will allow eXist to reckognize it as an XML file.
+
+ If you have previously executed build test, the file should have been downloaded and gunzipped for you in
+ ${EXIST_HOME}/extensions/indexes/spatial/test/resources.
+
+ Since this file refers to an OSGB-hosted schema, we will need to bypass validation in ${EXIST_HOME}/conf.xml.
+ Make sure the mode value is set like this:
+ <validation mode="no">
+ We are now ready to start the demonstration and we will use the interactive client for that. Run either
+ ${EXIST_HOME}/bin/client.bat or ${EXIST_HOME}/bin/client.sh from the command line (please read
+ elsewhere if you do not know how to start it).
+ Let's start by creating a collection named spatial in the /db collection. The menus might be
+ localised, but in english it is File/Create a collection....
+ Then, we will configure this collection by creating a configuration collection.
+ Let's navigate to /db/system/config
+
+ If required, let's create a general configuration collection : File/Create a collection... name it
+ db and get into it.
+ Then let's create a configuration collection for /db/spatial: File/Create a collection... name it
+ spatial and get into it.
+ We are now in /db/system/config/db/spatial.
+ Let's now create a configuration file for this collection: File/Create an empty document... name it
+ collection.xconf.
+ Double-click on this document and let's replace its auto-generated content :
+ <template/>
+ with this one:
+
+
+ Do not forget to save the document before closing the window.
+
+ The /db/system/config/db/spatial collection is now configured to index GML geometries when they are uploaded. The
+ in-memory index entries will be flushed to the HSQLDB every 200 geometries and will wait at most 100 seconds, the default value, to establish
+ a connection to the HSQL db.
+ Let's navigate to /db/spatial.
+ Let's upload port-talbot.gml: File/Upload files/directories...
+ On my computer, the operation on this 23.6 Mb file is performed in about 100 seconds, including some default fulltext indexing. Let's
+ close the upload window and quit the interactive client.
+ Let's look our our GML file looks like on GML Viewer, a free viewer provided by Snowflake software :
+
+
+
+
+
+
+
+ If you want to have a look at the spatial index HSQLDB, which, if you are using the default data-dir, is in
+ ${EXIST_HOME}/webapp/WEB-INF/data/spatial_index.* there is a dedicated script file in
+ ${EXIST_HOME}/extensions/indexes/spatial/. to launch HSQL's GUI client: Use either hsql.bat or
+ hsql.sh [data-dir] (you only need to supply data-dir if it is not the default one).
+ If the SQL command SELECT * FROM SPATIAL_INDEX_V1; is executed, the result window shows that 21961 geometries have been
+ indexed.
+ Let's get back to the interactive client and open the query window (the one we get when clicking on the binocular button in the
+ toolbar).
+ This query:
+
+ ... is processed in a little bit less than 2 seconds. That could seem high, but there is a cost for the Geotools transformation factories
+ initialization. Subsequent requests will be much faster, although there will always be a small cost for the streaming of the in-memory node to
+ a Geometry object.
+ The result is:
+
+
+ Due to the current Geotools limitations, there is no srsName attribute on gml:Polygon ! See above.
+
+ ... but people might find more convenient to perform this query :
+
+ ... which returns:
+
+ So, 3 degrees West, 51 deegrees North... we must be indeed northern of Brittany, i.e. in south-western Great-Britain.
+ Let's see what our polygon looks like:
+
+
+
+
+
+
+
+ Now, we continue doing something more practical:
+
+ This query returns 756 gml:Polygons in about 15 seconds. A subsequent call returns in just about 450 ms, not having the cost
+ for initializations (in particular the first connection to the HSQLDB). A slighly modified query, in order to show the performance without
+ utilising eXist's performant cache:
+
+ ... retuns 755 gml:Polygon (one less) in just about 470 ms.
+ The result of our first intersection query looks like this:
+
+
+
+
+
+
+
+ Let's try another type of spatial query:
+
+ It returns 598 gml:Polygons in just a little bit more than 400 ms. Here is what they look like:
+
+
+
+
+
+
+
+ The last query of this session is just to demonstrate some interesting capabilities of the spatial functions:
+
+ See the (not so) rounded corners of our 500 metres buffer over Port-Talbot :
+
+
+
+
+
+
+
+
-
+
-
- Facts and thoughts
+
+ Facts and thoughts
- As of june 2007, the spatial index is in working condition. It provides an interesting set of functionalities and its performance is
- satisfactory given the lightweight, non spatially-enabled, database engine that stores the Geometry objects and their
- properties. The main objective was to return within the second; we're there.
- Here are still some tentative improvements.
- The first improvement is to plug in the getGeometriesForNodes() and
- getGeometricPropertyForNodes() (in org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker)
- to allow sequence/set optimization.
- Indeed, a query like this one on our test file :
-
- ... returns 5339 items through as many calls to the DB in... 51 seconds on an initialized index ! Intercepting the
- SINGLE_STEP_EXECUTION flag when the expression is analyzed would allow to call the 2 above methods rather than
- their "individual" counterparts, namely getGeometryForNode() and
- getGeometricPropertyForNode(). The expected performance improvement would be interesting.
- A second improvement could be to refine the queries on the HSQLDB. Currently, search() (in
- org.exist.indexing.spatial.GMLHSQLIndexWorker) filters the records on the BBox of the searched
- Geometry. It would also be nice to refine the query on the context nodes and, in particular, on their involved
- collections and/or documents. The like applies for the HSQL implementation of getGeometryForNode() and
- getGeometricPropertyForNode() too.
- However, we have to be aware that writing such a SQL statement and passing it to the DB server might become counter-productive. The idea
- is then to define some (configurable) threshold values that would refine the query on the documents if there are fewer than, say, 10
- documents in the context nodeset, and if there are more than 10 documents in it, but less than, say, 15 collections, refine the query on the
- collection.
- It would be quite easy to determine those threshold values above which writing a long SQL statement and passing it to the DB server
- takes more time than filtering the fectched data.
- We might also consider the field in which the document's URI is stored (DOCUMENT_URI) and possibly split it into
- two fields, one for the collection and the second one for the document. Of course, having indexed integer values here would probably be
- interesting.
- Having some better algorithms to prefilter the Geometryies could probably help as well and, more generally,
- everything a DB server could bring (caching for instance) should be considered.
- An other improvement would be to introduce some error margins for Geometryies computations or BBox ones. The
- Java Topology Suite, widely used by Geotools, has all the necessary
- material for this.
- Another interesting improvement would be to compute and return simplified Geometryies depending of a "hint".
- Applications might want to return simplified polygons and even points at wider scales. Depending on the hint's value, passed in the
- function's parameters, the index could use the right DB fields in order to work on precomputed simpler (or, more
- generally, different) entries.
- This is how a hint configuration could look like :
-
- We should also discuss with the other developers about the opportunity to have a org.exist.indexing.Index
- interface in the modularized indexes hierarchy. The abstract class org.exist.indexing.AbstractIndex provides some
- nice general-purpose methods and allows static members that are nearly mandatory (like ID). The
- like for org.exist.indexing.StreamListener versus
- org.exist.indexing.AbstractStreamListener.
- More tests should also be driven. The spatial index has only be tested on one file until now although this file is sizeable. It might be
- interesting to see how it behaves with unusual geometries like rectangles, multi-geometries and collections. It might also be interesting to
- know more about the error margin when geometries in different SRSes are involved. The accuracy of the referencing libraries available in the
- CLASSPATH would play an important role here.
- As always, the code could be written in a more efficient way. There are probably too many
- if (...) ... else if(...) ... else ...
- constructs in the code for instance. Also, we will have to follow Geotools progress to get rid of some of the more or less
- elegant workarounds we've had to implement.
-
-
+ As of june 2007, the spatial index is in working condition. It provides an interesting set of functionalities and its performance is
+ satisfactory given the lightweight, non spatially-enabled, database engine that stores the Geometry objects and their
+ properties. The main objective was to return within the second; we're there.
+ Here are still some tentative improvements.
+ The first improvement is to plug in the getGeometriesForNodes() and getGeometricPropertyForNodes()
+ (in org.exist.indexing.spatial.AbstractGMLJDBCIndexWorker) to allow sequence/set optimization.
+ Indeed, a query like this one on our test file :
+
+ ... returns 5339 items through as many calls to the DB in... 51 seconds on an initialized index ! Intercepting the
+ SINGLE_STEP_EXECUTION flag when the expression is analyzed would allow to call the 2 above methods rather than their
+ "individual" counterparts, namely getGeometryForNode() and getGeometricPropertyForNode(). The expected
+ performance improvement would be interesting.
+ A second improvement could be to refine the queries on the HSQLDB. Currently, search() (in
+ org.exist.indexing.spatial.GMLHSQLIndexWorker) filters the records on the BBox of the searched
+ Geometry. It would also be nice to refine the query on the context nodes and, in particular, on their involved
+ collections and/or documents. The like applies for the HSQL implementation of getGeometryForNode() and
+ getGeometricPropertyForNode() too.
+ However, we have to be aware that writing such a SQL statement and passing it to the DB server might become counter-productive. The idea
+ is then to define some (configurable) threshold values that would refine the query on the documents if there are fewer than, say, 10 documents
+ in the context nodeset, and if there are more than 10 documents in it, but less than, say, 15 collections, refine the query on the
+ collection.
+ It would be quite easy to determine those threshold values above which writing a long SQL statement and passing it to the DB server takes
+ more time than filtering the fectched data.
+ We might also consider the field in which the document's URI is stored (DOCUMENT_URI) and possibly split it into two
+ fields, one for the collection and the second one for the document. Of course, having indexed integer values here would probably be
+ interesting.
+ Having some better algorithms to prefilter the Geometryies could probably help as well and, more generally, everything
+ a DB server could bring (caching for instance) should be considered.
+ An other improvement would be to introduce some error margins for Geometryies computations or BBox ones. The Java Topology Suite, widely
+ used by Geotools, has all the necessary material for this.
+ Another interesting improvement would be to compute and return simplified Geometryies depending of a "hint".
+ Applications might want to return simplified polygons and even points at wider scales. Depending on the hint's value, passed in the function's
+ parameters, the index could use the right DB fields in order to work on precomputed simpler (or, more generally,
+ different) entries.
+ This is how a hint configuration could look like :
+
+ We should also discuss with the other developers about the opportunity to have a org.exist.indexing.Index interface in
+ the modularized indexes hierarchy. The abstract class org.exist.indexing.AbstractIndex provides some nice general-purpose
+ methods and allows static members that are nearly mandatory (like ID). The like for
+ org.exist.indexing.StreamListener versus org.exist.indexing.AbstractStreamListener.
+ More tests should also be driven. The spatial index has only be tested on one file until now although this file is sizeable. It might be
+ interesting to see how it behaves with unusual geometries like rectangles, multi-geometries and collections. It might also be interesting to
+ know more about the error margin when geometries in different SRSes are involved. The accuracy of the referencing libraries available in the
+ CLASSPATH would play an important role here.
+ As always, the code could be written in a more efficient way. There are probably too many
+ if (...) ... else if(...) ... else ...
+ constructs in the code for instance. Also, we will have to follow Geotools progress to get rid of some of the more or less elegant
+ workarounds we've had to implement.
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_log4j/devguide_log4j.xml b/src/main/xar-resources/data/devguide_log4j/devguide_log4j.xml
index 75936e31..0f2c8093 100644
--- a/src/main/xar-resources/data/devguide_log4j/devguide_log4j.xml
+++ b/src/main/xar-resources/data/devguide_log4j/devguide_log4j.xml
@@ -1,659 +1,561 @@
-
- Log4J Logging Guide
- October 2016
-
- TBD
-
-
-
-
-
-
- Introduction
-
- Logging may seem like a rather unimportant part of our application. After all, it
- consumes resources – taking programmer time, increasing class file size (both on
- disk and in memory), and consuming CPU cycles to execute – all while doing nothing
- to provide end-user capability, e.g. configuration and document management. In
- reality, however, logging is an important part of any substantial body of software
- and is absolutely critical in enterprise software.
-
-
- R&D often needs some debug or trace output mode to help give greater
- insights into how mechanisms are operating in various situations.
-
-
- Occurrences of problems or potential problems need to be recorded for
- examination by site administrators.
-
-
- Given that R&D cannot watch the software from a debugger constantly
- while it is running in a customer environment, we need a means of capturing what
- is going on at various levels in the software to facilitate troubleshooting.
- Lack of appropriate log data can make attempts to identify and address customer
- problems ineffective or even completely infeasible.
-
-
- Customers want to be able to record and monitor trends in the operation of
- their software.
-
-
- Appropriate transaction logs are an important tool in performance tuning.
- Also, customers often want some general idea as to what the software is up to.
-
-
- Customers need to record various data for security reasons, e.g. to
- produce an access log of who accessed the system.
-
-
-
-
-
-
-
- What’s been lacking in our logging?
-
- Customers have been complaining about our logging for quite some time – and there
- are some good reasons for this.
- As should be clear by now, there are many different motivations and audiences for
- logging. Some log data is interesting primarily for analysis of access and/or
- performance. Other data is critical for examination of system health. Still other
- data is just informative – and some of this is really only meaningful to developers.
- To date, XXXXXXXXX has generally made it very hard to filter out the data of
- interest. XXXXXXXXX also has not consistently and clearly distinguished between
- important error messages, informational messages of possible customer interest, and
- debugging messages.
- A big issue with the use of XXXXXXXXX logs in troubleshooting has been the fact
- that XXXXXXXXX has required a restart for changes to log configuration (e.g.
- enabling a type of logging or changing its verbosity) to take effect. Customers are
- often extremely reluctant to restart their production servers, so we often get
- customer rancor, resistance, and delays in obtaining the log data critical to
- effective troubleshooting.
- The overall flexibility of XXXXXXXXX logging has left much to be desired. Certain
- data always goes to separate files whereas other data cannot be separated into its
- own files. There is no reliable, cross-platform means for reducing a log file’s size
- without a restart. The format of the log entries themselves cannot be specified by
- the administrator. Overall, XXXXXXXXX’s logging does not provide the myriad of
- features, big and small, that other logging systems provide today.
- Finally, XXXXXXXXX’s logging is showing signs of its grass-roots growth. There are
- many disparate xx.properties which control logging done in a number of different ad
- hoc fashions. Info*Engine introduces its own logging controlled in a different
- fashion and overall behaving differently. It is hard for a site administrator to get
- a big, useful picture of XXXXXXXXX’s logging.
-
-
-
-
-
- What to do about it?
-
- Most classes are using Apache’s log4j (http://logging.apache.org/log4j/). Log4j is
- the most powerful Java-based logging library available today and is used by most of
- the application servers on the market.
-
-
-
-
-
- What does log4j give us?
-
- Log4j is based on several core concepts:
-
-
- Each log event is issued by a hierarchically named “logger”, e.g.
- “xx.method.server.httpgw”.
-
-
- These hierarchical names may or may not correspond to Java class
- names.
-
-
- All log events have an associated severity level (trace, debug,
- info, warn, error, or fatal).
-
-
- To issue a log event programmers just acquire a logger by name
- and specify a log message and its severity level (and optionally a Throwable
- where applicable).
-
-
- Decisions as to whether a given log event is output, how and
- where it is output, what additional data (e.g. timestamps) are included beyond
- the log message, etc, are generally not made by the programmer – instead they
- are controlled by an administrator via a log4j configuration file.
-
-
- Based on these core concepts, log4j provides a powerful set of
- functionalities.
-
-
- Many “appender” choice
- Each log event may be output to zero or more “appenders,” which are
- essentially generalized output pipes. Log4j output can be sent to the
- System.out/err, files, JDBC, JMS, syslog, the Windows event log, SMTP,
- TCP/IP sockets, telnet, and more – all at the site administrator’s
- discretion. File output includes various options for log rolling, e.g. daily
- creation of new log files, segmenting when a given file size is reached, and
- externally controlled log rotation. These appenders can be run synchronously
- to the threads generating log events or as separate asynchronous
- queues.
-
-
- Flexible “layout” options
- With each appender one can specify a “layout” for formatting the log
- message. The administrator may choose from HTML, XML, and various plain text
- formats – including the flexible PatternLayout which allows selection of
- data to include (e.g. timestamps, originating thread, logger name, etc) and
- exactly how to include it.
-
-
- Hierarchical logger configuration
- Administrators can easily configure log event cutoff levels and appenders
- for entire branches of the hierarchical logger tree, i.e. for a whole set of
- related loggers, at once. For instance, by adding a “console” appender
- targeting System.out to the root logger, all log4j output will go to
- System.out. One can similarly configure the overall cutoff level as “error”
- at the root logger level so that only error and fatal messages are output
- unless otherwise specified. One could then configure the “xx.method” logger
- to have an “info” level cutoff and an appender to capture all output to a
- specified file (in addition to System.out). These “xx.method” settings would
- then affect all loggers whose names begin with “xx.method.” (e.g.
- “xx.method.server.httpgw”), in addition to the “xx.method” logger
- itself.
-
-
- Log viewers
- Various free and commercial products provide specialized viewing
- capabilities for log4j logs. Apache provides a very useful log4j log viewer,
- Chainsaw (http://logging.apache.org/log4j/docs/chainsaw.html).
-
-
- See the log4j website (http://logging.apache.org/log4j/ and in particular
- http://logging.apache.org/log4j/docs/manual.html) for more information.
- It is worth noting that Java 1.4 and higher’s java.util.logging API and concepts
- are very similar to log4j’s, but log4j is much more powerful in a number of critical
- areas.
- In conjunction with our JMX MBeans for log4j, one can also:
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Log4j Logging Guide
+ 1Q18
+
+ java-development
+
+
+
+
+
+ This article explains how to add logging to eXist-db using Log4j in Java code.
+
+
+
+
+ Overview
+
+ Logging may seem like a rather unimportant part of our application. After all, it consumes resources: programmer time, increasing class file
+ size, CPU cycles, etc. All while doing nothing to provide end-user functionality. However, logging is an important part of any substantial body
+ of software and is absolutely critical in enterprise software.
+ Customers have been complaining about our logging for quite some time. The logging system has made it hard to filter out the data of
+ interest. It also doesn’t distinguish between important error messages, informational messages and debugging messages.
+ A big issue with the use of the logging system in troubleshooting is that it requires a restart for changes to the log
+ configuration. Customers are often extremely reluctant to restart their production servers.
+ To amend these problems Log4j is introduced. Log4j is the most powerful
+ Java-based logging library available today and is used by most of the application servers on the market.
+
+
+
+
+ Introducing Log4j
+
+ Log4j is based on the following core concepts:
-
- Dynamically examine and reconfigure the log4j configuration for the
- duration of the process via a JMX console,
-
-
- Have all processes using a log4j configuration file periodically check its
- modification date and automatically re-load it upon any change , and
-
-
- Force an immediate reload from a configuration file via a JMX
- console.
-
+
+ Each log event is issued by a hierarchically named logger, e.g. xx.method.server.httpgw.
+
+
+ These hierarchical names may or may not correspond to Java class names.
+
+
+ All log events have an associated severity level (trace, debug, info, warn,
+ error, or fatal).
+
+
+ To issue a log event programmers just acquire a logger by name and specify a log message and its severity level (and optionally a
+ Throwable where applicable).
+
+
+ Logging is controlled by an administrator via a Log4j configuration file.
+
- Overall use of log4j provides an immense leap forward in XXXXXXXXX logging
- functionality and flexibility. Specifically, by using log4j we can address each of
- the shortcomings previously noted in our existing logging.
-
+ Based on these core concepts, Log4j provides a powerful set of functionalities:
+
+
+ Many “appender” choices
+
+ Each log event may be output to zero or more “appenders”. These are essentially generalized output pipes. Log4j output can be sent
+ to the System.out/err, files, JDBC, JMS, syslog, the Windows event log, SMTP, TCP/IP sockets, telnet, etc. All this at the
+ site administrator’s discretion.
+ File output includes various options for log rolling. For instance daily creation of new log files, segmenting when a given file
+ size is reached, and externally controlled log rotation. These appenders can be run synchronously to the threads generating the log
+ events or as separate asynchronous queues.
+
+
+
+ Flexible layout options
+
+ Each appender can specify a layout, for formatting the log message. The administrator can choose from HTML, XML, and various plain
+ text formats – including the flexible PatternLayout, which allows selecting the data to include (timestamps, originating
+ thread, logger name, etc.).
+
+
+
+ Hierarchical logger configuration
+
+ Administrators can configure log event cutoff levels and appenders for entire branches of the hierarchical logger tree.
+ For instance, by adding a console appender targeting System.out to the root logger, all Log4j output will
+ go to System.out. One can configure the overall cutoff level as error at the root logger level so that only
+ error and fatal messages are output, unless otherwise specified. One could then configure the xx.method logger to have an
+ info level cutoff and an appender to capture all output to a specified file (in addition to System.out).
+ These xx.method settings would then affect all loggers whose names begin with xx.method. (e.g.
+ xx.method.server.httpgw).
+
+
+
+ Log viewers
+
+ Various free and commercial products provide specialized viewing capabilities for Log4j logs. Apache provides a very useful Log4j
+ log viewer Chainsaw.
+
+
+
-
-
- How can I use log4j in new code?
+ For more information visit the Log4j website website.
- Using log4j from XXXXXXXXX code is quite easy.
-
-
- Acquire a logger. import org.apache.log4j.Logger; … private Logger
- logger = Logger.getLogger("xx.method.server.httpgw");
-
-
- This is a somewhat time-consuming operation and should be done
- in constructors of relatively long-lived objects or in static
- initializers.
-
-
- Many classes can separately acquire a logger using the same
- logger name. They will all end up with their own reference to
- the same shared logger object.
-
-
-
-
- Use the logger: logger.info( "Something I really wanted to say"
- );
-
-
- info() is just one of Logger’s methods for issuing log4j log
- events. It implicitly assigns the event a severity level of “info”
- and does not specify a Throwable. Logger methods for issuing log
- events include:
-
-
-
- Note that in each case the “message” is an Object, not a
- String. If (and only if) log4j decides to output the given log
- event (based on the configuration), it will render this object
- as a String. By default this is essentially via
- toString().
-
-
-
-
- It’s as simple as that. You just emit log events with appropriate log levels to
- appropriately named loggers. The log4j configuration determines which appenders (if
- any) should output/record the event and how this should be done.
- For more information on “Do’s and Don’t” see that section. For more information on
- how to configure log4j output see the “Configuring log4j?” section.
-
-
-
-
-
- How can I convert existing logging code to use log4j?
-
- Conversion of existing logging code to use log4j can be simply viewed as replacing
- System.out.println() calls, etc, with use of the log4j API. There are, however, a
- few special considerations worth noting.
-
-
-
-
- Dealing with Legacy Properties
-
- In the conversion process the behavior of some existing properties will
- generally be affected.
- In cases this may mean simply removing the existing properties. For instance,
- properties specifying specific output files can generally be removed as
- customers can now siphon log4j output to specific files at their discretion via
- log4j’s configuration files.
- On the other hand, it may be useful in cases to preserve well known log
- enabling properties to reduce confusion amongst those used to these properties.
- In such cases it is suggested that the property be ignored unless it is set to
- enable the given log – in which case it will ensure the given log4j logger’s log
- level is verbose enough to cause the existing messages to be output. For
- instance, one might have something like
-
- in the existing code. The static portions above can be left as is and the
- remainder changed to:
-
- This example assumes that output from the given log4j logger should be
- completely enabled, i.e. for all log levels, when the existing property is set.
- One can also use logic like:
-
- to cause the existing property to enable output from the given log4j logger up
- through only a certain severity level, debug in this example.
- To be clear, this approach to preserving existing “enabling” properties only
- keeps them working more or less as they were. The intended minimum log verbosity
- is ensured upon initialization and then cannot be reset via the property without
- a restart. The ability to change the log-level on the fly or make finer-grained
- log-level adjustments is still only available through log4j configuration.
- When an existing property’s behavior is changed or when log4j configuration is
- now the more powerful (and thus preferred) approach, this should be noted in the
- property’s entry in properties.html. Be sure to include the name of new log4j
- logger in such notes.
-
-
-
-
-
- Conditional Computation of Data for Logging
-
- In cases you will find existing code like:
-
- the if block may include System.out.println()’s or the results of the block
- may be used in later System.out.println()’s.
- In either case, the code is intended to avoid the block of computations and
- assignments unless their results are to be used. This intent can be satisfied by
- use of one of log4j’s is*Enabled() methods. For example:
-
- The log4j Logger class provides a set of methods for this purpose
- including:
-
- This technique obviously applies to new log4j usage as well but is noted at
- this point so as to make conversion of existing logging code more
- straightforward.
-
-
-
-
-
-
- Configuring log4j
-
- In 1.4 the
- log4j.xml
- configuration file control log4j’s behavior.
- The full format of this configuration file is described here and a short
- introduction of the basics is given here. Note this is the simpler, but less
- powerful, form of log4j configuration file. An XML-based configuration file format
- also exists and customers will end up supplanting one or more of the properties
- files above with an XML configuration file if they require some of the most advanced
- log4j features.
- These out-of-the-box configuration should generally be kept fairly simple (see
- “Do’s and Don’ts” below), so the main use case for development configuration of
- log4j is to enable a given level of log output from one or more loggers. Without
- such configuration XXXXXXXXX will output only ERROR and FATAL log events
- out-of-the-box – except where the configuration has already been extended to output
- other log events. Therefore you generally must change the configuration to see
- trace, debug, info, or warn log events in the log4j output.
- To turn on a given logging level for all loggers, find the log4j.root property and
- change the priority value to the desired level. For instance, change
-
- to
-
- Of course this will result in a cacophony of log output, so you’ll generally want
- to adjust the logging level at a more specific level. To do this you can append a
- line to the properties file of following form:
-
- For instance, you would add
-
- to set the “org.exist.security” logger’s level to “info”. Note that doing so also
- causes all the default log level of all org.exist.security loggers to be set to
- “INFO”. For example, the level of the org.exist.security.xacml logger would also be
- set to INFO unless the level of this logger is otherwise specified.
- Changes to log4j configuration files may go unnoticed for as long as a few minutes
- as the checks for modifications to these files take place on a periodic basis. To
- force an immediate reload of the log4j configuration file, change the configuration
- file modification check interval, or make temporary changes to the log4j
- configuration without changing the configuration files, one must use our JMX MBeans.
- To use our log4j JMX MBeans:
-
-
- Start jconsole.
- jconsole is located in the Java 5 SDK’s bin directory. You can either
- double-click on it or run it from the command line.
-
-
- Select the target JVM.
- jconsole will list the Java processes running on your machine under your
- current user which have been configured to allow local JMX connections.
-
-
-
- Navigate to the Logging node in the MBean tree.
-
-
- Select the MBeans tab.
-
-
- Expand the XXXXXXXXX folder.
-
-
- In the servlet engine expand the “WebAppContext” folder and the
- folder named after your web app.
-
-
- Select the “Logging” node (which should now be visible).
-
-
- Note that it may take a short while after initial start up for all
- of these tree nodes to load into jconsole.
-
-
-
-
- Perform desired operations and/or modifications.
+
+ Java 1.4 and higher’s java.util.logging API is very similar to Log4j’s. However Log4j is much more powerful in a number of
+ critical areas.
+ In conjunction with our JMX MBeans for Log4j, one can also:
+
+
+ Dynamically examine and reconfigure the Log4j configuration for the duration of the process via a JMX console.
+
+
+ Have all processes using a Log4j configuration file periodically check its modification date and automatically re-load it upon any
+ change.
+
+
+ Force an immediate reload from a configuration file via a JMX console.
+
+
+
+
+
+
+
+
+
+
+ Using Log4j in new Java code
+
+
+
+
+ Acquire a logger:
+ import org.apache.Log4j.Logger;
+… private Logger logger = Logger.getLogger("xx.method.server.httpgw");
+ Remarks:
+
+
+ This is a somewhat time-consuming operation and should be done in constructors of relatively long-lived objects or in static
+ initializers.
+
+
+ Many classes can separately acquire a logger using the same logger name. They will all end up with their own reference to the same
+ shared logger object.
+
+
+
+
+ Use the logger:
+ logger.info( "Something I really wanted to say" );
+ Remarks:
+
+
+
+ info() is just one of Logger’s methods for issuing Log4j log events. It implicitly assigns the event a severity level of
+ info and does not specify a Throwable. Logger methods for issuing log events include:
+
+
+
+ Note that in each case the message is an Object, not a String. If (and only if) Log4j decides to output the given log
+ event (based on the configuration), it will render this object as a String (essentially via
+ toString()).
+
+
+
+
+ It’s basicly as simple as that. You emit log events with appropriate log levels to appropriately named loggers. The Log4j configuration
+ determines which appenders (if any) should output/record the event and how this should be done.
+
+
+
+
+
+
+ Converting existing logging code to Log4j
+
+ Conversion of existing logging code to Log4j can be as simple as replacing System.out.println() calls, etc, with use of the
+ Log4j API. There are a few special considerations worth noting.
+
+
+
+
+ Dealing with Legacy Properties
+
+ You'll have to check the functionality of existing logging properties:
+
+
+ Sometimes they can be removed. For instance, usually properties specifying specific output files can be removed, as customers can
+ now siphon Log4j output to specific files via Log4j’s configuration files.
+
+
+
+ It may be useful sometimes to preserve well known logging properties when people are used to these. The advice is then to ignore
+ this property unless it is set to enable the given log – in which case it will ensure the given Log4j logger’s log level is verbose
+ enough to cause the existing messages to be output. For instance, the existing code contains:
+
+ The static portions above can be left as is and the remainder changed to:
+
+ This example assumes that output from the given Log4j logger should be completely enabled when the existing property is
+ set.
+ One can also use this:
+
+ This causes the existing property to enable output from the given Log4j logger up through only a certain severity level,
+ debug in this example.
+ This approach to preserving existing “enabling” properties only keeps them working more or less as they were. The intended minimum
+ log verbosity is ensured upon initialization but cannot be reset via the property without a restart. The ability to change the
+ log-level on the fly or make fine grained log-level adjustments is only available through the Log4j configuration.
+
+
+
+
+
+
+
+ Conditional Computation of Data for Logging
+
+ Sometimes log messages are constructed conditionally, for instance like this:
+
+ The if block may include System.out.println()’s or the results of the block may be used in later
+ System.out.println()’s. The code is intended to avoid computations and assignments unless their results are used.
+ You can now use one of Log4j’s is*Enabled() methods for this. For example:
+
+ The Log4j Logger class provides a set of methods for this:
+
+
+
+
+
+
+
+
+
+
+ Configuring Log4j
+
+ The Log4j.xml configuration file controls Log4j’s behavior.
+
+ It is used mainly to set a given level of log output for loggers. Without such configuration only ERROR and FATAL log events will show up.
+ Therefore you have to change the configuration to see trace, debug, info, or warn log events
+ in the Log4j output.
+ To turn on a given logging level for all loggers, find the Log4j.root property and change its priority value to the desired
+ level. For instance:
+
+ Change the priority level to info to see informational messages:
+
+ This will result in a cacophony of log output, so you’ll generally want to adjust the logging level to a more specific level. For
+ instance:
+
+ This sets the org.exist.security logger’s level to info.
+ Note that doing this causes the default log level of all org.exist.security loggers to be set to info. For
+ example, the level of the org.exist.security.xacml logger would also be set to info, unless of course the level for
+ this logger is specified explicitly.
+ Checking the Log4j configuration file is done periodically so changes may go unnoticed for a few minutes.
+ To make temporary changes to the Log4j configuration without changing the configuration files, use JMX MBeans:
+
+
+
+ Start jconsole
+ jconsole is located in the Java SDK’s bin directory. You can either double-click on it or run it from the command
+ line.
+
+
+ Select the target JVM
+ jconsole will list the Java processes running on your machine under the current user, which have been configured to allow
+ local JMX connections.
+
+
+ Navigate to the Logging node in the MBean tree:
+
+
+ Select the MBeans tab.
+
+
+ Expand the right folder.
+
+
+ In the servlet engine expand the WebAppContext folder and the folder named after your web app.
+
+
+ Select the Logging node (which should now be visible).
+
+
+
+
+ Perform desired operations and/or modifications:
+
+
+ To change the configuration file check interval, change the ConfigurationCheckInterval attribute to the desired number
+ of seconds. Note that this change will apply only for the duration of the JVM process unless you select the Loader node and its save
+ operation.
+
+
+ To force an immediate reload of the configuration file, press the reconfigure button on the operations
+ tab.
+
+
+ To examine other aspects of the configuration and make temporary changes, press the registerLoggers
+ button on the operation tab. Expand the Logging tree node and examine/operate upon its children.
+
+
+
+
+
+
+
+
+
+ Tips and Tricks
+
+
+ General
+
+
+
+
+ Carefully select appropriate logger names
+
+ Logger names should be meaningful and facilitate hierarchical configuration by administrators. Use a namespace prefix, either
+ xx or com.xxx to avoid collision with logger names from 3rd-party libraries and customizations.
+ For instance, one might have xx.method.server for general logging related to various low-level aspects of the method
+ server and xx.method.server.timing for logging specifically related to the method timing. Use Java class and package names
+ where these make sense.
+
+
+
+
+
+ Document your logger if appropriate
+
+ If appropriate, document your logger in /xxx/src_web/loggers.html. For instance when the output is of interest to your
+ customer. The logger should be documented (by name) in /xxx/src_web/loggers.html (which ends up in xxx’s codebase in an
+ installation).
+
+
+
+
+ Select appropriate levels for log events
+
+ One of the big advantages of Log4j is that each logging event has an associated level. An administrator can now filter log messages
+ by level. The following table delineates when to use each level.
+
+
+
+
+
+ Level
+
+
+ Usage
+
+
+
+
+
+
+ Trace
+
+
+ Very low-level debugging “execution is here” debugging/troubleshooting messages.
+
+
+
+
+ Debug
+
+
+ Messages of interest to those debugging or troubleshooting (with greater importance than trace messages). Probably only
+ meaningful to developers
+
+
+
+
+ Info
+
+
+ General informational messages. Understandable by and/or of interest to non-developers as well.
+
+
+
+
+ Warn
+
+
+ Warnings of potential problems
+
+
+
+
+ Error
+
+
+ Error conditions
+
+
+
+
+ Fatal
+
+
+ Fatal error conditions, For instance a shut down, a likely crash or something equally severe
+
+
+
+
+
+
+
+
+
+ Don’t go overboard with the Log4j configuration files
+
+ Log4j provides a great deal of ease and flexibility in its configuration. Its log viewers also make it easy to merge log data from
+ multiple Log4j logs or filter out the data of interest from a general purpose log. Given this it makes little sense to provide a complex
+ Log4j configuration file. The customer can change the configuration to have more or less specific log outputs as dictated by their needs
+ and desires.
+
+
+
+
+ Adjust log levels in Log4j configuration files where appropriate
+
+ The default for xxx is to output error and fatal log messages. This generates relatively "quiet" logs that
+ only alert administrators to issues. There are however sometimes log messages classified as informational that should be
+ output at this log level as well. Examples include periodic information like summaries of requests serviced by the server over a time
+ interval and process health summaries" For this extend the Log4j configuration to enable info level debug output for the logger in
+ question.
+
+
+
+
+ Don’t include redundant data in log messages
+
+ Administrators can easily configure Log4j log output to efficiently include current time, thread name, logger name, evernt level,
+ etc. Log4j includes this information in a standard, structured fashion so it is easily interpreted by log viewers like Chainsaw.
+ Inclusion of any of these pieces of information in the log message itself is therefore redundant and pointless.
+
+
+
+
+ make use of AttributeListWrapper where appropriate
+
+ For some particularly significant logs it is important to give the administrator even more control, including:
-
- To change the configuration file check interval, change the
- ConfigurationCheckInterval attribute to the desired number of
- seconds. [Note that this change will apply only for the duration of
- the JVM process unless you select the Loader node and its “save”
- operation.]
-
-
- To force an immediate reload of the configuration file, press the
- “reconfigure” button on the operations tab.
-
-
- To examine other aspects of the configuration and make temporary
- (for the duration of the JVM process) changes to the logging
- configuration, press the “registerLoggers” button on the operation
- tab. Expand the “Logging” tree node and examine/operate upon its
- children.
-
+
+ Allowing them to select which attributes should be included in a given log message
+
+
+ Allowing them to specify the order of these attributes
+
+
+ Allowing them to specify the formatting of these attribute (e.g. comma delimited, with or without attribute names, etc)
+
-
-
-
-
-
-
-
- Do’s and Don’ts
-
- General
- Do carefully select appropriate logger names
- Logger names should be somewhat meaningful and should facilitate hierarchical
- configuration by administrators. They should also be namespaced with “xx” or
- “com.xxx” so as to avoid collision with logger names from 3rd-party libraries and
- customizations. For instance, one might (and we do) have “xx.method.server” for
- general logging related to various low-level aspects of the method server and
- “xx.method.server.timing” for logging specifically related to the method timing. One
- can certainly use Java class and package names where these make sense, but an
- understandable and useful hierarchy is the important thing.
- Do document your logger in /XXXXXXXXX/src_web/loggers.html if appropriate
- If your logger is to be sent significant information of potential customer
- interest, then the logger should be documented (by name) in
- /XXXXXXXXX/src_web/loggers.html (which ends up in XXXXXXXXX’s codebase in an
- installation) unless there are special considerations to the contrary. Special
- considerations which would cause one not to document the logger include a likelihood
- that the logger will be removed in the near future or any other scenario wherein we
- do not want to raise customers’ awareness of a given logger.
- Do carefully select appropriate levels for log events One of the big advantages of
- log4j is that each logging event has an associated level and thus an administrator
- can easily filter out log messages by level. This advantage is nullified, however,
- if those outputting log events do not select appropriate log levels when they do so.
- The following table delineates when to use each level.
-
-
-
-
-
- Level
-
-
- Usage
-
-
-
-
-
-
- Trace
-
-
- Very low-level debugging “execution is here” debugging / troubleshooting
- types of events.
-
-
-
-
- Debug
-
-
- Messages of interest to those debugging or troubleshooting of a greater
- importance than trace messages; possibly still only meaningful to
- developers
-
-
-
-
- Info
-
-
- General informational messages; provide higher level and/or more
- important information than debug messages; understandable by and/or of
- interest to non-developers
-
-
-
-
- Warn
-
-
- Warnings of potential problems
-
-
-
-
- Error
-
-
- Error conditions
-
-
-
-
- Fatal
-
-
- Fatal error conditions, i.e. where the product is going to have to shut
- down, is likely to crash, or something equally severe
-
-
-
-
-
- Don’t go overboard with the log4j configuration files
- Log4j provides a great deal of ease and flexibility in its configuration. Its log
- viewers also make it easy to merge log data from multiple log4j logs or filter out
- the data of interest from a general purpose log. Given this it makes little sense to
- provide a complex log4j configuration file out-of-the-box. The customer can change
- the configuration to have more or less specific log outputs as dictated by their
- needs and desires.
- Do adjust log levels in log4j configuration files where appropriate
- Currently the out-of-the-box global default for XXXXXXXXX is to only output
- “error” and “fatal” log messages. This is a reasonable default in that it generates
- fairly quiet logs that only alert administrators to issues. There are, however, some
- cases where a given log event is best classified as being only informational (and
- thus is output as level “info”) and yet should be output to logs as an
- out-of-the-box default behavior. Examples include periodic summaries of requests
- serviced by the server over the last time interval and periodic process health
- summaries. In such cases one should add to the out-of-the-box log4j configuration to
- enable info level debug output for the logger in question.
- Don’t include redundant data in log messages
- Administrators can easily configure log4j log output to efficiently include:
-
-
- current time
-
-
- thread name
-
-
- logger name
-
-
- log event level
-
-
- Moreover, log4j includes this information in a standard, structured fashion which
- is therefore easily interpreted by log viewers like Chainsaw.
- Inclusion of any of these pieces of information in the log message itself is
- therefore redundant and pointless.
- Do make use of AttributeListWrapper where appropriate
- For some particularly significant logs it is important to give the administrator
- even more control including:
-
-
- Allowing them to select which of many possible attributes should be
- included in a given log message
-
-
- Allowing them to specify the order of these attributes
-
-
- Allowing them to specify the formatting of these attribute (e.g. comma
- delimited, with or without attribute names, etc)
-
-
- Examples of such cases include request access and periodic statistics logging.
- We have a re-usable utility for just this purpose,
- xx.jmx.core.AttributeListWrapper. See its existing usages for examples of how to use
- it and the Runtime Management design note for background information.
- Don’t be afraid to use log4j in any tier or JVM
- log4 currently is not included in the client jar set. The only reason for this is
- that currently no clients use log4j. We should not waste time and energy trying to
- avoid log4j logging from the client. If/when we need log4j.jar on the client we
- should simply include it in the client jar set. Java 5’s Pack200 technology reduces
- jars to 25% their original size on average, and log4j.jar is not that big to begin
- with.
- Performance
- The operation of log4j’s Logger class’s logging methods for issuing log events can
- be roughly summed up as:
-
- where:
+ Examples of such cases include request access and periodic statistics logging.
+ There is a re-usable utility for just this purpose, xx.jmx.core.AttributeListWrapper.
+
+
+
+
+ Don’t be afraid to use Log4j in any tier or JVM
+
+ Log4j currently is not included in the client jar set. The only reason for this is that currently no clients use Log4j. However, if
+ the log4j.jar is needed in the client simply include it in the client's jar set. Java’s Pack200 technology reduces jars to
+ 25% their original size on average.
+
+
+
+
+
+
+
+
+
+
+ Performance
+
+
+ The operation of Log4j’s Logger class’s logging methods for issuing log events can be summed up as:
+
+
-
- render() is simply a toString() call except when ‘message’ is an instance
- of a Class for which a specialized render has been registered.
-
-
- trace(), debug(), info(), warn(), error(), etc, simply call log() with the
- appropriate ‘level’.
-
+
+ render() is simply a toString() call, except when message is an instance of a class for which a
+ specialized render has been registered.
+
+
+ trace(), debug(), info(), warn(), error(), etc, call
+ log() with the appropriate level.
+
- Note that log4j documentation claims that isEnabledFor(), and the
- Logger.is*Enabled() method in general, are extremely fast (and they have some
- benchmark data to back this up). Thus log() should take very little time as well
- unless isEnabledFor() returns ‘true’.
- Given this, a few performance do’s and don’t become clear:
- Don’t reacquire a logger on each usage
- As already noted, the LogR.getLogger() (and underlying Logger.getLogger()) calls
- are relatively expensive. One should thus acquire these objects once per class or
- once instance of a class and re-use them in subsequent logging calls against that
- logger.
- Don’t assume a log’s level cannot change
- One of the big advantages of log4j is that administrators can easily change the
- level setting of any logger at any time. One can easily completely break this by
- following conventions common in existing XXXXXXXXX logging code, e.g.:
-
- There is no reason such code should ever occur. Logger’s isEnabledFor() and
- is*Enabled() routines are fast enough that we can pay the penalty to call them much
- more frequently in order to obtain the benefits of dynamically configurable logging
- levels.
- Don’t check whether the log level is enabled before every log call
- Do not write code such as:
-
- Such code results in essentially no savings when isDebugEnabled() is true but
- logger being checked twice to determine if it is debug enabled in the case when
- isDebugEnabled() is true. Besides this it makes the code more verbose and harder to
- read. In such cases one should simple do:
- logger.debug( "Some constant string" );
- Do avoid doing additional work for logging unless the logger is enabled
- If last example instead looked like:
-
- Then the isDebugEnabled() check should be performed. In this case two String
- concatenations and a potentially (somewhat) expensive method call can be saved when
- the logger is not debug enabled – at a cost of only an extra isDebugEnabled() call
- when the logger is debug enabled. See the “Conditional Computation of Data for
- Logging” section above for another example of this pattern.
- On the other hand, this technique should not be used when you are all but certain
- the given logger will be enabled. Usually this applies only to log events being
- emitted with an “error” or “fatal” level designation. In this case saving time for
- the few cases in which someone has actually disabled this level of logging is not
- worth while for the extra time required in the majority of cases.
- Another technique to avoid unnecessary work is to leverage the fact that Logger’s
- take objects, not Strings, as arguments. Thus one might have:
-
- Here one will pay the construction of “someObj” in all cases but will only pay for
- someObj.toString() when “logger” is enabled for info-level log events. Thus if very
- little work is done in the constructor and most is done in toString() this avoids
- doing work except when necessary – which is always a good thing.
- AttributeListWrapper (see above) is an example of this technique.
- Do hoist log level checks outside of very tight loops
- For cases where a given log level will usually not be enabled, e.g. for trace and
- debug log messages, one should avoid repeated checks within a tight loop. As an
- example:
-
- Trace level logging is rarely enabled and so in this example checking for this
- case ahead of time can save us from repeatedly verifying this in a tight loop. This
- does, however, come at the cost of making it impossible to dynamically enable trace
- logging for this logger in the middle of this loop. Due to this cost this technique
- should only be used for tight loops where the duration of the execution represented
- by the loop (and thus the time during which the logging behavior may lag the
- intended setting) is small.
-
+
+ Note that Log4j's documentation claims that isEnabledFor(), and the Logger.is*Enabled() method are extremely
+ fast. Therefore log() should take very little time as well (unless isEnabledFor() returns true).
+ Given this, a few additional performance tips and tricks:
+
+
+
+ Don’t reacquire a logger on each usage
+
+ AThe LogR.getLogger() (and underlying Logger.getLogger()) calls are relatively expensive. One should
+ acquire these objects once per class (or once per instance of a class) and re-use them in subsequent logging calls.
+
+
+
+
+ Don’t assume a log’s level cannot change
+
+ One of the big advantages of Log4j is that administrators can easily change the level setting of any logger at any time. One can
+ however easily undo this by following conventions common in existing logging code, e.g.:
+
+ Logger’s isEnabledFor() and is*Enabled() routines are fast enough to allow calling them more
+ frequently in order to obtain the benefits of dynamically configurable logging levels.
+
+
+
+
+ Don’t check whether the log level is enabled before every log call
+
+ Do not write code as:
+
+ This results in essentially no savings when isDebugEnabled() is true. It also makes the code more verbose
+ and harder to read. Instead do:
+ logger.debug( "Some constant string" );
+
+
+
+
+ Do avoid doing additional work for logging unless the logger is enabled
+
+ Assume the last example looked like:
+
+ Then the isDebugEnabled() check should be performed. In this case, two string concatenations and a potentially
+ (somewhat) expensive method call can be saved when the logger is not debug enabled. See the “Conditional Computation of Data for
+ Logging” section above for another example of this pattern.
+ On the other hand, this technique should not be used when you are all but certain the given logger will be enabled. Usually this
+ applies only to log events being emitted with an error or fatal level. In this case saving times for the few
+ cases in which someone has actually disabled this level of logging is not worthwhile.
+ Another technique to avoid unnecessary work is to leverage the fact that Logger’s take objects, not strings, as
+ arguments. Thus one might write:
+
+ Here one will pay for the construction of someObj in all cases but will only pay for someObj.toString()
+ when logging is enabled for info-level log events. Thus if very little work is done in the constructor and most is done in toString()
+ this avoids doing work except when necessary. AttributeListWrapper (see above) is an example of this technique.
+
+
+
+
+ Hoist log level checks outside of very tight loops
+
+ For cases where a given log level will usually not be enabled, for instance for trace and debug log messages, one should avoid
+ repeated checks within a tight loop. For example:
+
+ Trace level logging is rarely enabled and so in this example checking for this case ahead of time can save us from repeatedly
+ verifying this in a tight loop. This does, however, come at the cost of making it impossible to dynamically enable trace logging for
+ this logger in the middle of this loop. Due to this, this technique should only be used for tight loops where the duration of the
+ execution represented by the loop (and thus the time during which the logging behavior may lag the intended setting) is small.
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_log4j/listings/listing-10.xml b/src/main/xar-resources/data/devguide_log4j/listings/listing-10.xml
index bc012e2d..4a6176f7 100644
--- a/src/main/xar-resources/data/devguide_log4j/listings/listing-10.xml
+++ b/src/main/xar-resources/data/devguide_log4j/listings/listing-10.xml
@@ -1,4 +1,4 @@
-
-
-
\ No newline at end of file
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_log4j/listings/listing-11.xml b/src/main/xar-resources/data/devguide_log4j/listings/listing-11.xml
index 8d208497..06dfd948 100644
--- a/src/main/xar-resources/data/devguide_log4j/listings/listing-11.xml
+++ b/src/main/xar-resources/data/devguide_log4j/listings/listing-11.xml
@@ -1,4 +1,4 @@
-
-
-
\ No newline at end of file
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_log4j/listings/listing-8.xml b/src/main/xar-resources/data/devguide_log4j/listings/listing-8.xml
index 2bdcf227..d32ee109 100644
--- a/src/main/xar-resources/data/devguide_log4j/listings/listing-8.xml
+++ b/src/main/xar-resources/data/devguide_log4j/listings/listing-8.xml
@@ -1,4 +1,4 @@
-
-
-
\ No newline at end of file
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_log4j/listings/listing-9.xml b/src/main/xar-resources/data/devguide_log4j/listings/listing-9.xml
index 128bc81a..90d22dce 100644
--- a/src/main/xar-resources/data/devguide_log4j/listings/listing-9.xml
+++ b/src/main/xar-resources/data/devguide_log4j/listings/listing-9.xml
@@ -1,4 +1,4 @@
-
-
-
\ No newline at end of file
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_manifesto/devguide_manifesto.xml b/src/main/xar-resources/data/devguide_manifesto/devguide_manifesto.xml
index 22c4cd4a..d609044a 100644
--- a/src/main/xar-resources/data/devguide_manifesto/devguide_manifesto.xml
+++ b/src/main/xar-resources/data/devguide_manifesto/devguide_manifesto.xml
@@ -1,259 +1,224 @@
-
- eXist-db Developer Manifesto
- March 2010
-
- TBD
-
-
-
-
-
-
- Introduction
-
- This document lays out guidelines for developers that are either committing directly
- to the eXist-db code base via the projects GitHub repository or developing
- externally for later incorporation into eXist-db.
-
-
-
-
-
- Communication
-
- Communication between developers and within Open Source projects can be a hard
- thing to achieve effectively, but to ensure the success of the project and
- contributions, we must all strive to improve on communicating our intentions.
- Public and open discussion of all new features and changes to existing features
- MUST always be undertaken. eXist-db is a community project and the community must be
- invited to provide input and feedback on development. Development discussion should
- take place through the eXist-db Development mailing list.
- If conflicts of interest occur during discussion, they must be resolved before any code changes are
- made. If conflicts cannot be resolved by the community, one of the core maintainers may act as a
- moderator. Maintainers are contributors who feel responsible for the project as a whole and have shown
- it in the past through their commitment and support. Right now this includes: Pierrick Brihaye, Wolfgang
- Meier, Leif-Jöran Olsson, Adam Retter and Dannes Wessels. We name those people, so you know who to talk
- to, but the list is in no way exclusive and may change over time.
-
-
-
-
-
- Maintainability
-
- All code accepted to the project must be maintainable otherwise there is the
- possibility that it will grow stale and without maintainers will be removed from the
- code base.
- To ensure a happy future for the code base each contributor has a responsibility
- to ensure:
-
-
- New code and bug-fixes must be accompanied by
- JUnit/XQuery/XSpec test cases. This helps us understand intention and avoid
- regressions.
-
-
- Code should be appropriately commented (including javadoc/xqdoc) so that
- intention is understood. Industry standard code formatting rules should be
- followed. This helps us read and understand contributions.
-
-
- Code must be appropriately with the developers name and current email
- address. This helps us contact contributors/maintainers should issues
- arrive.
-
-
- Consider the maintainability of new features, will you maintain and
- support them over years? If not, who will and how do you communicate what is
- required?
-
-
-
-
-
-
-
- Developing
-
-
-
- Follow Industry Standard coding conventions.
-
-
- eXist-db is now developed atop Sun Java 8, so make use of Java 8 features
- for cleaner, safer and more efficient code.
-
-
- New Features must be generic and applicable to an
- audience of more than one or two. Consider whether the eXist-db community
- would see this as a valuable feature; You should have already discussed this
- via the eXist-db Development mailing list! If a feature is just for you
- and/or your customer, it may have no place in the eXist-db main code
- base.
-
-
- Major new features or risky changes must be developed in their own branch.
- Once they have been tested (should include some user testing) they may then
- be integrated back into the main code base.
-
-
- Follow a RISC like approach to developing new functions. It is better to
- have a single function that is flexible than multiple function signatures
- for the same function. Likewise, do not replace two functions by offering
- one new super function. Functions should act like simple building blocks
- that can be combined together.
-
-
- The use of Static Analysis tools is highly recommended, these bring value
- by reducing risk, and are even valuable to the most highly skilled
- developers. Such tools include Checkstyle, FindBugs and
- PMD.
-
-
-
-
-
-
-
- Before Committing
-
-
-
-
- TEST, TEST and TEST again! See last section how to do this.
-
-
- Execute the JUnit test suite to ensure that there are no regressions, if
- there are regressions then do not commit!
-
-
- Execute the XQTS test suite to ensure that there are no regressions, if
- there are regressions then do not commit!
-
-
- If you are working in an area of performance, there is also a Benchmark
- test suite that you should run to ensure performance.
-
-
- When effecting major changes, make sure all the demo applications which
- ship with eXist-db are still working as expected. Testing of the main user
- interfaces including Java WebStart client and WebDAV helps to avoid
- surprises at release time.
-
-
- Documentation, whilst often overlooked this is critical to getting users
- to accept and test any new feature. If you add features without
- documentation they are worthless to the community.
-
-
- Atomicity! Please consider how you group your commits together. A feature
- should be contributed as an atomic commit, this enables co-developers to
- easily follow and test the feature. During development if you need to clean
- existing code up, please commit this at the time labelled as 'cleaning up',
- this makes your final commit much more concise.
-
-
- Very large commits. If possible, without breaking existing functionality,
- it can be useful to break very large commits up into a few smaller atomic
- commits spanning a couple of days. This allows other users to test and help
- identify any parts of your code which might introduce issues.
-
-
- Commit tagging, helps us to generate lists of what has changed been
- releases. Please prefix your commit messages with an appropriate tag:
-
-
- [bugfix]
-
-
- [lib-change]
-
-
- [feature]
-
-
- [ignore]
-
-
- [format-change]
-
-
- [documentation]
-
-
- [documentation-fix]
-
-
- [performance]
-
-
- [testsuite]
-
-
- [building]
-
-
- The change log scripts will ignore any messages which do not start with one
- of the tags above or whose tag is [ignore].
-
-
-
-
-
-
-
- Finally
-
- Open Source projects are not a democracy, although they are not far from that.
- Breaking, unknown and untested commits cause a lot of pain and lost hours to your
- fellow developers.
- Whilst we of course wish to encourage and nurture contributions to the project,
- these have to happen in a manner that everyone involved in the project can cope
- with. However, as an absolute last measure, if developers frequently fail to adhere
- to the Manifesto then Commit access to the eXist-db repository could be revoked by
- the core developers.
-
-
-
-
-
- Appendix: How to enable all and test
-
- It is essential that none of the existing code breaks because of your commit. Here is how to be sure
- all code can be built and tested:
-
-
- Edit conf.xml (or actually the original file conf.xml.tmpl)
-
-
- Uncomment all (really, all) builtin-modules under xpath
- /exist/xquery/builtin-modules
-
-
-
- Activate the spatial index by uncomment the index-module "spatial-index" under
- xpath /exist/indexer/modules (the corresponding function module is uncommented
- by the first step.
-
-
-
-
- Edit local.build.properties, switch-on all modules
-
-
- The Oracle module can be switched to false, the required jar is a bit
- difficult to download
-
-
- Switch all on modules on with the command
-
- cat build.properties | sed 's/false/true/g' > local.build.properties
-
-
-
-
-
+
+ eXist-db Developer Manifesto
+ 1Q18
+
+ exist
+
+
+
+
+
+ This document lays out guidelines for developers that are either committing directly to the eXist-db code base via the projects GitHub
+ repository or developing externally, for later incorporation into eXist-db.
+
+
+
+
+ Communication
+
+ Communication between developers and within Open Source projects can be a hard thing to achieve effectively, but to ensure the success of
+ the project and contributions, we must all strive to improve on communicating our intentions.
+ Public and open discussion of all new features and changes to existing features must always be undertaken. eXist-db is
+ a community project and the community must be invited to provide input and feedback on development. Development discussions take place through
+ the eXist-db Development mailing
+ list.
+ If conflicts of interest occur during discussion, they must be resolved before code changes are made. If conflicts cannot be resolved by the
+ community, usually one of the core maintainers acts as a moderator. Core maintainers are contributors who feel responsible for the project as a
+ whole and have shown it in the past their commitment and support. Right now this includes: Pierrick Brihaye, Wolfgang Meier, Leif-Jöran Olsson,
+ Adam Retter and Dannes Wessels. We name those people, so you know who to talk to, but the list is in no way exclusive and may change over
+ time.
+
+
+
+
+
+ Maintainability
+
+ All code accepted must be maintainable. Otherwise there is the possibility that it will grow stale and, without maintainers, will be removed
+ from the code base.
+ To ensure a happy future for the code base, each contributor has a responsibility to ensure:
+
+
+ New code and bug-fixes must be accompanied by JUnit/XQuery/XSpec test cases. This helps us understand intention and
+ avoid regressions.
+
+
+ Code must be appropriately commented (including javadoc/xqdoc), so the intention is understood. Industry standard code formatting rules
+ must be followed. This helps us read and understand contributions.
+
+
+ Code must be tagged with the developers name and email address. This helps us contact contributors/maintainers should issues
+ arrive.
+
+
+ Consider the maintainability of new features: will you maintain and support them over years? If not, who will? How do you communicate
+ what is required?
+
+
+
+
+
+
+
+ Developing
+
+
+
+ Follow Industry Standard coding conventions.
+
+
+ eXist-db is now developed atop Sun Java 8, so make use of Java 8 features for cleaner, safer and more efficient code.
+
+
+ New Features must be generic and applicable to an audience of more than one or two. Consider whether the eXist-db
+ community would see this as a valuable feature (you should have already discussed this via the eXist-db Development mailing list). If a
+ feature is just for you and/or your customer, it may have no place in eXist-db's main code base.
+
+
+ Major new features or risky changes must be developed in their own branch. Once tested (this must include user testing) they may then be
+ integrated back into the main code base.
+
+
+ Follow a RISC like approach to developing new functions. It is better to have a single function that is flexible than multiple function
+ signatures for the same function. Likewise, do not replace two functions by offering one new super function. Functions should act like
+ simple building blocks that can be combined together.
+
+
+ The use of Static Analysis tools is highly recommended. These tools reduce risk, and are valuable even to the most highly skilled
+ developers. Such tools include Checkstyle, FindBugs and
+ PMD.
+
+
+
+
+
+
+
+ Before Committing
+
+
+
+
+ TEST, TEST and TEST again! See last section how to do this.
+
+
+ Execute the JUnit test suite to ensure that there are no regressions. If there are any regressions, do not commit!
+
+
+ Execute the XQTS test suite to ensure that there are no regressions. If there are any regressions, do not commit!
+
+
+ If you are working in an area of performance, there is a Benchmark test suite that you should run.
+
+
+ When effecting major changes, make sure all the demo applications which ship with eXist-db are still working as expected. Testing of the
+ main user interfaces, including Java WebStart client and WebDAV, helps to avoid surprises at release time.
+
+
+ Documentation, whilst often overlooked, is critical in getting users to accept and test any new feature. If you add features without
+ documentation they are worthless to the community.
+
+
+ Have a look at the Code Review Guide and take
+ its recommendations at heart!
+
+
+ Atomicity! Please consider how you group commits together. A feature should be contributed as an atomic commit, this enables
+ co-developers to easily follow and test the feature. If you need to clean up existing code during development, please commit this labelled
+ 'cleaning up'. This makes your final commit much more concise.
+
+
+ If possible, without breaking existing functionality, it can be useful to break very large commits into a few smaller atomic ones,
+ spanning a couple of days. This allows other users to test and help identify any parts of your code which might introduce issues.
+
+
+ Commit tagging helps us to generate lists of what has changed between releases. Please prefix your commit messages with an appropriate
+ tag:
+
+
+ [bugfix]
+
+
+ [lib-change]
+
+
+ [feature]
+
+
+ [ignore]
+
+
+ [format-change]
+
+
+ [documentation]
+
+
+ [documentation-fix]
+
+
+ [performance]
+
+
+ [testsuite]
+
+
+ [building]
+
+
+ The change log scripts will ignore any messages which do not start with one of the tags above or whose tag is [ignore].
+
+
+
+
+
+
+
+ Finally
+
+ Open Source projects are almost but quite not a democracy. Breaking, unknown and/or untested commits cause a lot of pain and lost hours to
+ your fellow developers.
+ Whilst we of course wish to encourage and nurture contributions to the project, these have to happen in a manner everyone involved in the
+ project can cope with. However, as an absolute last measure, if developers frequently fail to adhere to the Manifesto then Commit access to the
+ eXist-db repository shall be revoked by the core developers.
+
+
+
+
+
+ How to enable all and test
+
+ It is essential that none of the existing code breaks because of your commit. Here is how to be sure all code can be built and
+ tested:
+
+
+ Edit conf.xml (or actually the original file conf.xml.tmpl)
+
+
+ Uncomment all (really, all) built-in modules under Xpath /exist/xquery/builtin-modules
+
+
+
+ Activate the spatial index by uncommenting the index-module spatial-index under Xpath /exist/indexer/modules (the
+ corresponding function module is uncommented in the first step.
+
+
+
+
+ Edit local.build.properties and switch on all modules
+
+
+ The Oracle module can be left switched off, the required jar is a bit difficult to download
+
+
+ Switch all modules on with the command
+ cat build.properties | sed 's/false/true/g' > local.build.properties
+
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_rest/devguide_rest.xml b/src/main/xar-resources/data/devguide_rest/devguide_rest.xml
index d3083889..317db023 100644
--- a/src/main/xar-resources/data/devguide_rest/devguide_rest.xml
+++ b/src/main/xar-resources/data/devguide_rest/devguide_rest.xml
@@ -1,377 +1,329 @@
-
- Developer's Guide
- September 2009
-
- TBD
-
-
-
-
-
-
- REST-Style Web API
-
- eXist-db provides a REST-style (or RESTful) API through HTTP,
- which provides the simplest and quickest way to access the database. To implement
- this API, all one needs is an HTTP client, which is provided by nearly all
- programming languages and environments. However, not all of the database features
- are available using this approach.
- When running eXist-db as a stand-alone server - i.e. when the database has been
- started using the shell-script bin/server.sh (Unix) or batch
- file bin/server.bat (Windows/DOS) - HTTP access is supported
- through a simple, built-in web server. This web server however has limited
- capabilities restricted to the basic operations defined by eXist's REST API (e.g.
- GET, POST, PUT and ,
- DELETE).
- When running in a servlet-context, this same server functionality is provided by
- the EXistServlet. In the standard eXist distribution,
- this servlet is configured to have a listen address at:
- http://localhost:8080/exist/rest/
- Both the stand-alone server and the servlet rely on Java class
- org.exist.http.RESTServer to do the actual work.
- The server treats all HTTP request paths as paths to a database collection, i.e.
- all resources are read from the database instead of the file system. Relative paths
- are therefore resolved relative to the database root collection. For example, if you
- enter the following URL into your web-browser:
- http://localhost:8080/exist/rest/db/shakespeare/plays/hamlet.xml
- the server will receive an HTTP GET request for the resource
- hamlet.xml in the collection
- /db/shakespeare/plays in the database. The server will look for
- this collection, and check if the resource is available, and if so, retrieve its
- contents and send them back to the client. If the document does not exist, an
- HTTP 404 (Not Found) status response will be returned.
- To keep the interface simple, the basic database operations are directly mapped to
- HTTP request methods wherever possible. The following request methods are supported:
-
-
- GET
-
- Retrieves a representation of the resource or collection from the
- database. XQuery and XPath queries may also be specified using GET's
- optional parameters applied to the selected resource.
-
-
-
- PUT
-
- Uploads a resource onto the database. If required, collections are
- automatically created, and existing resources are overwritten.
-
-
-
- DELETE
-
- Removes a resource (document or collection) from the database.
-
-
-
- POST
-
- Submits data in the form of an XML fragment in the content of the
- request which specifies the action to take. The fragment can be either
- an XUpdate document or a query request. Query requests are used to pass
- complex XQuery expressions too large to be URL-encoded.
-
-
-
-
-
-
-
- HTTP Authentication
-
- The REST server and servlet support basic HTTP authentication, and only valid
- users can access the database. If no username and password are specified, the
- server assumes a "guest" user identity, which has limited capabilities. If the
- username submitted is not known, or an incorrect password is submitted, an error
- page (Status code 403 - Forbidden) is returned.
-
-
-
-
-
- GET Requests
-
- If the server receives an HTTP GET request, it first tries to locate known
- parameters. If no parameters are given or known, it will try to locate the
- collection or document specified in the URI database path, and return a
- representation of this resource the client. Note that when the located resource
- is XML, the returned content-type attribute value will be
- application/xml, and for binary resources
- application/octet-stream.
- If the path resolves to a database collection, the retrieved results are
- returned as an XML fragment. An example fragment is shown below:
-
- XML Results for GET Request for a Collection
-
-
- If an xml-stylesheet processing instruction is found in an
- XML document being requested, the database will try to apply the stylesheet
- before returning the document. Note that in this case, any relative path in a
- hypertext link will be resolved relative to the location of the source document.
- For example, if the document hamlet.xml, which is stored in
- collection /db/shakespeare/plays contains the XSLT
- processing instruction:
- <?xml-stylesheet type="application/xml" href="shakes.xsl"?>
- then the database will try to load the stylesheet from
- /db/shakespeare/plays/shakes.xsl and apply it to the
- document.
- Optionally, GET accepts the following request parameters, which must be
- URL-encoded:
-
-
- _xsl=XSL Stylesheet
-
-
- Applies an XSL stylesheet to the requested resource. If the
- _xsl parameter contains
- an external URI, the corresponding external resource is retrieved.
- Otherwise, the path is treated as relative to the database root
- collection and the stylesheet is loaded from the database. This
- option will override any XSL stylesheet processing instructions
- found in the source XML file.
- Setting _xsl to
- no disables any stylesheet processing. This is
- useful for retrieving the unprocessed XML from documents that have a
- stylesheet declaration.
-
- If your document has a valid XSL stylesheet declaration, the
- web browser may still decide to apply the XSL. In this case, passing
- _xsl=no has no visible effect, though the XSL
- is now rendered by the browser, not eXist.
-
-
-
-
- _query=XPath/XQuery Expression
-
-
- Executes a query specified by the request. The collection or
- resource referenced in the request path is added to the set of
- statically known documents for the query.
-
-
-
- _indent=yes | no
-
-
- Returns indented pretty-print XML. The
- default value is yes.
-
-
-
- _encoding=Character Encoding Type
-
-
- Sets the character encoding for the resultant XML. The
- default value is UTF-8.
-
-
-
- _howmany=Number of Items
-
-
- Specifies the number of items to return from the resultant
- sequence. The default value is 10.
-
-
-
- _start=Starting Position in Sequence
-
-
- Specifies the index position of the first item in the result
- sequence to be returned. The default value is 1.
-
-
-
- _wrap=yes | no
-
-
- Specifies whether the returned query results are to be wrapped
- into a surrounding exist:result element. The
- default value is yes.
-
-
-
- _source=yes | no
-
-
- Specifies whether the query should display its source code instead of being executed. The
- default value is no, but see the allow-source section in descriptor.xml
- to explicitely allow such a behaviour.
-
-
-
- _cache=yes | no
-
-
- If set to "yes", the query results of the current query are stored
- into a session on the server. A session id will be returned with the
- response. Subsequent requests can pass this session id via the
- _session parameter. If the server finds a valid session
- id, it will return the cached results instead of re-evaluating the query.
- For more info see below.
-
-
-
-
- _session=session id
-
-
- Specifies a session id returned by a previous query request.
- If the session is valid, query results will be read from the
- cached session.
-
-
-
- _release=session id
-
-
- Release the session identified by the session id.
-
-
-
- EXAMPLE: The following URI will find all SPEECH elements in
- the collection /db/shakespeare with "Juliet" as the
- SPEAKER. As specified, it will return 5 items from the
- result sequence, starting at position 3:
- http://localhost:8080/exist/rest/db/shakespeare?_query=//SPEECH[SPEAKER=%22JULIET%22]&_start=3&_howmany=5
-
-
-
-
-
- PUT Requests
-
- Documents can be stored or updated using an HTTP PUT request. The request URI
- points to the location where the document will be stored. As defined by the HTTP
- specifications, an existing document at the specified path will be updated, i.e.
- removed, before storing the new resource. As well, any collections defined in
- the path that do not exist will be created automatically.
- For example, the following Python script stores a document (the name of which
- is specified on the command-line) in the database collection
- /db/test, which will be created if this collection does not
- exist. Note that the HTTP header field content-type is
- specified as application/xml, since otherwise the document is stored
- as a binary resource.
-
- PUT Example using Python (See: samples/http/put.py)
-
-
-
-
-
-
-
- DELETE Requests
-
- DELETE removes a collection or resource from the database. For this, the
- server first checks if the request path points to an existing database
- collection or resource, and once found, removes it.
-
-
-
-
-
- POST Requests
-
- POST requests require an XML fragment in the content of the request, which
- specifies the action to take.
- If the root node of the fragment uses the XUpdate namespace
- (http://www.xmldb.org/xupdate), the fragment is sent to
- the XUpdateProcessor to be processed. Otherwise, the root node will have the
- namespace for eXist requests
- (http://exist.sourceforge.net/NS/exist), in which case the
- fragment is interpreted as an extended query request.
- Extended query requests can be used to post complex XQuery scripts that are too
- large to be encoded in a GET request.
- The structure of the POST XML request is as follows:
-
- Extended Query Request
-
-
- The root element query identifies the fragment as an extended
- query request, and the XQuery expression for this request is enclosed in the
- text element. The start, max, cache and session-id attributes
- have the same meaning as the corresponding GET parameters. Optional output properties, such as
- pretty-print, may be passed in the properties element. An
- example of POST for Perl is provided below:
-
- POST Example using Perl (See: samples/http/search.pl)
-
-
-
- Please note that you may have to enclose the XQuery expression in a CDATA
- section (i.e. <![CDATA[ ... ]]>) to avoid parsing errors (this
- is not shown above).
-
- The returned query results are enclosed in the exist:result
- element, which are shown below for the above example:
-
- Returned Results for POST Request
-
-
-
-
-
-
-
- Calling Stored XQueries
-
- The REST interface supports stored XQueries on the server: if the target
- resource of a GET or POST request is a binary resource with the mime-type
- application/xquery, the REST server will try to compile and
- execute it as an XQuery. The XQuery has access to the entire HTTP context, including
- parameters and session attributes.
- Stored XQueries are a good way to provide dynamic
- views on the data or create small services. However, they can do more:
- since you can also store binary resources like
- images, CSS stylesheets or Javascript files into a database collection, it is
- easily possible to serve a complex application entirely out of the database.
-
- Please have a look at the example
- Using XQuery for Web Applications
- on the demo server.
-
-
-
-
-
- Cached Query Results
-
- When executing queries using GET or POST, the server can cache query results
- in a server-side session. The results are cached in memory. In general, memory
- consumption will be low for query results which reference nodes stored in the
- database. It is high for nodes constructed within the XQuery itself.
- To create a session and store query results into it, pass parameter
- _cache=yes with a GET request or set attribute
- cache="yes" within the XML payload of a POST query request.
- The server will execute the query as usual. If the result sequence contains
- more than one item, the entire sequence will be stored into a newly created
- session.
- The id of the created session is included with the response. For requests
- which return a exist:result wrapper element, the session id
- will be specified in the exist:session attribute. The session
- id is also available in the HTTP header X-Session-Id. The
- following example shows the header and exist:result
- tag returned by the server:
-
- Sample Response
-
-
- The session id can then be passed with subsequent requests to retrieve
- further chunks of data without re-evaluating the query. For a GET request, pass
- the session id with parameter _session. For a POST request,
- add an attribute session="sessionId" to the XML content of the
- request.
- If the session does not exist or has timed out, the server will simply
- re-evaluate the query. The timeout is set to 2 minutes.
- A session can be deleted by sending a GET request to an arbitrary collection
- url. The session id is passed in a parameter _release:
- http://localhost:8080/exist/rest/db?_release=0
-
-
-
\ No newline at end of file
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ REST-Style Web API
+ 1Q18
+
+ application-development
+ interfaces
+
+
+
+
+
+ eXist-db provides a REST-style (or RESTful) API through HTTP, which provides a simple and quick way to access the
+ database. To use this API, all one needs is an HTTP client, which is provided by nearly all programming languages and environments. Or simply use
+ a web-browser…
+
+
+
+
+ Introduction
+
+ In the standard eXIst-db configuration, the system will listen for REST request at:
+ http://localhost:8080/exist/rest/
+
+ The server treats all HTTP request paths as paths to a database collection (instead of the file system). Relative paths are resolved
+ relative to the database root collection. For instance:
+ http://localhost:8080/exist/rest/db/shakespeare/plays/hamlet.xml
+ The server will receive an HTTP GET request for the resource hamlet.xml in the collection
+ /db/shakespeare/plays. Itr will look for this collection and check if the resource is available. If so, it will retrieve
+ its contents and send this back to the client. If the document does not exist, an HTTP 404 (Not Found) status response will
+ be returned.
+
+ To keep the interface simple, the basic database operations are directly mapped to HTTP request methods wherever possible:
+
+
+
+ GET
+
+
+ Retrieves a resource or collection from the database. XQuery and XPath queries may also be specified using GET's optional parameters
+ applied to the selected resource. See .
+
+
+
+
+ PUT
+
+
+ Uploads a resource to the database. If required, collections are automatically created and existing resources overwritten. See .
+
+
+
+
+ DELETE
+
+
+ Removes a resource (document or collection) from the database. See .
+
+
+
+
+ POST
+
+
+ Submits an XML fragment in the content of the request. This fragment specifies the action to take. The fragment can be either an
+ XUpdate document or a query request. Query requests are used to pass complex XQuery expressions too large to be URL-encoded. See .
+
+
+
+
+
+ When running eXist-db as a stand-alone server(when the database has been started using the shell-script bin/server.sh
+ (Unix) or batch file bin/server.bat (Windows/DOS)), HTTP access is supported through a simple, built-in web server. This
+ web server has limited capabilities, restricted to the basic operations defined by eXist's REST API (GET,
+ POST, PUT and DELETE).
+ When running in a servlet-context (the usual way of starting eXist-db), this same server functionality is provided by the
+ EXistServlet.
+ Both the stand-alone server and the servlet rely on Java class org.exist.http.RESTServer to do the actual work.
+
+
+
+
+
+
+ HTTP Authentication
+
+ Authentication is done through the basic HTTP authentication mechanism so only authenticated users can access the database. If no username
+ and password are specified, the server assumes a "guest" user identity, which has limited capabilities. If the username submitted is not known,
+ or an incorrect password is submitted, an error page (403 - Forbidden) is returned.
+
+
+
+
+
+ GET Requests
+
+ If the server receives an HTTP GET request, it first checks the request for known parameters. If no parameters are given or
+ known it will try to locate the collection or document specified in the URI database path and return a representation of this resource the
+ client.
+
+ When the located resource is XML, the returned content-type attribute value is application/xml and
+ for binary resources application/octet-stream.
+
+ If the path resolves to a database collection, the retrieved results are returned as an XML fragment. For example:
+
+
+ If an xml-stylesheet processing instruction is found in a requested XML document, the database will try to apply this
+ stylesheet before returning the document. A relative path will be resolved relative to the location of the source document. For example, if the
+ document hamlet.xml, which is stored in collection /db/shakespeare/plays contains the XSLT processing
+ instruction:
+ <?xml-stylesheet type="application/xml" href="shakes.xsl"?>
+ The database will load the stylesheet from /db/shakespeare/plays/shakes.xsl.
+
+ GET accepts the following optional request parameters (which must be URL-encoded):
+
+
+
+ _xsl=XSL Stylesheet
+
+
+ Applies an XSL stylesheet to the requested resource. A relative path is considered relative to the database root collection. This
+ option will override any XSL stylesheet processing instructions found in the source XML file.
+ Setting _xsl to no disables any stylesheet processing. This is useful for retrieving unprocessed
+ XML from documents that have a stylesheet declaration.
+
+ If your document has a valid XSL stylesheet declaration, the web browser may still decide to apply the XSL. In this case, passing
+ _xsl=no has no visible effect, though the XSL is now rendered by the browser, not
+ eXist.
+
+
+
+
+
+ _query=XPath/XQuery Expression
+
+
+ Executes the query specified. The collection or resource referenced in the request path is added to the set of statically known
+ documents for the query.
+
+
+
+
+ _indent=yes | no
+
+
+ Whether to return indented pretty-printed XML. The default value is yes.
+
+
+
+
+ _encoding=Character Encoding Type
+
+
+ Sets the character encoding for the resulting XML. The default value is UTF-8.
+
+
+
+
+ _howmany=Number of Items
+
+
+ Specifies the maximum number of items to return from the result sequence. The default value is 10.
+
+
+
+
+ _start=Starting Position in Sequence
+
+
+ Specifies the index position of the first item in the result sequence to return. The default value is 1.
+
+
+
+
+ _wrap=yes | no
+
+
+ Specifies whether the returned query results must be wrapped in a parent exist:result element. The default value is
+ yes.
+
+
+
+
+ _source=yes | no
+
+
+ Specifies whether the query should display its source code instead of being executed. The default value is no. See
+ the allow-source section in descriptor.xml about explicitly allowing this behaviour.
+
+
+
+
+ _cache=yes | no
+
+
+ If set to yes, the results of the current query are stored in a session on the server. A session id will be returned with
+ the response. Subsequent requests can pass this session id via the _session parameter. If the server finds a valid
+ session id, it will return the cached results instead of re-evaluating the query. See below.
+
+
+
+
+ _session=session id
+
+
+ Specifies a session id returned by a previous query request. Query results will be read from the cached session.
+
+
+
+
+ _release=session id
+
+
+ Release the session identified by session id.
+
+
+
+ As an example: The following URI will find all SPEECH elements in the collection /db/shakespeare with "Juliet"
+ as the SPEAKER. As specified, it will return 5 items from the result sequence, starting at position 3:
+ http://localhost:8080/exist/rest/db/shakespeare?_query=//SPEECH[SPEAKER=%22JULIET%22]&_start=3&_howmany=5
+
+
+
+
+
+ PUT Requests
+
+ Documents can be stored or updated in the database using an HTTP PUT request. The request URI points to the location where the
+ document must be stored. As defined by the HTTP specifications, an existing document at the specified path will be updated. Any collections
+ defined in the path that do not exist are created automatically.
+ For example, the following Python script stores a document (the name is specified on the command-line) in the database collection
+ /db/test,. This will be created if it does not exist. Note that the HTTP header field content-type is
+ specified as application/xml, since otherwise the document would be stored as a binary resource.
+
+
+
+
+
+
+
+ DELETE Requests
+
+
+ DELETE removes a collection or resource from the database.
+
+
+
+
+
+ POST Requests
+
+
+ POST requests require an XML fragment in the content of the request. This fragment specifies the action to take.
+
+
+ If the root node of the fragment uses the XUpdate namespace (http://www.xmldb.org/xupdate), the fragment is sent to
+ the XUpdateProcessor to be processed.
+
+
+ Otherwise the root node must have the namespace for eXist requests (http://exist.sourceforge.net/NS/exist). The
+ fragment is interpreted as an extended query request. Extended query requests can be used to post complex XQuery
+ scripts that are too large to be encoded in a GET request.
+
+
+
+
+ The structure of the POST XML request is as follows:
+
+
+ The root element query identifies the fragment as an extended query request. The XQuery expression for this request is
+ enclosed in the text element. The start, max, cache and session-id
+ attributes have the same meaning as the corresponding GET parameters (see ).
+
+ You may have to enclose the XQuery expression in a CDATA section (i.e. <![CDATA[ ... ]]>) to avoid parsing errors.
+
+ Optional output properties, such as pretty-print, can be passed in the properties element.
+ An example of POST for Perl is provided below:
+
+
+ The returned query results are enclosed in an exist:result element:
+
+
+
+
+
+
+ Calling Stored XQueries
+
+ The REST interface supports executing stored XQueries on the server. If the target resource of a GET or POST
+ request is a binary resource with the mime-type application/xquery, the REST server will try to compile and execute it as an
+ XQuery script. The script has access to the entire HTTP context, including parameters and session attributes.
+ Stored XQueries are a good way to provide dynamic views on data or create small services. However, they can do more: because you can also
+ store binary resources like images, CSS stylesheets or Javascript files into a database collection, it is entirely possible to serve a complex
+ application out of the database. For instance, have a look at the example Using XQuery for Web Applications on the demo
+ server.
+
+
+
+
+
+ Cached Query Results
+
+ When executing queries using GET or POST, the server is able to cache query results in a server-side session.
+ These results are cached in memory.
+
+ Memory consumption will be low for query results which reference nodes stored in the database and high for nodes constructed within the
+ XQuery itself.
+
+ To create a session and store query results, pass _cache=yes with a GET request or set attribute
+ cache="yes" within the XML payload of a POST query request. The server will execute the query as usual. If the
+ result sequence contains more than one item, the entire sequence will be stored into a newly created session.
+ The id of the created session is included in the response. For requests which return a exist:result wrapper element, the session
+ id will be specified in the exist:session attribute. The session id is also available in the HTTP header
+ X-Session-Id.
+ The following example shows an example of the HTTP header and exist:result tag returned by the server:
+
+
+ The session id can be passed with subsequent requests to retrieve further chunks of data without re-evaluating the query. For a
+ GET request, pass the session id with parameter _session. For a POST request, add an attribute
+ session="sessionId" to the XML content of the request.
+ If the session does not exist or has timed out, the server will re-evaluate the query. The timeout is set to 2 minutes.
+ A session can be deleted by sending a GET request to an arbitrary collection URL. Pass the session id in the
+ _release parameter:
+ http://localhost:8080/exist/rest/db?_release=0
+
+
+
diff --git a/src/main/xar-resources/data/devguide_soap/devguide_soap.xml b/src/main/xar-resources/data/devguide_soap/devguide_soap.xml
index 1bee70d2..9e9cf79e 100644
--- a/src/main/xar-resources/data/devguide_soap/devguide_soap.xml
+++ b/src/main/xar-resources/data/devguide_soap/devguide_soap.xml
@@ -1,63 +1,75 @@
-
- Developer's Guide
- September 2009
-
- TBD
-
-
-
-
-
-
- SOAP
-
- Beginning with version 0.8, eXist-db provides a SOAP interface as an alternative to
- XML-RPC. Programming with SOAP is slightly more convenient than XML-RPC. While you
- have to write XML-RPC method calls by hand, most SOAP tools will automatically
- create the low-level code from a given WSDL service description. Also fewer methods
- are needed to exploit the same functionality. On the other hand, SOAP toolkits tend
- to be complex.
- eXist-db uses the Axis SOAP toolkit from Apache, which runs as a servlet. The Tomcat
- webserver shipped with eXist has been configured to start Axis automatically, and
- will listen on port 8080: http://localhost:8080/exist/services.
- Note however that SOAP is not available in the stand-alone server.
- The interface has been tested using various clients, including Perl (SOAP::Lite)
- and the Microsoft .NET framework. The client stubs needed to access the SOAP
- interface from Java have been automatically generated by Axis and are included in
- the distribution.
- eXist-db provides two web services: one that contains methods to query the server and
- retrieve documents, and a second for storing and removing documents and collections.
- The first will by default listen on:
- http://localhost:8080/exist/services/Query
- while the second is available on:
- http://localhost:8080/exist/services/Admin
- Both services are described in the Java docs regarding their interfaces. Visit:
- org.exist.soap.Query and
- org.exist.soap.Admin for more
- information.
- The following SOAP example (available at:
- samples/org/exist/examples/soap/GetDocument.java)
- demonstrates how to retrieve a document from the database:
-
- Retrieving a document (SOAP)
-
-
- In this example, the Query client stub class has been
- automatically generated by the WSDL service description, and has methods for each of
- the operations defined in WSDL. You will find the web service description file
- query.wsdl in directory src/org/exist/soap. You may also get the WSDL
- directly from the server by pointing your web browser to
- http://localhost:8080/exist/services/Query?WSDL.
- To use the services provided, the client first has to establish a connection with
- the database. This is done by calling connect() with a
- valid user id and password. connect() returns a session id,
- which can then be passed to any subsequent method calls.
- To retrieve a resource we simply call
- Query.getResource(). And to release the current session,
- the method Query.disconnect() is called. Otherwise the
- session will remain valid for at least 60 minutes.
-
+
+ SOAP Interface Developer's Guide
+ 1Q18
+
+ java-development
+ interfaces
+
+
+
+
+
+ This article explains how to add a SOAP interface to eXist-db using Java code.
+
+
+
+
+ Introduction
+
+ eXist-db provides a SOAP interface as an alternative to XML-RPC. Programming with SOAP is slightly more convenient than XML-RPC. While you
+ have to write XML-RPC method calls by hand, most SOAP tools will automatically create the low-level code from a given WSDL service description.
+ Also fewer methods are needed to exploit the same functionality. On the other hand, SOAP toolkits tend to be complex.
+
+ SOAP is not available in the (default configuration) stand-alone server.
+
+ eXist-db uses the Axis SOAP toolkit from Apache, which runs as a servlet. The Tomcat webserver shipped with eXist has been configured to
+ start Axis automatically, and will listen on port 8080: http://localhost:8080/exist/services.
+
+ The interface has been tested using various clients, including Perl (SOAP::Lite) and the Microsoft .NET framework. The client stubs needed
+ to access the SOAP interface from Java have been automatically generated by Axis and are included in the distribution.
+
+
+
+
+
+
+ Web services
+
+ eXist-db provides two web services: one that contains methods to query the server and retrieve documents:
+
+
+ A service to query the server and retrieve documents. This will listen on:
+ http://localhost:8080/exist/services/Query
+
+
+ A service for storing and removing documents and collections. This will listen on:
+ http://localhost:8080/exist/services/Admin
+
+
+
+ Both services are described in the Java docs regarding their interfaces.
+
+
+
+
+
+ Example
+
+ The following example demonstrates how to retrieve a document from the database using SOAP:
+
+
+
+ The Query client stub class has been automatically generated by the WSDL service description. It has methods for each of
+ the operations defined in the WSDL. You will find the web service description file query.wsdl in directory
+ src/org/exist/soap.
+ To use the services provided, the client first has to establish a connection with the database. This is done by calling
+ connect() with a valid user id and password. connect() returns a session id, which can then be passed to
+ any subsequent method calls.
+ To retrieve a resource simply call Query.getResource().
+ To release the current session, the method Query.disconnect() is called. Otherwise the session will remain valid for at
+ least 60 minutes.
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_xmldb/devguide_xmldb.xml b/src/main/xar-resources/data/devguide_xmldb/devguide_xmldb.xml
index 9a5eeab3..fc8a5435 100644
--- a/src/main/xar-resources/data/devguide_xmldb/devguide_xmldb.xml
+++ b/src/main/xar-resources/data/devguide_xmldb/devguide_xmldb.xml
@@ -1,209 +1,160 @@
-
- Developer's Guide
- September 2009
-
- TBD
-
-
-
-
-
-
- Writing Java Applications with the XML:DB API
-
- The preferred way to work with eXist-db when developing Java applications is to use
- the XML:DB API. This API provides a common interface to native or XML-enabled
- databases and supports the development of portable, reusable applications. eXist's
- implementation of the XML:DB standards follows the Xindice implementation, and
- conforms to the latest working drafts put forth by the XML:DB Initiative. For more
- information, refer to the Javadocs for this API.
- The basic components employed by the XML:DB API are drivers,
- collections, resources and
- services.
-
- Drivers are implementations of the database interface that
- encapsulate the database access logic for specific XML database products. They are
- provided by the product vendor and must be registered with the database manager.
- A collection is a hierarchical container for
- resources and further sub-collections. Currently two
- different resources are defined by the API: XMLResource and
- BinaryResource. An XMLResource
- represents an XML document or a document fragment, selected by a previously executed
- XPath query.
- Finally, services are requested for special tasks such as
- querying a collection with XPath, or managing a collection.
-
- There are several XML:DB examples provided in eXist's
- samples directory . To start an example, use the start.jar jar file and pass the name of the
- example class as the first parameter, for instance:
-
-
- Programming with the XML:DB API is straightforward. You will find some code
- examples in the samples/org/exist/examples/xmldb directory. In
- the following simple example, a document can be retrieved from the eXist server and
- printed to standard output.
-
- Retrieving a Document with XML:DB
-
-
- With this example, the database driver class for eXist
- (org.exist.xmldb.DatabaseImpl) is first registered with
- the DatabaseManager. Next we obtain a
- Collection object from the database manager by calling the
- static method DatabaseManger.getCollection(). The method
- expects a fully qualified URI for its parameter value, which identifies the desired
- collection. The format of this URI should look like the following:
- xmldb:[DATABASE-ID]://[HOST-ADDRESS]/db/collection
- Because more than one database driver can be registered with the database manager,
- the first part of the URI (xmldb:exist) is required to determine
- which driver class to use. The database-id is used by the
- database manager to select the correct driver from its list of available drivers. To
- use eXist, this ID should always be "exist" (unless you have set up multiple
- database instances; additional instances may have other names).
- The final part of the URI identifies the collection path, and optionally the host
- address of the database server on the network. Internally, eXist uses two different
- driver implementations: The first talks to a remote database engine using XML-RPC
- calls, the second has direct access to a local instance of eXist. The root
- collection is always identified by /db. For example, the URI
- xmldb:exist://localhost:8080/exist/xmlrpc/db/shakespeare/plays
- references the Shakespeare collection on a remote server running the XML-RPC
- interface as a servlet at localhost:8080/exist/xmlrpc. If we leave
- out the host address, the XML:DB driver will try to connect to a locally attached
- database instance, e.g.:
- xmldb:exist:///db/shakespeare/plays
- In this case, we have to tell the XML:DB driver that it should create a new
- database instance if none has been started. This is done by setting the
- create-database property of class
- Database to "true" (more information on embedded use of eXist
- can be found in the deployment guide.
- The setProperty calls are used to set database-specific
- parameters. In this case, pretty-printing of XML output is turned on for the
- collection. eXist uses the property keys defined in the standard Java package
- javax.xml.transform. Thus, in Java you can simply use class
- OutputKeys to get the correct keys.
- Calling col.getResource() finally retrieves the document,
- which is returned as an XMLResource. All resources have a
- method getContent(), which returns the resource's content,
- depending on it's type. In this case we retrieve the content as type
- String.
- To query the repository, we may either use the standard
- XPathQueryService or eXist's
- XQueryService class. The XML:DB API defines different kinds
- of services, which may or may not be provided by the database. The
- getService method of class
- Collection calls a service if it is available. The method
- expects the service name as the first parameter, and the version (as a string) as
- the second, which is used to distinguish between different versions of the service
- defined by the XML:DB API.
- The following is an example of using the XML:DB API to execute a database query:
-
- Querying the Database with XPath(XML:DB API)
-
-
- To execute the query, method service.query(xpath) is
- called. This method returns a ResourceSet, containing the
- Resources found by the query. ResourceSet.getIterator()
- gives us an iterator over these resources. Every Resource contains a single document
- fragment or value selected by the XPath expression.
- Internally, eXist does not distinguish between XPath and XQuery expressions.
- XQueryService thus maps to the same implementation class
- as XPathQueryService. However, it provides a few additional
- methods. Most important, when talking to an embedded database,
- XQueryService allows for the XQuery expression to be
- compiled as an internal representation, which can then be reused. With compilation,
- the previous example code would look as follows:
-
- Compiling a Query (XML:DB API)
-
-
- The XML-RPC server automatically caches compiled expressions, and so calling
- compile through the remote driver produces no effect if
- the expression is already cached.
- Next, we would like to store a new document into the repository. This is done by
- creating a new XMLResource, assigning it the content of the
- new document, and calling the storeResource method of class
- Collection. First, a new Resource is created by method
- Collection.createResource(), and expects two
- parameters: the id and type of resource being created. If the id-parameter is null,
- a unique resource-id will be automatically generated .
- In some cases, the collection may not yet exist, and so we must create it. To
- create a new collection, call the createCollection method
- of the CollectionManagementService service. In the following
- example, we simply start at the root-collection object to get the
- CollectionManagementService service.
-
- Adding a File (XML:DB API)
-
-
- Please note that the XMLResource.setContent() method takes
- a Java object as its parameter. The eXist driver checks if the object is a File.
- Otherwise, the object is transformed into a String by calling the object's
- toString() method. Passing a File has one big
- advantage: If the database is running in the embedded mode, the file will be
- directly passed to the indexer. Thus, the file's content does not have to be loaded
- into the main memory. This is handy if your files are very large.
-
-
-
-
-
- Extensions to XML:DB
-
-
-
-
-
-
- Additional Services
-
- eXist provides several services in addition to those defined by the XML:DB
- specification:
- The UserManagementService service contains methods to manage users and
- handle permissions. These methods resemble common Unix commands such as
- chown or chmod. As with
- other services, UserManagementService can be retrieved
- from a collection object, as in:
-
- Another service called DatabaseInstanceManager, provides a single method to shut down the
- database instance accessed by the driver. You have to be a member of the
- dba user group to use this method or an exception will be
- thrown. See the Deployment Guide
- for an example.
- Finally, interface IndexQueryService supports access to the terms and elements
- contained in eXist's internal index. Method getIndexedElements() returns a list
- of element occurrences for the current collection. For each occurring element,
- the element's name and a frequency count is returned.
- Method scanIndexTerms() allows for a retrieval of the list of occurring words
- for the current collection. This might be useful, for example, to provide users
- a list of searchable terms together with their frequency.
-
-
-
-
-
- Multiple Database Instances
-
- As explained above, passing a local XML:DB URI to the
- DatabaseManager means that the driver will try to
- start or access an embedded database instance. You can configure more than one
- database instance by setting the location of the central configuration file. The
- configuration file is set through the configuration property of
- the DatabaseImpl driver class. If you would like to use
- different drivers for different database instances, specify a name for the
- created instance through the database-id property. You may
- later use this name in the URI to refer to a database instance. The following
- fragment sets up two instances:
-
- Multiple Database Instances
-
-
- With the above example, the URI
- xmldb:test:///db
- selects the test database instance. Both instances should have their own data
- and log directory as specified in the configuration files.
-
-
+
+ Writing Java Applications with the XML:DB API
+ 1Q18
+
+ java-development
+
+
+
+
+
+ This article explains how to work with eXist-db from Java code using the XML:DB API. This API provides a common interface to native or
+ XML-enabled databases and supports the development of portable, reusable applications.
+
+
+
+
+ Introduction
+
+ The preferred way to work with eXist-db when developing Java applications is to use the XML:DB API. eXist-db's implementation of the XML:DB
+ standards follows the XIndice implementation, and conforms to the latest working drafts put forth by the XML:DB Initiative. For more information,
+ refer to the Javadocs for this API.
+ The basic components employed by the XML:DB API are drivers, collections,
+ resources and services.
+
+ Drivers are implementations of the database interface that encapsulate the database access logic for specific XML database
+ products. They are provided by the product vendor and must be registered with the database manager.
+ A collection is a hierarchical container for resources and further sub-collections. Currently two
+ different resources are defined by the API: XMLResource and BinaryResource. An
+ XMLResource represents an XML document or a document fragment, selected by a previously executed XPath query.
+ Finally, services are requested for special tasks such as querying a collection with XPath, or managing a
+ collection.
+ There are several XML:DB examples provided in eXist's samples directory . To start an example, use the
+ start.jar jar file and pass the name of the example class as the first parameter, for instance:
+
+
+
+
+
+
+
+ Using the XML:DB API
+
+ Programming with the XML:DB API is straightforward. You will find some code examples in the
+ samples/org/exist/examples/xmldb directory.
+ In the following simple example, a document is retrieved from the eXist server and printed to standard output.
+
+
+
+ With this example, the database driver class for eXist (org.exist.xmldb.DatabaseImpl) is registered first with the
+ DatabaseManager.
+ Next we obtain a Collection object from the database manager by calling the static method
+ DatabaseManger.getCollection(). This method expects a fully qualified URI for its parameter value, which identifies the
+ desired collection. The format of this URI must be:
+ xmldb:[DATABASE-ID]://[HOST-ADDRESS]/db/collection
+ Because more than one database driver can be registered with the database manager, the first part of the URI
+ (xmldb:exist) is required to determine which driver class to use. The database-id is used by the
+ database manager to select the correct driver from its list of available drivers. To use eXist-db, this ID should always be exist
+ (unless you have set up multiple database instances; additional instances may have other names).
+ The final part of the URI identifies the collection path, and optionally the host address of the database server on the network. Internally,
+ eXist uses two different driver implementations: The first talks to a remote database engine using XML-RPC calls, the second has direct access
+ to a local instance of eXist-db. The root collection is always identified by /db. For example:
+ xmldb:exist://localhost:8080/exist/xmlrpc/db/shakespeare/plays
+ This references the Shakespeare collection on a remote server running the XML-RPC interface as a servlet at
+ localhost:8080/exist/xmlrpc.
+
+ If we leave out the host address, the XML:DB driver will try to connect to a locally attached database instance. For instance:
+ xmldb:exist:///db/shakespeare/plays
+ In this case, we have to tell the XML:DB driver that it should create a new database instance if none has been started. This is done by
+ setting the create-database property of class Database to true (more information on embedded
+ use of eXist-db can be found in the deployment guide).
+ The setProperty calls are used to set database-specific parameters. In this case, pretty-printing of XML output is turned
+ on for the collection. eXist uses the property keys defined in the standard Java package javax.xml.transform. Thus, in Java
+ you can simply use class OutputKeys to get the correct keys.
+ Calling col.getResource() finally retrieves the document, which is returned as an XMLResource. All
+ resources have a method getContent(), which returns the resource's content, depending on it's type. In this case we retrieve
+ the content as type String.
+ To query the repository, we may either use the standard XPathQueryService or eXist's XQueryService
+ class. The XML:DB API defines different kinds of services, which may or may not be provided by the database. The getService
+ method of class Collection calls a service if it is available. The method expects the service name as the first parameter,
+ and its version (as a string) as the second.
+
+ The following is an example of using the XML:DB API to execute a database query:
+
+ To execute the query, method service.query(xpath) is called. This method returns a ResourceSet,
+ containing the Resources found by the query. ResourceSet.getIterator() gives us an iterator over these resources. Every
+ Resource contains a single document fragment or value, selected by the XPath expression.
+ Internally, eXist does not distinguish between XPath and XQuery expressions. XQueryService thus maps to the same
+ implementation class as XPathQueryService. However, it provides a few additional methods. Most important, when talking to an
+ embedded database, XQueryService allows for the XQuery expression to be compiled to an internal
+ representation, which can then be reused. With compilation, the previous example code would look as follows:
+
+
+ The XML-RPC server automatically caches compiled expressions, and so calling compile through the remote driver produces
+ no effect if the expression is already cached.
+ Next, we would like to store a new document into the repository. This is done by creating a new XMLResource, assigning it
+ the content of the new document, and calling the storeResource method of class Collection.
+ First, a new Resource is created by method Collection.createResource(), and expects two parameters: the id and type of
+ resource being created. If the id parameter is null, a unique resource-id will be automatically generated .
+ In some cases, the collection may not yet exist, and so we must create it. To create a new collection, call the
+ createCollection method of the CollectionManagementService service. In the following example, we simply
+ start at the root collection object to get the CollectionManagementService service.
+
+
+
+
+ The XMLResource.setContent() method takes a Java object as its parameter. The eXist driver checks if the object is a
+ File. Otherwise, the object is transformed into a String by calling the object's toString() method. Passing a File has one
+ big advantage: If the database is running in embedded mode, the file will be directly passed to the indexer. Thus, the file's content does not
+ have to be loaded into memory. This is handy if your files are very large.
+
+
+
+
+
+
+
+ Extensions to XML:DB
+
+ eXist provides extensions on top of the XML:DB specification.
+
+
+
+
+ Additional Services
+
+ The UserManagementService service contains methods to manage users and handle permissions. These methods resemble common Unix
+ commands such as chown or chmod. As with other services, UserManagementService can be
+ retrieved from a collection objectn:
+
+ Another service called DatabaseInstanceManager provides a single method to shut down the database instance accessed by the
+ driver. You have to be a member of the dba user group to use this method or an exception will be thrown.
+ Finally, interface IndexQueryService supports access to the terms and elements contained in eXist's internal index. Method
+ getIndexedElements() returns a list of element occurrences for the current collection. For each occurring element, the
+ element's name and a frequency count is returned.
+ Method scanIndexTerms() allows for a retrieval of the list of occurring words for the current collection. This can be useful
+ to provide users a list of searchable terms together with their frequency.
+
+
+
+
+
+ Multiple Database Instances
+
+ As explained above, passing a local XML:DB URI to the DatabaseManager means that the driver will try to start or access
+ an embedded database instance. You can configure more than one database instance by setting the location of the central configuration
+ file.
+ The configuration file is set through the configuration property of the DatabaseImpl driver class.
+ If you would like to use different drivers for different database instances, specify a name for the created instance through the
+ database-id property. You may later use this name in the URI to refer to a database instance. The following fragment sets
+ up two instances:
+
+
+ With this example, the URI xmldb:test:///db selects the test database instance. Both instances should have their own data and
+ log directory as specified in the configuration files.
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_xmldb/listings/listing-1.txt b/src/main/xar-resources/data/devguide_xmldb/listings/listing-1.txt
index 332a062a..1d49ba97 100644
--- a/src/main/xar-resources/data/devguide_xmldb/listings/listing-1.txt
+++ b/src/main/xar-resources/data/devguide_xmldb/listings/listing-1.txt
@@ -1,2 +1 @@
-java -jar start.jar
- org.exist.examples.xmldb.Retrieve [- other options]
\ No newline at end of file
+java -jar start.jar org.exist.examples.xmldb.Retrieve [- other options]
\ No newline at end of file
diff --git a/src/main/xar-resources/data/devguide_xmlrpc/devguide_xmlrpc.xml b/src/main/xar-resources/data/devguide_xmlrpc/devguide_xmlrpc.xml
index 3feebb94..9a8094da 100644
--- a/src/main/xar-resources/data/devguide_xmlrpc/devguide_xmlrpc.xml
+++ b/src/main/xar-resources/data/devguide_xmlrpc/devguide_xmlrpc.xml
@@ -1,986 +1,816 @@
-
- Developer's Guide
- September 2009
-
- TBD
-
-
-
-
-
-
- Using the XML-RPC API
-
- XML-RPC (XML Remote Procedural Call) provides a simple way to call remote
- procedures from a wide variety of programming languages. eXist's XML-RPC API makes
- it easy to access eXist from other applications, CGI scripts, PHP, JSP and more. For
- more information on XML-RPC see www.xmlrpc.org. For the Java server, eXist uses the XML-RPC library created
- by Hannes Wallnoefer which recently has moved to Apache (see: http://xml.apache.org/xmlrpc). Perl
- examples use the RPC::XML package, which should be available at every CPAN mirror
- (see CPAN).
- The following is a small example, which shows how to talk to eXist-db from Java using
- the Apache XML-RPC library. This example can be found in samples/org/exist/examples/xmldb/Retrieve.java.
-
- Retrieving a document from eXist
-
-
- As shown above, the execute method of
- XmlRpcClient expects as its parameters a method (passed
- as a string) to call on the server and a Vector of parameters to pass to this
- executed method. In this example, the method
- getDocumentAsString is called as the first parameter, and a
- Vector params. Various output properties can also be set
- through the hashtable argument (see the method description below). Since all
- parameters are passed in a Vector, they are necessarily Java objects.
- XML-RPC messages (requests and responses sent between the server and client) are
- themselves XML documents. In some cases, these documents may use a character
- encoding which is in conflict with the encoding of the document we would like to
- receive. It is thus important to set the transport encoding to
- UTF-8 as shown in the above example. However, conflicts may
- persist depending on which client library is used. To avoid such conflicts, eXist
- provides alternative declarations for selected methods, which expect string
- parameters as byte arrays. The XML-RPC library will send them as binary data (using
- Base64 encoding for transport). With this approach, document encodings are preserved
- regardless of the character encoding used by the XML-RPC transport layer.
-
- Please note that the XML-RPC API uses int to encode booleans.
- This is because some clients do not correctly pass boolean parameters.
-
- Querying is as easy using XML-RPC. The following example:
-
- Sending a Query to eXist (XML-RPC)
-
-
- You will find the source code of this example in
- samples/xmlrpc/search2.pl. It uses the simple query method,
- which executes the query and returns a document containing the specified number of
- results. However, the result set is not cached on the server.
- The following example calls the executeQuery method,
- which returns a unique session id. In this case, the actual results are cached on
- the server and can be retrieved using the retrieve method.
-
- Another Query Examplet (XML-RPC)
-
-
-
-
-
-
-
- XML-RPC: Available Methods
-
- This section gives you an overview of the methods implemented by the eXist XML-RPC
- server. Only the most common methods are presented here. For a complete list see the
- Java interface RpcAPI.java.
- Note that the method signatures are presented below using Java data types. Also note
- that some methods like getDocument() and
- retrieve() accept a struct to specify optional output
- properties.
- In general, the following optional fields for methods are supported:
-
-
- indent
-
- Returns indented pretty-print XML. [yes | no]
-
-
-
- encoding
-
- Specifies the character encoding used for the output. If the method
- returns a string, only the XML declaration will be modified
- accordingly.
-
-
-
- omit-xml-declaration
-
- Add XML declaration to the head of the document. [yes |
- no]
-
-
-
- expand-xincludes
-
- Expand XInclude elements. [yes | no]
-
-
-
- process-xsl-pi
-
- Specifying "yes": XSL processing instructions in the document will be
- processed and the corresponding stylesheet applied to the output.
- [yes | no]
-
-
-
- highlight-matches
-
- Database adds special tags to highlight the strings in the text that
- have triggered a fulltext match. Set to "elements" to
- highlight matches in element values, "attributes" for
- attribute values or "both" for both elements and
- attributes.
-
-
-
- stylesheet
-
- Use this parameter to specify an XSL stylesheet which should be
- applied to the output. If the parameter contains a relative path, the
- stylesheet will be loaded from the database.
-
-
-
- stylesheet-param.key1 ... stylesheet-param.key2
-
- If a stylesheet has been specified with stylesheet,
- you can also pass it parameters. Stylesheet parameters are recognized if
- they start with the prefix stylesheet-param., followed
- by the name of the parameter. The leading
- "stylesheet-param." string will be removed before the
- parameter is passed to the stylesheet.
-
-
-
-
-
-
-
- Retrieving documents
-
-
-
- byte[] getDocument(String name, Hashtable parameters)
- String getDocumentAsString(String name, Hashtable parameters)
- Retrieves a document from the database.
-
- Parameters:
-
-
- name
-
- Path of the document to be retrieved (e.g.
- /db/shakespeare/plays/r_and_j.xml).
-
-
-
- parameters
-
- A struct containing key=value pairs for
- configuring the output.
-
-
-
-
-
- Hashtable getDocumentData(String name, Hashtable parameters)
- Hashtable getNextChunk(String handle, Int offset)
- Hashtable getNextExtendedChunk(String handle, String offset)
- To retrieve a document from the database, but limit the number of
- bytes transmitted in one chunk to avoid memory shortage on the server,
- use the following:
-
- getDocumentData() returns a struct containing
- the following fields: data,
- handle, offset, supports-long-offset.
- data contains the document's data (as
- byte[]) or the first chunk of data if the document
- size exceeds the predefined internal limit.
- handle and offset can be
- passed to getNextChunk() or getNextExtendedChunk()
- to retrieve the remaining data chunks.
- supports-long-offset, when available, tells whether the server understands
- getNextExtendedChunk() method.
-
- If offset is 0, no more chunks are available
- and all of the data is already contained in the
- data field. Otherwise, further chunks can be
- retrieved by passing the handle and the offset (as returned by the last
- call) to getNextChunk() or getNextExtendedChunk().
- Once the last chunk is read, offset will be 0 and the handle becomes
- invalid.
-
- getNextChunk() and getNextExtendedChunk() do
- more or less the same, but with the difference that getNextExtendedChunk()
- does not have the 2GB limitation in offset. As previous eXist servers could not
- implement it, you must take into account the supports-long-offset parameter from
- getDocumentData() returned structure.
-
-
- Parameters:
-
-
- name
-
- Path of the document to be retrieved (e.g.
- /db/shakespeare/plays/r_and_j.xml).
-
-
-
- parameters
-
- A struct containing key=value pairs to
- configure the output.
-
-
-
- handle
-
- The handle returned by the call to
- getDocumentData(). This
- identifies a temporary file on the server to be read.
-
-
-
- offset
-
- The data offset in the document at which the next chunk in
- the sequence will be read.
-
-
-
-
-
-
-
-
-
-
- Storing Documents
-
-
-
- boolean parse(byte[] xml, String docName, int overwrite)
- boolean parse(byte[] xml, String docName)
- Inserts a new document into the database or replace an existing one:
-
- Parameters:
-
-
- xml
-
- XML content of this document as a UTF-8 encoded byte
- array.
-
-
-
- docName
-
- Path to the database location where the new document is to
- be stored.
-
-
-
- overwrite
-
- Set this value to > 0 to automatically replace an
- existing document at the same location.
-
-
-
-
-
- String upload(byte[] chunk, int length)
- String upload(String file, byte[] chunk, int length)
- boolean parseLocal(String localFile, String docName, boolean replace)
- Uploads an entire document on to the database before parsing it.
- While the parse method receives the document as a large single chunk,
- the upload method allows you to upload the whole document to the server
- before parsing. This way, out-of-memory exceptions
- can be avoided, since the document is not entirely kept in the main
- memory. To identify the file on the server, upload returns an identifier
- string. After uploading all chunks, you can call
- parseLocal and pass it this identifier string
- as the first argument.
-
- Parameters:
-
-
- file
-
- The name of the file to which the uploaded chunk is
- appended. This is the name of a temporary file on the
- server. Use the two-argument version of upload for the first
- chunk. The method creates a temporary file and returns its
- name. On subsequent calls to this chunk, pass this
- name.
-
-
-
- chunk
-
- A byte array containing the data to be appended.
-
-
-
- length
-
- Defines the number of bytes to be read from chunk.
-
-
-
- localFile
-
- The name of the local file on the server that is to be
- stored in the database. This should be the same as the name
- returned by upload.
-
-
-
- docName
-
- The full path specifying the location where the document
- should be stored in the database.
-
-
-
- replace
-
- Set this to true if an existing document
- with the same name should be automatically
- overwritten.
-
-
-
-
-
-
-
-
-
-
- Creating a Collection
-
-
-
- boolean createCollection(String name)
- Creates a new collection
-
- Parameters:
-
-
- name
-
- Path to the new collection.
-
-
-
-
-
-
-
-
-
-
- Removing Documents or Collections
-
-
-
- boolean remove(String docName)
- Removes a document from the database.
-
- Parameters:
-
-
- docName
-
- The full path to the database document.
-
-
-
-
-
- boolean removeCollection( String collection)
- Removes a collection from the database (including all of its documents
- and sub-collections).
-
- Parameters:
-
-
- collection
-
- The full path to the collection.
-
-
-
-
-
-
-
-
-
-
- Querying
-
-
-
- int executeQuery(String xquery, HashMap parameters)
- int executeQuery(byte[] xquery, HashMap parameters)
- int executeQuery(byte[] xquery, String encoding, HashMap parameters)
- Executes an XQuery and returns a reference identifier to the generated
- result set. This reference can be used later to retrieve results.
-
- Parameters:
-
-
- xquery
-
- A valid XQuery expression.
-
-
-
- parameters
-
- The parameters a HashMap values.
- sort-expr :
- namespaces :
- variables :
- base-uri :
- static-documents :
- protected :
-
-
-
- encoding
-
- The character encoding used for the query string.
-
-
-
-
-
- Hashtable querySummary(int result-Id)
- Returns a summary of query results for the result-set referenced by
- result-Id.
- The result-Id value is taken from a previous
- call to executeQuery (See above). The
- querySummary method returns a struct with
- the following fields: queryTime,
- hits, documents,
- doctype.
-
- queryTime and hits are
- integer values that describe the processing time in milliseconds for the
- query execution and the number of hits in the result-set respectively.
- The field documents is an array of an array (i.e.
- Object[][3]) that represents a table in which each
- row identifies one document. The first field in each row contains the
- document-id (integer value). The second has
- the document's name as a string value. The third contains the number of
- hits found in this document (integer value).
- Thedoctype field is also an array of an array
- (Object[][2]) that contains the doctype public
- identifier and the number of hits found for this
- doctype in each row.
-
- Parameters:
-
-
- resultId
-
- Reference to a result-set as returned by a previous call
- to executeQuery.
-
-
-
-
-
- byte[] retrieve(int resultId, int pos, Hashtable parameters)
- Retrieves a single result-fragment from the result-set referenced by
- resultId. The result-fragment is identified
- by its position in the result-set, which is passed in the parameter
- pos.
-
- Parameters:
-
-
- resultId
-
- Reference to a result-set as returned by a previous call
- to executeQuery.
-
-
-
- pos
-
- The position of the item in the result-sequence, starting
- at 0.
-
-
-
- parameters
-
- A struct containing key=value pairs to
- configure the output.
-
-
-
-
-
- Hashtable retrieveFirstChunk(int resultId, int pos, Hashtable parameters)
- Retrieves a single result-fragment from the result-set referenced by
- resultId, but limiting the number of
- bytes transmitted in one chunk to avoid memory shortage on the server.
- The result-fragment is identified by its position in the result-set,
- which is passed in the parameter pos. It returns
- the same structure as getDocumentData(), and its
- fields behaves the same, so next chunks must be fetched using either
- getNextChunk() or getNextExtendedChunk()
- (see getDocumentData() documentation for further details).
-
- Parameters:
-
-
- resultId
-
- Reference to a result-set as returned by a previous call
- to executeQuery.
-
-
-
- pos
-
- The position of the item in the result-sequence, starting
- at 0.
-
-
-
- parameters
-
- A struct containing key=value pairs to
- configure the output.
-
-
-
-
-
- int getHits(int resultId)
- Get the number of hits in the result-set identified by
- resultId.
-
- Parameters:
-
-
- resultId
-
- Reference to a result-set as returned by a previous call
- to executeQuery.
-
-
-
-
-
- String query(byte[] xquery, int howmany, int start, Hashtable parameters)
- Executes an XQuery expression and returns a specified subset of the
- results. This method will directly return a subset of the
- result-sequence, starting at start, as a new XML
- document. The number of results returned is determined by parameter
- howmany. The result-set will be deleted on
- the server, so later calls to this method will again execute the query.
-
- Parameters:
-
-
- xquery
-
- An XQuery expression.
-
-
-
- start
-
- The position of the first item to be retrieved from the
- result-sequence.
-
-
-
- howmany
-
- The maximum number of items to retrieve.
-
-
-
- parameters
-
- A struct containing key=value pairs to
- configure the output.
-
-
-
-
-
- void releaseQueryResult(int resultId)
- Forces the result-set identified by its result id to be released on
- the server.
-
-
-
-
-
-
-
- Retrieving Information on Collections and Documents
-
-
-
- Hashtable describeCollection(String collection)
- Returns a struct describing a specified collection.
- The returned struct has the following fields:
- name (the collection path),
- owner (identifies the collection owner),
- group (identifies the group that owns the
- collection), created (the creation date of the
- collection expressed as a long value),
- permissions (the active permissions that apply to
- the collection as an integer value).
-
- collections is an array listing the names of
- available sub-collections in this collection.
-
- Parameters:
-
-
- collection
-
- The full path to the collection.
-
-
-
-
-
- Hashtable describeResource(String resource)
- Returns a struct describing a specified resource.
- The returned struct has the following fields:
- name (the collection path),
- owner (identifies the collection owner),
- group (identifies the group that owns the
- collection), created (the creation date of the
- collection expressed as a long value),
- permissions (the active permissions that apply to
- the collection as an integer value), type (either
- XMLResource for XML documents or
- BinaryResource for binary files),
- content-length (the estimated size of the
- resource in bytes). The content-length is based
- on the number of pages occupied by the resource in the DOM storage. For
- binary resources, the value will always be 0.
-
-
- Hashtable getCollectionDesc(String collection)
- Returns a struct describing a collection.
- The returned struct has the following fields:
- name (the collection path),
- owner (identifies the collection owner),
- group (identifies the group that owns the
- collection), created (the creation date of the
- collection expressed as a long value),
- permissions (the active permissions that apply to
- the collection as an integer value).
-
- collections is an array listing the names of
- available sub-collections in this collection.
-
- documents is an array listing information on
- all of the documents in this collection. Each item in the array is a
- struct with the following fields: name, owner, group, permissions, type.
- The type field contains a string describing the type of the resource:
- either XMLResourceor BinaryResource.
-
- Parameters:
-
-
- collection
-
- The full path to the collection.
-
-
-
-
-
-
-
-
-
-
- XUpdate
-
-
-
- int xupdate(String collectionName, byte[] xupdate)
- int xupdateResource(String documentName, byte[] xupdate)
- Applies a set of XUpdate modifications to a collection or document.
-
-
- collectionName
-
- The full path to the collection to which the XUpdate
- modifications should be applied.
-
-
-
- documentName
-
- The full path to the document to which the XUpdate
- modifications should be applied.
-
-
-
- xupdate
-
- The XUpdate document containing the modifications. This
- should be send as an UTF-8 encoded binary
- array.
-
-
-
-
-
-
-
-
-
-
- Managing Users and Permissions
-
-
-
- boolean setUser(String name, String passwd, String digestPasswd, Vector groups)
- boolean setUser(String name, String passwd, String digestPasswd, Vector groups, String home)
- Modifies or creates a database user.
-
- Parameters:
-
-
- name
-
- Username value.
-
-
-
- passwd
-
- The plain-text password for the user.
-
-
-
- digestPasswd
-
- The md5 encoded password for the user.
-
-
-
- groups
-
- A vector of groups assigned to the user. The first group
- in the vector will become the user's primary group.
-
-
-
- home
-
- An optional setting for the user's home collection path.
- The collection will be created if it does not exist, and
- provides the user with full access.
-
-
-
-
-
- boolean setPermissions(String resource, String permissions)
- boolean setPermissions(String resource, int permissions)
- boolean setPermissions(String resource, String owner, String ownerGroup, String permissions)
- boolean setPermissions(String resource, String owner, String ownerGroup, int permissions)
- Sets the permissions assigned to a given collection or document.
-
-
- resource
-
- The full path to the collection or document on which the
- specified permissions will be set. The method first checks
- if the specified path points to a collection or
- resource.
-
-
-
- owner
-
- The name of the user owning this resource.
-
-
-
- ownerGroup
-
- The name of the group owning this resource.
-
-
-
- permissions
-
- The permissions assigned to the resource, which can be
- specified either as an integer value constructed using the
- Permission class, or using a modification
- string. The bit encoding of the integer value corresponds to
- Unix conventions. The modification string has the following
- syntax:
- [user|group|other]=[+|-][read|write|update][, ...]
-
-
-
-
-
- Hashtable getPermissions(String resource)
- Returns the active permissions for the specified document or
- collection.
- The returned struct has the following fields:
- name (the collection path),
- owner (identifies the collection owner),
- group (identifies the group that owns the
- collection), created (the creation date of the
- collection expressed as a long value),
- permissions (the active permissions that apply to
- the collection as an integer value).
-
-
- boolean removeUser(String name)
- Removes the identified user.
-
-
- Hashtable getUser(String name)
- Returns a struct describing the user identified by its name.
- The returned struct has the following fields:
- name (the collection path),
- home (identifies the user's home directory),
- groups (an array specifying all groups to
- which the user belongs).
-
-
- Vector getUsers()
- Returns a list of all users currently known to the system.
- Each user in the list is described by the same struct returned by the
- getUser() method.
-
-
- Vector getGroups()
- Returns a list of all group names (as string values) currently
- defined.
-
-
-
-
-
-
-
- Access to the Index Contents
-
- The following methods provide access to eXist's internal index structure.
-
-
- Vector getIndexedElements(String collectionName, boolean inclusive)
- Returns a list (i.e. array[][4]) of all indexed element names for the
- specified collection.
- For each element, an array of four items is returned:
-
-
- name of the element
-
-
- optional namespace URI
-
-
- optional namespace prefix
-
-
- number of occurrences of this element as an integer
- value
-
-
-
-
- collectionName
-
- The full path to the collection.
-
-
-
- inclusive
-
- If set to true, the sub-collections of the
- specified collection will be included into the
- result.
-
-
-
-
-
- Vector scanIndexTerms(String collectionName, String start, String end, boolean inclusive)
- Return a list (array[][2]) of all index terms contained in the
- specified collection.
- For each term, an array with two items is returned:
-
-
- the term itself
-
-
- number occurrences of the term in the specified
- collection
-
-
-
-
- collectionName
-
- The full path to the collection.
-
-
-
- start
-
- The start position for the returned range expressed as a
- string value. Returned index terms are positioned after the
- start position in ascending, alphabetical order.
-
-
-
- end
-
- The end position for the returned range expressed as a
- string value. Returned index terms are positioned before the
- end position in ascending, alphabetical order.
-
-
-
- inclusive
-
- If set totrue, sub-collections of the
- specified collection will be included into the
- result.
-
-
-
-
-
-
-
-
-
-
- Other Methods
-
-
-
- boolean shutdown()
- Shuts down the database engine. All dirty pages are written to
- disk.
-
-
- boolean sync()
- Causes the database to write all dirty pages to disk.
-
-
-
-
+
+ XML-RPC API Developer's Guide
+ 1Q18
+
+ java-development
+ interfaces
+
+
+
+
+
+ This article explains how to interface with eXist-db using the XML-RPC API. This API can be used to access eXIst-db from multiple languages
+ and environments.
+
+
+
+
+ Introduction
+
+ XML-RPC (XML Remote Procedural Call) provides a simple way to access eXist-db by calling remote procedures from a wide variety of
+ programming languages and environments, like CGI scripts, PHP, JSP and more.
+ For a Java server, eXist uses the XML-RPC library created by Hannes Wallnoefer which recently has moved to Apache (see: http://xml.apache.org/xmlrpc).
+ Perl examples use the RPC::XML package, which should be available at every CPAN mirror (see CPAN).
+ The following is a small example, which shows how to talk to eXist-db from Java using the Apache XML-RPC library. This example can be found
+ in samples/org/exist/examples/xmldb/Retrieve.java.
+
+
+ As shown above, the execute method of XmlRpcClient expects as its parameters a method (passed as a
+ string) to call on the server and a Vector of parameters to pass to this executed method. In this example, the method
+ getDocumentAsString is called as the first parameter, and a Vector params. Various output properties can also
+ be set through the hashtable argument (see the method description below). Since all parameters are passed in a Vector,
+ they are necessarily Java objects.
+
+ XML-RPC messages (requests and responses sent between the server and client) are themselves XML documents. In some cases, these documents
+ may use a character encoding which is in conflict with the encoding of the document we would like to receive. It is therefore important to set
+ the transport encoding to UTF-8 as shown in the example above. However, conflicts may persist
+ depending on which client library is used. To avoid such conflicts, eXist provides alternative declarations for selected methods, which expect
+ string parameters as byte arrays. The XML-RPC library will send them as binary data (using Base64 encoding for transport). With this approach,
+ document encodings are preserved regardless of the character encoding used by the XML-RPC transport layer.
+
+
+ The XML-RPC API uses int to encode booleans. This is because some clients do not correctly pass boolean
+ parameters.
+
+ Querying is easy using XML-RPC:
+
+
+ You will find the source code of this example in samples/xmlrpc/search2.pl. It uses the simple query method, which
+ executes the query and returns a document containing the specified number of results. The result set is not cached on the
+ server.
+ The following example calls the executeQuery method, which returns a unique session id. In this case, the actual results
+ are cached on the server and can be retrieved using the retrieve method.
+
+
+
+
+
+
+
+ Available Methods
+
+ This section gives you an overview of the methods implemented by eXist-db's XML-RPC server. Only the most common methods are presented here.
+ For a complete list see the Java interface RpcAPI.java.
+
+ Method signatures are presented using Java data types. Some methods like getDocument() and
+ retrieve() accept a struct to specify optional output properties.
+
+ The following general fields for methods are supported:
+
+
+ indent
+
+ Returns indented pretty-print XML. [yes | no]
+
+
+
+ encoding
+
+ Specifies the character encoding used for the output. If the method returns a string, only the XML declaration will be modified
+ accordingly.
+
+
+
+ omit-xml-declaration
+
+ Add XML declaration to the head of the document. [yes | no]
+
+
+
+ expand-xincludes
+
+ Expand XInclude elements. [yes | no]
+
+
+
+ process-xsl-pi
+
+ Specifying yes XSL processing instructions in the document will be processed and the corresponding stylesheet applied to
+ the output. [yes | no]
+
+
+
+ highlight-matches
+
+ The database will add special tags to highlight the strings in the text that have triggered a fulltext match. Set to
+ elements to highlight matches in element values, attributes for attribute values or
+ both for both elements and attributes.
+
+
+
+ stylesheet
+
+ Use this parameter to specify an XSL stylesheet which should be applied to the output. If the parameter contains a relative path, the
+ stylesheet will be loaded from the database.
+
+
+
+ stylesheet-param.key1 ... stylesheet-param.key2
+
+ If a stylesheet has been specified with stylesheet, you can pass it parameters. Stylesheet parameters are
+ recognized if they start with the prefix stylesheet-param., followed by the name of the parameter. The leading
+ stylesheet-param. string will be removed before the parameter is passed to the stylesheet.
+
+
+
+
+
+
+
+ Retrieving documents
+
+
+
+ byte[] getDocument(String name, Hashtable parameters)
+ String getDocumentAsString(String name, Hashtable parameters)
+ Retrieves a document from the database.
+
+ Parameters:
+
+
+ name
+
+ Path of the document to be retrieved (e.g. /db/shakespeare/plays/r_and_j.xml).
+
+
+
+ parameters
+
+ A struct containing key=value pairs for configuring the output.
+
+
+
+
+
+ Hashtable getDocumentData(String name, Hashtable parameters)
+ Hashtable getNextChunk(String handle, Int offset)
+ Hashtable getNextExtendedChunk(String handle, String offset)
+ These methods retrieve a document from the database, but limit the number of bytes transmitted in one chunk to avoid memory shortage
+ on the server.
+
+ getDocumentData() returns a struct containing the following fields: data,
+ handle, offset, supports-long-offset. data contains the
+ document's data (as byte[]) or the first chunk of data if the document size exceeds the predefined internal limit.
+ handle and offset can be passed to getNextChunk() or
+ getNextExtendedChunk() to retrieve the remaining data chunks.
+
+ If offset is 0, no more chunks are available and all of the data is already contained in the
+ data field. Otherwise, further chunks can be retrieved by passing the handle and the offset (as returned by the last
+ call) to getNextChunk() or getNextExtendedChunk(). Once the last chunk is read,
+ offset will be 0 and the handle becomes invalid.
+ supports-long-offset, when available, tells whether the server understands
+ getNextExtendedChunk() method: getNextChunk() and getNextExtendedChunk() do
+ more or less the same, but with the difference that getNextExtendedChunk() does not have the 2GB limitation in
+ offset., which previous eXist servers could not handle. Use the supports-long-offset parameter
+ from getDocumentData() returned structure to verify this.
+
+
+ name
+
+ Path of the document to be retrieved (e.g. /db/shakespeare/plays/r_and_j.xml).
+
+
+
+ parameters
+
+ A struct containing key=value pairs to configure the output.
+
+
+
+ handle
+
+ The handle returned by the call to getDocumentData() (this identifies a temporary file on the server).
+
+
+
+ offset
+
+ The data offset in the document at which the next chunk in the sequence will be read.
+
+
+
+
+
+
+
+
+
+
+ Storing Documents
+
+
+
+ boolean parse(byte[] xml, String docName, int overwrite)
+ boolean parse(byte[] xml, String docName)
+ Insert a new document into the database or replace an existing one.
+
+
+ xml
+
+ XML content of this document as a UTF-8 encoded byte array.
+
+
+
+ docName
+
+ Path to the database location where the new document is to be stored.
+
+
+
+ overwrite
+
+ Set this value to > 0 to automatically replaces an existing document at the same location.
+
+
+
+
+
+ String upload(byte[] chunk, int length)
+ String upload(String file, byte[] chunk, int length)
+ boolean parseLocal(String localFile, String docName, boolean replace)
+ Uploads an entire document on to the database before parsing it.
+ The parse method receives the document as a large single chunk, but the upload method allows you to upload
+ the whole document to the server before parsing. Doing it this way out-of-memory exceptions can
+ be avoided since the document is not entirely kept in memory. To identify the file on the server, upload returns an identifier string.
+ After uploading all chunks, you can call parseLocal and pass it this identifier string as its first argument.
+
+
+ file
+
+ The name of the file to which the uploaded chunk is appended. This is the name of a temporary file on the server. Use the
+ two-argument version of upload for the first chunk. The method creates a temporary file and returns its name.
+
+
+
+ chunk
+
+ A byte array containing the data to be appended.
+
+
+
+ length
+
+ Defines the number of bytes to be read from chunk.
+
+
+
+ localFile
+
+ The name of the local file on the server that is to be stored in the database. This should be the same as the name returned by
+ upload.
+
+
+
+ docName
+
+ The full path specifying the location where the document should be stored in the database.
+
+
+
+ replace
+
+ Set this to true if an existing document with the same name should be automatically overwritten.
+
+
+
+
+
+
+
+
+
+
+ Creating a Collection
+
+
+
+ boolean createCollection(String name)
+ Creates a new collection.
+
+
+ name
+
+ Path to the new collection.
+
+
+
+
+
+
+
+
+
+
+ Removing Documents or Collections
+
+
+
+ boolean remove(String docName)
+ Removes a document from the database.
+
+
+ docName
+
+ The full path to the database document.
+
+
+
+
+
+ boolean removeCollection( String collection)
+ Removes a collection from the database (including all of its documents and sub-collections).
+
+
+ collection
+
+ The full path to the collection.
+
+
+
+
+
+
+
+
+
+
+ Querying
+
+
+
+ int executeQuery(String xquery, HashMap parameters)
+ int executeQuery(byte[] xquery, HashMap parameters)
+ int executeQuery(byte[] xquery, String encoding, HashMap parameters)
+ Executes an XQuery and returns a reference identifier to the generated result set. This reference can be used to retrieve the
+ results.
+
+
+ xquery
+
+ A valid XQuery expression.
+
+
+
+ parameters
+
+ The parameters as HashMap values (sort-expr, namespaces, variables,
+ base-uri, static-documents, protected).
+
+
+
+ encoding
+
+ The character encoding used for the query string.
+
+
+
+
+
+ Hashtable querySummary(int result-Id)
+ Returns a summary of query results for the result-set referenced by result-Id.
+ The result-Id value is taken from a previous call to executeQuery (see above). The
+ querySummary method returns a struct with the following fields: queryTime,
+ hits, documents, doctype.
+
+ queryTime and hits are integer values that describe the processing time in milliseconds for the
+ query execution and the number of hits in the result-set respectively. The field documents is an array of an array
+ (i.e. Object[][3]) that represents a table in which each row identifies one document. The first field in each row
+ contains the document-id (integer value). The second has the document's name as a string value. The third contains the
+ number of hits found in this document (integer value).
+ Thedoctype field is also an array of an array (Object[][2]) that contains the doctype public
+ identifier and the number of hits found for this doctype in each row.
+
+
+ resultId
+
+ Reference to a result-set as returned by a previous call to executeQuery.
+
+
+
+
+
+ byte[] retrieve(int resultId, int pos, Hashtable parameters)
+ Retrieves a single result-fragment from the result-set referenced by resultId. The result-fragment is identified by
+ its position in the result-set.
+
+
+ resultId
+
+ Reference to a result-set as returned by a previous call to executeQuery.
+
+
+
+ pos
+
+ The position of the item in the result-sequence, starting at 0.
+
+
+
+ parameters
+
+ A struct containing key=value pairs to configure the output.
+
+
+
+
+
+ Hashtable retrieveFirstChunk(int resultId, int pos, Hashtable parameters)
+ Retrieves a single result-fragment from the result-set referenced by resultId, but limiting the number of bytes
+ transmitted in one chunk to avoid memory shortage on the server. The result-fragment is identified by its position in the result-set,
+ which is passed in the parameter pos. It returns the same structure as getDocumentData(), and its
+ fields behaves the same, so next chunks must be fetched using either getNextChunk() or
+ getNextExtendedChunk() (see getDocumentData() documentation for further details).
+
+
+ resultId
+
+ Reference to a result-set as returned by a previous call to executeQuery.
+
+
+
+ pos
+
+ The position of the item in the result-sequence, starting at 0.
+
+
+
+ parameters
+
+ A struct containing key=value pairs to configure the output.
+
+
+
+
+
+ int getHits(int resultId)
+ Get the number of hits in the result-set identified by resultId.
+
+
+ resultId
+
+ Reference to a result-set as returned by a previous call to executeQuery.
+
+
+
+
+
+ String query(byte[] xquery, int howmany, int start, Hashtable parameters)
+ Executes an XQuery expression and returns a specified subset of the results. This method will directly return a subset of the
+ result-sequence, starting at start, as a new XML document. The number of results returned is determined by parameter
+ howmany. The result-set will be deleted on the server, so later calls to this method will again execute the
+ query.
+
+
+ xquery
+
+ An XQuery expression.
+
+
+
+ start
+
+ The position of the first item to be retrieved from the result-sequence.
+
+
+
+ howmany
+
+ The maximum number of items to retrieve.
+
+
+
+ parameters
+
+ A struct containing key=value pairs to configure the output.
+
+
+
+
+
+ void releaseQueryResult(int resultId)
+ Forces the result-set identified by its result id to be released on the server.
+
+
+
+
+
+
+
+ Retrieving Information on Collections and Documents
+
+
+
+ Hashtable describeCollection(String collection)
+ Returns a struct describing a specified collection.
+ The returned struct has the following fields: name (the collection path), owner
+ (identifies the collection owner), group (identifies the group that owns the collection), created
+ (the creation date of the collection expressed as a long value), permissions (the active permissions that apply to the
+ collection as an integer value).
+
+ collections is an array listing the names of available sub-collections in this collection.
+
+
+ collection
+
+ The full path to the collection.
+
+
+
+
+
+ Hashtable describeResource(String resource)
+ Returns a struct describing a specified resource.
+ The returned struct has the following fields: name (the collection path), owner
+ (identifies the collection owner), group (identifies the group that owns the collection), created
+ (the creation date of the collection expressed as a long value), permissions (the active permissions that apply to the
+ collection as an integer value), type (either XMLResource for XML documents or
+ BinaryResource for binary files), content-length (the estimated size of the resource in bytes).
+ The content-length is based on the number of pages occupied by the resource in the DOM storage. For binary resources,
+ the value will always be 0.
+
+
+ Hashtable getCollectionDesc(String collection)
+ Returns a struct describing a collection.
+ The returned struct has the following fields: name (the collection path), owner (identifies the
+ collection owner), group (identifies the group that owns the collection), created (the creation date
+ of the collection expressed as a long value), permissions (the active permissions that apply to the collection as an
+ integer value).
+
+ collections is an array listing the names of available sub-collections in this collection.
+
+ documents is an array listing information on all of the documents in this collection. Each item in the array is a
+ struct with the following fields: name, owner, group, permissions, type. The type field contains a string describing the type of the
+ resource: either XMLResourceor BinaryResource.
+
+
+ collection
+
+ The full path to the collection.
+
+
+
+
+
+
+
+
+
+
+ XUpdate
+
+
+
+ int xupdate(String collectionName, byte[] xupdate)
+ int xupdateResource(String documentName, byte[] xupdate)
+ Applies a set of XUpdate modifications to a collection or document.
+
+
+ collectionName
+
+ The full path to the collection to which the XUpdate modifications should be applied.
+
+
+
+ documentName
+
+ The full path to the document to which the XUpdate modifications should be applied.
+
+
+
+ xupdate
+
+ The XUpdate document containing the modifications. This should be send as an UTF-8 encoded binary
+ array.
+
+
+
+
+
+
+
+
+
+
+ Managing Users and Permissions
+
+
+
+ boolean setUser(String name, String passwd, String digestPasswd, Vector groups)
+ boolean setUser(String name, String passwd, String digestPasswd, Vector groups, String home)
+ Modifies or creates a database user.
+
+
+ name
+
+ Username .
+
+
+
+ passwd
+
+ The plain-text (!) password for the user.
+
+
+
+ digestPasswd
+
+ The md5 encoded password for the user.
+
+
+
+ groups
+
+ A vector of groups assigned to the user. The first group in the vector will become the user's primary group.
+
+
+
+ home
+
+ An optional setting for the user's home collection path. The collection will be created if it does not exist, and provides the
+ user with full access.
+
+
+
+
+
+ boolean setPermissions(String resource, String permissions)
+ boolean setPermissions(String resource, int permissions)
+ boolean setPermissions(String resource, String owner, String ownerGroup, String permissions)
+ boolean setPermissions(String resource, String owner, String ownerGroup, int permissions)
+ Sets the permissions assigned to a given collection or document.
+
+
+ resource
+
+ The full path to the collection or document on which the specified permissions will be set. The method first checks if the
+ specified path points to a collection or resource.
+
+
+
+ owner
+
+ The name of the user owning this resource.
+
+
+
+ ownerGroup
+
+ The name of the group owning this resource.
+
+
+
+ permissions
+
+ The permissions assigned to the resource, which can be specified either as an integer value constructed using the
+ Permission class, or using a modification string. The bit encoding of the integer value corresponds to Unix
+ conventions. The modification string has the following syntax:
+ [user|group|other]=[+|-][read|write|update][, ...]
+
+
+
+
+
+ Hashtable getPermissions(String resource)
+ Returns the active permissions for the specified document or collection.
+ The returned struct has the following fields: name (the collection path), owner (identifies the
+ collection owner), group (identifies the group that owns the collection), created (the creation date
+ of the collection expressed as a long value), permissions (the active permissions that apply to the collection as an
+ integer value).
+
+
+ boolean removeUser(String name)
+ Removes the identified user.
+
+
+ Hashtable getUser(String name)
+ Returns a struct describing the user identified by its name.
+ The returned struct has the following fields: name (the collection path), home (identifies the
+ user's home directory), groups (an array specifying all groups to which the user belongs).
+
+
+ Vector getUsers()
+ Returns a list of all users currently known to the system.
+ Each user in the list is described by the same struct returned by the getUser() method.
+
+
+ Vector getGroups()
+ Returns a list of all group names (as string values) currently defined.
+
+
+
+
+
+
+
+ Access to the Index Contents
+
+ The following methods provide access to eXist's internal index structure.
+
+
+ Vector getIndexedElements(String collectionName, boolean inclusive)
+ Returns a list (i.e. array[][4]) of all indexed element names for the specified collection.
+ For each element, an array of four items is returned:
+
+
+ name of the element
+
+
+ optional namespace URI
+
+
+ optional namespace prefix
+
+
+ number of occurrences of this element as an integer value
+
+
+
+
+ collectionName
+
+ The full path to the collection.
+
+
+
+ inclusive
+
+ If set to true, the sub-collections of the specified collection will be included into the result.
+
+
+
+
+
+ Vector scanIndexTerms(String collectionName, String start, String end, boolean inclusive)
+ Return a list (array[][2]) of all index terms contained in the specified collection.
+ For each term, an array with two items is returned:
+
+
+ the term itself
+
+
+ number occurrences of the term in the specified collection
+
+
+
+
+ collectionName
+
+ The full path to the collection.
+
+
+
+ start
+
+ The start position for the returned range expressed as a string value. Returned index terms are positioned after the start
+ position in ascending, alphabetical order.
+
+
+
+ end
+
+ The end position for the returned range expressed as a string value. Returned index terms are positioned before the end position
+ in ascending, alphabetical order.
+
+
+
+ inclusive
+
+ If set totrue, sub-collections of the specified collection will be included into the result.
+
+
+
+
+
+
+
+
+
+
+ Other Methods
+
+
+
+ boolean shutdown()
+ Shuts down the database engine. All dirty pages are written to disk.
+
+
+ boolean sync()
+ Causes the database to write all dirty pages to disk.
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/documentation/documentation.xml b/src/main/xar-resources/data/documentation/documentation.xml
index ff5694d2..15ff047f 100644
--- a/src/main/xar-resources/data/documentation/documentation.xml
+++ b/src/main/xar-resources/data/documentation/documentation.xml
@@ -1,6 +1,6 @@
-
+
+
+Documentation1Q18
@@ -9,16 +9,16 @@
- Welcome to eXist-db. This article serves as an index to the eXist-db documentation articles, which will help you getting to know, install and
- use eXist-db.
+ Welcome to eXist-db. This article serves as an index to the eXist-db documentation articles, which will help you getting to know, install and use eXist-db.
- eXist-db's documentation is currently being re-factored. Articles are checked, (partly) rewritten and/or reorganized. If everything goes
- according to plan this will take until the end of 1Q18.
- As a consequence, the entry page (which you are looking at now) is currently incomplete. The subject related sections
- below point to articles that have been through at least one re-factoring pass (re-factoring is a multi-pass process). During 1Q18 this page will
- grow to include all articles and the quality of the articles will rise.
- To allow access to all (also not yet re-factored) articles there is an and a , which both include all the available articles.
+ eXist-db's documentation is currently being re-factored. Articles are checked, (partly) rewritten and/or reorganized. If everything goes according to plan this will take until the end of 1Q18.
+ As a consequence, the entry page (which you are looking at now) is currently
+ incomplete. The subject related sections below point to articles that have been through at least one re-factoring pass (re-factoring is a multi-pass process). During 1Q18 this page will grow to include all articles and the quality of the articles will rise.
+ To allow access to all (also not yet re-factored) articles there is an
+
+ and a
+ , which both include all the available articles.
@@ -30,15 +30,27 @@
- Basic Installation: How to do a basic installation of eXist-db and fire it up for the first
- time.
+ Basic Installation: How to do a basic installation of eXist-db and fire it up for the first time.
- For the first steps with your freshly installed eXist-db, watch the screencasts available on the eXist-db homepage.
+ For the first steps with your freshly installed eXist-db, watch the screencasts available on the
+ eXist-db homepage.
- There is a whole book about eXist, written for both the
- novice and the more experienced user.
+ There is a whole
+ book about eXist, written for both the novice
+ and
+ the more experienced user.
+
+
+
+ Getting Help
+ will tell you where to go for help and advice.
+
+
+
+
+ Dashboard: Learn how to use and populate eXist-db's main user interface, the dashboard.
@@ -48,6 +60,24 @@
Uploading files: Learn how to get files in and populate your database.
+
+
+ Learning XQuery
+ provides tips and resources for newcomers to XQuery and eXist-db
+
+
+ If you're using
+ oXygen, this article describes
+ how to use oXygen together with eXist-db.
+
+
+ If you think you found a bug, this article will tell you
+ how to report an issue.
+
+
+ If you think you found a bug, this article will tell you
+ how to report an issue.
+
@@ -55,13 +85,23 @@
XQuery
- eXist-db's main programming language is XQuery. This documentation set does not contain a full introduction to XQuery. For this read the
- excellent book about XQuery by Priscilla Walmsley. The XQuery related
- articles below discuss specific eXist-db related details or shed light on some of the lesser known features of the language.
+ eXist-db's main programming language is XQuery. This documentation set does not contain a full introduction to XQuery. For this read the excellent
+ book about XQuery by Priscilla Walmsley. The XQuery related articles below discuss specific eXist-db related details or shed light on some of the lesser known features of the language.
- Debugger will tell you how to debug XQuery code on the server using Emacs or VIM.
+ Learning XQuery
+ provides tips and resources for newcomers to XQuery and eXist-db
+
+
+
+ KWIC (Keywords In Context)
+ will learn you how to display search results in context (parts of the document surrounding the search match).
+
+
+
+ Debugger
+ will tell you how to debug XQuery code on the server using Emacs or VIM.
@@ -71,18 +111,81 @@
Application development
- eXist-db is not only a database but also an excellent application development platform. The following articles will help you find your way
- in this:
+ eXist-db is not only a database but also an excellent application development platform. The following articles will help you find your way in this:
- The beginners guide to XRX will learn you how to create a simple application using XRX
- (XForms, REST, XQuery).
+ Getting Started with Web Application Development
+ will help you build a basic web application using the built-in
+ HTML templating framework.
+
+
+
+ Indexing
+ will provide you with an overview of eXist-db's indexes and how to configure them. More about indexing in:
+
+
+
+ Full Text indexing
+ will provide with all the information necessary to use eXist-db's Lucene based full-text indexing.
+
+
+
+ N-Gram Index
+ provides information on how to configure the
+ ngram
+ index.
+
+
+ The
+ Range Index
+ article describes eXist-db's super fast modularized range index based on Apache Lucene.
+ There is also an
+ older version of the range index, which is kept for compatibility reasons. Usage of this range index is discouraged.
+
+
+
+
+
+ The beginners guide to XRX
+ will learn you how to create a simple application using XRX (XForms, REST, XQuery).
- Content extraction shows ho to extract and index non-XML contents, like PDF or Word
- documents.
+ Package Repository: How to work with EXPath packages in eXist-db.
+
+
+
+ Content extraction
+ shows how to extract and index non-XML contents, like PDF or Word documents.
+
+
+
+ REST-Style Web API
+ explains how to use eXist-db's REST interface, a useful tool in building applications.
+
+
+
+ HTTP Request/Session
+ provides information about the functions available working with HTTP requests and sessions.
+
+
+
+ Scheduler Module: How to regularly schedule jobs.
+
+
+
+ Security: When you get serious writing applications, you need to be aware of the security model of eXist-db
+
+
+
+ REST-Style Web API
+ explains how to use eXist-db's REST interface, a useful tool in building applications.
+
+
+
+ HTTP Request/Session
+ provides information about the functions available working with HTTP requests and sessions.
@@ -90,11 +193,41 @@
+
+ Interfaces
+
+ eXist-db provides many ways of interfacing with the database.
+
+
+
+
+ REST-Style Web API
+ explains how to use eXist-db's REST interface.
+
+
+ The
+ SOAP Interface Developer's Guide
+ explains how to add a SOAP interface to eXist-db using Java code.
+
+
+ The
+ XML-RPC API Developer's Guide
+ explains how to interface with eXist-db using the XML-RPC API.
+
+
+
+
+
+
Operations
- Operations is the art of installing eXist-db and keeping it up-and-running professionally. This includes things like more advanced
- installation types, doing backups and restores, automate data transfers, etc.
+ Operations is the art of installing eXist-db and keeping it up-and-running professionally. This includes things like more advanced installation types, doing backups and restores, automate data transfers, etc.
+
+ The
+ Java Admin Client
+ is a utility for performing basic administrative tasks. It has both a GUI and a command line interface.
+
Configuration: How to configure eXist-db using its main configuration file
@@ -106,21 +239,68 @@
- Advanced Installation: How to install eXist-db on a headless (no GUI) system and run it as a
- service.
+ Advanced Installation: How to install eXist-db on a headless (no GUI) system and run it as a service.
+
+
+
+ Ant tasks: How to use the specific eXist-db
+ Ant
+ tasks to automate common system administration and operation tasks.
+
+
+
+ Database Deployment: How to install eXist-db as a stand-alone or embedded server.
+
+
+
+ Building eXist: How to build Java
+ .jar
+ or
+ .war
+ files from an eXist distribution.
+
+
+
+ Performance FAQ
+ contains a short FAQ about eXist-db's performance.
+
+
+ When you use eXist-db on a production system, please read
+ Production Use - Good Practice
+ for advice.
+
+
+ You can proxy eXist-db behind a web server like Nginx or Apache.
+ Production use - Proxying eXist-db behind a Web Server
+ will provide you with some examples.
+
+
+ eXist has a
+ JMX
+ interface for access to internal statistics about memory, caching, etc.
+
+
+ Consult the
+ incompatibilities overview
+ when you upgrade from an older version of eXist-db.
- Ant tasks: How to use the specific eXist-db Ant tasks to automate common system
- administration and operation tasks.
+ Scheduler Module: Scheduling jobs (like backups) is a useful tool in an eXist-db installation.
- Database Deployment: How to install eXist-db as a stand-alone or embedded server.
+
+ Security: The security model of eXist. Also explains how to connect eXist-db to other authentication realms like LDAP or OAuth.
- Building eXist: How to build Java .jar or .war files from an eXist
- distribution.
+ Performance FAQ
+ contains a short FAQ about eXist-db's performance.
+
+
+ Consult the
+ incompatibilities overview
+ when you upgrade from an older version of eXist-db.
@@ -130,16 +310,79 @@
Java developmenteXist-db is based on Java. Besides using eXist-db as a stand-alone application platform, you can also use it from
- within Java code. The following articles will help you with this.
+ within
+ Java code. The following articles will help you with this.
- Database Deployment: How to install eXist-db as a stand-alone or embedded server. An embedded
- server can be accessed directly from Java code.
+
+ Database Deployment: How to install eXist-db as a stand-alone or embedded server. An embedded server can be accessed directly from Java code.
+
+
+
+ Writing Java Applications with the XML:DB API
+ explains how to work with eXist-db from Java code using the XML:DB API. This API provides a common interface to native or XML-enabled databases and supports the development of portable, reusable applications.
- Building eXist: How to build Java .jar or .war files from an eXist
- distribution.
+ Building eXist: How to build Java
+ .jar
+ or
+ .war
+ files from an eXist distribution.
+
+
+
+ Developer's Guide to Modularized Indexes
+ explains how the internal indexing mechanism works and how to add your own indexes to it.
+
+
+ The
+ Log4j Logging Guide
+ explains how to add logging to your Java code using Log4J.
+
+
+ The
+ SOAP Interface Developer's Guide
+ explains how to add a SOAP interface to eXist-db using Java code.
+
+
+ The
+ XML-RPC API Developer's Guide
+ explains how to interface with eXist-db using the XML-RPC API.
+
+
+
+ Extension Modules
+ provides an overview of how to create eXist-db extension modules (in Java) and contains a list of available extension modules.
+
+
+ eXist provides access to various management interfaces using
+ JMX.
+
+
+
+ Developer's Guide to Modularized Indexes
+ explains how the internal indexing mechanism works and how to add your own indexes to it.
+
+
+ The
+ Log4j Logging Guide
+ explains how to add logging to your Java code using Log4J.
+
+
+ The
+ SOAP Interface Developer's Guide
+ explains how to add a SOAP interface to eXist-db using Java code.
+
+
+ The
+ XML-RPC API Developer's Guide
+ explains how to interface with eXist-db using the XML-RPC API.
+
+
+
+ Extension Modules
+ provides an overview of how to create eXist-db extension modules (in Java) and contains a list of available extension modules.
@@ -149,12 +392,28 @@
Developing eXist-db
- The following articles provide information on how to work on eXist-db itself, either by enhancing its code or providing
- documentation.
+ The following articles provide information on how to work on eXist-db itself, either by enhancing its code or providing documentation.
- The Author Reference explains how to write a documentation article for eXist-db (like the ones
- you are looking at now).
+ The
+ eXist-db Developer Manifesto
+ lays out guidelines for developers that wish to contribute to eXist-db's code base itself.
+
+
+ The
+ Code Review Guide
+ provides instructions how to review somebody else's (or your own of course) code.
+
+
+ The
+ Author Reference
+ explains how to write a documentation article for eXist-db (like the ones you are looking at now).
+
+
+ The
+ Legal Statement
+ provides information about the legal status of eXist as an open source product.
+
@@ -171,8 +430,9 @@
Subject index
- This section lists all available articles by subject. Not yet re-factored articles are listed under Tbd.
+ This section lists all available articles by subject. Not yet re-factored articles are listed under
+ Tbd.
-
\ No newline at end of file
+
diff --git a/src/main/xar-resources/data/extensions/extensions.xml b/src/main/xar-resources/data/extensions/extensions.xml
index 858988ee..a0b2a828 100644
--- a/src/main/xar-resources/data/extensions/extensions.xml
+++ b/src/main/xar-resources/data/extensions/extensions.xml
@@ -1,360 +1,559 @@
-
- XQuery Extension Modules Documentation
- August 2012
-
- TBD
-
-
-
-
-
-
- Introduction
-
- eXist-db provides a pluggable module interface that allows extension modules to be easily
- developed in Java. These extension modules can provide additional XQuery functions through a
- custom namespace. The extension modules have full access to the eXist-db database, its internal API,
- the context of the executing XQuery and the HTTP Session (if appropriate).
- The source code for extension modules should be placed in their own folder inside
- $EXIST_HOME/extensions/modules/src/org/exist/xquery/modules. They may
- then be compiled in place using either $EXIST_HOME/build.sh
- extension-modules or %EXIST_HOME%\build.bat extension-modules
- depending on the platform.
- Modules associated to modularized indexes should be placed in the
- $EXIST_HOME/extensions/indexes/*/xquery/modules/* hierarchy. They will
- be compiled automatically by the standard build targets or as indicated above.
- eXist-db must also be told which modules to load, this is done in
- conf.xml and the Class name and Namespace for each module is listed
- below. Note – eXist-db will require a restart to load any new modules added. Once a Module is configured
- and loaded eXist-db will display the module and its function definitions as part of the function library page or through
+
+ Extension Modules
+ 1Q18
+
+ java-development
+
+
+
+
+
+ This article provides an overview of how to create eXist-db extension modules (in Java) and contains a list of available extension modules.
+
+
+
+
+ Introduction
+
+ eXist-db provides a pluggable module interface that allows extension modules to be easily developed in Java. These extension modules can
+ provide additional XQuery functions through a custom namespace. The extension modules have full access to the eXist-db database, its internal
+ API, the context of the executing XQuery and the HTTP Session (if appropriate).
+ The source code for extension modules should be placed in their own folder inside
+ $EXIST_HOME/extensions/modules/src/org/exist/xquery/modules. They may then be compiled in place using either
+ $EXIST_HOME/build.sh extension-modules or %EXIST_HOME%\build.bat extension-modules depending on the
+ platform.
+ Modules associated to modularized indexes must be placed in the $EXIST_HOME/extensions/indexes/*/xquery/modules/*
+ hierarchy. They will be compiled automatically by the standard build targets or as indicated above.
+ eXist-db must also be told which modules to load, this is done in conf.xml and the Class name and Namespace for each
+ module is listed below.
+
+ eXist-db will require a restart to load any new modules added.
+
+ Once a Module is configured and loaded eXist-db will display the module and its function definitions as part of the function library page or through
util:decribe-function().
-
+
-
+
-
+
+ Extension Modules
+ Example Module
-
- Demonstrates the simplest example of an Extension module with a single function. A good
- place to start if you wish to develop your own Extension Module.
-
- Creator:Wolfgang Meier
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.example.ExampleModule
- Namespace: http://exist-db.org/xquery/examples
-
-
-
-
-
+ Demonstrates the simplest example of an Extension module with a single function. A good place to start if you wish to develop your own
+ Extension Module.
+
+
+ Creator: Wolfgang Meier
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.example.ExampleModule
+
+
+
+ Namespace: http://exist-db.org/xquery/examples
+
+
+
+
+ Cache Module
-
Provides a global key/value cache
-
- Creator:Evgeny Gazdovsky
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.cache.CacheModule
- Namespace: http://exist-db.org/xquery/cache
-
-
-
-
-
+
+
+ Creator: Evgeny Gazdovsky
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.cache.CacheModule
+
+
+
+ Namespace: http://exist-db.org/xquery/cache
+
+
+
+
+ Compression Module
-
Provides additional operations for compression
-
- Creator:Adam Retter
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.compression.CompressionModule
- Namespace: http://exist-db.org/xquery/compression
-
-
-
-
-
+
+
+ Creator: Adam Retter
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.compression.CompressionModule
+
+
+
+ Namespace: http://exist-db.org/xquery/compression
+
+
+
+
+ Context Module
-
- Provides access to XQuery contexts, local attributes and foreign contexts for simple
- inter-XQuery communication. This extension is experimental at this time and has side effects
- (eg. not purely functional in nature). Use at own risk!
-
- Creator:Andrzej Taramina
- Licence: LGPL Status: experimental
-
- Class: org.exist.xquery.modules.context.ContextModule
- Namespace: http://exist-db.org/xquery/context
-
-
-
-
-
+ Provides access to XQuery contexts, local attributes and foreign contexts for simple inter-XQuery communication. This extension is
+ experimental at this time and has side effects (eg. not purely functional in nature). Use at own risk!
+
+
+ Creator: Andrzej Taramina
+
+
+ Licence: LGPL
+
+
+ Status: experimental
+
+
+ Class: org.exist.xquery.modules.context.ContextModule
+
+
+
+ Namespace: http://exist-db.org/xquery/context
+
+
+
+
+ Date Time Module
-
Provides additional operations on date and time types
-
- Creator:Adam Retter
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.datetime.DateTimeModule
- Namespace: http://exist-db.org/xquery/datetime
-
-
-
-
-
+
+
+ Creator: Adam Retter
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.datetime.DateTimeModule
+
+
+
+ Namespace: http://exist-db.org/xquery/datetime
+
+
+
+
+ EXI Module
-
Provides additional operations to encode and decode Efficient XML Interchange format (EXI)
-
- Creator:Robert Walpole
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.exi.EXIModule
- Namespace: http://exist-db.org/xquery/exi
-
-
-
-
-
+
+
+ Creator: Robert Walpole
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.exi.EXIModule
+
+
+
+ Namespace: http://exist-db.org/xquery/exi
+
+
+
+
+ File Module
-
- Provides additional operations on files and directories. WARNING: Enabling this
- extension module could result in possible security issues, since it allows writing to the
- filesystem by xqueries!
-
- Creator:Andrzej Taramina
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.file.FileModule
- Namespace: http://exist-db.org/xquery/file
-
-
-
-
-
+ Provides additional operations on files and directories. WARNING: Enabling this extension module could result in possible security issues,
+ since it allows writing to the filesystem by xqueries!
+
+
+ Creator: Andrzej Taramina
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.file.FileModule
+
+
+
+ Namespace: http://exist-db.org/xquery/file
+
+
+
+
+ HTTP Client Module
-
Functions for performing HTTP requests
-
- Creator:Adam Retter and Andrzej Taramina
- Licence: LGPL Features Used: NekoHTML
- Status: production
-
- Class: org.exist.xquery.modules.http.HTTPClientModule
- Namespace: http://exist-db.org/xquery/httpclient
-
-
-
-
-
+
+
+ Creator: Adam Retter and Andrzej Taramina
+
+
+ Licence: LGPL Features
+
+
+ Used: NekoHTML
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.http.HTTPClientModule
+
+
+
+ Namespace: http://exist-db.org/xquery/httpclient
+
+
+
+
+ Image Module
-
- This modules provides operations on images stored in the db, including: Retreiving Image
- Dimensions, Creating Thumbnails and Resizing Images.
-
- Creator:Adam Retter
- Contributors:Wolfgang Meier, Rafael Troilo
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.image.ImageModule
- Namespace: http://exist-db.org/xquery/image
-
-
-
-
-
+ This modules provides operations on images stored in the db, including: Retreiving Image Dimensions, Creating Thumbnails and Resizing
+ Images.
+
+
+ Creator: Adam Retter
+
+
+ Contributors: Wolfgang Meier, Rafael Troilo
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.image.ImageModule
+
+
+
+ Namespace: http://exist-db.org/xquery/image
+
+
+
+
+ JNDI Directory Module
-
- This extension module allows you to access and manipulate JNDI-based directories, such
- as LDAP, using XQuery functions. It can be very useful if you want to integration and LDAP
- directory into an eXist-db/XQuery based application.
- To compile it, set the parameter include.module.jndi = true in
- $EXIST_HOME/extensions/local.build.properties file (create it if missing).
- Then, to enable it, edit the appropriate module entry in conf.xml
-
- Creator:Andrzej Taramina
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.jndi.JNDIModule
- Namespace: http://exist-db.org/xquery/jndi
-
-
-
-
-
+ This extension module allows you to access and manipulate JNDI-based directories, such as LDAP, using XQuery functions. It can be very
+ useful if you want to integration and LDAP directory into an eXist-db/XQuery based application.
+ To compile it, set the parameter include.module.jndi = true in $EXIST_HOME/extensions/local.build.properties file (create it if
+ missing). Then, to enable it, edit the appropriate module entry in conf.xml.
+
+
+ Creator: Andrzej Taramina
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.jndi.JNDIModule
+
+
+
+ Namespace: http://exist-db.org/xquery/jndi
+
+
+
+
+ Mail Module
-
- This modules provides facilities for sending text and/or HTML emails from XQuery using
- either SMTP or a local Sendmail binary.
-
- Creator:Adam Retter
- Contributors:Robert Walpole
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.mail.MailModule
- Namespace: http://exist-db.org/xquery/mail
-
-
-
-
-
+ This modules provides facilities for sending text and/or HTML emails from XQuery using either SMTP or a local Sendmail binary.
+
+
+ Creator: Adam Retter
+
+
+ Contributors: Robert Walpole
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.mail.MailModule
+
+
+
+ Namespace: http://exist-db.org/xquery/mail
+
+
+
+
+ Math Module
-
This module provides mathematical functions from the java Math class.
-
- Creator:Dannes Wessels
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.math.MathModule
- Namespace: http://exist-db.org/xquery/math
-
-
-
-
-
+
+
+ Creator: Dannes Wessels
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.math.MathModule
+
+
+
+ Namespace: http://exist-db.org/xquery/math
+
+
+
+
+ Oracle Module
-
- This module allows execution of PL/SQL Stored Procedures within an Oracle RDBMS from
- XQuery and returns the results as XML nodes. This module should be used where an Oracle
- database returns results in an Oracle REF_CURSOR and can only be used in conjunction with
- the SQL extension module.
-
- Creator:Rob Walpole
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.oracle.OracleModule
- Namespace: http://exist-db.org/xquery/oracle
-
-
-
-
-
+ This module allows execution of PL/SQL Stored Procedures within an Oracle RDBMS from XQuery and returns the results as XML nodes. This
+ module should be used where an Oracle database returns results in an Oracle REF_CURSOR and can only be used in conjunction with the SQL
+ extension module.
+
+
+ Creator: Rob Walpole
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.oracle.OracleModule
+
+
+
+ Namespace: http://exist-db.org/xquery/oracle
+
+
+
+
+ Scheduler Module
-
- Provides access to eXist-db's Scheduler for the purposes of scheduling job's and
- manipulating existing job's.
-
- Creator:Adam Retter
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.scheduler.SchedulerModule
- Namespace: http://exist-db.org/xquery/scheduler
-
-
-
-
-
+ Provides access to eXist-db's Scheduler for the purposes of scheduling job's and manipulating existing job's.
+
+
+ Creator: Adam Retter
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.scheduler.SchedulerModule
+
+
+
+ Namespace: http://exist-db.org/xquery/scheduler
+
+
+
+
+ Simple Query Language Module
-
- This modules implements a Simple custom Query Language which is then converted to XPath
- and executed against the db.
-
- Creator:Wolfgang Meier
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.simpleql.SimpleQLModule
- Namespace: http://exist-db.org/xquery/simple-ql
-
-
-
-
-
+ This modules implements a Simple custom Query Language which is then converted to XPath and executed against the db.
+
+
+ Creator: Wolfgang Meier
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.simpleql.SimpleQLModule
+
+
+
+ Namespace: http://exist-db.org/xquery/simple-ql
+
+
+
+
+ Spatial module
-
- Various functions for GML geometries, whether indexed or not. More information about the design is
- available here.
-
- Creator:Pierrick Brihaye
- Licence: LGPL Status: experimental
-
- Class: org.exist.xquery.modules.spatial.SpatialModule
- Namespace: http://exist-db.org/xquery/spatial
-
-
-
-
-
+ Various functions for GML
+ geometries, whether indexed or not. More information about the design is available here.
+
+
+ Creator: Pierrick Brihaye
+
+
+ Licence: LGPL
+
+
+ Status: experimental
+
+
+ Class: org.exist.xquery.modules.spatial.SpatialModule
+
+
+
+ Namespace: http://exist-db.org/xquery/spatial
+
+
+
+
+ SQL Module
-
- This module provides facilities for performing SQL operations against traditional
- databases from XQuery and returning the results as XML nodes.
-
- Creator:Adam Retter
- Licence: LGPL Features Used: JDBC
- Status: production
-
- Class: org.exist.xquery.modules.sql.SQLModule
- Namespace: http://exist-db.org/xquery/sql
-
-
-
-
-
- XML Differencing Module
-
- This module provides facilities for determining the differences between XML
+ This module provides facilities for performing SQL operations against traditional databases from XQuery and returning the results as XML
nodes.
-
- Creator:Dannes Wessels
- Contributors:Pierrick Brihaye
- Licence: LGPL Status: production
-
- Class: org.exist.xquery.modules.xmldiff.XmlDiffModule
- Namespace: http://exist-db.org/xquery/xmldiff
-
-
-
-
-
+
+
+ Creator: Adam Retter
+
+
+ Licence: LGPL Features
+
+
+ Used: JDBC
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.sql.SQLModule
+
+
+
+ Namespace: http://exist-db.org/xquery/sql
+
+
+
+
+
+ XML Differencing Module
+ This module provides facilities for determining the differences between XML nodes.
+
+
+ Creator: Dannes Wessels
+
+
+ Contributors: Pierrick Brihaye
+
+
+ Licence: LGPL
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.xmldiff.XmlDiffModule
+
+
+
+ Namespace: http://exist-db.org/xquery/xmldiff
+
+
+
+
+ XSL-FO Module
-
This module provides XSL-FO rendering facilities.
-
- Creator: University of the West of England
- Licence: LGPL Features Used: Apache FOP
- Status: production
-
- Class: org.exist.xquery.modules.xslfo.XSLFOModule
- Namespace: http://exist-db.org/xquery/xslfo
-
-
-
-
-
+
+
+ Creator: University of the West of England
+
+
+ Licence: LGPL Features
+
+
+ Used: Apache FOP
+
+
+ Status: production
+
+
+ Class: org.exist.xquery.modules.xslfo.XSLFOModule
+
+
+
+ Namespace: http://exist-db.org/xquery/xslfo
+
+
+
+
+ XProcxq Module
-
This module provides XProc functionality to eXist-db.
-
- Creator:
- James Fuller
-
- Licence: MPL v1.1 Features Used: expath http library
- Status: in development for v2.0 release
-
- Class: static xquery module via extensions/xprocxq.jar
- Namespace: http://xproc.net/xproc
-
-
-
-
-
+
+
+ Creator: James Fuller
+
+
+ Licence: MPL v1.1 Features
+
+
+ Used: expath http library
+
+
+ Status: in development for v2.0 release
+
+
+ Class: static xquery module via extensions/xprocxq.jar
+
+
+
+ Namespace: http://xproc.net/xproc
+
+
+
+
+
+ XML Calabash Module
-
This module provides simple integration with XML Calabash XProc engine.
-
- Creator:
- James Fuller
-
- Licence: MPL v1.1
-
- Class: org.exist.xquery.modules.xmlcalabash
- Namespace: http://xmlcalabash.com
-
+
+
+ Creator: James Fuller
+
+
+ Licence: MPL v1.1
+
+
+ Class: org.exist.xquery.modules.xmlcalabash
+
+
+
+ Namespace: http://xmlcalabash.com
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/faq_performance/faq_performance.xml b/src/main/xar-resources/data/faq_performance/faq_performance.xml
index 67a05f3b..391ec185 100644
--- a/src/main/xar-resources/data/faq_performance/faq_performance.xml
+++ b/src/main/xar-resources/data/faq_performance/faq_performance.xml
@@ -1,137 +1,120 @@
-
- Performance FAQ
- October 2012
-
- TBD
-
-
+
+ Performance FAQ
+ 1Q18
+
+ operations
+
+
-
+
+
+ Contains a short FAQ about eXist-db's performance.
-
- Are there limits on the size or amount of data eXist-db can store?
+
- As an advanced, powerful native XML database, eXist-db is capable of storing and
- querying XML documents of arbitrary depth and complexity, and there is no
- theoretical limit to the amount of data or the number of documents and collections
- you store in eXist-db. Currently eXist-db is set to limit the number of documents
- and collections (respectively) to 2^32, but this can be raised. Thus, the raw size
- of your data is not the key factor to consider when evaluating how eXist-db will
- perform for your applications.
-
+
+ FAQ
-
+
-
- How much do external factors like memory, storage, operating system, and
- processor power affect eXist-db's performance?
+
+ Are there limits on the size or amount of data eXist-db can store?
+
+ As an advanced, powerful native XML database, eXist-db is capable of storing and querying XML documents of arbitrary depth and
+ complexity, and there is no theoretical limit to the amount of data or the number of documents and collections you store in eXist-db.
+ Currently eXist-db is set to limit the number of documents and collections (respectively) to 2^32, but this can be raised. Thus, the raw
+ size of your data is not the key factor to consider when evaluating how eXist-db will perform for your applications.
+
+
- eXist-db has modest memory requirements (the default memory footprint is 512 MB),
- but as your data grows, queries grow more complex, and number of concurrent users
- increase, the performance of eXist-db will improve by supplying it with adequate
- memory, storage, and processing power. Certain operating systems impose upper limits
- on the amount of memory that can be allocated to a single application; for example,
- the 32-bit version of Windows limits applications to 1.3 GB of RAM, which while
- adequate for many applications may not last forever. As a multithreaded Java
- application, eXist-db benefits from multicore processors. Solid state storage offers
- performance advantages over much better than hard disk storage. Understanding
- external factors like these will allow you to give eXist-db the environment it needs
- to perform to your requirements. Regardless of the hardware and operating system you
- are using, you will want to explore the core factors that contribute to eXist-db
- performance.
-
+
+ How much do external factors like memory, storage, operating system, and processor power affect eXist-db's performance?
+
+ eXist-db has modest memory requirements (the default memory footprint is 512 MB), but as your data grows, queries grow more complex,
+ and number of concurrent users increase, the performance of eXist-db will improve by supplying it with adequate memory, storage, and
+ processing power. Certain operating systems impose upper limits on the amount of memory that can be allocated to a single application; for
+ example, the 32-bit version of Windows limits applications to 1.3 GB of RAM, which while adequate for many applications may not last
+ forever. As a multithreaded Java application, eXist-db benefits from multicore processors. Solid state storage offers performance
+ advantages over much better than hard disk storage. Understanding external factors like these will allow you to give eXist-db the
+ environment it needs to perform to your requirements. Regardless of the hardware and operating system you are using, you will want to
+ explore the core factors that contribute to eXist-db performance.
+
+
-
+
+ What core factors play into eXist-db's performance?
+
+ The key factor affecting performance in eXist-db is the interrelationship between the structure of your data and the queries you need
+ to run. eXist-db has been designed to execute XQuery efficiently by pre-indexing the structure of your data (and, if you configure it, the
+ contents of elements or attributes). Indexing allows eXist-db to perform operations in memory (which is fast), rather than reading from
+ disk (which is slow). eXist-db generally performs very well when querying XML documents and their collections. When performance suffers,
+ it is typically because indexing has not been employed, because queries have been written inefficiently, or because the data needs to be
+ restructured to allow queries to perform most optimally.
+ Among the many ways to optimize eXist-db's performance, eXist-db's indexing abilities can dramatically improve the performance of
+ queries. Range or NGram indexes can improve the performance of queries that rely on string or value comparisons, and full text indexes can
+ dramatically increase the speed and sophistication of full text searches. These indexes, paired with the right cache and memory settings
+ will allow eXist-db to load just the right amount of data in memory for fast processing and minimize disk I/O operations or the need to
+ access the raw DOM to complete a query.
+ Performance of queries can also depend on actions like storing, replacing, or updating data. Some operations synchronize on the
+ collection cache, which blocks other operations. Overcoming write-related performance problems can require changes to an application's
+ design.
+ Performance can change when an application moves from single to multiple concurrent users. In a concurrent situation, queries which
+ need to traverse the DOM or scan through large index entries can become a bottleneck even though they run quickly when a single user runs
+ the query.
+ The bottom line: Performance depends on many factors, but developers of eXist-db are eager to eliminate all known bottlenecks and
+ factors that lead to poor performance. If you have performance concerns, send a message to the exist-open mailing list. Depending on the
+ nature your issue, you may be asked to send information about your operating system or memory settings, sample data and queries, a thread
+ dump captured while the query is running, or information about memory consumption (using jconsole or other tools) to see how memory is
+ used during times of low and high load. Very often the cause of a slowdown is a single query which just consumes too many resources, and
+ if such a query coincides with other operations, performance can degrade. Identifying bottlenecks is the first step to overcoming
+ performance problems.
+
+
-
- What core factors play into eXist-db's performance?
+
+ How scalable is eXist-db?
+
+ Scalability is a complex topic, and there are numerous areas that an application might want to be able to scale. To date, eXist-db has
+ typically run as a single server. In a single server model, the means for scaling involves increasing memory and adding faster storage.
+ However, eXist-db also has data replication abilities, allowing applications to span multiple servers. Built on JMS, eXist-db's
+ replication involves designating one server as the master, and one or more other servers as slaves. Changes on the master are
+ automatically replicated to the slaves. This replication facility should not be confused with a system for sharding data or distributing
+ queries across multiple servers.
+ As the scalability requirements of eXist-db users grow, the eXist-db developers aim to rise to the challenge.
+
+
- The key factor affecting performance in eXist-db is the interrelationship between
- the structure of your data and the queries you need to run. eXist-db has been
- designed to execute XQuery efficiently by pre-indexing the structure of your data
- (and, if you configure it, the contents of elements or attributes). Indexing allows
- eXist-db to perform operations in memory (which is fast), rather than reading from
- disk (which is slow). eXist-db generally performs very well when querying XML
- documents and their collections. When performance suffers, it is typically because
- indexing has not been employed, because queries have been written inefficiently, or
- because the data needs to be restructured to allow queries to perform most
- optimally.
- Among the many ways to optimize eXist-db's performance, eXist-db's indexing
- abilities can dramatically improve the performance of queries. Range or NGram
- indexes can improve the performance of queries that rely on string or value
- comparisons, and full text indexes can dramatically increase the speed and
- sophistication of full text searches. These indexes, paired with the right cache and
- memory settings will allow eXist-db to load just the right amount of data in memory
- for fast processing and minimize disk I/O operations or the need to access the raw
- DOM to complete a query.
- Performance of queries can also depend on actions like storing, replacing, or
- updating data. Some operations synchronize on the collection cache, which blocks
- other operations. Overcoming write-related performance problems can require changes
- to an application's design.
- Performance can change when an application moves from single to multiple
- concurrent users. In a concurrent situation, queries which need to traverse the DOM
- or scan through large index entries can become a bottleneck even though they run
- quickly when a single user runs the query.
- The bottom line: Performance depends on many factors, but developers of eXist-db
- are eager to eliminate all known bottlenecks and factors that lead to poor
- performance. If you have performance concerns, send a message to the exist-open
- mailing list. Depending on the nature your issue, you may be asked to send
- information about your operating system or memory settings, sample data and queries,
- a thread dump captured while the query is running, or information about memory
- consumption (using jconsole or other tools) to see how memory is used during times
- of low and high load. Very often the cause of a slowdown is a single query which
- just consumes too many resources, and if such a query coincides with other
- operations, performance can degrade. Identifying bottlenecks is the first step to
- overcoming performance problems.
-
+
-
+
-
- How scalable is eXist-db?
+
- Scalability is a complex topic, and there are numerous areas that an application
- might want to be able to scale. To date, eXist-db has typically run as a single
- server. In a single server model, the means for scaling involves increasing memory
- and adding faster storage. However, eXist-db also has data replication abilities,
- allowing applications to span multiple servers. Built on JMS, eXist-db's replication
- involves designating one server as the master, and one or more other servers as
- slaves. Changes on the master are automatically replicated to the slaves. This
- replication facility should not be confused with a system for sharding data or
- distributing queries across multiple servers.
- As the scalability requirements of eXist-db users grow, the eXist-db developers
- aim to rise to the challenge.
-
+
+ Sources
-
+
+
+ On size vs. structure: Wolfgang Meier, exist-open mailing list, Apr 29, 2010
+
+
+
+ On querying while writing: Wolfgang Meier, exist-open mailing list, Jan 19, 2012
+
+
+
+ On multiple concurrent users thread dumps, and memory monitoring, Wolfgang Meier, exist-open mailing list, Jan 19, 2012
+
+
+
+ On limits of collection and document ids: Pierrick Brihaye, exist-open mailing list, Oct 4, 2007
+
+
+
+
-
- Sources
-
-
- On size vs. structure: Wolfgang Meier,
- exist-open mailing list, Apr 29, 2010
-
-
-
- On querying while writing: Wolfgang Meier,
- exist-open mailing list, Jan 19, 2012
-
-
-
- On multiple concurrent users thread dumps, and memory monitoring, Wolfgang Meier,
- exist-open mailing list, Jan 19, 2012
-
-
-
- On limits of collection and document ids: Pierrick Brihaye,
- exist-open mailing list, Oct 4, 2007
-
-
-
-
\ No newline at end of file
diff --git a/src/main/xar-resources/data/getting-help-how-to-report/getting-help-how-to-report.xml b/src/main/xar-resources/data/getting-help-how-to-report/getting-help-how-to-report.xml
index 5d2dbf9a..109bd53e 100644
--- a/src/main/xar-resources/data/getting-help-how-to-report/getting-help-how-to-report.xml
+++ b/src/main/xar-resources/data/getting-help-how-to-report/getting-help-how-to-report.xml
@@ -1,83 +1,83 @@
-
- Getting Help: How to report issues
- October 2016
-
- TBD
-
-
+
+ How to report issues
+ 1Q18
+
+ getting-started
+
+
-
+
-
- Introduction
+ This article will tell you what information to provide when you report an issue.
- When a (potential) bug is reported, please include as much of the information as described below. When
- more information is provided, it is more easy for the developers to understand and reproduce the issue,
- which means that the issue can be picked-up and solved much faster.
-
+
-
+
+ Reporting an issue
-
+ Bugs can be reported on the Bug Tracker. Data and log files can be attached.
+
+ When a (potential) bug is reported, please include as much of the information described below. When more information is provided it is
+ easier for the developers to understand and reproduce the issue, which means that the issue can be picked-up and solved much faster.
+
+
+
+ General information
-
- When reporting a (suspected) bug please make the report as complete as possible:
-
-
- Try to write a clear (and short) description how to reproduce the problem
-
-
- Include the exact version (and revision), e.g. "1.4.3" (rev1234) or "2.1".
-
-
- Always add the operating system (e.g. "Windows7 64bit"), the exact Java version as is outputted by
- the command 'java -version' on the console.
-
-
- Include relevant parts of the logfile (e.g.
- webapp/WEB-INF/logs/exist.log and tools/yajsw/logs/wrapper.log)
-
-
- Mention the changes that have been made in the configuration files, e.g.
- conf.xml, vm.properties,
- tools/yajsw/conf/wrapper.conf and
- tools/jetty/etc/jetty.xml.
-
-
-
-
-
-
-
-
+ When reporting a (suspected) bug please make the report as complete as possible:
+
+
+ Try to write a clear (and short) description how to reproduce the problem
+
+
+ Include the exact version (and revision), e.g. "1.4.3" (rev1234) or "2.1".
+
+
+ Always add the operating system (e.g. "Windows7 64bit"), the exact Java version as is outputted by the command java
+ -version on the console.
+
+
+ Include relevant parts of the logfile (webapp/WEB-INF/logs/exist.log and
+ tools/yajsw/logs/wrapper.log)
+
+
+ Mention the changes made in the configuration files, for instance conf.xml, vm.properties,
+ tools/yajsw/conf/wrapper.conf and tools/jetty/etc/jetty.xml.
+
+
+
+
+
+
+
+ XQuery specific
-
- When reporting a potential XQuery bug please:
-
-
- Make the XQuery, if possible, 'self containing', meaning that the XQuery does not
- require any additional files to run.
-
-
- Describe the actual XQuery result and the expected result
-
-
- Check if the issue has been solved in the latest version of eXist-db; For this the web
- based tool eXide
- can be used.
-
-
- Run the XQuery with Kernow for
- Saxon and check the similarities and differences.
-
-
-
- Bugs can also be reported on the Bug Tracker
- where data and log files can be attached.
-
+ When reporting a potential XQuery bug please:
+
+
+ Make the XQuery, if possible, 'self containing', meaning that the XQuery does not require any additional files to run.
+
+
+ Describe the actual XQuery result and the expected result
+
+
+ Check if the issue has been solved in the latest version of eXist-db; For this the web based tool eXide can be used.
+
+
+ Run the XQuery with Kernow for
+ Saxon and check the similarities and differences.
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/getting-help/getting-help.xml b/src/main/xar-resources/data/getting-help/getting-help.xml
index 4ec3ed8e..093b05e0 100644
--- a/src/main/xar-resources/data/getting-help/getting-help.xml
+++ b/src/main/xar-resources/data/getting-help/getting-help.xml
@@ -1,69 +1,60 @@
-
- Getting Help
- September 2012
-
- TBD
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Getting Help
+ 1Q18
+
+ getting-started
+
+
-
+
-
- Community Support
+ This article will tell you where to go for additional help working with eXist-db.
- Besides eXist-db's extensive documentation,
- eXist-db has a large and active community of users and developers. The community
- uses mailing lists, chat, a wiki, and social media—to get and give help, follow
- the latest developments, and share tips and discoveries.
- The book, "eXist: a NoSQL Document Database and Application Platform" was published in 2014
- and is available from O'Reilly.com.
-
- eXist: a NoSQL Document Database and Application Platform
-
-
-
-
-
-
- The exist-open mailing list is the primary forum for asking questions and
- getting help. Finding the answer to a question can sometimes be as simple as searching the mailing list's
- archives. Whether you have a question that you can't find the answer to,
- or you just want to join the discussion, subscribe to the
- mailing list—it's free and you can unsubscribe anytime.
-
- Please consult this page for tips how to best report issues.
-
- Besides exist-open, there is a mailing list dedicated solely to the XQuery
- language: the xquery-talk
- mailing list. It's a great place to ask questions about XQuery, from
- basic to advanced.
- Also, there are numerous excellent resources for learning XQuery, including the
- XQuery wikibook and
- Priscilla Walmsley's XQuery book (O'Reilly 2007).
- Follow @existdb on Twitter for
- news and announcements about eXist-db, and use the #existdb hash tag (note
- no dash in #existdb). Find links about eXist-db on Del.icio.us using the existdb tag. Meet the
- eXist-db professionals on LinkedIn. Subscribe to the eXist Developer Blog,
- powered by the eXist-based AtomicWiki project.
-
+
-
+
+ Community Support
-
- Professional Support
+ Besides eXist-db's extensive documentation set, eXist-db has a large and active community of users
+ and developers. The community uses mailing lists, chat, a wiki, and social media to get and give help, follow the latest developments and share
+ tips and discoveries.
+ The book, "eXist: a NoSQL Document Database and Application Platform" was published in 2014 and is available from O'Reilly.com.
+
+
+
+
+
+
+
+
+ The exist-open mailing list is the
+ primary forum for asking questions and getting help. Finding the answer to a question can sometimes be as simple as searching the mailing list's archives. Whether you have a
+ question that you can't find the answer to, or you just want to join the discussion, subscribe to the mailing list. It's free and
+ you can unsubscribe anytime.
+
+ Please consult this page for tips how to best report issues.
+
+ Besides exist-open, there is a mailing list dedicated solely to the XQuery language: the xquery-talk mailing list. It's a great place to ask questions about XQuery, from
+ basic to advanced.
+ Also, there are numerous excellent resources for learning XQuery, including the XQuery wikibook and Priscilla Walmsley's XQuery book.
+ Follow @existdb on Twitter for news and announcements about
+ eXist-db. Use the #existdb hash tag (note: no dash in
+ #existdb).
+ Find links about eXist-db on Del.icio.us using the existdb tag. Meet the eXist-db professionals on LinkedIn. Subscribe to the eXist
+ Developer Blog, powered by the eXist-based AtomicWiki project.
+
- eXist-db began as and continues to thrive as an open source community, built on
- the contributions of volunteers. Recognizing the need for professional solutions for
- eXist-db, members of our development team and community have come together
- commercially under the umbrella of eXist
- Solutions. They are available for Consultancy, Training or
- Bespoke
- Development on a flexible basis, as such if you need some extra support
- or development resource, please contact us at info@existsolutions.com. eXist
- Solutions also provides Support subscriptions with service level agreements tailored to
- customer's requirements.
-
+
+
+
+ Professional Support
+
+ eXist-db began as and continues to thrive as an open source community, built on the contributions of volunteers. Recognizing the need for
+ professional solutions for eXist-db, members of our development team and community have come together commercially under the umbrella of eXist Solutions. They are available for Consultancy, Training or Bespoke
+ Development. We also offer support
+ subscriptions with service level agreements tailored to customer's requirements. Please contact us at info@existsolutions.com.
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/http-request-session/http-request-session.xml b/src/main/xar-resources/data/http-request-session/http-request-session.xml
index b9f7aba3..d84dc568 100644
--- a/src/main/xar-resources/data/http-request-session/http-request-session.xml
+++ b/src/main/xar-resources/data/http-request-session/http-request-session.xml
@@ -1,59 +1,61 @@
-
- HTTP-Related Functions in the request and session Modules
- October 2012
-
- TBD
-
-
-
-
-
-
- Introduction
-
- The request module (in the http://exist-db.org/xquery/request function
- namespace) contains functions for handling HTTP request parameters. The session
- module (in the http://exist-db.org/xquery/session function namespace)
- contains functions for handling HTTP session variables. Functions in these
- namespaces are only usable if the query is executed through the XQueryGenerator or
- the XQueryServlet (for more information consult eXist-db's Developer's Guide ).
-
-
- request:get-parameter(name, default
- value)
-
- This HTTP function expects two arguments: the first denotes the name
- of the parameter, the second specifies a default value, which is
- returned if the parameter is not set. This function returns a sequence
- containing the values for the parameter. The above script
- (Adding/Subtracting Two Numbers) offers an example of how
- request:get-parameter can be used to read HTTP
- request parameters.
-
-
-
- request:get-uri()
-
- This function returns the URI of the current request. To encode this
- URI using the current session identifier, use the following
- function:
- session:encode-url(request:get-uri())
-
-
-
- session:create()
-
- This function creates a new HTTP session if none exists.
- Other session functions read and set session attributes, among other
- operations. For example, an XQuery or Java object value can be stored in
- a session attribute, to cache query results. For more example scripts,
- please look at our Examples page,
- under the XQuery Examples section.
-
-
-
-
+
+ HTTP-Related Functions in the Request and Session Modules
+ 1Q18
+
+ application-development
+
+
+
+
+
+ This article describes the functions for handling HTTP session parameters and variables.
+
+
+
+
+ Introduction
+
+ The request module (in the http://exist-db.org/xquery/request function namespace) contains functions for handling HTTP
+ request parameters. The session module (in the http://exist-db.org/xquery/session function namespace) contains functions for
+ handling HTTP session variables. Functions in these namespaces are only usable if the query is executed through the XQueryGenerator
+ or the XQueryServlet.
+
+
+
+
+
+ Available functions
+
+
+
+ request:get-parameter(name, default value)
+
+ This HTTP function expects two arguments: the first denotes the name of the parameter, the second specifies a default value, which is
+ returned if the parameter is not set. This function returns a sequence containing the values for the parameter.
+
+
+
+ request:get-uri()
+
+ This function returns the URI of the current request. To encode this URI using the current session identifier, use the following
+ function:
+ session:encode-url(request:get-uri())
+
+
+
+ session:create()
+
+ This function creates a new HTTP session if none exists.
+
+
+
+
+
+ Other session functions read and set session attributes, among other operations. For example, an XQuery or Java object value can be stored
+ in a session attribute, to cache query results.
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/incompatibilities/incompatibilities.xml b/src/main/xar-resources/data/incompatibilities/incompatibilities.xml
index b43c6e8a..87d4e2b3 100644
--- a/src/main/xar-resources/data/incompatibilities/incompatibilities.xml
+++ b/src/main/xar-resources/data/incompatibilities/incompatibilities.xml
@@ -1,205 +1,216 @@
-
- Known Issues for Binary Non-compatible Upgrades
- 27-11-2017
-
- TBD
-
-
-
-
-
-
- Upgrading to 3.0 stable
-
- eXist-db v3.0 is not binary compatible with previous versions of eXist-db; the on-disk database file format has been updated, users should perform a full backup and restore to migrate their data.
- eXist.db v3.0 and subsequent versions now require Java 8; Users must update to Java 8!
- 3.0 removes the the legacy Full Text Index and the text (http://exist-db.org/xquery/text) XQuery module. Users should now look toward fn:analyze-string, e.g.
-
-
-
+
+ Known Issues when upgrading
+ 1Q18
+
+ operations
+
+
+
+
+
+ This article lists the known incompatibilities when upgrading from an older version of eXist-db.
+
+
+
+ Upgrading to 3.0 stable
+
+
+
+ eXist-db v3.0 is not binary compatible with previous versions of eXist-db; the on-disk database file format has been updated, users
+ should perform a full backup and restore to migrate their data.
+
+
+ eXist.db v3.0 and subsequent versions now require Java 8; Users must update to Java 8!
+
+
+ 3.0 removes the the legacy Full Text Index and the text (http://exist-db.org/xquery/text) XQuery module. Users should now look toward
+ fn:analyze-string, e.g.
+
+ instead of using text:groups() use analyze-string()//fn:group,
-
-
- instead of text:filter("apparat", "([pr])") use analyze-string("apparat", "([pr])")//fn:match/string()).
-
-
- Furthermore, the SOAP APi, SOAP server, and XACML Security features were removed.
- The versioning extension is now available as a separate EXPATH package
-
- XQueryService has been moved from DBBroker to BrokerPool.
- EXPath packages that incorporate Java libraries may no longer work with eXist-db v3.0 and may need to be recompiled for our API changes; packages should now explicitly specify the eXist-db versions that they are compatible with.
- eXist-db v3.0 is the culmination of almost 1,500 changes. For more information on new features head to the blog.
-
-
-
-
-
- Upgrading to 2.2 final
-
- The 2.2 release is not binary compatible with the 1.4.x series. You need to
- backup/restore. If you experience problems with user logins after the restore, please restart eXist-db.
- 2.2 introduces a new range index module. Old index definitions will still work though as we made sure to keep backwards
- compatible. If you would like to upgrade to the new index, check its documentation.
- The XQuery engine has been updated to support the changed syntax for maps in XQuery 3.1. The query parser will still accept the
- old syntax for map constructors though (map { x:= "y"} instead of map { x: "y" } in XQuery 3.1), so old
- code should run without modifications. All map module functions from XQuery 3.1 are
- available.
- The signatures for some higher-order utility functions like fn:filter, fn:fold-left and fn:fold-right have changed as well. Please review your
- use of those functions. Also, fn:map is now called fn:for-each, though the old name is still accepted.
- The bundled Lucene has been upgraded from 3.6.1
- to 4.4 with this release. Depending on what Lucene
- analyzers you are using you need to change the
- classnames in your
- collection.xconfs. E.g. KeywordAnalyzer
- and WhitespaceAnalyzer has moved into package
- org.apache.lucene.analysis.core. Thus
- change, any occurrence of
- org.apache.lucene.analysis.WhitespaceAnalyzer
- into
- org.apache.lucene.analysis.core.WhitespaceAnalyzer
- and all other moved classes in the collection
- configurations and make sure you reindex your data
- before use. You get an error notice in the
- exist.log if you overlooked any
- occurrences.
-
-
-
-
-
- Upgrading to 2.1
-
- The 2.1 release is not binary compatible with the 1.4.x series. You need to
- backup/restore. 2.1 is binary compatible with 2.0 though.
-
-
-
-
-
- Upgrading to 2.0
-
- The 2.0 release is not binary compatible with the 1.4.x series. You need to
- backup/restore.
-
-
-
-
- Special Notes
-
-
-
- Permissions
-
- eXist-db 2.0 closely follows the Unix security model (plus ACLs). Permissions have thus
- changed between 1.4.x and 2.0. In particular, there's now an execute permission, which is
- required to
-
-
- execute an XQuery via any of eXist-db's interfaces
-
-
- change into a collection to view or modify its contents
-
-
- eXist-db had an update permission instead of the execute permission. Support for the update permission
- has been dropped because it was not used widely.
- When restoring data from 1.4.x, you thus need to make sure that:
-
-
- collections have the appropriate execute permission
-
-
- XQueries are executable
-
-
- You can use an XQuery to automatically apply a default permission to every collection and XQuery, and
- then change them manually for some collections or resources.
-
-
-
-
- Webapp Directory
-
- Contrary to 1.4.x, eXist-db 2.0 stores most web applications into the database. The webapp
- directory is thus nearly empty. It is still possible to put your web application there and it should
- be accessible via the browser in the same way as before.
-
-
-
-
-
-
-
-
-
- Upgrading to 1.4.0
-
- The 1.4 release is not binary compatible with the 1.2.x series. You need to
- backup/restore.
-
-
-
-
- Special Notes
-
-
-
- Indexing
-
- eXist-db 1.2.x used to create a default full text index on all
- elements in the db. This has been disabled. The
- main reasons for this are:
-
-
- maintaining the default index costs performance and
- memory, which could be better used for other indexes. The
- index may grow very fast, which can be a destabilizing
- factor.
-
-
- the index is unspecific. The query engine cannot use it as
- efficiently as a dedicated index on a set of named elements
- or attributes. Carefully creating your indexes by hand will
- result in much better performance.
-
-
- Please consider using the new Lucene-based full text index.
- However, if you need to switch back to the old behaviour to ensure
- backwards compatibility, just edit the system-wide defaults in
- conf.xml:
-
-
-
-
- Document Validation
-
- Validation of XML documents during storage is now turned
- off by default in conf.xml:
- <validation mode="no">
- The previous auto setting was apparently too
- confusing for new users who did not know what to do if eXist-db refused
- to store a document due to failing validation. If you are familiar
- with validation, the use of
- catalog files and the like, feel free to set the default back to
- auto or yes.
-
-
-
- Cocoon
-
- eXist-db does no longer require Cocoon for viewing documentation and
- samples. Cocoon has been largely replaced by eXist-db's own URL rewriting and MVC
- framework.
- Consequently, we now limit Cocoon to one directory of the web
- application (webapp/cocoon) and moved all the
- Cocoon samples in there. For the 1.5 version we completely
- removed Cocoon support.
-
-
-
-
-
+
+
+ instead of text:filter("apparat", "([pr])") use analyze-string("apparat",
+ "([pr])")//fn:match/string()).
+
+
+
+
+ The SOAP APi, SOAP server, and XACML Security features were removed.
+
+
+ The versioning extension is now available as a separate EXPATH package
+
+
+
+ XQueryService has been moved from DBBroker to BrokerPool.
+
+
+ EXPath packages that incorporate Java libraries may no longer work with eXist-db v3.0 and may need to be recompiled for our API changes;
+ packages should now explicitly specify the eXist-db versions that they are compatible with.
+
+
+ eXist-db v3.0 is the culmination of almost 1,500 changes. For more information on new features head to the blog.
+
+
+
+
+
+
+
+
+ Upgrading to 2.2 final
+
+
+
+ The 2.2 release is not binary compatible with the 1.4.x series. You need to backup/restore. If you experience problems with user logins
+ after the restore, please restart eXist-db.
+
+
+ 2.2 introduces a new range index module. Old index definitions will still work though as we made sure to keep
+ backwards compatible. If you would like to upgrade to the new index, check its documentation.
+
+
+ The XQuery engine has been updated to support the changed syntax for maps in XQuery 3.1. The query parser will
+ still accept the old syntax for map constructors though (map { x:= "y"} instead of map { x: "y" } in XQuery 3.1),
+ so old code should run without modifications. All map module functions from XQuery 3.1 are available.
+
+
+ The signatures for some higher-order utility functions like fn:filter, fn:fold-left and
+ fn:fold-right have changed as well. Please review your use of those functions. Also, fn:map is now called
+ fn:for-each, though the old name is still accepted.
+
+
+ The bundled Lucene has been upgraded from 3.6.1 to 4.4 with this release. Depending on what Lucene analyzers you are using you need to
+ change the classnames in your collection.xconfs. E.g. KeywordAnalyzer and WhitespaceAnalyzer has moved into package
+ org.apache.lucene.analysis.core. Thus change, any occurrence of
+ org.apache.lucene.analysis.WhitespaceAnalyzer into
+ org.apache.lucene.analysis.core.WhitespaceAnalyzer and all other moved classes in the collection configurations and
+ make sure you reindex your data before use. You get an error notice in the exist.log if you overlooked any
+ occurrences.
+
+
+
+
+
+
+
+
+ Upgrading to 2.1
+
+ The 2.1 release is not binary compatible with the 1.4.x series. You need to backup/restore.
+ 2.1 is binary compatible with 2.0 .
+
+
+
+
+
+ Upgrading to 2.0
+
+ The 2.0 release is not binary compatible with the 1.4.x series. You need to backup/restore.
+
+
+
+
+ Special Notes
+
+
+
+ Permissions
+
+ eXist-db 2.0 closely follows the Unix security model (plus ACLs). Permissions have thus changed between 1.4.x and 2.0. In
+ particular, there's now an execute permission, which is required to
+
+
+ execute an XQuery via any of eXist-db's interfaces
+
+
+ change into a collection to view or modify its contents
+
+
+ eXist-db had an update permission instead of the execute permission. Support for the update permission has been dropped because it
+ was not used widely.
+ When restoring data from 1.4.x, you thus need to make sure that:
+
+
+ collections have the appropriate execute permission
+
+
+ XQueries are executable
+
+
+ You can use an XQuery to automatically apply a default permission to every collection and XQuery, and then change them manually for
+ some collections or resources.
+
+
+
+
+ Webapp Directory
+
+ Contrary to 1.4.x, eXist-db 2.0 stores most web applications into the database. The webapp directory is thus nearly empty. It is
+ still possible to put your web application there and it should be accessible via the browser in the same way as before.
+
+
+
+
+
+
+
+
+
+ Upgrading to 1.4.0
+
+ The 1.4 release is not binary compatible with the 1.2.x series. You need to backup/restore.
+
+
+
+
+ Special Notes
+
+
+
+ Indexing
+
+ eXist-db 1.2.x used to create a default full text index on all elements in the db. This has been disabled. The
+ main reasons for this are:
+
+
+ maintaining the default index costs performance and memory, which could be better used for other indexes. The index may grow
+ very fast, which can be a destabilizing factor.
+
+
+ the index is unspecific. The query engine cannot use it as efficiently as a dedicated index on a set of named elements or
+ attributes. Carefully creating your indexes by hand will result in much better performance.
+
+
+ Please consider using the new Lucene-based full text index. However, if you need to switch back to the old behaviour to ensure
+ backwards compatibility, just edit the system-wide defaults in conf.xml:
+
+
+
+
+ Document Validation
+
+ Validation of XML documents during storage is now turned off by default in conf.xml:
+ <validation mode="no">
+ The previous auto setting was apparently too confusing for new users who did not know what to do if eXist-db
+ refused to store a document due to failing validation. If you are familiar with validation, the use of catalog files and the like, feel free to set the default back to
+ auto or yes.
+
+
+
+ Cocoon
+
+ eXist-db does no longer require Cocoon for viewing documentation and samples. Cocoon has been largely replaced by eXist-db's own
+ URL rewriting and MVC framework.
+ Consequently, we now limit Cocoon to one directory of the web application (webapp/cocoon) and moved all the
+ Cocoon samples in there. For the 1.5 version we completely removed Cocoon support.
+
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/indexing/indexing.xml b/src/main/xar-resources/data/indexing/indexing.xml
index d112533b..027aaeba 100644
--- a/src/main/xar-resources/data/indexing/indexing.xml
+++ b/src/main/xar-resources/data/indexing/indexing.xml
@@ -1,385 +1,305 @@
-
- Configuring Database Indexes
- November 2009
-
- TBD
-
-
-
-
-
-
- Overview
-
- In this section, we discuss the types of database indexes used by eXist-db, as well
- as how they are created, configured and maintained. It assumes readers have a basic
- understanding of XML and XQuery.
- Database indexes are used extensively by eXist-db to facilitate efficient querying of
- the database. This is accomplished both by system-generated and user-configured
- database indexes. The current (3.0) version of eXist-db by default includes the following
- types of indexes:
-
- Properly configured indexes have a huge impact on database performance! Some
- expressions might run a hundred times faster with an index. This particulary
- applies to the range index: without a range index, eXist has to do a full scan
- over the context nodes to look up an element value, which severly limits
- performance and scalability.
-
-
-
-
-
- New Range Indexes
- : A rewritten range index which provides superior performance on
- large data sets.
-
-
-
-
- Full Text Indexes
- : This full text indexing module features faster and
- customizable full text indexing by transparently integrating Lucene into the
- XQuery engine. Prefer this index over the Legacy Full Text Index.
-
-
-
-
-
- Legacy Range Indexes
- : These map specific text nodes and attributes of the documents
- in a collection to typed values.
-
-
-
-
- NGram Indexes
- : These map specific text nodes and attributes of the documents
- in a collection to splitted tokens of n-characters (where n = 3 by default).
- This is very efficient for exact substring searches and for queries on scripts
- (mostly non-European ones) which can not be easily split into whitespace
- separated tokens and are thus a bad match for the Lucene full text index.
-
-
-
-
- Spatial Indexes
- (Experimental): A working proof-of-concept index, which listens for spatial geometries
- described through the Geography Markup Language (GML). A detailed description of
- the implementation can be found in the Developer's Guide to Modularized Indexes.
-
-
-
-
- xml:id Index
- : An index of all xml:id attribute values is automatically created.
- These values can be queried by fn:id().
-
-
-
-
- Structural Indexes
- : This index keeps track of the elements, attributes, and the nodal structure
- of all XML documents in a collection. It is created and maintained automatically. No configuration required.
-
-
- eXist-db features a modularized indexing architecture. Most
- types of indexes have been moved out of the database core and are now maintained as
- pluggable extensions. The full text, the ngram, the spatial and the new range indexes
- fall under this category.
-
-
-
-
-
- Configuring Indexes
-
- eXist-db has no "create index" command. Instead, indexes are configured in
- collection-specific configuration files. These files are stored as standard XML
- documents in the system collection: /db/system/config, which
- can be accessed like any other document (e.g. using the Admin interface or Java
- Client). In addition to defining settings for indexing collections, the
- configuration document specifies other collection-specific settings such as triggers
- or default permissions.
- The contents of the system collection (/db/system/config)
- should mirror the hierarchical structure of the main collection. Configurations are shared
- by descendants in the hierarchy unless they have their own configuration (i.e. the
- configuration settings for the child collection override those set for the parent).
- If no collection-specific configuration file is created for any document, the global
- settings in the main configuration file, conf.xml, will apply
- by default. That being said, the conf.xml file should only
- define the default global index creation policy.
- To configure a given collection - e.g. /db/foo - create a
- file collection.xconf and store it as /db/system/config/db/foo/collection.xconf. Note the replication of
- the /db/foo hierarchy inside /db/system/config/. Sub-collections which do not have a collection.xconf file of their own will be governed by the
- configuration policy specified for the closest ancestor collection which does have
- such a file, so you are not required to specify a configuration for every
- collection. Note, however, that configuration settings do not cascade, that is to say that a sub-collection
- with a collection.xconf, will not inherit any settings from any ancestor collection
- that has a collection.xconf; If you choose
- to deploy a collection.xconf file in a sub-collection, you must
- specify in that file all the configuration options you wish to
- have applied to that sub-collection (and any lower-level sub-collections without
- collection.xconf files of their own).
-
-
-
-
- Maintaining Indexes and Re-indexing
-
-
-
-
-
-
-
-
- The eXist-db index system automatically maintains and updates
- indexes defined by the user. You therefore do not need to
- update an index when you update a database document or
- collection. eXist-db will even update indexes following partial
- document updates via XUpdate or
- XQuery Update expressions.
- The only exception to eXist-db's automatic update occurs when
- you add a new index definition to an existing
- database collection. In this case, the new
- index settings will only apply to new
- data added to this collection, or any of its
- sub-collections, and not to previously
- existing data. To apply the new settings to the entire
- collection, you need to trigger a "manual reindex" of the
- collection being updated. You can re-index collections using
- the Java Admin Client (shown on the right). From the Admin
- menu, select File»Reindex Collection.
-
- You can also index by passing an XQuery to eXist-db:
- xmldb:reindex('/db/foo')
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- General Configuration Structure and Syntax
-
- Index configuration files are standard XML documents that have their elements
- and attributes defined by the eXist-db namespace:
- http://exist-db.org/collection-config/1.0
- The following example shows a configuration example:
-
- Configuration Document
-
-
- All configuration documents have an index
- element directly below the root element, which encloses the index configuration.
- Only one
- index element is permitted in a document. Apart from the
- index configuration, the document may also contain settings not related to indexing,
- e.g. for triggers; these will not be covered here.
- In the index element are elements that define the various
- index types. Each index type can add its own configuration elements, which are
- directly forwarded to the corresponding index implementation. The example above
- configures three different types of indexes: full text, range and ngram.
-
- Namespaces
-
- If the document to be indexed uses namespaces, you should add a
- xmlns declaration for each of the required namespaces to
- the index element:
-
- Using Namespaces
-
-
- The example configuration above creates two indexes on a collection of
- atom documents. The two elements which should be indexed are both in the
- atom namespace and we thus need to declare a mapping
- for this namespace. Please note that the xmlns namespace
- declarations have to be specified on the index element, not
- the create or fulltext
- elements.
-
-
-
-
-
-
-
- Check Index Usage
-
- The quickest way to see if an index was used or not is to go to the
- Profiling menu item in the Monex Monitoring and Profiling application.
-
-
- Click on Enable Tracing to enable usage statistics.
-
-
- Run the query you would like to profile. The profiler will collect statistics about
- any query running on the database instance, no matter how the query is called.
-
-
- Click Refresh and switch to the Index Usage
- tab.
-
-
-
- Query Profiling
-
-
-
-
-
-
- The table provides the following information:
-
-
- Source
-
- The query containing the expression. The line/column of the expression is
- given in brackets. For queries stored in the database, the file name will be shown.
- Dynamically executed queries are displayed with the name "String".
-
-
-
- Index
-
- Type of the index used: "range" for the old range index, "new-range" for the new
- range index, "lucene" for the full text index.
-
-
-
- Optimization
-
- Either "Full", "Basic", or "No index". The meaning of those labels is as follows:
-
-
- "Full": the expression was rewritten by the optimizer to make full use of the index.
- This is the best you can achieve.
-
-
- "Basic": the index was used, but the expression was not rewritten by the optimizer. This is
- better than "No index" but still several times slower than "Full". Most probably the context of
- the expression was too complex to rewrite it.
-
-
- "No index": no index defined. Expression is evaluated in "brute force" mode.
-
-
-
-
-
- Calls
-
- The number of calls to the expression.
-
-
-
- Elapsed time
-
- The time elapsed for all calls together. The time is measured for the index lookup only.
- The absolute numbers are not reliable (due to measurement errors), but they show a tendency: if a lookup takes
- relatively longer than other expressions, it might be worth to optimize it with an index.
-
-
-
-
-
-
-
-
- Enabling Index Modules
-
- While some indexes (n-gram, full text) are already pre-build in the standard
- eXist-db distribution, other modules may need to be enabled first. For example, the
- spatial index depends on a bunch of external libraries, which do not ship with
- eXist-db. However, enabling the spatial index is a simple process:
-
-
- Copy the properties file extensions/indexes/build.properties and store it as
- local.build.properties in the same directory if it does not already exist.
-
-
- Edit extensions/indexes/local.build.properties:
-
- local.build.properties
-
-
- To include an index, change the corresponding property to "true".
-
-
- Call the Ant build system once to regenerate the eXist-db libraries:
- build.sh
- or
- build.bat
-
-
- The build process should create a jar file for every index implementation in
- directory lib/extensions. For example, the spatial index is
- packaged into the jar exist-spatial-module.jar.
- Once the index module has been built, it can be announced to eXist-db. To activate an
- index plugin, it needs to be added to the modules section within
- the global configuration file conf.xml:
-
- Index Plugin Configuration in conf.xml
-
-
- Every module element needs at least an id and
- class attribute. The class attribute contains the name of the
- plugin class, which has to be an implementation of
- org.exist.indexing.Index.
- All other attributes or nested configuration elements below the
- module element are specific to the implementation and will
- differ between indexes. They should be documented by the index implementor.
- If an index implementation cannot be loaded from the specified class, the entry
- will simply be ignored. A warning will be written to the logs which should provide
- more information on the issue which caused the configuration to fail.
-
-
-
-
-
- Automatic Indexes
-
-
-
-
-
-
- Structural index
-
- This index keeps track of the elements (tags), attributes, and nodal structure
- for all XML documents in a collection. It is created and maintained
- automatically in eXist-db, and can neither be reconfigured nor disabled by the
- user. The structural index is required for nearly all XPath and XQuery
- expressions in eXist-db (with the exception of wildcard-only expressions such as
- "//*"). This index is stored in the database file
- structure.dbx.
- Technically, the structural index maps every element and attribute
- qname (or qualified name) in a
- document collection to a list of documentId, nodeId pairs.
- This mapping is used by the query engine to resolve queries for a given XPath
- expression.
- For example, given the following query:
- //book/section
- eXist-db uses two index lookups: the first for the book node,
- and the second for the section node. It then computes the
- structural join between these node sets to determine
- which section elements are in fact children of
- book elements.
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Configuring Database Indexes
+ 1Q18
+
+ application-development
+
+
+
+
+
+ In this article we discuss the types of database indexes used by eXist-db and how they are created, configured and maintained. It assumes
+ readers have a basic understanding of XML and XQuery.
+
+
+
+
+ Overview
+
+
+ Database indexes are used extensively by eXist-db to facilitate efficient querying of the database. This is accomplished both by
+ system-generated and user-configured database indexes. The current (3.x) version of eXist-db by default includes the following types of
+ indexes:
+
+ Properly configured indexes have a huge impact on database performance! Some expressions might run a hundred times faster with an index.
+ This particulary applies to the range index: without a range index, eXist has to do a full scan over the context nodes to look up an element
+ value, which severly limits performance and scalability.
+
+
+
+
+
+ Structural Indexes
+ : This index keeps track of the elements, attributes, and the nodal structure of all XML documents in a collection. It is created
+ and maintained automatically. No configuration required.
+
+
+
+ xml:id Index : An index of all xml:id attribute values is automatically created. These values can be
+ queried by fn:id().
+
+
+
+
+ New Range Indexes
+ : A (rewritten) range index which provides superior performance on large data sets.
+
+
+
+
+ Full Text Indexes
+ : This full text indexing module features faster and customizable full text indexing by transparently integrating Lucene into the
+ XQuery engine. Prefer this index over the Legacy Full Text Index.
+
+
+
+
+ NGram Indexes
+ : These map specific text nodes and attributes of the documents in a collection to splitted tokens of n-characters (where n = 3
+ by default). This is very efficient for exact substring searches and for queries on scripts (mostly non-European ones) which can not be
+ easily split into whitespace separated tokens and are therefore a bad match for the Lucene full text index.
+
+
+
+
+ Legacy Range Indexes
+ : These map specific text nodes and attributes of documents in a collection to typed values.
+
+
+
+
+ Spatial Indexes
+ (Experimental): A working proof-of-concept index, which listens for spatial geometries described through the Geography Markup
+ Language (GML). A detailed description of the implementation can be found in the Developer's Guide to Modularized Indexes.
+
+
+ eXist-db features a modularized indexing architecture. Most types of indexes have been moved out of the database core and are maintained as
+ pluggable extensions. The full text, the ngram, the spatial and the new range indexes fall under this category.
+
+
+
+
+
+ Configuring Indexes
+
+ eXist-db has no "create index" command. Instead, indexes are configured in collection-specific configuration files. These files are stored
+ as standard XML documents in the system collection /db/system/config, which can be accessed like any other document (e.g.
+ using the Admin interface or Java Client). In addition to defining settings for indexing
+ collections, the configuration document specifies other collection-specific settings such as triggers or default permissions.
+ The contents of the system collection (/db/system/config) should mirror the hierarchical structure of the main
+ collection. Configurations are shared by descendants in the hierarchy unless they have their own configuration: the configuration settings for
+ the child collection override those set for the parent. If no collection-specific configuration file is created for any document, the global
+ settings in the main configuration file conf.xml will apply by default. The conf.xml file should only
+ define the default global index creation policy.
+ To configure a given collection, for instance /db/foo, create a file collection.xconf and store it as
+ /db/system/config/db/foo/collection.xconf. Note the replication of the /db/foo hierarchy inside
+ /db/system/config/. Sub-collections which do not have a collection.xconf file of their own will be
+ governed by the configuration policy specified for the closest ancestor collection which does have such a file, so you are not required to
+ specify a configuration for every collection.
+
+ Configuration settings do not cascade: a sub-collection with a collection.xconf will
+ not inherit any settings from any ancestor collection that has a collection.xconf. If you choose to
+ deploy a collection.xconf file in a sub-collection, you must specify in that file all the
+ configuration options you wish to have applied to that sub-collection (and any sub-collections without collection.xconf
+ files of their own).
+
+
+
+
+
+ Maintaining Indexes and Re-indexing
+
+ The eXist-db index system automatically maintains and updates indexes defined by the user. You do not need to update an index when you
+ update a database document or collection. eXist-db will even update indexes following partial document updates via
+ XUpdate or XQuery Update expressions.
+ The only exception to eXist-db's automatic update occurs when you add a new index definition to an existing database
+ collection. In this case, the new index settings will only apply to new data added to this collection (or
+ any of its sub-collections) and not to previously existing data. To apply the new settings to the entire collection, you
+ need to trigger a "manual reindex" of the collection being updated.
+ You can re-index collections using the Java Admin Client. From the Admin menu, select
+ File, Reindex Collection.
+
+
+
+
+
+
+
+
+ You can also index by passing the following XQuery to eXist-db:
+ xmldb:reindex('/db/foo')
+
+
+
+
+
+
+ General Configuration Structure and Syntax
+
+ Index configuration collection.xconf files are standard XML documents that have their elements and attributes defined
+ by the eXist-db namespace http://exist-db.org/collection-config/1.0. The following example shows a configuration example:
+
+
+ All configuration documents have an index element directly below the root element. This encloses the index configuration. Only
+ one
+ index element is permitted in a document. Apart from the index configuration, the document may also contain settings not related to
+ indexing (e.g. for triggers). These will not be covered here.
+ In the index element are elements that define the various index types. Each index type adds its own configuration elements,
+ which are directly forwarded to the corresponding index implementation. The example above configures three different types of indexes: full
+ text, range and ngram.
+
+ If the document to be indexed uses namespaces, you should add a xmlns declaration for each of the required namespaces
+ to the index element:
+
+
+ The example configuration above creates two indexes on a collection of atom documents. The two elements which should be indexed are both
+ in the atom namespace and we therefore need to declare a mapping for this namespace. Please note that the
+ xmlns namespace declarations have to be specified on the index element, not the create or
+ fulltext elements.
+
+
+
+
+
+
+ Check Index Usage
+
+ The quickest way to see if an index was used or not is to go to the Profiling menu item in the Monex Monitoring and Profiling application.
+
+
+ Click on Enable Tracing to enable usage statistics.
+
+
+ Run the query you would like to profile. The profiler will collect statistics about any query running on the database instance, no
+ matter how the query is called.
+
+
+ Click Refresh and switch to the Index Usage tab.
+
+
+
+
+
+
+
+
+
+ The table provides the following information:
+
+
+ Source
+
+ The query containing the expression. The line/column of the expression is given in brackets. For queries stored in the database, the
+ file name will be shown. Dynamically executed queries are displayed with the name "String".
+
+
+
+ Index
+
+ Type of the index used: "range" for the old range index, "new-range" for the new range index, "lucene" for the full text index.
+
+
+
+ Optimization
+
+
+
+ Full
+
+ The expression was rewritten by the optimizer to make full use of the index. This is the best you can achieve.
+
+
+
+ Basic
+
+ The index was used, but the expression was not rewritten by the optimizer. This is better than "No index" but still several
+ times slower than "Full". Most probably the context of the expression was too complex to rewrite it.
+
+
+
+ No index
+
+ No index defined. Expression is evaluated in "brute force" mode.
+
+
+
+
+
+
+ Calls
+
+ The number of calls to the expression.
+
+
+
+ Elapsed time
+
+ The time elapsed for all calls together. The time is measured for the index lookup only. The absolute numbers are not reliable (due to
+ measurement errors), but they show a tendency: if a lookup takes relatively longer than other expressions, it might be worth to optimize
+ it with an index.
+
+
+
+
+
+
+
+
+ Enabling Index Modules
+
+ While some indexes (n-gram, full text) are already pre-build in the standard eXist-db distribution, other modules may need to be enabled
+ first. For example, the spatial index depends on external libraries which do not ship with eXist-db. However, as an example, enabling the
+ spatial index is a simple process:
+
+
+ Copy the properties file extensions/indexes/build.properties and store it as
+ local.build.properties in the same directory (if it does not already exist).
+
+
+ Edit extensions/indexes/local.build.properties:
+
+
+ To include a spatial index, change the corresponding property to "true".
+
+
+ Call the Ant build system once to regenerate the eXist-db libraries using build.sh or build.bat.
+
+
+ The build process should create a .jar file for every index implementation in directory lib/extensions. For
+ example, the spatial index is packaged into the .jar
+ exist-spatial-module.jar.
+ Once the index module has been built, it can be announced to eXist-db. To activate an index plug-in, it needs to be added to the
+ modules section within the global configuration file conf.xml:
+
+
+
+ Every module element needs at least an id and class attribute. The class attribute contains
+ the name of the plug-in class, which has to be an implementation of org.exist.indexing.Index.
+ All other attributes or nested configuration elements below the module element are specific to the implementation and will differ
+ between indexes. They should be documented by the index implementer.
+ If an index implementation cannot be loaded from the specified class, the entry will simply be ignored. A warning will be written to the
+ logs.
+
+
+
+
+
+ The Structural index
+
+ This index keeps track of the elements (tags), attributes, and nodal structure for all XML documents in a collection. It is created and
+ maintained automatically in eXist-db, and can neither be reconfigured nor disabled. The structural index is required for nearly all XPath and
+ XQuery expressions in eXist-db (with the exception of wildcard-only expressions such as //*). This index is stored in the
+ database file structure.dbx.
+ Technically, the structural index maps every element and attribute qname (or qualified name) in a
+ document collection to a list of documentId, nodeId pairs. This mapping is used by the query engine to resolve queries for a given
+ XPath expression.
+ For example:
+ //book/section
+ eXist-db uses two index lookups: the first for the book node, and the second for the section node. It then computes
+ the structural join between these node sets to determine which section elements are in fact children of
+ book elements.
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/java-admin-client/java-admin-client.xml b/src/main/xar-resources/data/java-admin-client/java-admin-client.xml
index a280513a..e6a3f815 100644
--- a/src/main/xar-resources/data/java-admin-client/java-admin-client.xml
+++ b/src/main/xar-resources/data/java-admin-client/java-admin-client.xml
@@ -1,415 +1,383 @@
-
- Java Admin Client
- October 2012
-
- TBD
-
-
-
-
+
+ Java Admin Client
+ 1Q18
+
+ operations
+
+
+
+
+
+ eXist-db ships with the so-called "Java Admin Client". This application enables users to perform administrative tasks, such as user
+ management, security settings, batch import of whole directories, and backup/restore of the database. It can be used either as a GUI or on the
+ command line.
+
+
+
+
+ Launching the Client
+
+ You can launch the Java Admin Client using one of the following methods:
+
+
+ Windows and Linux users: Click the eXist-db icon in the taskbar and choose the Open Java Admin Client menu
+ entry.
+
+
+ You can download its Java WebStart file exist.jnlp via your web browser. Once the download has completed, double-click it
+ to launch the client:
+
+
+
+
+
+
+
+
+
+ You can also find a link to download the Java WebStart file on the dashboard.
+
+ If you built eXist-db from source rather than using the downloadable installer, the Java WebStart function will not work unless you
+ sign the jars. To do so, enter the following on your command line from the eXist directory:
+ build.bat -f build/scripts/jarsigner.xml (DOS/Windows)
+ build.sh -f build/scripts/jarsigner.xml (Unix)
+
+
+
+ Enter the following on your command line from the eXist installation directory (with the JAVA_HOME environment variable set
+ correctly):
+ bin\client.bat (DOS/Windows)
+ bin/client.sh (Unix)
+
+
+ Or enter the following on the command line (again from the eXist installation directory):
+ java -jar start.jar client
+
+
+
+
+
+
+
+ Using the Graphical Client
+
+ Once the Graphical Client is launched, you will see the "eXist Database Login" window.
+
+
+
+
+
+
+
+
+ The Java Admin Client can connect to a database in two ways:
+
+
+ Most common is to connect to a "remote" server. The client talks with the server using the XML RPC protocol.
+
+
+ It can also launch an "enbedded database", that is, a database embedded in an application which runs in the same process as the client.
+ This embedded option is useful for backup/restore or mass uploads of data; writing to an embedded instance avoids the network
+ overhead.
+
+
+
+
+
+ To connect to a remote server, enter your eXist-db username and password, select Remote from the
+ Type dropdown menu, and in the URL field enter the database's URI. This by default set to
+ xmldb:exist://localhost:8080/exist/xmlrpc (the URI for a database installed with all the default settings).
+ After clicking OK, the main client window will open.
+
+
+
+
+
+
+
+ This window is split into two panels. The top panel lists the database collections. The bottom panel acts like a shell and has a command
+ prompt. This shell allows you to manually enter database commands.
+ Using the Java Admin Client as a GUI is like using any other GUI application. That is why we do not explain it further.
+
+
+ If eXist-db is online, you expose it to exploitation if you use an empty admin password. If you did not specify an admin password during
+ installation, you are strongly advised to set an admin password as soon as possible. You can do this in the Java Admin Client:
+
+
+ Open the Edit Users window by selecting the Manage Users icon (image of a pair
+ of keys) in the toolbar
+
+
+ At the top, select the admin user in the table of users
+
+
+ Type in the new password into the password fields
+
+
+ Click the Modify User button to apply the changes
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Using the Command-line Client
+
+ It is sometimes faster or more convenient to use the Java Admin Client on the command line. The following sections provide a quick
+ introduction to the most common command line parameters and their use. The client offers three modes of operation:
+
+
+ If an action is specified on the command-line (when starting the client), it will be processed in non-interactive mode and the client
+ will terminate after completion.
+
+
+ If option -s or --no-gui is specified without an action, the client switches to interactive
+ shell-mode and prompts for user input. No graphical interface is displayed.
+
+
+ Otherwise the client switches to interactive mode and displays the graphical user interface.
+
+
+
+
+
+
+
+ Interactive Shell Mode
+
+ While this tutorial will not describe the interactive shell mode in detail, most commands work like their counterparts specified on the
+ command line. On the shell, just type help to get a list of supported commands.
+ The shell mode may support full command line history and command completion, depending on the OS
+
+
+ On Unix systems, the client will try to load the GNU readline library, which is part of most Unix installations. This
+ gives you access to all the nice things you probably know from Linux shells. For example, pressing the tab-key will try to complete
+ collection and document names. However, for this to work, the native library lib/core/libJavaReadline.so has to be
+ found by the system's loader. On Linux, just add lib/core to your LD_LIBRARY_PATH (the client.sh
+ script does this automatically).
+
+
+ On Windows OS, you should at least be able to use the cursor-up/cursor-down keys to browse through the command history.
+
+
-
- Introduction
- eXist-db ships with a Java-based Admin Client. This application enables users to
- perform administrative tasks, such as user management, security settings, batch
- import of whole directories, and backup/restore of the database. The Client can be
- used either as a graphical interface or on the command line.
-
-
-
- Launching the Client
+ To explain the shell-mode we provide a short example, showing how to store the sample files into the database.
+
+
+ Typing mkcol shakespeare and pressing enter will create a Shakespeare collection into which we will put some of the
+ sample documents provided with eXist-db.
+
+
+ To check if the new collection is present, enter ls to get a listing of the current collection contents. The
+ listing below shows an example session of how to add the sample documents:
+
+
+
+ Adding files to the database is done using put. This expects either a single file, a file-pattern or a directory name as
+ argument. If a directory is specified, all XML and XSL files in that directory will be put into the database.
+ To add the files in directory samples/shakespeare simply enter put samples/shakespeare.
+
+ put also accepts file-patterns, i.e. a path with wildcards ? or *. ** means: any
+ sub-directory. So the command put samples/**/*.xml will parse any XML files found in the samples
+ directory and any of its sub-directories.
+
+
+ To see if the files have actually been stored use ls again.
+
+
+
+ To view a document, use the get command, for instance get hamlet.xml
+
+
+
- You can launch the Java Admin Client using one of the following methods:
+
+ If you ever run into problems while experimenting with eXist-db and your database files get corrupt: just remove the data files created
+ by eXist-db and everything should work again. The data files all end with .dbx. You will either find them in directory
+ webapp/WEB-INF/data or WEB-INF/data, depending on your installation. It is ok to backup those
+ data-files to be able to restore them in case of a database corruption.
+
+
+
+
+
+
+ Specifying Parameters
+
+ The client uses the CLI library from Apache's Excalibur project to parse command-line parameters. This means that the same conventions
+ apply as for most GNU tools. Most parameters have a short and a long form.
+ For example, the user can be specified in short with -u user or in long with --user=user. You can
+ combine argument-less parameters: for example, -ls is short for -l -s.
+
+
+
+
+
+ General Configuration
+
+ The client reads its default options from the properties file client.properties. Most of the properties can be
+ overwritten by command-line parameters or by the set command in shell-mode.
+ The client relies on the XML:DB API to communicate with the database. It will therefore work with remote as well as embedded database
+ instances. The correct database instance is determined through the XML:DB base URI as specified in the properties file or through command-line
+ options. The deployment article describes how different
+ servers are addressed using the XML:DB URI.
+ The XML:DB base URI used by the client for connections is defined by the uri= property. By default, this is set to
+ uri=xmldb:exist://localhost:8080/exist/xmlrpc. With this, the client will try to connect to a database instance running
+ inside the webserver at port 8080 of the local host.
+ There are several ways to specify an alternate XML:DB base URI:
-
- Windows and Linux users: Double-click on the Java Admin
- Client desktop shortcut icon (if the option to create
- desktop shortcuts was selected during installation) or select the shortcut
- icon from the Start Menu (if the option to select Start Menu entries was
- selected during installation)
-
-
- You can download a Java WebStart file (exist.jnlp) via your web browser;
- once the download has completed, double-click on the exist.jnlp file to
- launch the client:
-
-
-
-
-
-
-
-
+
+ Change the uri= property in client.properties
- You can also find a Java WebStart Launch icon in the Administration menu
- in the left sidebar of all documentation pages.
-
- If you built eXist-db from source rather than using the downloadable
- installer, the Java WebStart function will not work unless you sign the
- jars. To do so, enter the following on your command line from the eXist directory:
-
- build.bat -f build/scripts/jarsigner.xml (DOS/Windows)
- build.sh -f build/scripts/jarsigner.xml (Unix)
-
-
-
- Enter the following on your command line from the eXist directory (with the JAVA_HOME
- environmental variable set correctly):
- bin\client.bat (DOS/Windows)
- bin/client.sh (Unix)
-
-
- Enter the following on the command line:
- java -jar start.jar client
-
+
+
+ Use the -ouri parameter on the command-line
+ For instance, to access a server running inside the Jetty webserver at port 8080 on a remote host, use
+ bin/client.sh -ouri=xmldb:exist://host:8080/exist/xmlrpc
+
+
+ To start the client in local mode, use:
+ bin/client.sh -ouri=xmldb:exist://
+ Local mode means that an embedded database instance will be initialized and started by the client. It will have direct access to the
+ database instance. Use this option if you want to batch-load a large document or a huge collection of documents.
+ Since switching to local mode is required quite often, there's also a shortcut:
+ bin/client.sh -l
+ This is equivalent to the -ouri=xmldb:exist:// option shown above.
+ When launching the client with option -l or -ouri=xmldb:exist:// the configuration for the
+ database instance is read from conf.xml located in EXIST_HOME.
+ Use the -C parameter to specify an alternate database location. For instance:
+ bin/client.sh -C /home/exist/test/conf.xml
+ This will temporarily launch a new database instance whose configuration is read from the provided file. Option -C
+ implies option -l.
+
-
-
+ If you have set a password for the admin user you must to authenticate yourself to the database:
+ bin/client.sh -l -u username -P password
+ If the -P password option is missing, the client will prompt for the password.
+ The graphical user interface will always prompt for username and password unless you specify both on the command-line.
+
-
- Using the Graphical Client
+
- Once the Graphical Client is launched, you will see the "eXist Database Login" window.
-
-
-
-
-
-
-
- Enter your eXist-db username and password, select "Remote" from the "Type" dropdown menu, and in the URL field enter the URI for your database. By
- default, the URI for your database is
- xmldb:exist://localhost:8080/exist/xmlrpc.
-
- The Java Admin Client can either connect to a "remote" server—as demonstrated
- here—or it can launch an "embedded database", that is, a database embedded in an
- application which runs in the same process as the client. This "embedded" option
- is useful for backup/restore or mass uploads of data; writing to an embedded
- instance avoids the network overhead.
-
- After clicking "OK", the main client window will open.
-
-
-
-
-
-
-
- This window is split into two
- panels, and has a top menu and a toolbar. The top panel lists the database
- collections; the bottom panel is the "shell" and has a command prompt. This shell
- allows you to manually enter database commands. Most commands, however, can be
- accessed using the menu. All of the menu and toolbar items have tooltips that explain
- their functions.
-
- If eXist-db is online, you expose it to exploitation if you use an empty admin
- password. If you did not specify an admin password during installation, you are
- strongly advised to set an admin password as soon as possible. You can do this in
- the Java Admin Client by following these steps:
-
-
- Open the "Edit Users" window by selecting the "Manage Users" icon (image
- of a pair of keys) in the toolbar
-
-
- At the top, select the "admin" user in the table of users
-
-
- Type in the new password into the password fields
-
-
- Click the "Modify User" button to apply the changes
-
-
-
-
-
-
-
-
-
-
-
+
+ Storing documents
-
+ To store a set of documents, use the -m and -p parameters. For instance:
+ bin/client.sh -m /db/shakespeare/plays -p /home/exist/xml/shakespeare
+
+
+ The -m tells the client to implicitly create any missing collection.
+
+
+ The -p parameter means that all of the following arguments should be interpreted as a list of paths to XML
+ documents (you may specify more than one document or directory).
+
+
+ If the passed path denotes a directory, the client will try to store all documents in that directory into the database. However, it
+ will not recurse into subdirectories. For this, you have to pass the -d option. For
+ example:
+ bin/client.sh -d -m /db/movies -p /home/exist/xml/movies
+ This will recurse into all directories below /home/exist/xml/movies. For each subdirectory, a collection will be
+ created below the /db/movies collection. Use this to recursively import an entire collection tree.
+
+
-
- Using the Command-line Client
+ eXist-db can also store binary resources in addition to XML files. The client tries to determine if the current file is XML or not. The
+ mime-types.xml lookup table (in the eXist root installation directory) is used for this. It associates:
+
+
+ a MIME type
+
+
+ an eXist-db type ("xml" or "binary")
+
+
+ a file extension
+
+
+ This mechanism is also used by the eXist-db servers . For example to specify that .xmap extension is used for XML files:
+
- It is sometimes faster or more convenient to use the Java Admin Client on the
- command line. The following sections provide a quick introduction to the most common
- command line parameters and their use. The client offers three modes of operation:
-
-
- If an action is specified on the command-line, it will be processed in
- non-interactive mode and the client will terminate after completion.
-
-
- Without an action, the client switches to interactive mode and displays the
- graphical user interface.
-
-
- If option -s or --no-gui is specified without an
- action, the client switches to shell-mode and prompts for user input. No graphical
- interface is displayed.
-
-
+
-
-
-
- Interactive Shell Mode
-
- While this tutorial will not describe the interactive shell mode in detail, most commands work like
- their counterparts specified on the command line. On the shell, just type help to get a list of supported commands.
- The shell mode may support full command line history and command completion, depending
- on your type of operating system. On Unix systems, the client will try to load the GNU
- readline library, which is part of most Unix installations. This gives you access to all the
- nice things you probably know from Linux shells. For example, pressing the tab-key will try
- to complete collection and document names. However, for this to work, the native library
- lib/core/libJavaReadline.so has to be found by the
- system's loader. On Linux, just add lib/core to your
- LD_LIBRARY_PATH (the client.sh script does this automatically).
- On Windows OS, you should at least be able to use the cursor-up/cursor-down keys to
- browse through the command history.
- To explain the shell-mode, we just provide a short example, showing how to store the
- sample files into the database. Typing mkcol shakespeare
- and pressing enter will create a shakespeare-collection into which we will put some of the
- sample documents provided with eXist-db. To check if the new collection is present, enter
- ls to get a listing of the current collection contents.
- The listing below shows an example session of how to add the sample documents:
-
- Adding the sample documents
-
-
- Adding files to the database is done using put. Put expects either a single file, a
- file-pattern or a directory name as argument. If a directory is specified, all XML and XSL
- files in that directory will be put into the database. To add the files in directory
- samples/shakespeare simply enter put samples/shakespeare. To see if the files have actually
- been stored, you may view the contents of the current collection with ls. To view a
- document, use the get command, e.g.:
- get hamlet.xml
-
-
- put also accepts file-patterns, i.e. a path with
- wildcards ? or *. ** means: any sub-directory. So the command put
- samples/**/*.xml will parse any XML files found in the samples directory and any of its sub-directories.
-
-
- If you ever run into problems while experimenting with eXist-db and your database files
- get corrupt: just remove the data files created by eXist-db and everything should work again.
- The data files all end with .dbx. You will either
- find them in directory webapp/WEB-INF/data or
- WEB-INF/data, depending on your installation. It is
- also ok to backup those data-files to be able to restore them in case of a database
- corruption.
-
-
-
-
-
-
- Specifying Parameters
-
- The client uses the CLI library from Apache's Excalibur project to parse command-line
- parameters. This means that the same conventions apply as for most GNU tools. Most
- parameters have a short and a long form: for example, the user can be specified in short
- form with -u user or in long form --user=user. You can
- also combine argument-less parameters: for example, -ls is short for
- -l -s.
-
-
-
-
-
- General Configuration
-
- The client reads its default options from the properties file client.properties. Most of the properties can be overwritten by command-line
- parameters or by the set command in shell-mode.
- The client relies on the XML:DB API to communicate with the database. It will thus work
- with remote as well as embedded database instances. The correct database instance is
- determined through the XML:DB base URI as specified in the properties file or through
- command-line options. The deployment document describes
- how different servers are addressed by the XML:DB URI.
- The XML:DB base URI used by the client for connections is defined by the
- uri= property. By default, this is set to
- uri=xmldb:exist://localhost:8080/exist/xmlrpc. The client will thus try
- to connect to a database instance running inside the webserver at port 8080 of the local
- host. This doesn't mean that the client is not communicating through the network. In fact,
- any XML:DB URI containing a host part is accessed through the XMLRPC protocol.
- There are several ways to specify an alternate XML:DB base URI: first, you may change
- the uri= property in client.properties. Second, you may use the -ouri parameter on the
- command-line to temporarily select another target for the connection. For example, to start
- the client in local mode, use:
- bin/client.sh -ouri=xmldb:exist://
- To access a server running inside the Jetty webserver at port 8080 on a remote host, use
- bin/client.sh -ouri=xmldb:exist://host:8080/exist/xmlrpc
-
- Local mode means here, that an embedded database instance will be
- initialized and started by the client. It will thus have direct access to the database
- instance. Use this option if you want to batch-load a large document or a huge collection of
- documents.
- Using the -ouri, you can temporarily change any property specified in the
- properties file. Since switching to local mode is required quite often, there's also a
- shortcut: specifying
- bin/client.sh -l
- is equivalent to the
- -ouri=xmldb:exist:// option shown
- above.
- If you have set a password for the admin user (as described in the security doc), you may have to authenticate yourself to the
- database. In this case, specify the -u username on the command line, e.g.
- bin/client.sh -l -u peter -P somepass
- If the -P password option is missing, the client will prompt for the
- password.
-
- The graphical user interface will always prompt for username and password unless you
- specify both on the command-line.
-
-
-
-
-
-
- Storing documents
-
- To store a set of documents, use the -m and -p
- parameters, e.g.
- bin/client.sh -m /db/shakespeare/plays -p /home/exist/xml/shakespeare
- The -m argument differs from the -c option, because it
- tells the client to implicitely create any missing collection. The -p
- parameter means that all of the following arguments should be interpreted as a list of paths
- to XML documents, i.e. you may specify more than one document or directory.
- If the passed path denotes a directory, the client will try to store all documents in
- that directory to the database. However, it will not recurse into subdirectories. For this,
- you have to pass the -d. For example,
- bin/client.sh -d -m /db/movies -p /home/exist/xml/movies
- will recurse into all directories below /home/exist/xml/movies. For each subdirectory, a collection will be created
- below the /db/movies root collection, i.e. you may use
- this option to recursively import an entire collection tree.
-
- eXist-db can also store binary resources in addition to XML files. The client thus tries
- to determine if the current file is XML or not. File mime-types.xml
- allows to associate :
-
-
- a MIME type
-
-
- an eXist-db type ("xml" or "binary")
-
-
- a file extension
-
-
-
- This is also used by the eXist-db servers . For example to specify that
- .xmap extension is used for XML files you can specify it like this in mime-types.xml:
-
-
-
-
-
-
-
- Removing Collections/Documents
-
- The -r and -R parameters are used to remove a document
- or collection. -r removes a single document from the collection specified
- in the -c parameter. For example,
- bin/client.sh -c /db/shakespeare/plays -r hamlet.xml
- removes the document hamlet.xml from the /db/shakespeare/plays collection. To remove the entire
- plays collection, use
- bin/client.sh -c /db/shakespeare -R plays
-
-
-
-
-
- Executing Queries
-
- To execute queries, use the -x parameter. This parameter accepts an
- optional argument, which specifies the query to execute. However, passing XQuery on the
- command-line is a problem on many operating systems, because the command shell may
- interprete whitespace characters as command separators. Thus, if no argument is passed to
- -x, the client will try to read the query from standard input. For
- example, on Unix you may do
- echo "//SPEECH[contains(LINE, 'love')]" | bin/client.sh -x
- NB! remember to type Ctrl-d when working without pipe.
-
- Queries can also be read from a file. For example,
- bin/client.sh -F samples/xquery/fibo.xq
- executes the XQuery contained in fibo.xq.
- There's an additional parameter to be used in conjunction with -x:
- -n specifies the number of hits that should be printed to the standard
- output.
-
-
-
-
-
- XUpdate
-
- You may also update a stored document or multiple documents by passing an XUpdate file
- on the command-line. For
- example:
-
- bin/client.sh -c /db/test -f address.xml -X samples/xupdate/xupdate.xml
- This
- will apply the modifications described in
-
- samples/xupdate/xupdate.xml to the document
- address.xml in collection /db/test. If you skip
- the -f option, the modifications will be applied to all documents in the
- collection.
-
-
-
-
-
- Using an Alternate Database Configuration
-
- If you start the client with option -l or
- -ouri=xmldb:exist://, it will launch its own embedded database instance.
- By default, the configuration for the database instance is read from file
- conf.xml located in EXIST_HOME.
- However, you may want to use another database installation, stored in an alternate
- location. To make this as simple as possible, option -C is provided. The
- parameter expects a configuration file as argument, for example:
- bin/client.sh -C /home/exist/test/conf.xml
- This will temporarily launch a new database instance, whose configuration is read from
- the provided file. As is obvious, option -C implies option
- -l.
-
-
-
-
-
- Backup/Restore on the Command-Line
-
- A simple backup/restore client can be launched through the bin/backup.sh or bin\backup.bat scripts. The client allows to backup any local or remote
- collection available through the XML:DB API. To backup a collection, use for example
- bin/backup.sh -d backup -u admin -p somepass -b /db -ouri=xmldb:exist://
- This will launch a database instance in local mode and backup the /db root collection. A hierarchy of directories will be created
- below the backup directory. The directories correspond
- to the hierarchy of collections found in the database.
- The tool will also backup user permissions for each collection and document. This
- information is written into the special file __contents__.xml placed in each subdirectory. You need these files to restore
- the database contents.
- To restore the backuped data, use the -r option and pass one of the
- __contents__.xml files as an argument. The tool will
- restore all files and collections described in the XML file, for example,
- bin/backup.sh -r backup/db/__contents__.xml -ouri=xmldb:exist://
- will restore the entire database, while
- bin/backup.sh -r backup/db/shakespeare/__contents__.xml -ouri=xmldb:exist://
- restores only the /db/shakespeare collection.
- Please note that users and groups are defined in the database collection /db/system. This collection will thus be restored first.
-
-
+
+
+
+ Removing Collections/Documents
+
+ The -r and -R parameters are used to remove a document or collection.
+
+
+
+ -r removes a single document from the collection specified in the -c parameter. For
+ example:
+ bin/client.sh -c /db/shakespeare/plays -r hamlet.xml
+ This removes the document hamlet.xml from the /db/shakespeare/plays collection.
+
+
+
+ To remove the entire plays collection use the -R parameter:
+ bin/client.sh -c /db/shakespeare -R plays
+
+
+
+
+
+
+
+
+ Executing Queries
+ Executing queries can be done as follows:
+
+
+ Use the -x parameter. This parameter accepts an optional argument, which specifies the query to execute. However,
+ passing XQuery on the command-line is a problem on many operating systems, because the command shell usually interprets whitespace
+ characters as command separators. Therefore if no argument is passed to -x, the client will try to read the query from
+ standard input.
+ For example, on Unix you may do the follwong:
+ echo "//SPEECH[contains(LINE, 'love')]" | bin/client.sh -x
+ Remember to type Ctrl-d when working without pipe:
+
+
+
+ Queries can also be read from a file using the -F parameter. For example,
+ bin/client.sh -F samples/xquery/fibo.xq
+ This executes the XQuery contained in fibo.xq.
+
+
+ As an addition use -n to specify the number of hits that should be printed to the standard output.
+
+
+
+
+
+
+
+
+ XUpdate
+
+ You can update a stored document or multiple documents by passing an XUpdate file on the command-line. For example:
+ bin/client.sh -c /db/test -f address.xml -X samples/xupdate/xupdate.xml
+ This will apply the modifications described in samples/xupdate/xupdate.xml to the document
+ address.xml in collection /db/test.
+ If you don't specify the -f option, the modifications will be applied to all documents in the collection.
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/java-admin-client/listings/listing-14.xml b/src/main/xar-resources/data/java-admin-client/listings/listing-14.xml
index 1fcbea4f..d48192ca 100644
--- a/src/main/xar-resources/data/java-admin-client/listings/listing-14.xml
+++ b/src/main/xar-resources/data/java-admin-client/listings/listing-14.xml
@@ -1,4 +1,4 @@
- XML document
- .xml,.xsl,.xsd,.mods,.xmi,.xconf,.xslt,.wsdl,.x3d,.rdf,.owl,.xmap
+ XML document
+ .xml,.xsl,.xsd,.mods,.xmi,.xconf,.xslt,.wsdl,.x3d,.rdf,.owl,.xmap
\ No newline at end of file
diff --git a/src/main/xar-resources/data/jmx/jmx.xml b/src/main/xar-resources/data/jmx/jmx.xml
index ad8d2b07..381a199c 100644
--- a/src/main/xar-resources/data/jmx/jmx.xml
+++ b/src/main/xar-resources/data/jmx/jmx.xml
@@ -3,52 +3,36 @@
schematypens="http://purl.oclc.org/dsdl/schematron"?>Java Management Extensions (JMX)
- September 2009
+ 1Q18
- TBD
+ java-development
+ operations
-
- Intro
-
- eXist-db provides access to various management interfaces via Java Management Extensions (JMX). JMX is a standard mechanism available in
- Java 5 and above. An agent in the Java virtual machine exposes agent services as so-called MBeans that belong to different components running
- within the virtual machine. A JMX-compliant management application can then connect to the agent through the MBeans and access the available
- services in a standardized way. The standard Java installation includes a simple client, JConsole, which will also display the eXist-specific
- services. However, eXist also provides a command-line client for quick access to server statistics and other information.
- Right now, eXist only exposes a limited set of read-only services. Most of them are only useful for debugging. This will certainly change in
- the future as we add more services. We also plan to provide write access to configuration properties.
-
+ eXist-db provides access to various management interfaces via Java Management Extensions (JMX). An agent in the Java virtual machine exposes
+ agent services as so-called MBeans that belong to different components running within the virtual machine. A JMX-compliant management application
+ can then connect to the agent through the MBeans and access the available services in a standardized way.
+ The standard Java installation includes a simple client, JConsole, which will also display the eXist-specific services. eXist also provides a
+ command-line client for quick access to server statistics and other information.
+ Right now, eXist only exposes a limited set of read-only services. Most of them are useful for debugging purposes only.Enabling the JMX agent
- To enable the platform server within the host virtual machine, you need to pass a few Java system properties to the java
- executable. The properties are:
+ To enable the platform server within the host virtual machine, pass the following Java system properties:
-
- This option makes the server publicly accessible. Please check the Oracle JMX documentation for details.
-
- The extension can be activated by passing a command-line parameter to the eXist start scripts (client.sh,
- startup.sh etc.)
-
-
- -j <argument>, --jmx <argument>
-
- set port number through which the JMX/RMI connections are enabled.
-
-
-
- Some examples:
+
+ These options makes the server publicly accessible. Please check the Oracle JMX documentation for details.
+
+ The extension can now be activated by passing a -j or -jmx command-line parameter to the eXist start scripts
+ (client.sh, startup.sh etc.). This parameter must be followed by the port number through which the
+ JMX/RMI connections are enabled. For instance:
-
- In the Oracle Java SE 6 and 7 platforms, the JMX agent for local monitoring is enabled by default.
-
@@ -63,11 +47,11 @@
Use JConsole
- Once you restart eXist, you can use a JMX-compliant management console to access the management interfaces. For example, you can call
- jconsole, which is included with the JDK:
+ Use a JMX-compliant management console to access the management interfaces. For example, call JConsole, which is included with the
+ JDK:jconsole localhost:1099
- Clicking on the MBeans tab should show some eXist-specific MBeans below the standard Java MBeans in the tree component
- to the left.
+ Clicking on the MBeans tab should show some eXist-specific MBeans below the standard Java MBeans (in the tree
+ component to the left).
@@ -75,67 +59,67 @@
Use JMXClient
- eXist includes a simple command-line JMX client which provides a quick access to some important server statistics. The application accepts
- the following command-line parameters:
+ eXist includes a simple command-line JMX client which provides quick access to some important server statistics.java -jar start.jar org.exist.management.client.JMXClient <params>
+ This accepts the following command-line parameters:
- -a, --address <argument>
+ -a, --address <argument>RMI address of the server.
- -c, --cache
+ -c, --cachedisplays server statistics on cache and memory usage.
- -d, --db
+ -d, --dbdisplay general info about the db instance.
- -h, --help
+ -h, --helpprint help on command line options and exit.
- -i, --instance <argument>
+ -i, --instance <argument>the ID of the database instance to connect to
- -l, --locks
+ -l, --lockslock manager: display locking information on all threads currently waiting for a lock on a resource or collection. Useful to debug
deadlocks. During normal operation, the list will usually be empty (means: no blocked threads).
- -m, --memory
+ -m, --memorydisplay info on free and total memory. Can be combined with other parameters.
- -p, --port <argument>
+ -p, --port <argument>RMI port of the server
- -s, --report
+ -s, --reportRetrieves the most recent sanity/consistency check report
- -w, --wait <argument>
+ -w, --wait <argument> while displaying server statistics: keep retrieving statistics, but wait the specified number of seconds between calls.
@@ -150,83 +134,87 @@
JMXServlet
- eXist also provides a servlet which connects to the JMX interface and returns a status report for the database as XML. By default, the
- servlet listens on
+ eXist also provides a servlet which connects to the JMX interface and returns a status report for the database as XML. By default, this
+ servlet listens on:http://localhost:8080/exist/status
- For simplicity, the different JMX objects in eXist are organized into categories. One or more categories can be passed to the servlet in
- parameter c. The following categories are recognized:
+ For example, to get a report on current memory usage and running instances, use the following URL:
+ http://localhost:8080/exist/status?c=memory&c=instances
+ This returns something like:
+
+ The different JMX objects in eXist are organized into categories. One or more categories can be passed to the servlet in parameter
+ c. The following categories are recognized:
- memory
+ memorycurrent memory consumption of the Java virtual machine
- instances
+ instancesgeneral information about the db instance, active db broker objects etc.
- disk
+ diskcurrent hard disk usage of the database files
- system
+ systemsystem information: eXist version ...
- caches
+ cachesstatistics on eXist's internal caches
- locking
+ lockinginformation on collection and resource locks currently being held by operations
- sanity
+ sanityfeedback from the latest sanity check or ping request (see below)
- all
+ alldumps all known JMX objects in eXist's namespace
- For example, to get a report on current memory usage and running instances, use the following URL:
- http://localhost:8080/exist/status?c=memory&c=instances
- This should return an XML document as follows:
-
+
+
Testing responsiveness using "ping"
- The servlet also implements a simple "ping" operation. Ping will first try to obtain an internal database broker object. If the db is
+ This servlet also implements a simple "ping" operation. Ping will first try to obtain an internal database broker object. If the db is
under very high load or deadlocked, it will run out of broker objects and ping will not be able to obtain one within a certain time. This is
- thus a good indication that the database has become unresponsive for requests. If a broker object could be obtained, the servlet will run a
- simple XQuery to test the availability of the XQuery engine.
- To run a "ping", call the servlet with parameter operation=ping. The operation also accepts an optional timeout
- parameter, t=timeout-in-ms, which defines a timeout in milliseconds. For example, the following URL starts a ping with a
- timeout of 2 seconds:
+ an indication that the database has become unresponsive for requests. If a broker object could be obtained, the servlet will run a simple
+ XQuery to test the availability of the XQuery engine.
+ To run a "ping", call the servlet with parameter operation=ping. The operation accepts an optional timeout parameter,
+ t=timeout-in-ms.
+ For example, the following URL starts a ping with a timeout of 2 seconds:http://localhost:8080/exist/status?operation=ping&t=2000If the ping returns within the specified timeout, the servlet returns the attributes of the SanityReport JMX bean, which will include an
- element <jmx:Status>PING_OK</jmx:Status>:
+ element <jmx:Status>PING_OK</jmx:Status>:
- If the ping takes longer than the timeout, you'll instead find an element <jmx:error> in the returned XML. In this case,
- additional information on running queries, memory consumption and database locks will be provided:
+ If the ping takes longer than the timeout, you'll instead find an element <jmx:error> in the returned XML. In this
+ case, additional information on running queries, memory consumption and database locks will be provided:
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/kwic/kwic.xml b/src/main/xar-resources/data/kwic/kwic.xml
index 467c7571..c838d28b 100644
--- a/src/main/xar-resources/data/kwic/kwic.xml
+++ b/src/main/xar-resources/data/kwic/kwic.xml
@@ -1,163 +1,142 @@
-
- Generating KWIC (Keywords in Context) Output
- September 2009
-
- TBD
-
-
-
-
-
-
- Abstract
-
- A KWIC display helps users to quickly scan through search results by
- listing hits surrounded by their context. eXist provides a
- KWIC module that is not bound to a specific index or query
- operation, but can be applied to query results from all indexes that support match
- highlighting. This includes the Lucene-based index, the ngram index, as
- well as the old full text index.
- The documentation search function on eXist's home page is a good example. It
- queries documents written in the DocBook format. However, the KWIC module has also
- been successfully used and deployed with different schemas (e.g. TEI) and languages
- (e.g. Chinese).
-
-
-
-
-
- Preparing your Query
-
- The KWIC module is entirely written in XQuery. To
- use the module, simply import its namespace into your query:
- import module namespace kwic="http://exist-db.org/xquery/kwic";
- You don't need to specify a location since the module is already registered in
- conf.xml. If you would still like to provide one, change
- the import as follows:
-
- The module is part of the main exist.jar, so we can use a
- resource link here.
-
-
-
-
-
- Using the Module
-
- The easiest way to get KWIC output is to call the
- kwic:summarize function on an element node returned from a
- full text or ngram query:
-
- Every call to kwic:summarize will return an HTML paragraph
- containing 3 spans with the text before and after each match as well as the match
- text itself:
-
- The config element, passed to
- kwic:summarize as second parameter, determines the
- appearance of the generated HTML. There are 3 different attributes you can set
- here:
-
-
- width
-
- The maximum number of characters to be printed before and after the
- match
-
-
-
- table
-
- if set to "yes", kwic:summarize will return an
- HTML table row (tr). The text chunks will be enclosed
- in a table column (td).
- The default behaviour, table="no", is to return an
- HTML paragraph with spans.
-
-
-
- link
-
- If present, each match will be enclosed within a link, using the URI
- in the link attribute as target.
-
-
-
-
- If you look at the output of above query, you may notice that a space is missing
- between words if the previous or following chunk extends to a different
- LINE element. Also, it would be nicer to only display text
- from LINE elements and to ignore SPEAKER or
- STAGEDIR tags. This can be achieved with the help of a
- callback function:
+
+ KWIC (Keywords in Context) Output
+ 1Q18
+
+ xquery
+
+
+
+
+
+ Keywords In Context (KWIC) helps users to quickly scan through search results by listing hits surrounded by their context. eXist provides a
+ KWIC module that is not bound to a specific index or query operation. It but can be applied to query results from all indexes that support match
+ highlighting. This includes the Lucene-based index and the ngram index.
+ The documentation search function on eXist's home page is a good example. It queries documents written in DocBook format. However, the KWIC
+ module has also been successfully used with different schemas (e.g. TEI) and languages (e.g. Chinese).
+
+
+
+
+ Using the Module
+
+ The KWIC module is entirely written in XQuery. To use the module, import its namespace into your query (you don't need to specify a
+ location):
+ import module namespace kwic="http://exist-db.org/xquery/kwic";
+
+ The easiest way to get KWIC output is to call the kwic:summarize function on an element node returned from a full text or
+ ngram query:
+
+ Every call to kwic:summarize will return an HTML paragraph containing 3 span elements with the text before
+ and after each match, as well as the match text itself:
+
+
+ The config element, passed to kwic:summarize (as second parameter) determines the appearance of the generated
+ HTML. It recognizes 3 attributes:
+
+
+ width
+
+ The maximum number of characters to be printed before and after the match
+
+
+
+ table
+
+ By default kwic:summarize returns an HTML paragraph with spans.
+ If table="yes" it will return an HTML table row
+ tr element. The text chunks will be enclosed in a table column td element.
+
+
+
+ link
+
+ If present, each match will be enclosed within a link, using the URI in the link attribute as target.
+
+
+
+
+
+
+
+ Using a callback function for more fine-grained control
+
+ If you look at the output of query above you may notice that a space is missing between words if the previous or following chunk extends
+ to a different LINE element. And it would also be nicer to display text from LINE elements only and to ignore
+ SPEAKER or STAGEDIR elements. This can be achieved with the help of a callback function:
- The third parameter to kwic:summarize should be a reference
- to a function
- accepting 2 arguments: 1) a single text node which should be appended or prepended
- to the current text chunk, 2) a string indicating the current direction in which
- text is appended, i.e. "before" or "after". The function may return the empty
- sequence if the current node should be ignored (e.g. if it belongs to a "footnote"
- which should not be displayed). Otherwise it should return a single string.
- The local:filter function above first checks if the passed
- node has a SPEAKER or STAGEDIR parent and if yes, ignores that
- node by returning the empty sequence. If not, the function adds a single whitespace
- before or after the string, so adjacent lines will be properly separated.
-
-
-
-
-
- Advanced Use
-
- Using kwic:summarize, you will get one KWIC-formatted item
- for every match, even if the matches are in the same paragraph. Also, the context
- from which the text is taken is always the same: the element you queried.
- To get more control over the output, you can directly call
- kwic:get-summary, which is the module's core function. It
- expects 3 or 4 parameters, where the first two parameters are: a) the current
- context root, b) the match object to process. Parameters 3 and 4 are the same as for
- kwic:summarize.
- Before passing nodes to kwic:get-summary you have to
- expand them, which basically means to create an in-memory
- copy in which all matches are properly marked up with exist:match
- tags. The main part of the query should look as follows:
-
- In this example, we select the first exist:match only, thus
- ignoring all other matches within $expanded.
- Sometimes you may also want to change the context to restrict the KWIC display to
- certain elements within the larger query context, e.g. paragraphs within sections.
- The following example still queries SPEECH, but displays a KWIC
- entry for each LINE with a match, grouped by speech:
-
- You may ask why we are not querying LINE directly to get a
- different context, e.g. as in:
- //SPEECH[ft:query(LINE, "nature")]
- Well, we want Lucene to compute the relevance of each match with respect to the
- SPEECH context, not LINE. If we queried LINE, each single line would get a match
- score and the matches would end up in a completely different order.
-
-
-
-
-
- Marking up Matches without using KWIC
-
- Sometimes you don't want to use the KWIC module, but you would still like to have indicated
- where matches were found in the text. eXist's XML serializer can automatically highlight
- matches when it writes out a piece of XML. All the matches will be surrounded by an
- exist:match tag.
- You can achieve the same within an XQuery by calling the extension function
- util:expand:
-
- Using util:expand
-
-
-
- util:expand returns a copy of the XML fragment it received
- in its first parameter, which - unless configured otherwise - has all matches
- wrapped into exist:match tags.
-
+ The third parameter to kwic:summarize here is a reference to a function accepting 2 arguments:
+
+
+ A single text node which should be appended or prepended to the current text chunk
+
+
+ A string indicating the current direction in which text is appended: before or after.
+
+
+
+
+ The function can return the empty sequence if the current node should be ignored (for instance if it belongs to a footnote which should
+ not be displayed). Otherwise it must return a single string.
+ The local:filter function above first checks if the passed node has a SPEAKER or STAGEDIR parent. If so, it
+ ignores that node by returning the empty sequence. If not, the function adds a single whitespace before or after the
+ string, so adjacent lines will be properly separated.
+
+
+
+
+
+
+ Advanced Use
+
+ Using kwic:summarize, you will get one KWIC-formatted item for every match, even if the matches are in the same
+ paragraph. Also, the context from which the text is taken is always the same: the element you queried. To get more control over the output, you
+ can directly call kwic:get-summary, which is the module's core function.
+ kwic:get-summary expects 3 or 4 parameters.
+
+
+ The current context root
+
+
+ The match object to process
+
+
+ Parameters 3 and 4 are the same as for kwic:summarize
+
+
+
+ Before passing nodes to kwic:get-summary you have to expand them, which basically means to create an
+ in-memory copy in which all matches are properly marked up with exist:match tags. The main part of the query should look as
+ follows:
+
+ In this example, we select the first exist:match only, thus ignoring all other matches within
+ $expanded.
+
+ Sometimes you may also want to change the context to restrict the KWIC display to certain elements within the larger query context, for
+ instance paragraphs within sections. The following example still queries SPEECH but displays a KWIC entry for each LINE
+ with a match, grouped by speech:
+
+ You might wonder why we don't query LINE directly to get a different context, as in:
+ //SPEECH[ft:query(LINE, "nature")]
+ This is because Lucene computes the relevance of each match with respect to the SPEECH context, not LINE. If we queried LINE, each single
+ line would get a match score and the matches would end up in a completely different order.
+
+
+
+
+
+ Marking up Matches without using KWIC
+
+ Sometimes you don't want to use the KWIC module, but still would like an indication where matches were found in the text. eXist's XML
+ serializer can automatically highlight matches when it writes out a piece of XML. All the matches will be surrounded by an
+ exist:match tag.
+ You can achieve the same within an XQuery by calling the extension function util:expand:
+
+
+
+ util:expand returns a copy of the XML fragment it received in its first parameter, which, unless configured otherwise, has
+ all matches wrapped into exist:match tags.
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/learning-xquery/learning-xquery.xml b/src/main/xar-resources/data/learning-xquery/learning-xquery.xml
index 823edff1..5c31aa70 100644
--- a/src/main/xar-resources/data/learning-xquery/learning-xquery.xml
+++ b/src/main/xar-resources/data/learning-xquery/learning-xquery.xml
@@ -1,155 +1,124 @@
-
- Learning XQuery and eXist-db
- September 2012
-
- TBD
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Learning XQuery and eXist-db
+ 1Q18
+
+ xquery
+ getting-started
+
+
-
+
-
- Introduction
-
- This article provides tips and resources for newcomers to XQuery and eXist-db.
-
+ This article provides tips and resources for newcomers to XQuery and eXist-db.
-
+
-
- Key Points to Learning XQuery
+
+ Key Points to Learning XQuery
- This is a guide to help you learn XQuery. It contains some brief background
- information on XQuery and then lists a number of resources you can use to learn
- XQuery.
- XQuery is unique in the development stack in that it replaces both SQL and the
- traditional software layers that convert SQL into presentation formats such as HTML, PDF
- and ePub. XQuery can both retrieve information from your database and format it for
- presentation.
- Learning how to select basic data from an XML document can be learned in just a few
- hours if you are already familiar with SQL and other functional programming languages.
- However, learning how to create custom XQuery functions, how to design XQuery modules
- and how to execute unit tests on XQuery takes considerably longer.
+ This is a guide to help you learn XQuery. It contains some brief background information on XQuery and then lists a number of resources you
+ can use to learn it.
+ XQuery is unique in the development stack: It replaces both SQL and the traditional software layers that convert SQL into presentation
+ formats such as HTML, PDF and ePub. XQuery can both retrieve information from your database and format it for presentation.
+ Learning how to select basic data from an XML document can be learned in just a few hours, especially if you are already familiar with SQL
+ and other functional programming languages. Learning how to create custom XQuery functions, how to design XQuery modules and how to execute unit
+ tests on XQuery takes considerably longer.
-
-
-
- Learning by Example
-
- Many people find that they learn a new language best by reading small examples of
- code. One of the ideal locations for this is the XQuery Wikibook
- Beginning Examples
-
- These examples are all designed and tested to work with eXist. Please let us know if
- there are specific examples you would like to see.
-
-
-
-
-
- Learning Functional Programming
-
- XQuery is a functional programming language, so many things that you do in
- procedural programs are not recommended or not possible. In XQuery all variables
- should be immutable, meaning they should be set once but never changed. This aspect
- of XQuery allows it to be stateless and side-effect free.
-
-
-
-
-
- Learning FLOWR statements
-
- Iteration in XQuery uses parallel programming statements called FLOWR statements.
- Each loop of a FLOWR statement is performed in a separate thread of execution. As a
- result you cannot use the output of any computation in a FLOWR loop as input to the
- next loop. This concept can be difficult to learn if you have never used parallel
- programming systems.
-
-
-
-
-
- Learning XPath
-
- XQuery also includes the use of XPath to select various nodes from an XML document. Note
- that with native XML databases the shortest XPath expression is often the fastest since
- short expressions use element indexes. You may want to use a tool such as an XPath "builder" tool
- within an IDE such as oXygen to learn how to build XPath expressions.
-
-
-
-
-
- Using eXide
-
- eXist comes with a web-based tool for doing XQuery development called eXide.
- Although this tool is not as advanced as a full IDE such as oXygen, it is ideal for
- small queries if an IDE is not accessible.
-
-
-
-
-
- Learning how to update XML documents
-
- eXist comes with a set of operations for updating on-disk XML documents.
- eXist XQuery Update Operations
-
-
-
-
-
-
- Learning how to debug XQuery
+
- eXist has some support for step-by-step debugging of XQuery, but the interface is
- not mature yet. Many people choose to debug complex recursive functions directly
- within XML IDEs such as oXygen that support step-by-step debugging using the
- internal Saxon XQuery library. The oXygen IDE allows you to set breakpoints and
- watch the output document get created one element at a time. This process is
- strongly recommended if you are learning topics like recursion. eXist XQuery Debugger
-
-
-
-
-
-
- Learning recursion in XQuery
-
- XML is an inherently recursive data structure: trees contain sub-trees, so many
- XQuery functions for transforming documents are best designed using recursion. One
- good place to start learning recursion is the identity node filter functions in the
- XQuery wikibook.
-
-
-
-
-
- Effective use of your IDE
+
+ Learning by Example
- Most developers who do XQuery more than a few hours a day eventually end up using
- a full commercial XQuery IDE, with oXygen being the best integrated with eXist.
- Setting up oXygen is a bit tricky the first time since you need to load five jar
- files into a "driver" for oXygen. See Using oXygen. Yet once this is done and the default XQuery
- engine is set up to use eXist, there are many high-productivity features that are
- enabled. Central to this is the XQuery auto-complete feature. As you type within
- XQuery, all eXist functions and their parameters are shown in the IDE. For example
- if you type "xmldb:" all the functions of the XMLDB module will automatically appear
- in a drop-down list. As you continue to type or select a function the parameters and
- types are also shown. This becomes a large time saver as you use more XQuery
- functions.
-
-
+ Many people find that they learn a new language best by reading and trying small examples of code. One of the ideal locations for this is
+ the XQuery Wikibook Beginning Examples. These examples are all designed and tested to work with eXist. Please let us know if there are
+ specific examples you would like to see.
+
-
+
-
- Learning XQuery Resources
+
+ Learning Functional Programming
- The following is an annotated list of resources that can help you learn XQuery.
-
+ XQuery is a functional programming language, so many things that you do in procedural programs are not recommended or not possible. In
+ XQuery all variables should be immutable, meaning they should be set once and never changed. This aspect of XQuery allows it to be stateless
+ and side-effect free.
+
+
+
+
+
+ Learning FLOWR statements
+
+ Iteration in XQuery uses parallel programming statements called FLOWR (For, Let, Order by, Where, Return). Each loop of a FLOWR statement
+ is performed in a separate thread of execution. As a result you cannot use the output of any computation in a FLOWR loop as input for the next
+ iteration. This concept can be difficult to learn if you have never used parallel programming systems.
+
+
+
+
+
+ Learning XPath
+
+ XQuery includes the use of XPath to select various nodes from an XML document. In native XML databases (and eXist is no exception) the
+ shortest XPath expression is often the fastest since short expressions use element indexes. You can use a tool such as an XPath "builder" tool
+ within an IDE such as oXygen to learn how to build XPath expressions.
+
+
+
+
+
+ Using eXide
+
+ eXist comes with a web-based tool for XQuery development called eXide. Although this tool is
+ not as advanced as a full IDE such as oXygen, it is ideal for small queries if an IDE is not accessible.
+
+
+
+
+
+ Learning how to update XML documents
+
+ eXist comes with a set of operations for updating on-disk XML documents. eXist XQuery Update Operations
+
+
+
+
+
+
+ Learning how to debug XQuery
+
+ Many people choose to debug complex recursive functions directly within XML IDEs such as oXygen that support step-by-step debugging,
+ using the internal Saxon XQuery library. The oXygen IDE allows you to set breakpoints and watch the output document get created one element at
+ a time. This process is strongly recommended if you are learning topics like recursion.
+ eXist has some support for step-by-step debugging of
+ XQuery, but the interface is not mature yet.
+
+
+
+
+
+ Learning recursion in XQuery
+
+ XML is an inherently recursive data structure: trees contain sub-trees. Therefore many XQuery functions for transforming documents are
+ best designed using recursion. A good place to start learning recursion is the identity node filter functions in the XQuery
+ Wikibook.
+
+
+
+
+
+ Effective use of your IDE
+
+ Most developers who do XQuery more than a few hours a day eventually end up using a full commercial XQuery IDE, with oXygen being the best
+ integrated with eXist. See Using oXygen.
+ An important feature is the XQuery auto-complete. As you type within XQuery, all eXist functions and their parameters are listed. For
+ example if you type xmldb:, all the functions of the XMLDB module will automatically appear in a drop-down list. As you continue
+ to type or select a function the parameters and types are also shown. This becomes an important time-saver!
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/legal/legal.xml b/src/main/xar-resources/data/legal/legal.xml
index 7c6fd0b7..8734fdd0 100644
--- a/src/main/xar-resources/data/legal/legal.xml
+++ b/src/main/xar-resources/data/legal/legal.xml
@@ -1,102 +1,87 @@
-
- Legal Statement
- January 2012
-
- TBD
-
-
+
+ Legal Statement
+ January 2012
+
+ TBD
+
+
-
+
-
- Introduction
+ This article provides information about eXist-db's legal status.
- eXist-db is an open source software product that is published under the L-GPL Open Source license. A
- trademark has been registered for the name and the logo.
-
+
-
+
+ Open Source
-
- Open Source
+ eXist-db is an open source software product that is published under the L-GPL Open Source license. A trademark has been registered for the
+ name and the logo.
- The source code of eXist-db is 'open', which means that essentially anyone has access to the full
- source code of the software. The source code is stored on a publicly accessible server so anyone can
- download it and learn from it. Because it is open everyone can use the work and can, in the spirit of
- Open Source community, contribute to the code by, for example, reviewing it, providing bug-fixes, or
- adding new features.
- The software is developed with the intension to have it 'open' and keep it 'open' for anyone in the
- community of users. To guarantee this 'open-ness' there are some rules for using the code; these rules
- are defined in the L-GPL Open Source License.
-
+ The source code of eXist-db is 'open', which means anyone has access to the full source code. The source code is stored on a publicly
+ accessible server so anyone can download and learn from it. Because it is open, everyone can use the work and, in the spirit of Open Source
+ community, contribute to the code by, for example, reviewing it, providing bug-fixes, or adding new features.
-
+
-
- L-GPL license
+
- eXist-db is released under the very liberal L-GPL license; in short this means that the software (as
- is) can be used without restrictions in any other software: Open Source software, propriety software,
- Commercial software, and non-Commercial software.
- To guarantee the 'open' nature of eXist-db, the L-GPL license defines a few conditions for the cases
- where the original code is modified and (re-)distributed: The modified code must be released under the
- conditions as defined in the L-GPL license.
- In the spirit of the open source community, it is encouraged to share any modified code the eXist-db
- community. The community and the author of the modified code will benefit from sharing the
- modifications, and the code will be maintained and reviewed by the eXist-db community.
- Note that the L-GPL license only applies for the eXist-db code and distributed binaries, not for
- applications and plugins that are developed by eXist-db users.
- The eXist-db code has dependencies on a number of carefully selected third-party libraries, libraries
- that are not developed as part of eXist-db. All of the libraries are released under an open source
- license, a copy of which is included in the eXist-db distribution. All of these licenses have been
- carefully checked and can be (legally) used together with eXist-db.
- Some parts of the eXist-db code are published under even more liberal licenses. These licenses are
- also legally compatible with the eXist-db license.
-
+
+ L-GPL license
+ The software is developed with the intention to be 'open' and stay 'open' for anyone in the community of users. To guarantee this
+ 'open-ness' there are some rules for using the code.
+ eXist-db is released under the very liberal L-GPL license. In short this means that the software (as is) can be used without restrictions in
+ any other software: Open Source software, propriety software, commercial or non-commercial.
+ To guarantee the 'open' nature of eXist-db, the L-GPL license defines a few conditions for those cases where the original code is modified
+ and (re-)distributed: The modified code must be released under the conditions as defined in the L-GPL license.
+ In the spirit of the open source community, it is encouraged to share any modified code within the eXist-db community. The community and the
+ author of the modified code will benefit from sharing the modifications. The code will be maintained and reviewed by the eXist-db
+ community.
+ Note that the L-GPL license only applies for the eXist-db code and distributed binaries, not for applications and plugins that are developed
+ by eXist-db users.
+ The eXist-db code has dependencies on a number of carefully selected third-party libraries (not specifically developed as part of eXist-db).
+ All of these libraries are also released under an open source license, a copy of which is included in the eXist-db distribution. These licenses
+ have been carefully checked so the libraries can be used together with eXist-db.
+ Some parts of the eXist-db code are published under even more liberal licenses. These licenses are legally compatible with the eXist-db
+ license.
+
-
+
-
- Trademark
+
+ Trademark
- The eXist-db name and logo have been registered as a trademark in order to keep both available for the
- eXist-db software and community. Please contact info_at_exist-db.org if you want to use the name and/or
- logo in your own works, e.g. as part of a software distribution or in a publication. The logo and trademark
- may be freely used for promotional purposes in the context of open source or non-profit applications and
- services.
- A high resolution version of the logo is available on request.
-
+ The eXist-db name and logo have been registered as a trademark in order to keep both available for the eXist-db software and community. The
+ logo and trademark may be freely used for promotional purposes in the context of open source or non-profit applications and services.
+ Please contact info_at_exist-db.org if you want to use the name and/or logo in your own works, e.g. as part of a software distribution or
+ in a publication.
+ A high resolution version of the logo is available on request.
+
-
+
-
- References
+
+ References
-
-
-
- Source code of eXist-db (a
- GitHub repository)
-
-
- A
- Definition of open source
- software.
-
-
-
- The orginal text of the
- L-GPL license.
-
-
-
- An
- explanation of the L-GPL license.
-
-
-
-
+
+
+
+ Source code of eXist-db
+ (a GitHub repository)
+
+
+ A Definition of open source software.
+
+
+ The original text of the L-GPL license.
+
+
+ An explanation of
+ the L-GPL license.
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/lucene/lucene.xml b/src/main/xar-resources/data/lucene/lucene.xml
index 1ade9c98..f1dc3e4b 100644
--- a/src/main/xar-resources/data/lucene/lucene.xml
+++ b/src/main/xar-resources/data/lucene/lucene.xml
@@ -1,673 +1,522 @@
-
- Lucene-based Full Text Index
- July 2015
-
- TBD
-
-
-
-
-
-
- Introduction
-
- The 1.4 version of eXist-db introduced a new full text indexing module which replaced
- eXist-db's former built-in full text index. This new module is faster, more configurable and
- more feature-rich than eXist-db's old index. It will also be the basis for eXist-db's
- implementation of the W3C's full text extensions for XQuery.
- The new full text module is based on Apache
- Lucene. It thus benefits from a stable, well-designed and widely-used
- framework. The module is tightly integrated with eXist-db's modularized
- indexing architecture: the index behaves like a plugin which adds
- itself to the db's index pipelines. Once configured, the index will be notified of
- all relevant events, like adding/removing a document, removing a collection or
- updating single nodes. No manual reindex is required to keep the index up-to-date.
- The module also implements common interfaces which are shared with other indexes,
- e.g. for highlighting matches. It is thus easy to switch between the Lucene index
- and e.g. the ngram index without rewriting too much XQuery code.
-
-
-
-
-
- Enabling the Lucene Module
-
- The Lucene full text index is enabled by default since version 1.4 of eXist-db. However, in
- case it is not enabled in your installation, here's how to get it up and
- running:
-
-
- Before building eXist-db, enable the Lucene full text index by enabling it according to the instructions in the documentation on index
- modules.
-
-
- Then (re-)build eXist-db using the provided build.sh or build.bat. The build
- process downloads the required Lucene jars automatically. If everything
- builds OK, you should find a jar exist-lucene-module.jar in the lib/extensions directory. Next, edit the main configuration
- file, conf.xml and un-comment the Lucene-related
- section:
-
- conf.xml
-
-
-
-
-
-
-
-
- Global configuration options
-
- The index has a single configuration parameter which can be specified on the
- module element within the modules
- section:
-
-
- buffer
-
- Defines the amount of memory (in megabytes) Lucene will use for
- buffering index entries before they are written to disk. See the
- Lucene javadocs.
-
-
-
-
-
-
-
-
-
- Configuring the Index
-
- Like other indexes, you create a Lucene index by configuring it in a
- collection.xconf document. If you have never done that
- before, read the corresponding documentation. An example collection.xconf is
- shown below:
-
- collection.xconf for versions between 1.4 and 2.1
-
-
-
- collection.xconf for version 2.2
-
-
-
- collection.xconf for version 3.0 and above.
-
-
- You can either define a Lucene index on a single element or attribute name
- (qname="...") or a node path with wildcards (match="...", see below). It is
- important make sure to choose the right context for an index,
- which has to be the same as in your query. To better understand this, let's have a
- look at how the index creation is handled by eXist-db and Lucene. The following
- configuration:
- <text qname="SPEECH"/>
- creates an index ONLY on SPEECH. What is passed to Lucene is the string value of
- SPEECH, which includes the text of all its descendant text nodes (*except* those
- filtered out by an optional ignore). For example, consider the
- fragment:
-
- If you have an index on SPEECH, Lucene will create a "document" with the text
- "Second Witch Fillet of a fenny snake, In the cauldron boil and bake;" and index
- it. eXist-db internally links this Lucene document to the SPEECH node, but Lucene has
- no knowledge of that (it doesn't know anything about XML nodes).
- The query:
- //SPEECH[ft:query(., 'cauldron')]
- searches the index and finds the "document" containing the SPEECH text, which
- eXist-db can trace back to the SPEECH node in the XML document. However, it is required
- that you use the same context (SPEECH) for creating and querying the index. The
- query:
- //SPEECH[ft:query(LINE, 'cauldron')]
- will not return anything, even though LINE is a child of SPEECH and 'cauldron' was
- indexed. This particular 'cauldron' is linked to its ancestor SPEECH node, not its
- parent LINE.
- However, you are free to give the user both options, i.e. use SPEECH and LINE as
- context at the same time. How? Simply define a second index on LINE:
-
- Let's use a different example to illustrate that. Assume you have a document with
- encoded place names:
-
- Paragraph with place name
- <p>He loves <placeName>Paris</placeName>.</p>
-
- For a general query you probably want to search through all paragraphs. However,
- you may also want to provide an advanced search option, which allows the user to
- restrict his query to place names. To make this possible, simply define an index on
- placeName as well:
-
- collection.xconf fragment
-
-
- Based on this setup, you'll be able to query for the word 'Paris' anywhere in a
- paragraph:
- //p[ft:query(., 'paris')]
- as well as 'Paris' occurring within a placeName:
- //p[ft:query(placeName, 'paris')]
-
-
-
-
- Using match="..."
-
- In addition to defining an index on a given qname, you may also specify a
- "path" with wildcards. This feature is subject to change,
- so please be careful when using it.
- Assume you want to define an index on all the possible elements below SPEECH.
- You can do this by creating one index for every element:
-
- As a shortcut, you can use a match attribute with a
- wildcard:
- <text match="//SPEECH/*"/>
- which will create a separate index on each child element of SPEECH it
- encounters. Please note that the argument to match is a simple path pattern, not
- an XPath expression. It only allows / and // to denote a child or descendant
- step, plus the wildcard to match an arbitrary element.
- As explained above, you have to figure out which parts of your document will
- likely be interesting as context for a full text query. The full text index will
- work best if the context isn't too narrow. For example, if you have a document
- structure with section divs, headings and paragraphs, you would probably want to
- create an index on the divs and maybe on the headings, so the user can
- differentiate between the two. In some cases, you could decide to put the index on
- the paragraph level, but then you don't need the index on the section since you can
- always get from the paragraph back to the section.
- If you query a larger context, you can use the KWIC module to show the user only a certain chunk of text
- surrounding each match. Or you can ask eXist-db to highlight each match with an
- exist:match tag, which you can later use to locate the
- matches within the text.
-
-
-
-
-
- Whitespace Treatment and Ignored Content
-
-
- Inlined elements
-
- By default, eXist-db's indexer assumes that element boundaries break a word
- or token. For example, if you have an element:
-
- Not a Mixed Content Element
- <size><width>12</width><height>8</height></size>
-
- You want "12" and "8" to be indexed as separate tokens, even though
- there's no whitespace between the elements. By default, eXist-db will indeed
- pass the content of the two elements to Lucene as separate strings and
- Lucene will thus see two tokens instead of just "128".
- However, you usually don't want this behaviour for mixed content nodes.
- For example:
-
- Mixed Content Node
- <p>This is <b>un</b>clear.</p>
-
- In this case, you want "unclear" to be indexed as one word. This can be
- done by telling eXist-db which nodes are "inline" nodes. The example
- configuration above defines:
- <inline qname="b"/>
- The inline option can be specified globally, which means it
- will be applied to all b elements, or per-index:
-
-
-
- Ignored elements
-
- Also, it is sometimes necessary to skip the content of an inlined element,
- which can appear in the middle of a text sequence you want to index. Notes
- are a good example:
-
- Paragraph With Inline Note
-
-
- Use an ignore element in the collection configuration
- to have eXist-db ignore the note:
- <ignore qname="note"/>
- Basically, ignore simply allows you to hide a chunk of
- text before Lucene sees it.
- Like the inline tag, ignore may
- appear globally or within a single index definition.
- The ignore only applies to descendants of an indexed
- element. You can still create another index on the ignored element itself.
- For example, you can have index definitions for p and
- note:
-
- collection.xconf fragment
-
-
- If note appears within p, it will
- not be added to the index on p, but only to the index on
- note. This means that the query
- //p[ft:query(., "note")]
- may not return a hit if "note" occurs within a note,
- while
- //p[ft:query(note, "note")]
- may still find a match.
-
-
-
-
-
-
- Boost
-
- A boost value can be assigned to an index to give it a higher score. The score
- for each match will be multiplied by the boost factor (default is: 1.0). For
- example, you may want to rank matches in titles higher than other matches.
- Here's how we configure the documentation search indexes in eXist-db:
-
- collection.xconf using boost
-
-
- The title index gets a boost of 2.0 to make sure that title matches get a
- higher score. Since the title element does occur within
- section, we add an ignore rule to the index definition on
- the section and create a separate index on title. We also ignore titles occurring inside paragraphs. Without this, title would be
- matched two times.
- Because the title is now indexed separately, we also need to query it
- explicitly. For example, to search the section and the title at the same time,
- one could issue the following query:
-
-
- Attribute Boost
-
- Starting with eXist-db 3.0 a boost value can
- also be assigned to an index by attribute. This could
- be used to weight your search results even if you have
- flat data structures with the same attribute value
- pairs in attributes throughout your documents. Two
- flavours of dynamic weighting are available through
- the new pairs
- match-sibling-attribute,
- has-sibling-attribute and
- match-attribute,
- has-attribute child elements in the
- full-text index configuration. If you have data in
- Lexical metadata framework (LMF) format you will
- recognize these repeated structures of
- feat elements with 'att' attributes
- and 'val' attributes within
- LexicalEntry elements, e g
- feat att='writtenForm' val='LMF feature
- value'. The attribute boosting allows you to
- weight the results based on the value of the 'att'
- attribute so that eg hits in definitions come before
- hits in comments and examples. This behaviour is
- enabled by adding a child
- match-sibling-attr to a Lucene
- configuration text element. An
- example index configuration for it looks like
- this:
-
- This means that the ft:score#1 function will
- boost hits in 'val' attributes with a factor of 25
- times for 'writtenForm' value of the 'att'
- attribute.
- In the same way match-attr would be used for element qnames in the text element.
- If you do not care about any value of the
- sibling attribute then use the
- has-attribute index configuration
- variant. An example index configuration with
- has-attr looks like this:
-
- This means that if your feat elements have an
- attribute xml:lang it will score
- them nil and push them last of the pack, which might
- be useful to demote hits in features in other
- languages than the main entry language.
- In the same way has-sibling-attr would be used for attribute qnames in the text element.
-
-
-
-
-
-
- Analyzers
-
- One of the strengths of Lucene is that it allows the developer to determine
- nearly every aspect of the text analysis. This is mostly done through analyzer
- classes, which combine a tokenizer with a chain of filters to post-process the
- tokenized text. eXist-db's Lucene module already allows different analyzers to
- be used for different indexes.
-
-
-
-
- In the example above, we define that Lucene's StandardAnalyzer should be used by default (the
- analyzer element without id attribute).
- We provide an additional analyzer and assign it the id ws, by
- which the analyzer can be referenced in the actual index definitions.
- The whitespace analyzer is the most basic one. As the name says, it
- tokenizes the text at white space characters, but treats all other characters -
- including punctuation - as part of the token. The tokens are not converted to
- lower case and there's no stopword filter applied.
-
- Configuring the Analyzer
-
- We provide the capability to send configuration parameters to the instantiation of the Analyzer. These parameters must match a Constructor signature on the underlying Java class of the Analyzer, so we would first recommend that you review the Javadoc for the Analyzer that you wish to configure.
- We currently support passing the following types:
-
-
-
-
- "String" (default if no type is specified)
-
-
- "java.io.FileReader" (since Lucene 4) or "file"
-
-
- "java.lang.Boolean" or "boolean"
-
-
- "java.lang.Integer" or "int"
-
-
- "org.apache.lucene.analysis.util.CharArraySet" or "set"
-
-
- "java.lang.reflect.Field"
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Full Text Index
+ 1Q18
+
+ indexing
+
+
+
+
+
+ This article provides information on configuring and using eXist-db's full text index.
+
+
+
+
+ Introduction
+
+ The full text index module is based on Apache Lucene.
+ The full-text index module is tightly integrated with eXist-db's modularized indexing architecture: the index behaves
+ like a plug-in which adds itself to the database's index pipelines. Once configured, the index will be notified of relevant events, like
+ adding/removing a document, removing a collection or updating single nodes. No manual re-indexing is required to keep the index up-to-date.
+ The full-text index module also implements common interfaces which are shared with other indexes, for instance for highlighting matches (see
+ KWIC). It is easy to switch between the Lucene index and, for instance, the ngram index without rewriting much
+ XQuery code.
+
+
+
+
+
+ Enabling the Lucene Module
+
+ The Lucene full text index is enabled by default (since eXist-db version 1.4). In case it is not enabled in your installation, here's how to
+ get it up and running:
+
+
+ Enable it according to the instructions in the article on index modules.
+
+
+ Then (re-)build eXist-db using the provided build.sh or build.bat script. The
+ build process downloads the required Lucene jars automatically. If everything builds ok, you'll find a jar
+ exist-lucene-module.jar in the lib/extensions directory.
+
+
+ Edit the main configuration file, conf.xml and un-comment the Lucene-related section:
+
+
+
+
+ The index has a single configuration parameter on the modules/module element called buffer.
+ It defines the amount of memory (in megabytes) Lucene will use for buffering index entries before they are written to disk. See the Lucene Javadocs.
+
+
+
+
+
+
+ Configuring the Index
+
+ Like other indexes, you create a Lucene index by configuring it in a collection.xconf document as explained in documentation. For example:
+
+
+ collection.xconf for version 2.2
+
+
+
+ collection.xconf for version 3.0 and above.
+
+
+
+ You can define a Lucene index on a single element or attribute (qname="...") or a node path with wildcards
+ (match="...", see below).
+
+ It is important make sure to choose the right context for an index, which has to be the same as in your query. To
+ better understand this, let's have a look at how the index creation is handled by eXist-db and Lucene. For example:
+ <text qname="SPEECH"/>
+ This creates an index on SPEECH only. What is passed to Lucene is the string value of SPEECH, which also includes the
+ text of all its descendant text nodes (except those filtered out by an optional ignore).
+ Consider the fragment:
+
+ If you have an index on SPEECH, Lucene will use the text "Second Witch Fillet of a fenny snake, In the cauldron boil and
+ bake;" and index it. eXist-db internally links this Lucene document to the SPEECH node, but Lucene itself has no knowledge
+ of that (it doesn't know anything about XML nodes).
+
+ Given this, take the following query:
+ //SPEECH[ft:query(., 'cauldron')]
+ This searches the index and finds the text, which eXist-db can trace back to the SPEECH node in the XML document.
+ However, it is required that you use the same context (SPEECH) for creating and querying the index. For
+ instance:
+ //SPEECH[ft:query(LINE, 'cauldron')]
+ This will not return anything, even though LINE is a child of SPEECH and cauldron
+ was indexed. This particular cauldron is linked to its ancestor SPEECH , not its parent LINE.
+
+ However, you are free to give the user both options, i.e. use SPEECH and LINE as context at the same time. For this
+ define a second index on LINE:
+
+
+ Let's use a different example to illustrate this. Assume you have a document with encoded place names:
+
+ <p>He loves <placeName>Paris</placeName>.</p>
+ For a general query you probably want to search through all paragraphs. However, you may also want to provide an advanced search option,
+ which allows the user to restrict his/her queries to place names. To make this possible, simply define an index on placeName
+ as well:
+
+ Based on this setup, you'll be able to query for the word 'Paris' anywhere in a paragraph:
+ //p[ft:query(., 'paris')]
+ And also on 'Paris' occurring within a placeName:
+ //p[ft:query(placeName, 'paris')]
+
+
+
+
+ Using match="..."
+
+ In addition to defining an index on a given qualified name, you can also specify a "path" with wildcards. This feature might be
+ subject to change, so please be careful when using it.
+ Assume you want to define an index on all the possible elements below SPEECH. You can do this by creating one index for every
+ element:
+
+ As a shortcut, you can use a match attribute with a wildcard:
+ <text match="//SPEECH/*"/>
+ This will create a separate index on each child element of SPEECH it encounters. Please note that the argument to match is a simple path
+ pattern, not a full XPath expression. It only allows / and // to denote child or descendant steps, plus the wildcard
+ * to match an arbitrary element.
+ As explained above, you have to figure out which parts of your document will likely be interesting as context for a full text query. The
+ full text index works best if the context isn't too narrow. For example, if you have a document structure with section divs,
+ headings and paragraphs, you would probably want to create an index on the divs and maybe on the headings, so the user can
+ differentiate between the two.
+ In some cases, you could decide to put the index on the paragraph level. Then you don't need the index on the section, since you can
+ always get from the paragraph back to the section.
+ If you query a larger context, you can use the KWIC module to
+ show the user text surrounding each match. Or you can ask eXist-db to highlight each match with an exist:match tag, which you can later use to locate the
+ matches within the text.
+
+
+
+
+
+ Whitespace Treatment and Ignored Content
+
+
+ Inlined elements
+
+ By default, eXist-db's indexer assumes that element boundaries break on a word or token. For example, if you have an element:
+
+ <size><width>12</width><height>8</height></size>
+ You want 12 and 8 to be indexed as separate tokens, even though there's no whitespace between the elements.
+ eXist-db will pass the content of the two elements to Lucene as separate strings and Lucene will see two tokens (instead of just
+ 128).
+ However, you usually don't want this behaviour for mixed content nodes. For example:
+
+ <p>This is <b>un</b>clear.</p>
+ In this case, you want unclear to be indexed as a single word. This can be done by telling eXist-db which nodes are inline
+ nodes. The example configuration above uses:
+ <inline qname="b"/>
+ The inline option can both be specified globally or per-index:
+
+
+
+
+ Ignored elements
+
+ It is sometimes necessary to skip the content of an inline element. Notes are a good example:
+
+
+ Use an ignore element in the collection configuration to have eXist-db ignore the note:
+ <ignore qname="note"/>
+ Basically, ignore simply allows you to hide a chunk of text before Lucene sees it.
+ Like the inline tag, ignore may appear both globally or within a single index definition.
+ The ignore only applies to descendants of an indexed element. You can still create another index on the ignored element
+ itself. For example, you can have index definitions for p
+ and
+ note:
+
+
+ If note appears within p, it will not be added to the index on p, only to the index on note.
+ For example:
+ //p[ft:query(., "note")]
+ This may not return a hit if "note" occurs within a note, while this finds a match:
+ //p[ft:query(note, "note")]
+
+
+
+
+
+
+ Boost
+
+ A boost value can be assigned to an index to give it a higher score. The score for each match will be multiplied by
+ the boost factor (default is: 1.0). For example, you may want to rank matches in titles higher than other matches.
+ Here's how to configure the documentation search indexes in eXist-db:
+
+
+ The title index gets a boost of 2.0 to make sure that its matches get a higher score. Since the title element occurs
+ within section, we add an ignore rule to the index definition on the section and create a separate index on title. We also ignore
+ titles occurring inside paragraphs. Without this, title would be matched two times.
+ Because the title is now indexed separately, we need to query it explicitly. For example, to search the section and the title at the same
+ time, one could issue the following query:
+
+
+
+ Attribute boost
+
+ Starting with eXist-db 3.0 a boost value can also be assigned to an index by attribute. This can be used to weight your search results,
+ even if you have flat data structures with the same attribute value pairs in attributes throughout your documents. Two flavours of dynamic
+ weighting are available through the new pairs match-sibling-attribute, has-sibling-attribute and
+ match-attribute, has-attribute child elements in the full-text index configuration.
+ If you have data in Lexical metadata framework (LMF) format you will recognize these repeated structures of feat elements
+ with att and val attributes within LexicalEntry elements. For instance feat att='writtenForm'
+ val='LMF feature value'. The attribute boosting allows you to weight the results based on the value of the att
+ attribute so that hits in definitions come before hits in comments and examples. This behaviour is enabled by adding a child
+ match-sibling-attr to a Lucene configuration text element. An example index configuration for it looks like
+ this:
+
+ This means that the ft:score#1 function will boost hits in val attributes with a factor of 25 times for the
+ writtenForm value of the att attribute.
+ In the same way match-attr would be used for element qnames in the text element.
+ If you do not care about any value of the sibling attribute, use the has-attribute index configuration variant. An example
+ index configuration with has-attr looks like this:
+
+ This means that if your feat elements have an attribute xml:lang it will score them nil and push them last of the
+ pack, which might be useful to demote hits in features in other languages than the main entry language.
+ In the same way has-sibling-attr would be used for attributes in the text element.
+
+
+
+
+
+
+ Analyzers
+
+ One of the strengths of Lucene is that it allows the developer to determine nearly every aspect of text analysis. This is done through
+ analyzer classes, which combine a tokenizer with a chain of filters to post-process the tokenized text. eXist-db's Lucene module already
+ allows different analyzers to be used for different indexes.
+
+
+
+ In the example above, we define that Lucene's StandardAnalyzer should be used by default (the analyzer element without id
+ attribute). We provide an additional analyzer and assign it the id ws, by which the analyzer can be referenced in the
+ actual index definitions.
+ The whitespace analyzer is the most basic one. As the name implies, it tokenizes the text at white space characters,
+ but treats all other characters - including punctuation - as part of the token. The tokens are not converted to lower case and there's no
+ stopword filter applied.
+
+
+ Configuring the Analyzer
+
+ You can send configuration parameters to the instantiation of the Analyzer. These parameters must match a Constructor
+ signature on the underlying Java class of the Analyzer, please review the Javadoc for the Analyzer that you wish to configure.
+ We currently support passing the following types:
+
+
+ String (default if no type is specified)
+
+
+ java.io.FileReader (since Lucene 4) or file
+
+
+ java.lang.Boolean or boolean
+
+
+ java.lang.Integer or int
+
+
+ org.apache.lucene.analysis.util.CharArraySet or set
+
+
+ java.lang.reflect.Field
+
+
+ The value Version#LUCENE_CURRENT is always added as first parameter for the analyzer constructor (a fallback mechanism is present for older
+ analyzers). The previously valid values java.io.File and java.util.Set can not be used since Lucene 4.
+ For instance to add a stopword list, use one of the following constructions:
+
+
+
+
+
+ Using the Snowball analyzer requires you to add additional libraries to lib/user.
+
+
+
+
+
+
+
+
+
+ Defining Fields
+
+ Sometimes you want to define different Lucene indexes on the same set of elements, for instance to use a different
+ analyzer. eXist-db allows to name a certain index using the field attribute:
+ <text field="title" qname="title" analyzer="en"/>
+ Such an index is called named index. See on how to query these indexes.
+
+
+
+
+
+
+ Querying the Index
+
+ Querying full text from XQuery is straightforward. For example:
+
+
+ The query function takes a query string in Lucene's default query syntax. It returns a set of nodes which are relevant with respect to the query. Lucene assigns a relevance
+ score or rank (a decimal number) to each match. This score is preserved by eXist-db and can be accessed through the score function.
+ The higher the score, the more relevant the text. You can use Lucene's features to "boost" a certain term in the query: give it a higher or
+ lower influence on the final rank.
+ Please note that the score is computed relative to the root context of the index. If you created an index on SPEECH, all scores
+ will be computed based on text in SPEECH nodes, even though your actual query may only return LINE children of
+ SPEECH.
+ The Lucene module is fully supported by eXist-db's query-rewriting optimizer. This means that the query engine can rewrite the XQuery
+ expression to make best use of the available indexes. All the rules and hints given in the tuning guide fully apply to the Lucene index.
+ To present search results in a Keywords in Context format, you may want to have a look at eXist-db's KWIC module.
+
+
+
+
+ Query a Named Index
+
+ To query a named index (see ), use the ft:query-field($fieldName, $query) instead of
+ ft:query:
+ ft:query-field("title", "xml")
+
+ ft:query-field works exactly like ft:query, except that the set of nodes to search is determined by the
+ nodes in the named index. The function returns the nodes selected by the query, which would be title elements in the example
+ above.
+ You can use ft:query-field with an XPath filter expression, just as you would call ft:query:
+ //section[ft:query-field("title", "xml")]
+
+
+
+
+
+ Describing Queries in XML
+
+ Lucene's default query syntax does not provide access to all available features. However, eXist-db's ft:query function
+ also accepts a description of the query in XML, as an alternative to passing a query string. The XML description closely mirrors Lucene's
+ query API. It is transformed into an internal tree of query objects, which is directly passed to Lucene for execution. This has several
+ advantages, for example you can specify if the order of terms should be relevant for a phrase query:
+
+
+
+ The following elements may occur within a query description:
+
+
+
+ term
+
+
+ Defines a single term to be searched in the index. If the root query element contains a sequence of term elements, wrap them in
+ <bool></bool> and they will be combined as in a boolean "or" query. For example:
+
+ This finds all SPEECH elements containing either nation or miserable or both.
+
+
+
+
+ wildcard
+
+
+ A string with a * wildcard in it. This will be matched against the terms of a document. Can be used instead of a
+ term element. For example:
+
+
+
+
+
+ regex
+
+
+ A regular expression which will be matched against the terms of a document. Can be used instead of a term element. For
+ example:
+
+
+
+
+
+ bool
+
+
+ Constructs a boolean query from its children. Each child element may have an occurrence indicator, which could be either
+ must, should or not:
+
+
+ must
+
+ this part of the query must be matched
+
+
+
+ should
+
+ this part of the query should be matched, but doesn't need to
+
+
+
+ not
+
+ this part of the query must not be matched
+
+
+
+ For instance:
+
+
+
+
+
+ phrase
+
+
+ Searches for a group of terms occurring in the correct order. The element may either contain explicit term elements or
+ text content. Text will be automatically tokenized into a sequence of terms. For example:
+
+ This has the same effect as:
+
+ The attribute slop can be used for a proximity search: Lucene will try to find terms which are within the
+ specified distance:
+
+
+
+
+
+ near
+
+
- The value Version#LUCENE_CURRENT is always added
- as first parameter for the analyzer constructor, but a fall back mechanism is present for older analyzers.
- The previously valid values "java.io.File" and "java.util.Set" can not be used since Lucene 4.
-
- Providing a Stop Words file for the Standard Analyzer
-
-
-
- Providing a list of Stop Words for the Standard Analyzer
-
-
-
- Using the Snowball Analyzer
- Note that using the Snowball analyzer requires you to add additional libraries to lib/user.
-
-
- We will certainly add more features in the future, e.g. a possibility to
- construct a new analyzer from a set of filters. For the time being, you can
- always provide your own analyzer or use one of those supplied by Lucene or
- compatible software.
-
-
-
-
-
-
- Defining Fields
-
- Sometimes you may want to define different Lucene indexes on the same set of elements, e.g.
- to use a different analyzer. eXist-db allows to name a certain index using the field
- attribute:
- <text field="title" qname="title" analyzer="en"/>
- Such an index is called named index. See on how to query the
- named indexes.
-
-
-
-
-
-
- Querying the Index
-
- Querying lucene from XQuery is straightforward. For example:
-
- A Simple Query
-
-
- The query function takes a query string in Lucene's default query
- syntax. It returns a set of nodes which are relevant with respect to the
- query. Lucene assigns a relevance score or rank to each match. This score is
- preserved by eXist-db and can be accessed through the score function, which returns a
- decimal value. The higher the score, the more relevant is the text. You can use
- Lucene's features to "boost" a certain term in the query, i.e. give it a higher or
- lower influence on the final rank.
- Please note that the score is computed relative to the root context of the index.
- If you created an index on SPEECH, all scores will be computed on basis of the text
- in the SPEECH nodes, even though your actual query may only return LINE children of
- SPEECH.
- The Lucene module is fully supported by eXist-db's query-rewriting optimizer, which
- means that the query engine can rewrite the XQuery expression to make best use of
- the available indexes. All the rules and hints given in the tuning guide fully apply to the Lucene index.
- To present search results in a Keywords in Context format,
- you may want to have a look at eXist-db's KWIC
- module.
-
-
-
-
- Query a Named Index
-
- To query a named index (see ), use the ft:query-field($fieldName, $query) instead of
- ft:query:
- ft:query-field("title", "xml")
-
- ft:query-field works exactly like ft:query, except that the set of nodes
- to search is determined by the nodes in the named index. The function returns the nodes selected by
- the query, which would be title elements in the example above.
- You can thus use ft:query-field with an XPath filter expression just as you would call
- ft:query:
- //section[ft:query-field("title", "xml")]
-
-
-
-
-
- Describing Queries in XML
-
- Lucene's default query syntax does not provide access to all available
- features. However, eXist-db's ft:query function also accepts a
- description of the query in XML as an alternative to passing a query string. The
- XML description closely mirrors Lucene's query API. It is transformed into an
- internal tree of query objects, which is directly passed to Lucene for
- execution. This has some advantages. For example, you can specify if the order
- of terms should be relevant for a phrase query:
-
- Using an XML Definition of the Query
-
-
- The following elements may occur within a query description:
-
-
-
- term
-
-
- Defines a single term to be searched in the index. If the root
- query element contains a sequence of term elements, wrap them in <bool></bool> and they will be
- combined as in a boolean "or" query. For example:
-
- finds all SPEECH elements containing either "nation" or
- "miserable" or both.
-
-
-
-
- wildcard
-
-
- A string with a '*' wildcard in it, which will be matched against
- the terms of a document. Can be used instead of a
- term element. For example:
-
-
-
-
-
- regex
-
-
- A regular expression which will be matched against the terms of a
- document. Can be used instead of a term element.
- For example:
-
-
-
-
-
- bool
-
-
- Constructs a boolean query from its children. Each child element
- may have an occurrence indicator, which could be either
- must, should or
- not:
-
-
- must
-
- this part of the query must be
- matched
-
-
-
- should
-
- this part of the query should be
- matched, but doesn't need to
-
-
-
- not
-
- this part of the query must not
- be matched
-
-
-
-
-
-
-
-
- phrase
-
-
- Searches for a group of terms occurring in the correct order. The
- element may either contain explicit term elements
- or text content. Text will be automatically tokenized into a
- sequence of terms. For example:
-
- has the same effect as:
-
- The attribute slop can be used for a proximity
- search: Lucene will try to find terms which are within the specified
- distance:
-
-
-
-
-
- near
-
-
-
- near is a powerful alternative to
- phrase and one of the features not available
- through the standard Lucene query parser.
- If the element has text content only, it will be tokenized into
- terms and the expression behaves like phrase.
- Otherwise it may contain any combination of term,
- first and nested near
- elements. This makes it possible to search for two sequences of
- terms which are within a specific distance. For example:
-
- Element first matches a span against the start
- of the text in the context node. It takes an optional attribute
- end to specify the maximum distance from
- the start of the text. For example:
-
- As shown above, the content of first can again
- be text, a term or
- near.
- Contrary to phrase, near can
- be told to ignore the order of its components. Use parameter
- ordered="yes|no" to change near's
- behaviour. For example:
-
-
-
-
- All elements in a query may have an optional boost
- parameter (a float value). The score of the nodes matching the corresponding
- query part will be multiplied by the boost.
-
-
-
-
-
- Additional parameters
-
- The ft:query function allows a third parameter, which can be used to pass
- some additional settings to the query engine. The parameter should contain an XML
- fragment which lists the configuration properties to be set as child elements:
-
-
-
-
- The meaning of those properties is as follows
-
-
- filter-rewrite
-
- Controls how terms are expanded for wildcard or regular expression
- searches. If set to "yes", Lucene will use a filter to pre-process
- matching terms. If set to "no", all matching terms will be added to a single
- boolean query which is then executed. This may generate a "too many clauses"
- exception when applied to large data sets. Setting filter-rewrite to "yes"
- avoids those issues.
-
-
-
- default-operator
-
- The default operator with which multiple terms will be combined.
- Allowed values: "or", "and".
-
-
-
- phrase-slop
-
- Sets the default slop for phrases. If zero, then exact phrase matches are
- required. Default value is zero.
-
-
-
- leading-wildcard
-
- When set to "yes", * or ? are allowed as the first character of a PrefixQuery and
- WildcardQuery. Note that this can produce very slow queries on big indexes.
-
-
-
-
-
-
-
-
-
- Adding Constructed Fields to a Document
-
- This feature allows to add arbitrary fields to a binary or XML document and have them indexed
- with lucene. It was developed as part of the content extraction
- framework to attach metadata extracted from e.g. a PDF to the binary document. It works
- equally well for XML documents though and is an efficient method, e.g. to attach computed fields to
- a document, containing information which does not exist in the XML as such.
- The field indexes are not configured via collection.xconf. Instead we add
- fields programmatically from an XQuery (which could be run via a trigger):
-
- The store attribute indicates that the fields content should be stored as a string.
- Without this attribute, the content will be indexed for search, but you won't be able to retrieve the contents.
-
- To get the contents of a field, use the ft:get-field function:
- ft:get-field("/db/demo/test.xml", "title")
- To query this index, use the ft:search function:
- ft:search("/db/demo/test.xml", "title:indexing and author:me")
- Custom field indexes are automatically deleted when their parent document is removed. If you want to update
- fields without removing the document, you need to delete the old fields first though. This can be done
- using the ft:remove-index function:
- ft:remove-index("/db/demo/test.xml")
-
+ near is a powerful alternative to phrase and one of the features not available through the standard Lucene query
+ parser.
+ If the element has text content only, it will be tokenized into terms and the expression behaves like phrase. Otherwise
+ it may contain any combination of term, first and nested near elements. This makes it possible to
+ search for two sequences of terms which are within a specific distance. For example:
+
+ Element first matches a span against the start of the text in the context node. It takes an optional attribute
+ end to specify the maximum distance from the start of the text. For example:
+
+ As shown above, the content of first can again be text, a term or near.
+ Contrary to phrase, near can be told to ignore the order of its components. Use parameter
+ ordered="yes|no" to change near's behaviour. For example:
+
+
+
+
+ All elements in a query may have an optional boost parameter (float). The score of the nodes matching the corresponding
+ query part will be multiplied by this factor.
+
+
+
+
+
+ Additional parameters
+
+ The ft:query function allows a third parameter for passing additional settings to the query engine. This parameter must be an
+ XML fragment which lists the configuration properties to be set as child elements:
+
+
+
+ The meaning of those properties is as follows
+
+
+ filter-rewrite
+
+ Controls how terms are expanded for wildcard or regular expression searches. If set to yes, Lucene will use a filter to
+ pre-process matching terms. If set to no, all matching terms will be added to a single boolean query which is then
+ executed. This may generate a "too many clauses" exception when applied to large data sets. Setting filter-rewrite to yes
+ avoids those issues.
+
+
+
+ default-operator
+
+ The default operator with which multiple terms will be combined. Allowed values: or, and.
+
+
+
+ phrase-slop
+
+ Sets the default slop for phrases. If 0, then exact phrase matches are required. Default value is
+ 0.
+
+
+
+ leading-wildcard
+
+ When set to yes, * or ? are allowed as the first character of a PrefixQuery and
+ WildcardQuery. Note that this can produce very slow queries on big indexes.
+
+
+
+
+
+
+
+
+
+ Adding Constructed Fields to a Document
+
+ This feature allows to add arbitrary fields to a binary or XML document and have them indexed with Lucene. It was developed as part of the
+ content extraction framework, to attach metadata
+ extracted from for instance a PDF to the binary document. It works equally well for XML documents though and is an efficient method to attach
+ computed fields to a document, containing information which does not exist in the XML as such.
+ The field indexes are not configured via collection.xconf. Instead we add fields programmatically from an XQuery (which
+ could be run via a trigger):
+
+
+ The store attribute indicates that the fields content should be stored as a string. Without this attribute, the content
+ will be indexed for search, but you won't be able to retrieve the contents.
+ To get the contents of a field, use the ft:get-field function:
+ ft:get-field("/db/demo/test.xml", "title")
+
+ To query this index, use the ft:search function:
+ ft:search("/db/demo/test.xml", "title:indexing and author:me")
+
+ Custom field indexes are automatically deleted when their parent document is removed. If you want to update fields without removing the
+ document, you need to delete the old fields first though. This can be done using the ft:remove-index function:
+ ft:remove-index("/db/demo/test.xml")
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/newrangeindex/newrangeindex.xml b/src/main/xar-resources/data/newrangeindex/newrangeindex.xml
index e3dd1c72..40407838 100644
--- a/src/main/xar-resources/data/newrangeindex/newrangeindex.xml
+++ b/src/main/xar-resources/data/newrangeindex/newrangeindex.xml
@@ -1,307 +1,243 @@
-
- New Range Index (since eXist 2.2)
- November 2009
-
- TBD
-
-
-
-
-
-
- Overview
-
- eXist version 2.2 and above includes a rewritten, modularized range index. Under
- the hood it is based on Apache Lucene for super fast lookups. It also provides new
- optimizations to speed up some types of queries which failed to run efficiently with
- the old index.
- Range indexes are extremely important in eXist-db. Without a proper index,
- evaluating a general comparison in a filter (like //foo[baz = "xyz"])
- requires eXist to do a full scan over the context node set, checking the value of
- every node against the argument. This is not only slow, it also limits concurrency
- due to necessary locking and consumes memory for loading each of the nodes. With a
- well-defined index, queries will usually complete in a few milliseconds instead of
- taking seconds. The index allows the optimizer to rewrite the expression and process
- the index lookup in advance, assuming that the number of baz elements with content
- "xyz" is much smaller than the total number of elements.
- The old range indexing code had three main issues though:
-
-
-
- Index entries were organized by collection, resulting in an unfortunate dependency between collection
- size and update speed. In simple words: updating or removing documents became slower as the collection grew.
- For a long time, the general recommendation was to split large document sets into multiple, smaller sub-collections
- if update speed was an issue.
-
-
-
-
- Queries on very frequent search strings were quite inefficient: for example, a query
-
- //term[@type ="main"][. = "xyz"]
-
- could be quite slow despite an index being defined if @type="main" occurred very
- often. Unfortunately this is a common use of attributes and to make it quick, you had to reformulate
- the query, e.g. by moving the non-selective step to the back:
-
- //term[. = "xyz"][@type = "main"]
-
-
-
- Range indexes were baked into the core of eXist-db, making maintenance and bug fixing difficult.
-
-
-
- The rewritten range index addresses both issues. First, indexes are now organized
- by document/node, so collection size does no longer matter when updating an index
- entry. Concerning storage, the index is entirely based on Apache Lucene instead of
- the B+-tree which was previously used. Most range indexes tend to be strings, so why
- not leave the indexing to a technology like Lucene, which is known to scale well and
- does a highly efficient job on string processing? Since version 4, Lucene has added
- support for storing numeric data types and binary data into the index, so it seemed
- to be a perfect match for our requirements. Lucene is integrated into eXist on a
- rather low level with direct access to the indexes.
- To address the second issue, it is now possible to combine several fields to index
- into one index definition, so above XPath:
- //term[@type = "main"] [. = "xyz"]
- can be evaluated with a single index lookup. We'll see in a minute how to define
- such an index.
- Finally, the new range index is implemented as a pluggable module: a separate
- component which is not required for the core of eXist-db to work properly. For
- eXist, the index is a black box: it does not need to know what the index does. If
- the index is there, it will automatically plug itself into the indexing pipeline as
- well as the query engine. If it is not, eXist will fall back to default (brute
- force) query processing.
-
-
-
-
-
- Index Configuration
-
- We tried to keep the basic index configuration as much backwards compatible as
- possible. The old range index is still supported to allow existing applications to
- run unchanged.
-
- Example: Index Configuration with the Legacy Range Index
-
-
- To use the new range index, wrap the range index definitions into a range element:
-
- Example: Index Configuration with the New Range Index
-
-
- If you store this definition and do a reindex, you should find new index files in
- the webapp/WEB-INF/data/range directory (or wherever you
- configured your data directory to be).
- Just as the old range index, the new indexes will be used automatically for
- general or value comparisons as well as string functions like
- fn:contains, fn:starts-with,
- fn:ends-with.
-
-
- fn:matches is currently not supported due to limitations in
- Lucene's regular expression handling. If you require fn:matches a lot, consider
- using the old range index.
-
- Above configuration applies to documents using MODS, a standard for
- bibliographical metadata. To provide some examples, the following XPath expressions
- should use the created indexes:
-
- Example: XPath expressions which should be optimized by the index
-
-
-
-
-
-
-
- New Configuration Features
-
-
-
-
-
-
- Case sensitive index
-
- Add case="no" to create a case insensitive index on a string.
- This is a feature many users have asked for. With a case insensitive index on
- mods:namePart a match will also be found if you query for
- "dennis ritchie" instead of "Dennis Ritchie".
-
-
-
-
-
- Collations
-
-
- A collation changes how strings are compared. For example, you can change the strength property of the
- collation to ignore diacritics, accents or case. So to compare strings ignoring accents or case, you can
- define an index as follows:
-
-
- Example: Configuring a collation
-
-
-
- Please refer to the ICU documentation (which is used by eXist)
- for more information on collations, strength etc.
-
-
-
-
-
-
- Combined indexes
-
- If you know you will often use a certain
- combination of filters, you can combine the corresponding indexes into one to
- further reduce query times. For example, the mods:name
- element has an attribute type which qualifies the name as being "personal",
- "corporate" or another predefined value. To speed up a query
- like
-
- you
- could create a combined index on mods:name as follows:
-
- Example: Configuring a combined index
-
-
- This index will be used whenever the context of the filter
- expression is a mods:name and it filters on either or both: @type and
- mods:namePart. Advantage: only one index lookup is
- required to evaluate such an expression, resulting in a huge performance boost,
- in particular if the combination of filters does only match a few names out of a
- large set!
- Note that all 3 attributes of the field element are
- required. The name you give to the field can be arbitrary, but it should be
- unique within the index configuration document. The match attribute specifies
- the nodes to include in the field. It should be a simple path relative to the
- context element.
- You can skip the match attribute if you want to
- index the content of the context node itself. In this case, an additional
- attribute: nested="yes|no" can be added to tell the indexer to
- skip the content of nested nodes to only index direct text children of the
- context node.
- The index is also used if you only query one of the
- defined fields, e.g.:
-
- //mods:mods[mods:name[mods:namePart = "Dennis Ritchie"]].
-
- It is important that the filter expression matches the index definition though,
- so the following will not be sped up by the index:
-
- //mods:mods[mods:name/mods:namePart = "Dennis Ritchie"]
-
- because the context of the filter expression here is mods:mods, not mods:name.
-
- You can create as many combined indexes as you like, even if some
- of them refer to elements which are nested inside other elements having a
- different index. For example, to index a complete MODS record, we could create
- one nested index on the root element: mods:mods, and include
- all attributes or simple descendant elements we may want to query at the same
- time. mods:name - even though a child of
- mods:mods - is a complex element, so we want it to have a
- separate index as shown above. We thus define both indexes:
-
- Example: Complex index definition
-
-
- This allows a more complex query to be optimized:
-
- Example: XPath optimized by the index
-
-
- In this case, the mods:dateIssued lookup will be done first, which
- presumably returns more hits than the name lookup. For maximum performance it may
- thus still be faster to split the expression into two parts and do the name check
- first.
-
-
-
-
-
- Conditional combined indexes
-
- For combined indexes, you can specify conditions to restrict the values being indexed to those contained in elements that have an attribute meeting certain criteria:
-
-
- Conditional indexes
-
-
- This will only index the value of the tei:term element if it has an attribute named type with the value "main". Multiple conditions can be specified in an index definition, in which case all conditions need to match in order for the value to be indexed.
- In order to take advantage of query optimization for conditionally indexed fields, queries should be formulated like this:
- //tei:term[@type = "main"][. = "xyz"]
- which then gets rewritten to a call to
- range:field(("mainTerm"), "eq", "xyz")
- By default, condition matching is string-based and case sensitive. The following optional attributes can be specified on a condition:
-
-
- operator="eq|ne|lt|gt|le|ge|starts-with|ends-with|contains|matches"
-
- Specifies the operator for the comparison. matches supports Java regular expressions.
- Default is "eq".
-
-
-
- case="yes|no"
-
- Turns case sensitivity on or off for string comparisons.
- Default is "yes".
-
-
-
- numeric="yes|no"
-
- Turns numeric comparison on or off for equality and ordinal comparisons (eq, ne, lt, gt, le, ge). When enabled, 01.0 will equal 1 and 2 will be less than 110 for example. The rewriter will respect the type of the value (string, numeric) when matching a condition to a predicate.
- Default is "off".
-
-
-
-
-
-
-
-
-
- Using Index Functions
-
- Internally the query optimizer will rewrite range lookup expressions into
- optimized function calls into the range module (namespace
- http://exist-db.org/xquery/range). This happens transparently and
- you'll never see the function calls. However, for debugging and testing it is
- sometimes useful to be able to use the corresponding functions directly. There are
- two sets of functions: one for simple range index lookups, and one for indexes on
- fields.
- Given the following index configuration:
-
- Some Indexes on Shakespeare
-
-
- A query:
- //SPEECH[SPEAKER="HAMLET"]
- translates into:
- //SPEECH[range:eq(SPEAKER, "HAMLET")]
- If the index is defined on an element with fields, the entire sub-expression, i.e. the context path and all its filters,
- is rewritten into a single function call. For example, take:
- collection("/db/apps/demo/data")//SPEECH[.//STAGEDIR = "Aside"]
- is replaced with
- collection("/db/apps/demo/data")/range:field-eq("stagedir", "Aside")
- Because the index root is defined on SPEECH, the function will always return
- SPEECH elements.
- If multiple filters are used and each of them has a corresponding field definition, they are combined into one call:
- collection("/db/apps/demo/data")/range:field-eq(("stagedir", "line"), "Aside", "what do you read, my lord?")
- Note that while the field names are specified in a sequence, we add one parameter for every value to look up. This way it is possible
- to specify more than one value for each parameter by passing in a sequence.
- Because different operators might be used inside the filters, the query engine will actually rewrite the expression to the
- following:
- collection("/db/apps/demo/data")/range:field(("stagedir", "line"), ("eq", "eq"), "Aside", "what do you read, my lord?")
- This is not easy to read, but efficient, and users will normally not see this function call anyway. However, it sometimes helps to know what the
- optimizer is supposed to do and try it out explicitely.
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Range Index
+ 1Q18
+
+ indexing
+
+
+
+
+
+ eXist (version 2.2 and above) includes a super fast modularized range index based on Apache Lucene. This article describes eXist-db's range
+ index.
+ There is also an older version of the range index, which is kept for compatibility reasons. Usage of
+ this range index is discouraged.
+
+
+
+
+ Overview
+
+ Range indexes are extremely important in eXist-db. Without a proper index, evaluating a general comparison in an XPath filter expression
+ (like //foo[baz = "xyz"]) requires eXist to do a full scan over the context node set, checking the value of every node against the
+ argument. This is not only slow, it also limits concurrency due to necessary locking and consumes memory for loading each of the nodes. With a
+ well-defined index, queries will usually complete in a few milliseconds instead of seconds. The index allows the optimizer to rewrite the
+ expression and process the index lookup in advance, assuming that the number of baz elements with content xyz is much
+ smaller than the total number of elements.
+
+
+
+
+
+
+ Index Configuration
+
+
+ To use the new range index, wrap the range index definitions into a range element:
+
+
+ Store this definition and do a re-index. Index files are created in the webapp/WEB-INF/data/range directory (or wherever
+ you configured your data directory to be).
+ The indexes will be used automatically for general or value comparisons, as well as string functions like fn:contains,
+ fn:starts-with, fn:ends-with.
+
+
+ fn:matches is currently not supported due to limitations in Lucene's regular expression handling. If you require fn:matches a
+ lot, consider using the old range index.
+
+ Above configuration applies to documents using MODS, a standard for bibliographical metadata. The following XPath expressions use the
+ created indexes:
+
+
+
+
+
+
+
+ Configuration Features
+
+
+
+
+ Case sensitive index
+
+ Add case="no" to create a case insensitive index on a string.
+ With a case insensitive index on mods:namePart a match will also be found if you query for "dennis ritchie" instead of
+ "Dennis Ritchie".
+
+
+
+
+
+ Collations
+
+ A collation changes how strings are compared. For example, you can change the strength property of the collation to ignore diacritics,
+ accents or case. So to compare strings ignoring accents or case, you can define your index as follows:
+
+
+ Please refer to the ICU documentation (which is used by eXist) for more information on collations, strength, etc.
+
+
+
+
+
+ Combined indexes
+
+ If you know you will often use a certain combination of filters, you can combine the corresponding indexes into one to further reduce
+ query times. For example, the mods:name element has an attribute type which qualifies the name as being "personal", "corporate" or
+ another predefined value.
+ Assume you want to speed up a query like this:
+
+ To do this you could create a combined index on mods:name as follows:
+
+
+ This index will be used whenever the context of the filter expression is a mods:name. It filters on either or both:
+ @type and mods:namePart. Advantage: only one index lookup is required to evaluate such an expression, resulting in a
+ huge performance boost, in particular if the combination of filters does only match a few names out of a large set!
+ Note that all 3 attributes of the field element are required. The name you give to the field can be arbitrary, but it should
+ be unique within the index configuration document. The match attribute specifies the nodes to include in the field. It should be
+ a simple path relative to the context element.
+ You can skip the match attribute if you want to index the content of the context node itself. In this case, an additional attribute:
+ nested="yes|no" can be added to tell the indexer to skip the content of nested nodes to only index direct text children of the
+ context node.
+ The index is also used if you only query one of the defined fields, for instance:
+ //mods:mods[mods:name[mods:namePart = "Dennis Ritchie"]]
+ It is important that the filter expression matches the index definition though, so the following will not be sped up
+ by the index:
+ //mods:mods[mods:name/mods:namePart = "Dennis Ritchie"]
+ This is because the context of the filter expression here is mods:mods, not mods:name.
+ You can create as many combined indexes as you like, even if some of them refer to elements which are nested inside other elements having
+ a different index. For example, to index a complete MODS record, we could create one nested index on the root element: mods:mods,
+ and include all attributes or simple descendant elements we may want to query at the same time. mods:name, even though a child of
+ mods:mods, is a complex element, so we want it to have a separate index as shown above. We therefore define both indexes:
+
+
+ This allows more complex queries to be optimized, for instance:
+
+
+ In this case, the mods:dateIssued lookup will be done first, which presumably returns more hits than the name lookup. So, for
+ maximum performance it may still be faster to split the expression into two parts and do the name check first.
+
+
+
+
+
+ Conditional combined indexes
+
+ For combined indexes, you can specify conditions to restrict the values being indexed to those contained in elements that have an
+ attribute meeting certain criteria:
+
+
+ This will only index the value of the tei:term element if it has an attribute named type with the value
+ main. Multiple conditions can be specified in an index definition, in which case all conditions need to match in order for the
+ value to be indexed.
+ In order to take advantage of query optimization for conditionally indexed fields, queries should be formulated like this:
+ //tei:term[@type = "main"][. = "xyz"]
+ This gets rewritten to a call to:
+ range:field(("mainTerm"), "eq", "xyz")
+ By default, condition matching is string-based and case-sensitive.
+ The following optional attributes can be specified on a condition:
+
+
+ operator="eq|ne|lt|gt|le|ge|starts-with|ends-with|contains|matches"
+
+ Specifies the operator for the comparison. matches supports Java regular expressions.
+ Default is eq.
+
+
+
+ case="yes|no"
+
+ Turns case sensitivity on or off for string comparisons.
+ Default is yes.
+
+
+
+ numeric="yes|no"
+
+ Turns numeric comparison on or off for equality and ordinal comparisons (eq, ne, lt, gt, le, ge). When enabled, 01.0
+ will equal 1 and 2 will be less than 110 for example. The rewriter will respect the type of the value (string, numeric) when matching a
+ condition to a predicate.
+ Default is off.
+
+
+
+
+
+
+
+
+
+ Using Index Functions
+
+ Internally the query optimizer will rewrite range lookup expressions into optimized function calls into the range module
+ (namespace http://exist-db.org/xquery/range). This happens transparently and you'll never see the function calls. However, for
+ debugging and testing it is sometimes useful to use the corresponding functions directly. There are two sets of functions: one for simple range
+ index lookups, and one for indexes on fields.
+ For example, assume the following index configuration:
+
+
+ And the following query:
+ //SPEECH[SPEAKER="HAMLET"]
+ This translates into:
+ //SPEECH[range:eq(SPEAKER, "HAMLET")]
+
+ If the index is defined on an element with fields, the entire sub-expression, the context path and all its filters, is rewritten into a
+ single function call. For example:
+ collection("/db/apps/demo/data")//SPEECH[.//STAGEDIR = "Aside"]
+ This is replaced with:
+ collection("/db/apps/demo/data")/range:field-eq("stagedir", "Aside")
+ Because the index root is defined on SPEECH, the function will always return SPEECH elements.
+
+ If multiple filters are used and each of them has a corresponding field definition, they are combined into one call:
+ collection("/db/apps/demo/data")/range:field-eq(("stagedir", "line"), "Aside", "what do you read, my lord?")
+ Note that while the field names are specified in a sequence, we add one parameter for every value to look up. This way it is possible to
+ specify more than one value for each parameter by passing in a sequence.
+ Because different operators might be used inside the filters, the query engine will actually rewrite the expression to the following:
+ collection("/db/apps/demo/data")/range:field(("stagedir", "line"), ("eq", "eq"), "Aside", "what do you read, my lord?")
+ This is not easy to read, but efficient. Users will normally not see this function call. However, it sometimes helps to know what the
+ optimizer is supposed to do and try it out explicitly.
+
+
+
+
+
+ Comparison with the old range index
+
+ eXist also has an older version of the range index on board (see old range index). Compared to this
+ newer version it has three main issues:
+
+
+ Index entries were organized by collection, resulting in an unfortunate dependency between collection size and update speed. In simple
+ words: updating or removing documents became slower as the collection grew. For a long time, the general recommendation was to split large
+ document sets into multiple, smaller sub-collections if update speed was an issue.
+
+
+ Queries on very frequent search strings were quite inefficient: for example, a query
+ //term[@type ="main"][. = "xyz"]
+ could be quite slow despite an index being defined if @type="main" occurred very often. Unfortunately this is a common use
+ of attributes and to make it quick, you had to reformulate the query, e.g. by moving the non-selective step to the back:
+ //term[. = "xyz"][@type = "main"]
+
+
+ Range indexes were baked into the core of eXist-db, making maintenance and bug fixing difficult.
+
+
+ The rewritten range index addresses both issues. First, indexes are now organized by document/node, so collection size does no longer matter
+ when updating an index entry. Concerning storage, the index is entirely based on Apache Lucene instead of the B+-tree which was previously used.
+ Most range indexes tend to be strings, so why not leave the indexing to a technology like Lucene, which is known to scale well and does a highly
+ efficient job on string processing? Since version 4, Lucene has added support for storing numeric data types and binary data into the index, so
+ it seemed to be a perfect match for our requirements. Lucene is integrated into eXist on a rather low level with direct access to the indexes.
+ To address the second issue, it is now possible to combine several fields to index into one index definition:
+ //term[@type = "main"] [. = "xyz"]
+ This can now be evaluated with a single index lookup.
+ Finally, the new range index is implemented as a pluggable module: a separate component which is not required for the core of eXist-db to
+ work properly. For eXist, the index is a black box: it does not need to know what the index does. If the index is there, it will automatically
+ plug itself into the indexing pipeline as well as the query engine. If it is not, eXist will fall back to default (brute force) query
+ processing.
+ The old range index is still supported to allow existing applications to run unchanged.
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/ngram/ngram.xml b/src/main/xar-resources/data/ngram/ngram.xml
index 6737771a..0482fcc8 100644
--- a/src/main/xar-resources/data/ngram/ngram.xml
+++ b/src/main/xar-resources/data/ngram/ngram.xml
@@ -1,30 +1,30 @@
-
- N-Gram Index
- September 2009
-
- TBD
-
-
-
-
-
-
- Index Configuration
-
- To create an n-gram index, add a ngram element directly
- below the root index node. The n-gram index only supports
- index definitions by qname. The path
- attribute is not supported (we currently don't see many real use cases for
- it). Right now, the n-gram index has no additional parameters to be
- specified; the default settings should just be ok for most cases (we may add
- extra parameters in the future, e.g. for collapsing/normalizing
- whitespace).
-
- collection.xconf
-
-
-
+
+ N-Gram Index
+ 1Q18
+
+ indexing
+
+
+
+
+
+ This article will provide some information on how to configure eXist-db's ngram index.
+
+
+
+
+ Index Configuration
+
+ To create an n-gram index, add a ngram element directly below the root index node of a collection.xconf
+ document:
+
+
+ The ngram index only supports index definitions by qname. The path attribute is not
+ supported. There are no additional parameters to be specified.
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/oldrangeindex/oldrangeindex.xml b/src/main/xar-resources/data/oldrangeindex/oldrangeindex.xml
index 818a9694..a864da38 100644
--- a/src/main/xar-resources/data/oldrangeindex/oldrangeindex.xml
+++ b/src/main/xar-resources/data/oldrangeindex/oldrangeindex.xml
@@ -1,250 +1,181 @@
-
- Legacy Range index
- November 2009
-
- TBD
-
-
-
-
-
-
- Note
-
-
- This index has been replaced by a redesigned range index module in eXist 2.2. The old
- index is still available and fully functional though.
-
- Range indexes provide a shortcut for the database to directly select nodes
- based on their typed values. They are used when matching or comparing nodes by
- way of standard XPath operators and functions. Without a range index, comparison
- operators like =, > or < will default to a "brute-force" inspection of the
- DOM, which can be extremly slow if eXist-db has to search through maybe millions of
- nodes: each node has to be loaded and cast to the target type.
- To see how range indexes work, consider the following fragment:
-
- Example: List Entry
-
-
- With this short inventory, the text nodes of the price
- elements have dollar values expressed as a floating-point number, (e.g.
- "299.99"), which has an XML
- Schema Definition (XSD) data type of xs:double.
- Using this builtin type to define a range index, we can improve the efficiency
- of searches for price values. (Instructions on how to
- configure range indexes using configuration files are provided under the Configuring Indexes section below.) During indexing,
- eXist-db will apply this data type selection by attempting to cast all
- price values as double floating point numbers, and add
- appropriate values to the index. Values that cannot be cast as double floating
- point numbers are therefore ignored. This range index will then be used by any
- expression that compares price to an xs:double value - for instance:
- //item[price > 100.0]
- For non-string data types, the range index provides the query engine with a more
- efficient method of data conversion. Instead of retrieving the value of each
- selected element and casting it as a xs:double type, the
- engine can evaluate the expression by using the range index as a form of lookup
- index. Without an index, eXist-db has to do a full scan over all price
- price elements, retrieve the string values of their text
- node and cast them to a double number. This is a time-consuming process which
- also scales very badly with growing data sets. With a proper index, eXist-db needs
- just a single index lookup to evaluate price = 100.0. The
- range expression price > 100.0 is processed with an index
- scan starting at 100.
- For string data, the index will also be used by the standard functions
- fn:contains(), fn:starts-with(),
- fn:ends-with() and fn:matches().
- To illustrate this functionality, let's return to the previous example. If you
- define a range index of type xs:string for element
- name, a query on this element to select tall bookcases
- using fn:matches() will be supported by the following
- index:
- //item[fn:matches(name, '[Tt]all\s[Bb]')]
- Note that fn:matches will by default try to match the
- regular expression anywhere in the string. We can thus
- speed up the query dramatically by using "^" to restrict the match to the start
- of the string:
- //item[fn:matches(name, '^[Tt]all\s[Bb]')]
- Also, if you really need to search for an exact substring in a longer text
- sequence, it is often better to use the NGram index instead of the range index,
- i.e. use ngram:contains() instead of fn:contains(). Unfortunately, there's no equivalent NGram
- function for fn:matches() yet, but we may add one in the
- future as it could help to increase performance dramatically.
- In general, three conditions must be met in order to optimize a search using a
- range index:
-
-
-
- The range index must be defined on all
- items in the input sequence.
-
- For example, suppose you have two collections in the database: C1 and
- C2. If you have a range index defined for collection C1, but your query
- happens to operate on both C1 and C2, then the range index would
- not be used. The query optimizer selects an
- optimization strategy based on the entire input sequence of the query.
- Since, in this example, since only nodes in C1 have a range index, no
- range index optimization would be applied.
-
-
-
- The index data type (first argument type) must match the test
- data type (second argument type).
-
- In other words, with range indexes, there is no promotion of data
- types (i.e. no data type precedes or replaces another data type). For
- example, if you defined an index of type xs:double on
- price, a query that compares this element's value
- with a string literal would not use a range index, for instance:
- //item[price = '1000.0']
- In order to apply the range index, you would need to cast the value as
- a type xs:double, i.e.:
- //item[price = xs:double($price)] (where $price is any test value)
- Similarly, when we compare xs:double values with
- xs:integer values, as in, for instance:
- //item[price = 1000]
- the range index would again not be used since the
- price data type differs from the test value type,
- although this conflict might not seem as obvious as it is with string
- values.
-
-
-
- The right-hand argument has no dependencies on the current
- context item.
-
- That is, the test or conditional value must not depend on the value
- against which it is being tested. For example, range indexes will not be
- applied given the following expression:
- //item[price = self]
-
-
- Concerning range indexes on strings there's another restriction to be
- considered: up to version 1.3, range indexes on strings can only be used with
- the default Unicode collation. Also, string indexes will always be case
- sensitive (while n-gram and full text indexes are not). It is not yet possible
- to define a string index on a different collation (e.g. for German or French) or
- to make it case insensitve. This is a limitation we plan to address in the future.
-
-
-
-
-
- Range index configuration
-
-
- Range Index Configuration
-
-
- A range index is configured by adding a create element
- directly below the root index element. As explained above,
- the node to be indexed is either specified through a path
- or a qname attribute.
-
- Unlike the new range index, the create elements of
- the old range index are NOT wrapped inside a range tag.
-
- As range indexes are type specific, the type attribute is
- always required. The type should be one of the atomic XML schema types,
- currently including xs:string, xs:integer and its derived types xs:double and xs:float, xs:boolean and xs:dateTime. Further types
- may be added in the future. If the name of the type is unknown, the index
- configuration will be ignored and you will get a warning written into the
- logs.
- Please note that the index configuration will only apply to the node
- specified via the path or qname attribute,
- not to descendants of that node. Consider a mixed content element
- like:
-
- Mixed Content Element
- <mixed><s>un</s><s>even</s></mixed>
-
- If an index is defined on mixed, the key for the index
- is built from the concatenated text nodes of element
- mixed and all its descendants, i.e. "uneven". The
- created index will only be used to evaluate queries on
- mixed, but not for queries on s.
- However, you can create an additional index on s without
- getting into conflict with the existing index on
- mixed.
-
-
-
-
-
- Configuration by path vs. configuration by qname
-
- It is important to note the difference between the path
- and qname attributes used throughout above example. Both
- attributes are used to define the elements or attributes to which the index
- should be applied. However, the path attribute creates
- context-dependant indexes, while the
- qname attribute does not. The path attribute takes a
- simple path expression:
- <create path="//book/title" type="xs:string"/>
- The path expression looks like XPath, but it's really not. Index path
- syntax uses the following components to construct paths:
-
-
- Elements are specified by their qname
-
-
-
- Attributes are specified by @attributeName, so if
- the attribute is called "attrib1", one uses
- @attrib1 in the index specification.
-
-
- Child nodes are selected using the forward-slash
- (/)
-
-
- All descendant nodes in a tree are selected using the double
- forward-slash (//)
-
-
- The example above creates a range index of type string on all
- title elements which are children of
- book elements, which may occur at an arbitrary
- position in the document tree. All other title elements,
- e.g. those being children of section nodes, are not
- indexed. The path expression thus defines a selective
- index, which is also context-dependant: we always need
- look at the context of each title node before we can
- determine if this particular title is to be indexed or not.
- This kind of context-dependant index definition helps to keep the index
- small, but unfortunately it makes it hard for the query optimizer to
- properly rewrite the expression tree without missing some nodes. The
- optimizer needs to make an optimization decision at compile time, where the
- context of an expression is unknown or at least not exactly known (read the
- blog
- article to get the whole picture). This means that some of the
- highly efficient optimization techniques can not be applied to
- context-dependant indexes!
- We thus had to introduce an alternative configuration method which is not
- context-dependant. To keep things simple, we decided to define the index on
- the qname of the element or attribute alone and to
- ignore the context altogether:
- <create qname="title" type="xs:string"/>
- This results in an index being created on every title
- element found in the document node tree. Section titles will be indexed as
- well as chapter or book titles. Indexes on attributes are defined as above
- by prepending "@" to the attribute's name, e.g.:
- <create qname="@type" type="xs:string"/>
- defines an index on all attributes named "type", but not on elements with
- the same name.
- Defining indexes on qnames may result in a considerably larger index, but
- it also allows the query engine to apply all available optimization
- techniques, which can improve query times by an order of magnitude. As so
- often, there's a trade-off between performance and storage space. In many
- cases, the performance win can be dramatic enough to justify an increase in
- index size.
-
- To be on the safe side and to benefit from current and future
- improvements in the query engine, you should prefer
- qname over path - unless you
- really need to exclude certain nodes from indexing.
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Legacy Range index
+ 1Q18
+
+ indexing
+
+
+
+
+
+ This article describes eXist-db's old range index. This has been replaced by a redesigned range
+ index (since eXist 2.2). The old index is still available for compatibility reasons. Its use is discouraged.
+
+
+
+ Introduction
+
+ Range indexes provide a shortcut for the database to directly select nodes based on their typed values. They are used when matching or
+ comparing nodes by way of standard XPath operators and functions. Without a range index, comparison operators like =,
+ > or < will default to a "brute-force" inspection of the DOM, which can be extremely slow if eXist-db has to
+ search through maybe millions of nodes: each node has to be loaded and cast to the target type.
+ To see how range indexes work, consider the following fragment:
+
+
+ With this short inventory, the text nodes of the price elements have dollar values expressed as a floating-point number, (e.g.
+ "299.99"), which has an XML Schema
+ Definition (XSD) data type of xs:double. Using this built-in type to define a range index, we can improve the
+ efficiency of searches for price values. During indexing, eXist-db will apply this data type selection by attempting to cast all
+ price values as double floating point numbers and add appropriate values to the index. Values that cannot be cast as double are
+ ignored. This range index will be used by any expression that compares price to an xs:double value. For
+ instance:
+ //item[price > 100.0]
+
+ For non-string data types, the range index provides the query engine with a more efficient method of data conversion. Instead of retrieving
+ the value of each selected element and casting it as a xs:double type, the engine can evaluate the expression by using the
+ range index as a form of lookup index. Without an index, eXist-db has to do a full scan over all price price elements, retrieve the
+ string values of their text node and cast them to a double number. This is a time-consuming process which also scales very badly with growing
+ data sets. With a proper index, eXist-db needs just a single index lookup to evaluate price = 100.0. The range expression
+ price > 100.0 is processed with an index scan starting at 100.
+ For string data, the index will also be used by the standard functions fn:contains(),
+ fn:starts-with(), fn:ends-with() and fn:matches().
+ To illustrate this functionality, let's return to the previous example. If you define a range index of type xs:string for
+ element name, a query on this element to select tall bookcases using fn:matches() will be supported by the
+ following index:
+ //item[fn:matches(name, '[Tt]all\s[Bb]')]
+ Note that fn:matches will by default try to match the regular expression anywhere in the string. We
+ can thus speed up the query dramatically by using "^" to restrict the match to the start of the string:
+ //item[fn:matches(name, '^[Tt]all\s[Bb]')]
+
+
+ If you really need to search for an exact substring in a longer text sequence, it is often better to use the NGram index instead of the
+ range index, i.e. use ngram:contains() instead of fn:contains(). Unfortunately, there's no equivalent
+ NGram function for fn:matches().
+
+
+ In general, three conditions must be met in order to optimize a search using a range index:
+
+
+
+ The range index must be defined on all items in the input sequence.
+
+ For example, suppose you have two collections in the database: C1 and C2. If you have a range index defined for collection C1, but your
+ query happens to operate on both C1 and C2, then the range index would not be used. The query optimizer selects an
+ optimization strategy based on the entire input sequence of the query. Since, in this example, since only nodes in C1 have a range index, no
+ range index optimization would be applied.
+
+
+
+ The index data type (first argument type) must match the test data type (second argument type).
+
+ In other words, with range indexes, there is no promotion of data types (i.e. no data type precedes or replaces another data type). For
+ example, if you defined an index of type xs:double on price, a query that compares this element's value with a
+ string literal would not use a range index, for instance:
+ //item[price = '1000.0']
+ In order to apply the range index, you would need to cast the value as a type xs:double, i.e.:
+ //item[price = xs:double($price)] (where $price is any test value)
+ Similarly, when we compare xs:double values with xs:integer values, as in, for instance:
+ //item[price = 1000]
+ the range index would again not be used since the price data type differs from the test value type, although this conflict
+ might not seem as obvious as it is with string values.
+
+
+
+ The right-hand argument has no dependencies on the current context item.
+
+ That is, the test or conditional value must not depend on the value against which it is being tested. For example, range indexes will
+ not be applied given the following expression:
+ //item[price = self]
+
+
+ Concerning range indexes on strings there's another restriction to be considered: up to version 1.3, range indexes on strings can only be
+ used with the default Unicode collation. Also, string indexes will always be case sensitive (while n-gram and full text indexes are not). It is
+ not (yet) possible to define a string index on a different collation (e.g. for German or French) or to make it case-insensitive.
+
+
+
+
+
+ Range index configuration
+
+ Range index configuration is done in collection.xconf documents (see indexing):
+
+ A range index is configured by adding a create element directly below the root index element. As explained above, the
+ node to be indexed is either specified through a path or a qname attribute.
+
+
+ Unlike the new range index, the create elements of the old range index are
+ not wrapped inside a range tag.
+
+
+ As range indexes are type specific, the type attribute is always required. The type should be one of the atomic XML
+ schema types, currently including xs:string, xs:integer and its derived types xs:double
+ and xs:float, xs:boolean and xs:dateTime. If the name of the type is unknown, the index
+ configuration will be ignored and you will get a warning written into the logs.
+ Please note that the index configuration will only apply to the node specified via the path or qname
+ attribute, not to descendants of that node. Consider a mixed content element like:
+
+ <mixed><s>un</s><s>even</s></mixed>
+ If an index is defined on mixed, the key for the index is built from the concatenated text nodes of element mixed and
+ all its descendants, so uneven in this case. The created index will only be used to evaluate queries on mixed, but not
+ for queries on s. However, you can create an additional index on s without getting into conflict with the existing index
+ on mixed.
+
+
+
+
+
+ Configuration by path vs. configuration by qname
+
+ It is important to note the difference between the path and qname attributes used in above examples.
+ Both attributes are used to define the elements or attributes to which the index should apply. However, the path attribute
+ creates context-dependant indexes, while the qname attribute does not.
+ The path attribute takes a simple path expression:
+ <create path="//book/title" type="xs:string"/>
+ The path expression looks like XPath, but it's not. Index path syntax uses the following components to construct paths:
+
+
+ Elements are specified by their qname
+
+
+
+ Attributes are specified by @attributeName, so if the attribute is called attrib1, one uses
+ @attrib1 in the index specification.
+
+
+ Child nodes are selected using the forward-slash (/)
+
+
+ All descendant nodes in a tree are selected using the double forward-slash (//)
+
+
+ The example above creates a range index of type string on all title elements which are children of book elements,
+ which may occur at an arbitrary position in the document tree. All other title elements, e.g. those being children of
+ section nodes, are not indexed. The path expression thus defines a selective index, which is also
+ context-dependant: we always need look at the context of each title node before we can determine if this
+ particular title is to be indexed or not.
+ This kind of context-dependant index definition helps to keep the index small, but unfortunately it makes it hard for the query optimizer to
+ properly rewrite the expression tree without missing some nodes. The optimizer needs to make an optimization decision at compile time, where the
+ context of an expression is unknown or at least not exactly known (read the blog article to get the whole picture). This means that some of the
+ optimization techniques can not be applied to context-dependant indexes!
+ We therefore had to introduce an alternative configuration method which is not context-dependant. To keep things simple, we decided to
+ define the index on the qname of the element or attribute alone and to ignore the context altogether:
+ <create qname="title" type="xs:string"/>
+ This results in an index being created on every title element found in the document node tree. Section titles will be indexed as
+ well as chapter or book titles.
+ Indexes on attributes are defined as above by prepending "@" to the attribute's name:
+ <create qname="@type" type="xs:string"/>
+ This defines an index on all attributes named "type", but not on elements with the same name.
+ Defining indexes on qnames may result in a considerably larger index, but it allows the query engine to apply all available
+ optimization techniques, which can improve query times by an order of magnitude. As always, there's a trade-off here between performance and
+ storage space. In many cases, the performance win can be dramatic enough to justify the increase in index size.
+
+ To be on the safe side, and to benefit from current and future improvements in the query engine, you should prefer qname
+ over path, unless you really need to exclude certain nodes from indexing.
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/oxygen/oxygen.xml b/src/main/xar-resources/data/oxygen/oxygen.xml
index 1e540066..e80156a1 100644
--- a/src/main/xar-resources/data/oxygen/oxygen.xml
+++ b/src/main/xar-resources/data/oxygen/oxygen.xml
@@ -1,222 +1,130 @@
-
- Using oXygen
- November 2014
-
- TBD
-
-
-
-
-
-
- Overview
-
-
- oXygen XML Editor is a powerful tool
- for working with eXist-db. Its eXist-db-specific capabilities include:
-
-
- browsing eXist-db database contents
-
-
- editing database contents (open, save, rename documents; create,
- rename collections)
-
-
- editing XQuery files and continuously validate them against eXist-db's
- XQuery engine
-
-
- executing queries and displaying results
-
-
- This article describes how to configure oXygen to work with
- eXist-db. While the oXygen documentation describes oXygen's eXist-db
- support, we provide up-to-date information here for the convenience of
- eXist-db users.
-
-
-
-
-
- How to tell oXygen about your eXist-db installation
-
- To tap into eXist-db via oXygen, you must tell oXygen a bit about your eXist-db
- installation. The steps to do this are admittedly too tedious, but you only need to
- perform these steps once. First, we need to create an entry for eXist-db in oXygen's
- list of Data Sources; this involves pointing oXygen to 5 key libraries (.jar files)
- in our eXist-db directory so that oXygen knows how to connect to our version of
- eXist-db. Then we need to create an entry in its list of Data Connections; this
- involves providing oXygen with a URL and account information for your eXist-db
- instance.
-
-
- In oXygen, go to Preferences > Data Sources, and you will see a window
- with two areas: Data Sources (on the top) and Connections (on the bottom).
-
-
-
- In the Data Sources pane, select the New button to
- create a new data source.
-
-
- A new dialog will appear, with fields for Name, Type, and Driver
- Files.
-
-
- In the Name field enter a unique name for this eXist-db data source, e.g.,
- "eXist-db Data Source".
-
-
- In the Type dropdown menu select "eXist."
-
-
- Finally, select the Add Files button. Browse to the
- directory where you installed eXist-db, and select each of the following
- files so that they appear in the Driver files area:
-
-
- exist.jar
-
-
- lib/core/ws-commons-1.0.2.jar
-
-
- lib/core/xmldb.jar
-
-
- lib/core/xmlrpc-client-3.1.3.jar
-
-
- lib/core/xmlrpc-common-3.1.3.jar
-
-
-
-
- Select OK to complete the creation of the new Data Source and return to
- the Data Sources screen, where will will create a new Data Connection to
- your eXist-db installation. In the "Connections" area of the screen, select
- the Add button to creat a new data connection.
-
-
- A new dialog will appear, with fields for Name, Type, and Driver
- Files.
-
-
- In the Name field enter a unique name for your eXist-db server, e.g.,
- "eXist-db on localhost 8080".
-
-
- In the Data Source drop down menu, select the Data Connection name that
- you created above.
-
-
- In the XML DB URI field, enter the URL pointing to your eXist-db's XML-RPC
- service (e.g., http://localhost:8080/exist/xmlrpc). oXygen v14 and higher
- allow you to make the connection between oXygen and eXist-db secure and
- SSL-encrypted; to do so select the checkbox, "Use a Secure HTTPS Connection
- (SSL)", and use your eXist-db's secure port for the XML DB URI (e.g.,
- https://localhost:8443/exist/xmlrpc).
-
-
- In the User and Password fields, enter your eXist-db account details
- (e.g., typically, the "admin" user and associated password that you set up
- when you installed eXist-db).
-
-
- In the Collection field, enter "/db".
-
-
- Select OK to complete the creation of the new Data Connection. Select OK
- to exit oXygen's Preferences. Congratulations! You have told oXygen
- everything it needs to know about eXist-db.
-
-
-
-
-
-
-
- How to browse your database contents
-
- Now that you have created an oXygen Data Source and Connection for eXist-db, you
- can browse your database contents from within oXygen in two ways:
-
-
- Use the Data Source Explorer, an oXygen pane that lists your
- Connections including the one you created above. To open the Data Source
- Explorer, select Window > Show view > Data Source Explorer. Using this,
- you can browse collections and their contents; you can right click on
- these items display contextual menus with options to create, rename, or
- move database contents.
-
-
- Use the File > Open URL to browse and pick documents or files database
- to open. The first time you connect to your database, you will need to
- fill in several fields: your eXist-db account credentials and Server URL
- (e.g., http://localhost:8080/exist/webdav/db/)
-
-
-
-
-
-
-
- How to validate XQuery files against eXist-db's XQuery engine
-
- By default oXygen uses Saxon to validate XQuery files that you open in oXygen.
- Saxon is a fine tool for validating XQuery (among its many capabilities), but it
- lacks knowledge of eXist-db built-in functions and other settings. Thus, if you are
- ultimately creating XQuery to use in eXist-db, you will find numerous advantages in
- configuring oXygen to use eXist-db for validation instead of Saxon. The steps to
- complete this configuration are very easy:
-
-
- In oXygen, go to Preferences > XQuery. On the dropdown menu labeled,
- "XQuery Validate with", select the name of the Data Connection that you
- created above.
- Select OK to confirm your new preference.
-
-
- Now when you are editing an XQuery file in oXygen, the validation information you
- receive (i.e., when you click on the Validate toolbar button)
- is supplied from eXist-db.
-
-
-
-
-
- How to execute queries and display results
-
- You can execute queries against eXist-db from within oXygen. To do so:
-
-
- Open an XQuery file that you would like to execute.
-
-
- Select the Configure Transformation Scenario
- toolbar button, select the New button, and select
- "XQuery Transformation" in the dropdown menu.
-
-
- A new dialog will appear with fields to configure your XQuery
- transformation settings. Enter a Name for the transformation (e.g.,
- "Transform with eXist-db"). In the "Transformer" dropdown menu, select the
- name of the Data Connection that you created above.
-
-
- Select OK to confirm these settings, and then select "Save and Close" to
- exit the configuration window, or select the Apply
- Associated button to execute your query.
-
-
- Henceforth, you can execute any query using this transformation scenario
- by simply selecting the Apply Transformation Scenario
- toolbar button
-
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Using oXygen with eXist-db
+ 1Q18
+
+ getting-started
+
+
+
+
+
+ This article describes how to use eXist-db in combination with the oXygen XML Editor IDE.
+
+
+
+
+ Overview
+
+
+ oXygen XML Editor is a powerful
+ IDE for working with XNML in general but also with eXist-db. Its eXist-db specific capabilities include:
+
+
+ Browsing eXist-db database contents
+
+
+ Editing database contents (open, save, rename documents; create, rename collections)
+
+
+ Editing XQuery files and continuously validate them against eXist-db's XQuery engine
+
+
+ Executing queries and displaying results
+
+
+ This article describes how to configure oXygen to work with eXist-db. See also oXygen's eXist-db support article.
+
+
+
+
+
+ How to tell oXygen about your eXist-db installation
+
+ To tap into eXist-db via oXygen, you must tell oXygen a bit about your eXist-db installation. First, we need to create an entry for eXist-db
+ in oXygen's list of Data Sources. Then we need to create an entry in its list of Data Connections; this involves providing
+ oXygen with a URL and account information for your eXist-db instance.
+
+
+ In oXygen, go to Preferences, Data Sources, and you will see a window with two areas: Data Sources (at the
+ top) and Connections (at the bottom).
+
+
+ Click the Create eXist-db XML connection link at the top.
+
+
+ Fill in the dialog (for a default installation you'll only to change the user to admin and fill in its password)
+
+
+ Click ok on all subsequent dialogs (if any). This will create both a Data Source and Connection for
+ you.
+
+
+
+
+
+
+
+ How to browse your database contents
+
+ Now that you have created an oXygen Data Source and Connection for eXist-db, you can browse your database contents from within oXygen in two
+ ways:
+
+
+ Use the Data Source Explorer, an oXygen pane that lists your Connections including the one you created above. To open the Data Source
+ Explorer, select Window, Show view, Data Source Explorer. You can now browse collections and their contents.
+ Right click on these items display contextual menus.
+
+
+ Use the File, Open URL to browse and pick documents from the eXist-db database. The first time you connect
+ you need to fill in several fields, among which your eXist-db account credentials and server URL (e.g.,
+ http://localhost:8080/exist/webdav/db/)
+
+
+
+
+
+
+
+ How to validate XQuery files against eXist-db's XQuery engine
+
+ By default oXygen uses Saxon to validate XQuery files. Saxon is a fine tool for validating XQuery (among its many capabilities), but it
+ lacks knowledge of eXist-db built-in functions and other settings. Therefore if you are creating XQuery to use in eXist-db, you will find
+ numerous advantages in configuring oXygen to use eXist-db for validation. The steps to complete this configuration are very easy:
+
+
+ In oXygen, go to Preferences, XQuery. On the dropdown menu labeled, XQuery Validate
+ with, select the name of the Data Connection that you created above.
+
+
+ Now when you are editing an XQuery file in oXygen, the validation information you receive (for instance when you click on the
+ Validate toolbar button) is supplied from eXist-db.
+
+
+
+
+
+ How to execute queries and display results
+
+ You can also execute queries against eXist-db from within oXygen:
+
+
+ Open the XQuery file that you would like to execute.
+
+
+ Select the Configure Transformation Scenario toolbar button, select the New
+ button, and select XQuery Transformation in the dropdown menu.
+
+
+ A new dialog will appear with fields to configure your XQuery transformation settings. Enter a Name for the transformation (e.g.,
+ "Transform with eXist-db"). In the Transformer dropdown menu select the Data Connection you created above.
+
+
+ From now on you can execute any query using this transformation scenario by simply selecting the Apply Transformation
+ Scenario toolbar button
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/production_good_practice/listings/listing-5.xml b/src/main/xar-resources/data/production_good_practice/listings/listing-5.xml
index fb5c43c4..a7c5246e 100644
--- a/src/main/xar-resources/data/production_good_practice/listings/listing-5.xml
+++ b/src/main/xar-resources/data/production_good_practice/listings/listing-5.xml
@@ -1,3 +1,4 @@
- xquery-submissionauthenticated
+ xquery-submission
+ authenticated
\ No newline at end of file
diff --git a/src/main/xar-resources/data/production_good_practice/listings/listing-6.xml b/src/main/xar-resources/data/production_good_practice/listings/listing-6.xml
index fe8c8966..665d022e 100644
--- a/src/main/xar-resources/data/production_good_practice/listings/listing-6.xml
+++ b/src/main/xar-resources/data/production_good_practice/listings/listing-6.xml
@@ -1,3 +1,4 @@
- xupdate-submissiondisabled
+ xupdate-submission
+ disabled
\ No newline at end of file
diff --git a/src/main/xar-resources/data/production_good_practice/production_good_practice.xml b/src/main/xar-resources/data/production_good_practice/production_good_practice.xml
index 0f1a23cc..5b214869 100644
--- a/src/main/xar-resources/data/production_good_practice/production_good_practice.xml
+++ b/src/main/xar-resources/data/production_good_practice/production_good_practice.xml
@@ -1,318 +1,423 @@
-
- Production Use - Good Practice
- 2017-12-20
-
- TBD
-
-
-
-
-
-
- Abstract
-
- From our and our clients' experiences of developing and using eXist-db in production environments a number of lessons have been learned. This Good Practice guide is an attempt to cover some of the considerations that should be taken into account when deploying eXist-db for use in a production environment.
- The concepts laid out within this document should not be considered absolute or accepted wholesale - they should rather be used as suggestions to guide users in their eXist-db deployments.
-
-
-
-
-
- The Server
-
- Ensure that your server is up-to-date and patched with any necessary security fixes.
- eXist-db is written in Java - so for performance and security reasons, please ensure that you have the latest and greatest Java JDK release that is compatible with your version of eXist. The latest version can always be found here at: http://java.sun.com and the recommended major version for a given eXist release can be found at: https://bintray.com/existdb/releases/exist#read
-
-
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Production Use - Good Practice
+ 1Q18
+
+ operations
+
+
+
+
+
+ From our and our clients' experiences of developing and using eXist-db in production
+ environments a number of lessons have been learned. This Good Practice guide is an attempt to
+ cover some of the considerations that should be taken into account when deploying eXist-db for
+ use in a production environment.
+ The concepts laid out within this document should not be considered absolute or accepted
+ wholesale - they should rather be used as suggestions to guide users in their eXist-db
+ deployments.
+
+
+
+
+ The Server
+
+
+
+ Ensure that your server is up-to-date and patched with any necessary security
+ fixes.
+
+
+ eXist-db is written in Java - so for performance and security reasons, please ensure
+ that you have the latest and greatest Java JDK release that is compatible with your
+ version of eXist. The latest version can always be found here at: http://java.sun.com and the recommended major version for a
+ given eXist release can be found at: https://bintray.com/existdb/releases/exist#read
+
+
+
+
+
+
+
+
+
+
+
+ Install from Source or Release?
+
+ Most users will install an officially released version of eXist-db on their production
+ systems and usually this is perfectly fine. However, for production systems there can be
+ advantages to installing eXist-db from source code.
+ eXist-db may be installed and build (see Building
+ eXist-db) from source code to a production system in one of two ways:
+
+
+ Via Local Build Machine (preferred)
+
+ You checkout the eXist-db code for a release branch (or trunk) from our GitHub
+ repository to a local machine. From here you build a distribution which you test and
+ then deploy to your live server.
+
+
+
+ Directly from GitHub
+
+ In this case you don't use a local machine for building an eXist-db distribution,
+ but you checkout the code from a release branch (or the develop branch) directly from
+ our GitHub repository on your server and build it in-situ.
+
+
+
+ Some advantages of installing eXist-db from source code are:
+
+
+ Patches
+
+ If patches or fixes are developed that are relevant to your specific needs, you can
+ update your code and re-build eXist.
+
+
+
+ Features
+
+ If you are following trunk and new features are developed which you are interested
+ in, you can update your code and re-build to take advantage of these.
+
+
+
+
+ eXist's code trunk is generally not recommended for production
+ use! Although it should always compile and be relatively stable, it may also contain as yet
+ unrecognised regressions or result in unexpected behaviour.
+
+
+
+
+
+ Upgrading
+
+ If you are upgrading the version of eXist-db that you use in your production system,
+ please always follow these two points:
+
+
+ Backup
+
+ Always make sure you have a full database backup before you upgrade.
+
+
+
+
+ Test
+
+ Always test your application in the new version of eXist-db in a development
+ environment to ensure expected behaviour before you upgrade your production
+ system.
+
+
+
-
- Install from Source or Release?
- Most users will install an officially released version of eXist-db on their production systems, usually this is perfectly fine. However there can be advantages to installing eXist-db from source code on a production system.
- eXist-db may be installed from source code to a production system in one of two ways:
-
-
- via Local Build Machine (preferred)
+
+
+
+
+
+
+ Configuring eXist
+
+ There are four main things to consider here:
+
+
+
+
+ Ensure that eXist-db is installed in a secure manner.
+
+
+
+
+
+ Configure eXist-db so it provides only what you need for your
+ application.
+
+
+
+
+
+ Configure your system and eXist-db so that eXist-db has access to enough resources
+ and the system starts and stops eXist-db in a clean manner.
+
+
+
+
+
+ Configure your system and eXist-db so that you get the maximum performance
+ possible.
+
+
+
+
+
+
+
+ Security - Permissions
+
+
+ eXist-db Permissions
+
+ eXist-db ships with fairly relaxed permissions to facilitate rapid application
+ development. However, for production systems these should be constrained:
+
+
+ admin account
- You checkout the eXist-db code for a release branch (or trunk) from our GitHub repository to a local machine, from here you build a distribution which you test and then deploy to your live server.
+ The password of the admin account is blank by default! Ensure that you set a
+ decent password.
-
-
- Directly from GitHub
-
- In this case you don't use a local machine for building an eXist-db distribution, but you checkout the code from a release branch (or the develop branch) directly from our GitHub repository on your server and build it in-situ.
-
-
-
- If you install eXist-db from source code, some advantages might be:
-
-
- patches
-
- If patches or fixes are developed that are relevant to your specific needs, you can update your code and re-build eXist.
-
-
-
- features
-
- If you are following trunk and new features are developed which you are interested in, you can update your code and re-build to take advantage of these.
-
-
-
-
-
-
- NOTE - eXist's code trunk is generally not recommended for production use, whilst it should always compile and be relatively stable, it may also contain as yet unrecognised regressions or result in unexpected behaviour.
-
-
-
-
-
-
- Upgrading
-
- If you are upgrading the version of eXist-db that you use in your production system, please always follow these two points:
-
+
+
+ default-permissions
-
- Backup - always make sure you have a full database backup before you upgrade.
+ The default permissions for creating resources and collections in eXist-db are
+ set in conf.xml. The current settings are ok, but you may like to
+ improve on them for your own application security.
+
+
+ /db permissions
-
- Test - always test your application in the new version of eXist-db in a development environment to ensure expected behaviour before you upgrade your production system.
+ The default permissions for /db are 0755, which should
+ be sufficient in most cases. In case you needed to change this, you could do that
+ with (here for 0775):
+ sm:chmod(xs:anyURI("/db"), "rwxrwxr-x")
-
-
-
-
-
-
-
- Configuring eXist
-
- There are four main things to consider here:
+
+
+
+
+
+ Operating System Permissions
+
+ eXist-db should be deployed and configured to run whilst following the security best
+ practices of the operating system on which it is deployed.
+ Typically we would recommend creating an exist user account and
+ exist user group with no login privileges (no shell
+ and empty password), changing the permissions of the eXist-db installation to be owned by
+ that user and group. Then run eXist-db using those credentials. An example of this on
+ OpenSolaris might be:
+
+
+
+
+
+
+
+ Security - Attack Surface
+
+ For any live application it is best practice to keep the attack surface of the
+ application as small as possible. There are three aspects to this:
-
-
- Security - Permissions - ensure that eXist-db is installed in a secure manner.
-
-
-
- Security - Attack Surface - configure eXist-db so it provides only what you need for your application.
-
-
-
- Resources - configure your system and eXist-db so that eXist-db has access to enough resources and the system starts and stops eXist-db in a clean manner.
-
-
-
- Performance - configure your system and eXist-db so that you get the maximum performance possible.
-
+
+ Limiting means of arbitrary code execution.
+
+
+ Reducing the application itself to the absolute essentials.
+
+
+ Limiting access routes to the application.
+
-
-
-
- Permissions
-
-
- eXist-db Permissions
-
- At present eXist-db ships with fairly relaxed permissions to facilitate rapid application development, but for production systems these should be constrained:
-
-
- admin account
-
- The password of the admin account is blank by default! Ensure that you set a decent password.
-
-
-
- default-permissions
-
- The default permissions for creating resources and collections in eXist-db are set in conf.xml. The current settings are fairly sane, but you may like to improve on them for your own application security.
-
-
-
- /db permissions
-
- The default permissions for /db are 0755, which should be sufficient in most cases. In the case you needed to change this, you could do that with (here for 0775):
- sm:chmod(xs:anyURI("/db"), "rwxrwxr-x")
-
-
+ eXist-db is no exception and should be configured for your production systems so that it
+ provides only what you need and no more. For example, the majority of applications will be
+ unlikely to require the WebDAV or SOAP Admin features for operation in a live environment.
+ These and other services can be disabled easily.
+ Means for anonymous users to execute arbitrary code require special attention. There are
+ two means of code execution in eXist, which make sense during development, but should be
+ reconsidered for production systems:
+
+
+ Java binding
+
+ The ability to execute java code from inside the XQuery processor is disabled by
+ default in conf.xml:
+ <xquery enable-java-binding="no" .../>
+ It is strongly recommended to keep it disabled on production systems.
+
+
+
+ REST server
+
+ We recommend to prevent eXist's REST server from directly receiving web requests,
+ and use URL Rewriting only to control code execution. The REST
+ server feature is enabled by default in
+ $EXIST_HOME/webapp/WEB-INF/web.xml. Changing the
+ param-value to true, allows you to filter request via your
+ own XQuery controller.
+
+ The following options allow a more fine-grained control over the REST server's
+ functionality:
+
+
+ XQuery submissions
+
+ We recommend to restrict the REST servers ability to execute XQuery code to
+ authenticated users, by modifying:
+ $EXIST_HOME/webapp/WEB-INF/web.xml:
+
+
+
+
+ XUpdate statements
+
+ In addition, we recommend to restrict the REST servers ability to execute
+ XUpdate statements. Simply modify
+ $EXIST_HOME/webapp/WEB-INF/web.xmlby changing the
+ param-value from enabled to
+ disabled:
+
+
+
-
-
- Operating System Permissions
+
+
- eXist-db should be deployed and configured to run whilst following the security best practices of the operating system on which it is deployed.
- Typically we would recommend creating an "exist" user account and "exist" user group with no login privileges (i.e. no shell and empty password), changing the permissions of the eXist-db installation to be owned by that user and group, and then running eXist-db using those credentials. An example of this on OpenSolaris might be:
-
-
-
+
+ Further considerations for a live environment:
+
+
+ Standalone mode
+
+ eXist-db can be operated in a cut-down standalone mode (see
+ server.(sh|bat)). This provides just the core services from the
+ database (no webapp file system access and no documentation). The entire application
+ has to be stored in the database and is served from there. This is an ideal starting
+ place for a production system.
+
+
+
+ Services
+
+ eXist-db provides services for accessing the database. You should reduce these to
+ the absolute minimum you need for your production application. If you are operating in
+ standalone mode, this is done via server.xml, otherwise
+ webapp/WEB-INF/web.xml. You should look at each configured service,
+ servlet or filter and ask yoursel: do we use this? Most production environments are
+ unlikely to need WebDAV or SOAP Admin (Axis).
+
+
+
+ Extension Modules
+
+ eXist-db loads several XQuery and Index extension modules by default. You should
+ modify the builtin-modules section of conf.xml and
+ only load what you need for your application.
+
+
+
+
-
+
-
- Attack Surface
+
+ Resources
- For any live application it is recognised best practice to keep the attack surface of the application as small as possible. There are three aspects to this:
-
-
- Limiting means of arbitrary code execution.
-
-
- Reducing the application itself to the absolute essentials.
-
-
- Limiting access routes to the application.
-
-
- eXist-db is no exception and should be configured for your production systems so that it provides only what you need and no more. For example, the majority of applications will be unlikely to require the WebDAV or SOAP Admin features for operation in a live environment, and as such these and other services can be disabled easily.
- Means for anonymous users to execute arbitrary code require special attention. There are two means of code execution in eXist, which make sense during development, but should be reconsidered for production systems.
-
-
- Java binding
-
- The ability to execute java code from inside the XQuery processor is disabled by default in the instances' conf.xml.
- <xquery enable-java-binding="no" .../>
- It is strongly recommended to keep it disabled on production systems.
-
-
-
- REST server
-
- We recommend to prevent eXist's REST server from directly recieving web requests, and use URL Rewriting to control code execution via URL instead. This feature is enabled by default in $EXIST_HOME/webapp/WEB-INF/web.xml. Changing the param-value to true, allows you to filter request via your own XQuery controller.
-
- The following options allow a more fine-grained control over aspects of remote code execution:
-
-
-
- XQuery submissions
-
- We recommend to restrict the REST servers ability to execute XQuery code to authenticated users, by modifying:$EXIST_HOME/webapp/WEB-INF/web.xml.
-
-
-
-
- XUpdate statements
-
- In addtion, we recommend to restrict the REST servers ability to execute XUpdate statements, because of the sensitive nature of update operation. Simply modify $EXIST_HOME/webapp/WEB-INF/web.xmlby changing the param-value from enabled to disabled.
-
-
-
-
- Further considerations for a live environment:
-
-
- Standalone mode
-
- eXist-db can be operated in a cut-down standalone mode (see server.(sh|bat)). This provides just the core services from the database, no webapp file system access, and no documentation. The entire application has to be stored in the database and is served from there. This is an ideal starting place for a production system.
-
-
-
- Services
-
- eXist-db provides several services for accessing the database. You should reduce these to the absolute minimum that you need for your production application. If you are operating in standalone mode, this is done via server.xml, else see webapp/WEB-INF/web.xml. You should look at each configured service, servlet or filter and ask yourself - do we use this? Most production environments are unlikely to need WebDAV or SOAP Admin (Axis).
-
-
-
- Extension Modules
-
- eXist-db loads several XQuery and Index extension modules by default. You should modify the builtin-modules section of conf.xml, to ONLY load what you need for your application.
-
-
-
-
-
-
-
-
- Resources
-
- You should ensure that you have enough memory and disk space in your system so that eXist-db can cope with any peak demands by your users.
-
-
- -Xmx
-
- However you decide to deploy and start eXist, please ensure that you allocate enough maximum memory to eXist-db via. the Java -Xmx setting. See backup.sh and startup.sh.
-
-
-
- cacheSize and collectionCache
-
- These two settings in the db-connection of conf.xml should be adjusted appropriately based on your -Xmx setting (above). See the tuning guide for advice on sensible values.
-
-
-
- disk space
-
- Please ensure that you have plenty of space for your database to grow. Unsurprisingly running out of disk space can result in database corruptions or having to rollback the database to a known state.
-
-
-
-
-
-
-
-
- Performance
-
- It has been reported by large scale users that keeping the eXist-db application, database data files and database journal on separate disks connected to different I/O channels can have a positive impact on performance. The location of the database data files and database journal can be changed in conf.xml.
-
-
-
-
-
-
- Backups
-
-
- This is fundamental - Make sure you have them, they are up-to-date and that they work!
- eXist-db provides 3 different mechanisms for performing backups -
-
-
- Full database backup.
-
-
- Differential database backup.
-
-
- Snapshot of the database data files.
-
-
- Each of these backup mechanisms is schedulable either with eXist-db or with your operating system scheduler. See the backup page and conf.xml for further details.
-
-
-
-
-
- Web Deployments
-
- eXist-db like any Web Application Server (Tomcat, WebLogic, GlassFish, etc.) should not be directly exposed to the Web. Instead, we would strongly recommend proxying your eXist-db powered Web Application through a Web Server such as Nginx or Apache HTTPD. See here for further details.
- If you proxy eXist-db through a Web Server, then you may also configure your firewall to only allow external access directly to the Web Server. If done correctly this also means that web users will not be able to access any eXist-db services except your application which is proxyied into the Web Servers namespace.
-
-
-
-
- Enable GZip Compression
-
- eXist-db by default operates inside the Jetty Application Server, Jetty
- (and most other Java Application Servers) provides a mechanism for enabling
- dynamic GZip compression of resources. This is to say that Jetty can be
- configured to dynamically GZip compress any resource received from the server by
- HTTP. Potentially for large resources, or even for frequently used resources.
- Enabling dynamic GZip compression can reduce the size of transfers, and as such
- reduce the transfer time of resources from the server to the client, hopefully
- resulting in a faster experience for the end-user.
- GZip Compression can be enabled in web.xml, which can be found in either
- $EXIST_HOME/webapp/WEB-INF/web.xml for default deployments or
- $EXIST_HOME/tools/jetty/etc/standalone/WEB-INF/web.xml for standalone
- deployments.
-
-
+ You should ensure that you have enough memory and disk space in your system so that
+ eXist-db can cope with peak demands.
+
+
+ -Xmx
+
+ However you decide to deploy and start eXist, please ensure that you allocate
+ enough maximum memory to eXist-db uwing the Java -Xmx setting. See
+ backup.sh and startup.sh.
+
+
+
+ cacheSize and collectionCache
+
+ These two settings in db-connection of conf.xml
+ should be adjusted appropriately based on your -Xmx setting (see above).
+ See the tuning
+ guide for advice on sensible values.
+
+
+
+ Disk space
+
+ Please ensure that you have plenty of space for your database to grow.
+ Unsurprisingly, running out of disk space can result in database corruptions or having
+ to rollback the database to a known state.
+
+
+
+
+
+
+
+
+ Performance
+
+ Keeping the eXist-db application, data and journal on separate disks, connected to
+ different I/O channels, can have a positive impact on performance. The location of the data
+ files and journals can be changed in conf.xml.
+
+
+
+
+
+
+ Backups
+
+
+ This is fundamental: Make sure you have them, that they are up-to-date
+ and that a restore is possible!
+ eXist-db provides 3 different mechanisms for performing backups -
+
+
+ Full database backup.
+
+
+ Differential database backup.
+
+
+ Snapshot of the database data files.
+
+
+ Each of these backup mechanisms can be scheduled, either with eXist-db or with your
+ operating system scheduler. See the backup article and conf.xml for further
+ details.
+
+
+
+
+
+ Web Deployments
+
+ eXist-db, like any Web Application Server (Tomcat, WebLogic, GlassFish, etc.), should not
+ be directly exposed to the Web. Instead, we strongly recommend proxying eXist-db through a Web
+ Server such as Nginx or Apache HTTPD. See here for further details.
+ If you proxy eXist-db through a Web Server, you can also configure your firewall to allow
+ external access directly to the Web Server only. If done correctly this means that web users
+ will not be able to access any eXist-db services directly, except your application, which is
+ proxyied into the Web Servers namespace.
+
+
+
+
+ Enable GZip Compression
+
+ eXist-db by default operates inside the Jetty Application Server. Jetty (and most other
+ Java Application Servers) provides a mechanism for enabling dynamic GZip compression of
+ resources. In other words: Jetty can be configured to dynamically GZip compress any resource
+ received from the server by HTTP. Enabling dynamic GZip compression can reduce the size of
+ transfers, and as such reduce the transfer time of resources from the server to the client,
+ hopefully resulting in a faster experience for the end-user.
+ GZip Compression can be enabled in web.xml, which can be found in either
+ $EXIST_HOME/webapp/WEB-INF/web.xml for default deployments or
+ $EXIST_HOME/tools/jetty/etc/standalone/WEB-INF/web.xml for standalone
+ deployments.
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/production_web_proxying/production_web_proxying.xml b/src/main/xar-resources/data/production_web_proxying/production_web_proxying.xml
index 159cbc26..6b4bdca0 100644
--- a/src/main/xar-resources/data/production_web_proxying/production_web_proxying.xml
+++ b/src/main/xar-resources/data/production_web_proxying/production_web_proxying.xml
@@ -1,93 +1,115 @@
-
- Production use - Proxying eXist-db behind a Web Server
- September 2009
-
- TBD
-
-
-
-
-
-
- Abstract
-
- From a security perspective, it is recognised best practice to proxy Web Application Servers behind dedicated Web Servers, and eXist-db is no exception.
- Some other nice side-effects of proxying eXist-db behind a Web Server include -
-
-
- Unified web namespace
-
- You can map eXist-db or an application build atop eXist-db into an existing web namespace. If your website is - http://www.mywebsite.com, then your eXist-db application could be mapped into http://www.mywebsite.com/myapplication/. However, if you are tempted to shorten the URL of WebDAV resources with such mapping, you will not succeed due to the specifications of WebDAV that are not designed to handle such cases.
-
-
-
- Virtual Hosting
-
- Providing your Web Server supports Virtual Hosting, then you should be able to proxy many URLs from different domains onto different eXist-db REST URLs which may belong to one or more eXist-db instances. This in effect allows a single eXist-db instance to perform virtual hosting.
-
-
-
- Examples are provided for -
-
-
-
- Nginx
-
-
- A very small but extremely poweful Web Server which is also very simple to configure. It powers some of the biggest sites on the Web.
-
-
-
-
- Apache HTTPD
-
-
- Likely the most prolific Web Server used on the web.
-
-
-
-
-
-
-
-
- Example 1 - Proxying a Web Domain Name to an eXist-db Collection
-
- In this example we look at how to proxy a web domain name onto an eXist-db Collection. We make the following assumptions -
-
-
- http://www.mywebsite.com is our website domain name address
-
-
- eXist-db is running in standalone mode (i.e. http://localhost:8088/) on the same host as the Web Server (i.e. http://localhost:80/)
-
-
- /db/apps/mywebsite.com is the eXist-db collection we want to proxy
-
-
- Web Server access logging will be written to /srv/www/vhosts/mywebsite.com/logs/access.log
-
-
-
-
-
-
- Nginx
-
- This needs to be added to the http section of the nginx.conf file -
-
-
-
-
-
-
- Apache HTTPD
-
- This needs to be added to your httpd.conf -
-
-
-
+
+ Production use - Proxying eXist-db behind a Web Server
+ 1Q18
+
+ operations
+
+
+
+
+
+
+ From a security perspective it is recognised best practice to proxy Web Application Servers
+ behind dedicated Web Servers. eXist-db is no exception. This article will provide you with some
+ examples on how to do this.
+
+
+
+
+
+
+ Introduction
+
+ Interesting side-effects of proxying eXist-db behind a Web Server:
+
+
+ Unified web namespace
+
+ You can map eXist-db, or an application build on eXist-db, into an existing web
+ namespace. If your website is - http://www.mywebsite.com, then your
+ eXist-db application could be mapped into
+ http://www.mywebsite.com/myapplication/. However, if you are tempted to
+ shorten the URL of WebDAV resources with such a mapping, you will not succeed, due to
+ the specifications of WebDAV that are not designed to handle such cases.
+
+
+
+ Virtual Hosting
+
+ Providing your Web Server supports Virtual Hosting, you should be able to proxy many
+ URLs from different domains onto different eXist-db REST URLs, which may belong to one
+ or more eXist-db instances. This allows a single eXist-db instance to perform virtual
+ hosting.
+
+
+
+
+ Examples are provided for:
+
+
+
+ Nginx
+
+
+ A very small but extremely powerful Web Server which is also simple to configure. It
+ powers some of the biggest sites on the Web. See .
+
+
+
+
+ Apache HTTPD
+
+
+ Likely the most prolific Web Server used on the web. See .
+
+
+
+
+
+
+
+
+ Example: Proxying a Web Domain Name to an eXist-db Collection
+
+ In this example we look at how to proxy a web domain name onto an eXist-db Collection. We
+ make the following assumptions:
+
+
+ http://www.mywebsite.com is our website domain name address
+
+
+ eXist-db is running in standalone mode (i.e. http://localhost:8088/) on the same host
+ as the Web Server (i.e. http://localhost:80/)
+
+
+ /db/apps/mywebsite.com is the eXist-db collection we want to proxy
+
+
+ Web Server access logging will be written to
+ /srv/www/vhosts/mywebsite.com/logs/access.log
+
+
+
+
+
+
+ Using Nginx
+
+ This needs to be added to the http section of the nginx.conf file:
+
+
+
+
+
+
+ Using Apache HTTPD
+
+ This needs to be added to your httpd.conf:
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/replication/replication.xml b/src/main/xar-resources/data/replication/replication.xml
deleted file mode 100644
index 7d61d9ba..00000000
--- a/src/main/xar-resources/data/replication/replication.xml
+++ /dev/null
@@ -1,24 +0,0 @@
-
-
- Replication & Messaging
- October 2012
-
- TBD
-
-
-
-
-
-
- This eXist-db extension has moved to GitHub
-
-
- Starting with eXist-db v2.2 the document replication extension will be released as a XAR file independant from
- the eXist-db release schedule.
-
- The extension is now available on GitHub.
-
-
-
\ No newline at end of file
diff --git a/src/main/xar-resources/data/repo/repo.xml b/src/main/xar-resources/data/repo/repo.xml
index fb134a74..d4f3eefd 100644
--- a/src/main/xar-resources/data/repo/repo.xml
+++ b/src/main/xar-resources/data/repo/repo.xml
@@ -1,491 +1,501 @@
-
- Package Repository
- October 2012
-
- TBD
-
-
-
-
-
-
- Introduction
-
- The eXist-db package repository is a central component of eXist-db. It makes
- it easy to manage and deploy external packages (.xar archives) which include
- everything they need to run third party XQuery libraries, full applications or
- other XML technology functionality. This document provides technical details on the
- packaging format.
- In previous versions of eXist-db, most applications were split into two parts:
-
-
- the application code (XQuery modules, HTML pages etc.) residing in the webapp
- directory on the file system
-
-
- the data stored inside the database
-
-
- This split made it difficult to redistribute applications. For larger setups,
- maintenance easily became tedious. To solve those problems, eXist-db has the
- concept of self-contained, modular applications which can be deployed into any database
- instance using a standardized packaging format. Later eXist-db distributions are built around this
- concept: documentation, examples and administration utilities have been moved out of the
- webapp directory and into separate application packages, which can be easily installed
- or removed on demand. The Dashboard is now the central hub for managing packages.
- The package repository is based on and extends the EXPath Packaging System. The core of
- the EXPath packaging specification has been designed to work across different XQuery
- implementations and is targeted at managing extension libraries (including XQuery, Java
- or XSLT code modules). eXist-db extends this core by adding a facility for the automatic
- deployment of entire applications into the database.
- eXist-db packages may fall into one of the following categories:
-
-
-
- Applications containing application code,
- HTML views, associated services, resources and data. An application always has a
- web interface which can be displayed, if for example the user clicks on the
- application icon in the Dashboard.
-
-
-
- Resource packages containing only data or
- resources used by other applications, e.g. JavaScript libraries shared by
- several application packages. A resource package has no web view, but needs to
- be deployed into the database.
-
-
-
- Library packages providing a set of XQuery
- library modules to be registered with eXist-db and used by other packages. A
- library package may also contain Java jar archives to be loaded into the
- eXist-db classpath. It has no web view and is not deployed into the
- database.
-
-
- Those categories are not exclusive: an application may also include resources and
- XQuery libraries and is not required to move those into separate packages.
-
-
-
-
-
- Creating Packages
-
- Creating new packages is fairly easy if you use eXide. This is described in the
- web development starter document.
- The following sections will cover some details of the packaging format from a more
- technical perspective.
-
-
-
-
-
- EXPath Packaging Format
-
- An EXPath package is essentially an archive file in ZIP format. By convention, the
- file name extension of the package is .xar. The archive
- must contain two XML descriptor files in the root
- directory: expath-pkg.xml and
- repo.xml:
+
+ Package Repository
+ 1Q18
+
+ application-development
+
+
+
+
+
+ The eXist-db package repository is a central component of eXist-db. It makes it easy to
+ manage and deploy external packages (.xar archives) which include everything they
+ need to run third party XQuery libraries, full applications or other XML technology
+ functionality. This article provides technical details on the packaging format.
+
+
+
+
+ Introduction
+
+ The package repository is based on and extends the EXPath Packaging System. The core of the EXPath packaging
+ specification has been designed to work across different XQuery implementations and is
+ targeted at managing extension libraries (including XQuery, Java or XSLT code modules).
+ eXist-db extends this core by adding a facility for the automatic deployment of entire
+ applications into the database.
+ eXist-db packages may fall into one of the following categories:
+
+
+ Applications
+
+ Contain application code, HTML views, associated services, resources and data. An
+ application always has a web interface which can be displayed, if for example the user
+ clicks on the application icon in the Dashboard.
+
+
+
+ Resource packages
+
+ Contain only data or resources used by other applications, e.g. JavaScript
+ libraries shared by several application packages. A resource package has no web view,
+ but needs to be deployed into the database.
+
+
+
+ Library packages
+
+ Provide a set of XQuery library modules to be registered with eXist-db and used by
+ other packages. A library package may also contain Java .jar archives to be
+ loaded into the eXist-db classpath. It has no web view and is not deployed into the
+ database.
+
+
+
+
+ These categories are not exclusive. An application may include resources and XQuery
+ libraries and is not required to move those into separate packages.
+
+
+
+
+
+ EXPath Packaging Format
+
+ An EXPath package is essentially an archive file in ZIP format. By convention, the file
+ extension of the package is .xar.
+ The archive must contain two XML descriptor files in the root
+ directory: expath-pkg.xml and repo.xml:
+
+
+
+ expath-pkg.xml
+
+
+ This is the standard EXPath descriptor as defined by the EXPath specification. It
+ specifies the unique name of the package, lists dependencies and any library modules to
+ register globally. See .
+
+
+
+
+ repo.xml
+
+
+ The eXist-db specific deployment descriptor: it contains additional metadata about
+ the package and controls how it will be deployed into the database. See .
+
+
+
+ For library packages repo.xml is optional. However, we recommend to
+ always provide it for better tool integration.
+
+
+
+
+ expath-pkg.xml
+
+ As an example, the EXPath descriptor for the documentation app is shown below:
+
+ The schema of this file is documented in the specification. In short, the attributes are
+ as follows:
-
-
- expath-pkg.xml
-
+
+ name
+
+ URI used as a unique identifier for the package. The URI does not need to point to
+ an existing web site. The package repository will use this URI to identify a package
+ within the system.
+
+
+
+ abbrev
+
+ Short abbreviation for the package. This will be used as part of the file name for
+ the .xar. We recommend to choose a short, simple name, without
+ spaces or punctuation characters.
+
+
+
+ version
+
+ Version of the package. This allows the Package Manager to determine if newer
+ versions of the same package are available.
+
+
+
+ spec
+
+ Version of the packaging specification the package conforms to. Always
+ 1.0 for the current specification.
+
+
+
+ title
+
+ Descriptive title to display to the user (in the Dashboard)
+
+
+
+
+
+ Dependency Management
+
+ A package may depend on one or more other packages. The Package Manager in the
+ Dashboard will resolve these dependencies before deployment. Dependant packages will be
+ installed automatically from the public repository. It is an error if a dependency cannot
+ be resolved.
+ A dependency on another package is defined by using a reference to the unique name of
+ the other package: The name attribute (URI) of the expath-pkg.xml descriptor.
+ For eaxample:
+ <dependency package="http://exist-db.org/apps/shared"/>
+ It is also possible to create a dependency on a specific version, based on Semantic Versioning. This can be done by adding either of the
+ attributes: version, semver, semver-min,
+ semver-max:
+
+
+ version
- This is the standard EXPath descriptor as defined by the EXPath
- specification. It specifies the unique name of the package, lists
- dependencies and any library modules to register globally.
+ A simple version string which must exactly match the version string of the
+ package to install.
-
-
-
- repo.xml
-
+
+
+ semver
- The eXist-db specific deployment descriptor: it contains additional
- metadata about the package and controls how it will be deployed into the
- database.
+ A "semantic" version string: the version number must follow the scheme
+ x.x.x. Selects the highest version in the range of versions starting
+ with semver.
+ For example, if semver is 1.2, a package with version
+ 1.2.3 will be selected because it is in the 1.2 release
+ series. Likewise, if semver is 1, any package with a
+ version starting with 1 will be chosen.
-
-
- Though library packages do not really need repo.xml, we
- recommend to always provide both for better tool integration.
-
-
-
-
- Descriptors: expath-pkg.xml
-
- As an example, the EXPath descriptor for the documentation app is shown
- below:
-
- The schema of this file is documented in the specification. In short, the
- attributes are as follows:
-
-
- name
-
- a URI used as a unique identifier for the package. The URI does
- not need to point to an existing web site. The package repository
- will use this URI to identify a package within the system.
-
-
-
- abbrev
-
- a short abbreviation for the package. This will be used as part of
- the file name for the .xar. We thus recommend
- to choose a short, simple name without spaces or punctuation
- characters.
-
-
-
- version
-
- the version of the package: allows the Package Manager to
- determine if newer versions of the same package are
- available.
-
-
-
- spec
-
- the version of the packaging specification the package conforms
- to. Always "1.0" for the current specification.
-
-
-
- title
-
- a descriptive title to display to the user, e.g. in the
- Dashboard
-
-
-
-
-
-
-
-
- Dependency Management
-
- As shown above, a package may depend on one or more other packages. The
- Package Manager in the Dashboard will resolve dependencies before deployment.
- Dependant packages will be installed automatically from the public repository.
- It is an error if a dependency cannot be resolved.
- A dependency on another package is defined by reference to the unique name of
- the other package (as given in the name attribute (URI) of the expath-pkg.xml
- descriptor):
- <dependency package="http://exist-db.org/apps/shared"/>
- It is also possible to create a dependency on a specific version, based on Semantic Versioning. This can be
- done by adding either of the attributes: version, semver, semver-min,
- semver-max. The attributes are mutually exclusive, except for semver-min and
- semver-max, which may appear together.
-
-
- version
-
- A simple version string which has to exactly match the version
- string of the package to install.
-
-
-
- semver
-
- A "semantic" version string: the version number has to follow the
- scheme "x.x.x". Selects the highest version in the range of
- versions starting with semver. For example, if semver is "1.2", a
- package with version "1.2.3" will be selected because it is in the
- 1.2 release series. Likewise, if semver is "1", any package with a
- version starting with 1 will be chosen.
-
-
-
- semver-min
-
- Defines a minimal required version according to the semver
- scheme.
-
-
-
- semver-max
-
- Maximum version allowed.
-
-
-
- We definitely recommend to prefer semver, semver-min and semver-max where
- possible.
- It is also possible to require a certain eXist-db version for versions greater than 2.2. The Dashboard will prevent installation of packages into unsupported instances and display a warning to the user.
- To require a specific eXist-db version, include a processor dependency in your descriptor:
- <dependency processor="eXist-db" version="trunk > rev 18070"/>
-
+
+
+ semver-min
+
+ Defines a minimal required version according to the semver
+ scheme.
+
+
+
+ semver-max
+
+ Maximum version allowed.
+
+
+
+ These attributes are mutually exclusive, except for semver-min and
+ semver-max, which may appear together. We recommend using
+ semver, semver-min and semver-max where
+ possible.
+ It is also possible to require a certain eXist-db version (for versions greater than
+ 2.2). The Dashboard will prevent installation of packages into unsupported instances and
+ display a warning to the user. To require such a specific eXist-db version, include a
+ processor dependency in your descriptor:
+ <dependency processor="eXist-db" version="trunk > rev 18070"/>
+
-
- Library Modules
-
- A package may list one or more library modules to register with eXist-db. The
- registered modules will become globally available within the eXist-db instance
- and may be used by other packages without knowing where the module code is
- stored. For example, the following descriptor registers the module functx.xql using the given namespace:
-
- The namespace has to correspond to the namespace defined in the module
- declaration of the XQuery module. The file should be placed into a subdirectory
- of the .xar archive, named "content". The structure of the .xar for the functx
- library would thus look as follows:
-
- Only XQuery files which are registered in expath-pkg.xml need to go into the
- special directory. You are free to keep other XQuery files wherever you want.
- Also, XQuery resources which are only used by a single application should
- not be registered (to avoid messing up the global
- context). Registering a module only makes sense for libraries which will
- likely be used by several applications.
- After installing the package, you should be able to use the registered XQuery
- modules from anywhere within the database instance without knowing the exact
- import path. Thus the following import statement will be sufficient to import
- the functx module:
-
-
+
+ Library Modules
+
+ A package may list one or more library modules to register with eXist-db. The
+ registered modules will become globally available within the eXist-db instance and may be
+ used by other packages without knowing where the module code is stored.
+ For example, the following descriptor registers the module
+ functx.xql using the given namespace:
+
+ The namespace has to correspond to the namespace defined in the module declaration of
+ the XQuery module. The file should be placed into a subdirectory of the .xar archive,
+ named "content". The structure of the .xar for the functx
+ library would be as follows:
+
+ Only XQuery files which are registered in expath-pkg.xml need to go into
+ this special directory. You are free to keep other XQuery files wherever you want. Also,
+ XQuery resources which are only used by a single application, should
+ not be registered (to avoid messing up the global context).
+ Registering a module only makes sense for libraries which will likely be used by several
+ applications.
+ After installing the package, you should be able to use the registered XQuery modules
+ from anywhere within the database instance without knowing the exact import path. Thus the
+ following import statement will be sufficient to import the functx
+ module:
+
+
-
- Java Libraries
-
- eXist-db also supports XQuery extension modules written in Java. They require
- a slightly different mechanism for integration into a .xar
- package. This is an extension to the standard EXPath format and should thus go
- into a separate file, named exist.xml. As an example, the
- exist.xml descriptor of the cryptographic extension module is shown
- below:
-
- The descriptor may contain one or more jar elements, each pointing to a Java
- .jar archive to be installed. Arbitrary jars can be
- listed here: they do not need to be XQuery extension modules. Again, the jar
- files should be placed into the "content" subdirectory of the .xar file.
- All jars will be dynamically added to eXist-db's class loader and become
- immediately available after deploying a package. A restart of eXist-db is not
- required.
- The java element registers an XQuery extension module written in Java. This is
- similar to the xquery element discussed above, except that the namespace is
- mapped to a Java class instead of an XQuery file. The Java class should point to
- the Module class which defines the module.
-
-
-
-
-
-
- The repo.xml Deployment Descriptor
-
- The deployment descriptor contains additional metadata and defines how the package
- will be installed into an eXist-db database instance. An example is given
- below:
+
+ Java Libraries
+
+ eXist-db also supports XQuery extension modules written in Java. These require a
+ slightly different mechanism for integration into a .xar package. This
+ is an extension to the standard EXPath format and should goes into a separate file, named
+ exist.xml.
+ As an example, the exist.xml descriptor of the cryptographic extension
+ module is shown below:
+
+ The descriptor may contain one or more jar elements, each pointing to a
+ Java .jar archive to be installed. Arbitrary .jar files
+ can be listed here: they do not need to be XQuery extension modules. Again, the
+ .jar files should be placed into the "content" subdirectory of the
+ .xar file.
+ All .jar files will be dynamically added to eXist-db's class loader and
+ become immediately available after deploying a package. A restart of eXist-db is not
+ required.
+ The java element registers an XQuery extension module written in Java. This
+ is similar to the xquery element for library modules, except that the namespace
+ is mapped to a Java class instead of an XQuery file. The Java class should point to the
+ Module class which defines the module.
+
+
+
+
+
+
+
+ repo.xml
+
+ The deployment descriptor contains additional metadata and defines how the package will
+ be installed into an eXist-db database instance. For example:
- The two settings: type and target determine how a
- package is handled by the installer:
+ The two settings: type and target determine how a package is
+ handled by the installer:
-
-
-
-
-
-
-
- Type of package
-
-
- type
-
-
- target
-
-
-
-
-
-
-
- Application package
-
-
- application
-
-
- specified
-
-
-
-
- Resource package
-
-
- library
-
-
- specified
-
-
-
-
- Library package
-
-
- library
-
-
- not specified
-
-
-
-
+
+
+
+
+
+
+
+ Type of package
+
+
+ type
+
+
+ target
+
+
+
+
+
+
+ Application package
+
+
+ application
+
+
+ specified
+
+
+
+
+ Resource package
+
+
+ library
+
+
+ specified
+
+
+
+
+ Library package
+
+
+ library
+
+
+ not specified
+
+
+
+
- An application package has type set to "application" and specifies a target because it needs to be deployed into the database. Contrary to this, a library package only registers XQuery or other modules, but no resources need to be stored into the db, so target is empty.
- The general metadata fields should not need to be explained. The relevant elements are:
+ An application package has type set to application and specifies
+ a target because it needs to be deployed into the database. Contrary to this, a
+ library package only registers XQuery or other modules, but no resources need to be stored
+ into the database, so target is empty.
+
-
- type
-
- Should be set to either "application" or "library". We assume a library has no GUI (i.e. no HTML view). A library will thus not be shown on the main Dashboard page, which only lists applications.
-
-
-
- target
-
- Specifies the collection where the contents of the package will be stored. Top-level files in the package will end up in this collection, resources in sub-directories will go into sub-collections. Please note that the target collection can be changed by the package manager during install. It is just a recommendation, not a requirement.
- The collection path should always be relative to the repository root collection defined in the configuration.
-
-
-
- permissions
-
- You can define package specific permissions in the repo.xml to use when uploading package contents like this:
-
- <permissions user="app-user" password="123" group="app-group" mode="rw-rw-r--"/>
- All resources and collections will be owned by the specified user and permissions will be changed to those given in mode. If the user does not exist, the deploy function will try to create it, using the password specified in attribute password.
- Concerning permissions, the execute ("x") flag will be set automatically on all XQuery files in addition to the default permissions defined in the descriptor. For more control over permissions, use a post-install XQuery script (see element "finish" below). It is generally recommended to specify users in this manner when a package requires write privileges to the database, and to use a custom user-group (i.e. not "dba"). To avoid conflicts with locally defined user-names and groupnames, packages that do not require write access can ommit permissions in their repo.xml, such packages will be assigned to the guest usergroup by default.
-
-
-
- prepare
-
- Points to an XQuery script inside the root of the package archive, which will be executed before any package data is uploaded to the database. By convention the XQuery script should be called pre-install.xql, though this is not a requirement.
- If you create a package via eXide, it will generate a default pre-install.xql which uploads the default collection configuration to the system collection. This needs to be done before deployment to guarantee that index definitions are applied when data is uploaded to the db.
- The target collection, the file system path to the current package directory and eXist-db's home directory are passed to the script as external variables:
-
- The script may use those variables to read files contained in the package.
-
-
-
- finish
-
- Like prepare, this element should point to an XQuery script, which will be executed after all data has been uploaded to the database. It receives the same external variables as the prepare script. The convention is to name the script post-install.xql.
- Use the finish trigger to run additional tasks or move data into different collections. For example, the XQuery function documentation app runs an indexing task from the finish trigger to extract documentation from all XQuery modules known to the db at the time.
-
-
-
- deployed
-
- This element will be set automatically when the package is deployed into a database instance. It is used by eXide to track changes and does not need to be specified in the original repo.xml descriptor.
-
-
+
+ type
+
+ Should be set to either application or library. We
+ assume a library has no GUI. A library will therefore not be shown on the main
+ Dashboard page, which only lists applications.
+
+
+
+ target
+
+ Specifies the collection where the contents of the package will be stored.
+ Top-level files in the package will end up in this collection, resources in
+ sub-directories will go into sub-collections. Please note that the target collection
+ can be changed by the package manager during install: It is a recommendation, not a
+ requirement.
+ The collection path should be relative to the repository root collection defined
+ in the configuration (usually /db/apps).
+
+
+
+ permissions
+
+ You can define package specific permissions in the repo.xml to
+ use when uploading package contents like this:
+ <permissions user="app-user" password="123" group="app-group" mode="rw-rw-r--"/>
+ All resources and collections will be owned by the specified user and permissions
+ will be changed to those given in mode. If the user does not exist,
+ the deploy function will try to create it, using the password specified in the
+ password attribute.
+ Concerning permissions: the execute x flag will be set automatically
+ on all XQuery files, in addition to the default permissions defined in the descriptor.
+ For more control over permissions, use a post-install XQuery script (see element
+ finish below).
+ It is generally recommended to specify users in this manner when a package
+ requires write privileges to the database, and to use a custom user-group (i.e. not
+ dba). To avoid conflicts with locally defined user-names and group
+ names, packages that do not require write access can omit permissions in their
+ repo.xml. These packages will be assigned to the
+ guest user group by default.
+
+
+
+ prepare
+
+ Points to an XQuery script inside the root of the package archive, which will be
+ executed before any package data is uploaded to the database. By convention the XQuery
+ script is called pre-install.xql, but this is not a
+ requirement.
+ If you create a package via eXide, it will generate a default
+ pre-install.xql which uploads the default collection
+ configuration to the system collection. This needs to be done before deployment to
+ guarantee that index definitions are applied when data is uploaded to the db.
+ The target collection, the file system path to the current package directory and
+ eXist-db's home directory, are passed to the script as external variables:
+
+ The script can use these to read files contained in the package.
+
+
+
+ finish
+
+ Like prepare, this element must point to an XQuery script. This will be
+ executed after all data has been uploaded to the database. It
+ receives the same external variables as the prepare script. The convention is to name
+ the script post-install.xql.
+ Use the finish script to run additional tasks or move data into
+ different collections. For example, the XQuery function documentation app runs an
+ indexing task from the finish trigger to extract documentation from all XQuery modules
+ known to the database at the time.
+
+
+
+ deployed
+
+ This element is set automatically when the package is deployed into a database
+ instance. It is used by eXide to track changes and does not need to be specified in
+ the original repo.xml descriptor.
+
+
-
-
-
-
-
- Configuring the repository root
-
- The root collection for deployed packages can be configured in
- conf.xml:
- <repository root="/db/apps"/>
- The install location specified in the target element of
- repo.xml will always be relative to this root
- collection.
- eXist-db's URL rewriting is by default configured to map any path starting with
- /apps to the repository root collection. Check
- webapp/WEB-INF/controller-config.xml and the URL rewriting
- documentation.
-
-
-
-
-
- Programmatically installing packages
-
- The repo XQuery module provides a number of functions to
- programmatically install, remove or inspect packages. The Dashboard Package Manager
- relies on the same functions.
- The module distinguishes between installation and
- deployment steps. The reason for this distinction is: while
- the installation process is standardized by the EXPath packaging specification, the
- deployment step is implementation defined and specific to eXist-db.
- installation will register a package with the EXPath
- packaging system, but not copy anything into the database. Deployment
- will deploy the application into the database as specified by the
- repo.xml descriptor.
- The most convenient way to install a package are the
- repo:install-and-deploy and
- repo:install-and-deploy-from-db functions. repo:install-and-deploy
- downloads the specified package from a public repository. For example, one can
- install the eXist-db demo apps using the following call:
- repo:install-and-deploy("http://exist-db.org/apps/demo", "0.2.2", "http://demo.exist-db.org/exist/apps/public-repo/modules/find.xql")
- The first parameter denotes the unique name of the package to install. The second
- may contain a specific version or the empty sequence. The third parameter is the URI
- for the public repository API. The function call will download, install and deploy
- the package as well as any dependencies it defines. If the installation succeeds,
- an element will be returned to indicate the target collection into which the package
- was deployed.
- The repo:install-and-deploy-from-db function works in a similar way,
- but reads the package data to install from a resource stored in the database.
- To uninstall a package, you should first call repo:undeploy, followed
- by repo:remove, e.g.:
- repo:undeploy("http://exist-db.org/apps/demo"), repo:remove("http://exist-db.org/apps/demo")
- To list all installed packages, call repo:list, which will return a
- the unique name of every installed package.
-
-
-
-
-
- Running your own repository
-
- You can run your private repository and install packages from it, e.g. to
- distribute applications to your customers. The eXist-db repository is implemented by
- the application package http://exist-db.org/apps/public-repo. The code
- can be downloaded from the eXist-db
- GitHub repo.
- Once you have built and installed the app, you can upload the package xars you wish to
- distribute into the collection public-repo/public. To make the
- uploaded xars available, run the query
- public-repo/modules/update.xql once as an admin user. This
- will create a document apps.xml in
- public-repo/public.
-
-
-
-
-
- General considerations when writing a package
-
- Packages should be portable and should thus not make any assumptions about the
- collection path in which they will be installed. In general it is best to use
- relative paths throughout XQuery modules, in particular for import statements. Just
- as a reminder: a relative path in an "import module namespace..." expression is
- always relative to the XQuery which contains the import.
- If an XQuery needs to access data provided by another package, it should locate
- the other package by its package name and not by using an absolute collection path
- which may change in the future. For example, if an application requires access to
- data stored in another package called "data-pkg", it could define a variable to
- point to the correct collection as follows:
-
-
+
+
+
+
+
+
+
+ General considerations when writing a package
+
+ Packages should be portable and should not make any assumptions about the collection path
+ in which they will be installed. So best to use relative paths throughout XQuery modules, in
+ particular for import statements. Just as a reminder: a relative path in an "import module
+ namespace..." expression is always relative to the XQuery which contains the import.
+ If an XQuery needs to access data provided by another package, it should locate the other
+ package by its package name and not by using an absolute collection path which may change in
+ the future. For example, if an application requires access to data stored in another package
+ called "data-pkg", it could define a variable to point to the correct collection as
+ follows:
+
+
+
+
+
+
+ Configuring the repository root
+
+ The root collection for deployed packages can be configured in
+ conf.xml:
+ <repository root="/db/apps"/>
+ The install location specified in the target element of
+ repo.xml will always be relative to this root collection.
+ eXist-db's URL rewriting is by default configured to map any path starting with
+ /apps to the repository root collection. Check
+ webapp/WEB-INF/controller-config.xml and the URL rewriting
+ documentation.
+
+
+
+
+
+ Programmatically installing packages
+
+ The repo XQuery module provides a number of functions to
+ programmatically install, remove or inspect packages. The Dashboard Package Manager relies on
+ the same functions.
+ The module distinguishes between installation and
+ deployment steps. The reason for this distinction is: while the
+ installation process is standardized by the EXPath packaging specification, the deployment
+ step is implementation defined and specific to eXist-db. Installation
+ will register a package with the EXPath packaging system, but not copy anything into the
+ database. Deployment will deploy the application into the database as
+ specified by the repo.xml descriptor.
+ The most convenient way to install a package are the repo:install-and-deploy
+ and repo:install-and-deploy-from-db functions.
+ repo:install-and-deploy downloads the specified package from a public
+ repository. For example, one can install the eXist-db demo apps using the following
+ call:
+ repo:install-and-deploy("http://exist-db.org/apps/demo", "0.2.2", "http://demo.exist-db.org/exist/apps/public-repo/modules/find.xql")
+ The first parameter denotes the unique name of the package to install. The second may
+ contain a specific version or the empty sequence. The third parameter is the URI for the
+ public repository. The function call will download, install and deploy the package as well as
+ any dependencies it defines. If the installation succeeds, an element will be returned to
+ indicate the target collection into which the package was deployed.
+ The repo:install-and-deploy-from-db function works in a similar way, but
+ reads the package data to install from a resource stored in the database.
+ To uninstall a package, you should first call repo:undeploy, followed by
+ repo:remove. For instance:
+ repo:undeploy("http://exist-db.org/apps/demo"), repo:remove("http://exist-db.org/apps/demo")
+ To list all installed packages, call repo:list. This will return the unique
+ name of every installed package.
+
+
+
+
+
+ Running your own repository
+
+ You can run your private repository and install packages from it, for instance to
+ distribute applications to your customers. The eXist-db repository is implemented by the
+ application package http://exist-db.org/apps/public-repo. The code can be
+ downloaded from the eXist-db
+ GitHub repo.
+ Once you have built and installed the app, you can upload the package .xar
+ files you wish to distribute into the collection public-repo/public. To
+ make the uploaded .xar files available, run the query
+ public-repo/modules/update.xql once as an admin user. This
+ will create a document apps.xml in
+ public-repo/public.
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/scheduler/scheduler.xml b/src/main/xar-resources/data/scheduler/scheduler.xml
index 9461b204..86a14862 100644
--- a/src/main/xar-resources/data/scheduler/scheduler.xml
+++ b/src/main/xar-resources/data/scheduler/scheduler.xml
@@ -1,586 +1,558 @@
-
- Scheduler Module
- September 2009
-
- TBD
-
-
-
-
-
-
+
+ Scheduler Module
+ 1Q18
+
+ operations
+ application-development
+
+
+
+
+
+ eXist has a scheduler based on Quartz, a full-featured, open source job scheduling system.
+ This article will explain how to use the scheduler.
+
+
+
+
+ Activating the Scheduler
+
+ There are two type of uses for the Quartz Scheduler within eXist-db. The first is always
+ active and is invoked at system startup. The second is the XQuery function module for
+ initiating and managing scheduled jobs. This second use is not activated out-of-the-box. It has to be activated through the
+ conf.xml file:
+
+
+
+ System Start-up
+
+ The jobs that are initiated at system start-up are invoked by the settings in the
+ scheduler element in the conf.xml file. There is a full set of
+ instructions with (commented out) example jobs here.
+
+
+
+ XQuery Function Module
+
+ The scheduler XQuery function module is activated by uncommenting the following in
+ conf.xml:
+
+ Once the scheduler XQuery function module is active, XQuery code can be written to
+ invoke and manage user type jobs (see ).
+
+
+
+
+
+
+
+
+
+ Job Types
+
+ There are three types of scheduler jobs:
+
+
+ startup
+
+ Start-up jobs are executed once during the database start-up, but before the
+ database becomes available. These jobs are synchronous.
+
+
+
+ system
+
+ System jobs require the database to be in a consistent state. All database
+ operations will stop until the method returns or throws an exception. Any exception will
+ be caught and a warning is written to the log.
+
+
+
+ user
+
+ User jobs may be scheduled at any time and may be mutually exclusive or
+ non-exclusive.
+
+
+
+
+
+
+
+
+ Java vs. XQuery Jobs
+
+ The system makes a distinction between XQuery and Java jobs:
+
+
+
+ XQuery Jobs
+
+ XQuery jobs can only be user type jobs.
+ If the job is written in XQuery this should be a path to an XQuery stored in the
+ database. For instance /db/myCollection/myJob.xql.
+ XQuery job's will be launched under the guest account initially. The running XQuery
+ can switch permissions through calls to xmldb:login().
+
+
+
+ Java Jobs
+
+ Java jobs can be startup, system or user type
+ jobs.
+ A Java job invoked from the XQuery function module has to be extended from the
+ org.exist.scheduler.UserJavaJob class. A start-up or system Java job that
+ is invoked from the conf.xml file implements
+ org.exist.storage.SystemTask.
+
+
+
+
+
+
+
+
+
+
+
+
+ Periodic Runs
+
+ A scheduled job can be run on a periodic basis. When the job is scheduled, you can specify
+ to run every n milliseconds. There is the additional option to start after
+ x milliseconds and to only be repeated y times after the initial
+ execution.
+
+
+
+
+
+ Cron Tutorial
+
+
+ This section was copied from the OpenSymphony CronTriggers Tutorial.
+
+
+
+
+ Introduction
- Quartz is a full-featured, open source job scheduling system that has been
- integrated with eXist. Quartz can be used to create simple or complex schedules for
- executing tens, hundreds, or even tens-of-thousands of jobs. The Quartz Scheduler
- includes many enterprise-class features, such as JTA transactions and
- clustering.
-
-
-
-
- Activating
-
- There are two type of uses of the Quartz Scheduler within eXist-db. The first
- is always active and is invoked at system startup. The second is the XQuery
- function module for initiating and managing scheduled jobs. This second use is
- not activated out-of-the-box. It has to be
- activated through the conf.xml file.
-
- System Startup
-
- The jobs that are initiated at system startup are invoked by the settings
- within the <scheduler> tag in the conf.xml file. There is a full set of instructions with
- example jobs (that are commented out) in the file.
-
-
- XQuery Function Module
-
- The scheduler XQuery function module is activated by uncommenting the
- following in conf.xml:
-
- Scheduler XQuery Module in conf.xml
-
-
- Once the scheduler XQuery function module is active, then XQuery code can
- be written to invoke and manage user type
- jobs.
-
-
-
-
-
-
- Type
-
- The type of the job to schedule. Must be either "startup", "system" or
- "user".
-
-
- startup
-
- Startup jobs are executed once during the database startup but
- before the database becomes available, these jobs are
- synchronous.
-
-
-
- system
-
- System jobs require the database to be in a consistent state. All
- database operations will be stopped until the method returns or
- throws an exception. Any exception will be caught and a warning
- written to the log.
-
-
-
- user
-
- User jobs may be scheduled at any time and may be mutually
- exclusive or non-exclusive
-
-
-
-
-
-
-
-
-
- Java vs. XQuery Jobs
-
-
-
-
-
-
- Introduction
-
- There are two types of jobs that can be scheduled. They are Java and XQuery.
- Java jobs can be startup, system or user. XQuery jobs can only be user
- jobs.
-
-
-
-
-
- XQuery Jobs
-
- If the job is written in XQuery (not suitable for startup or system jobs)
- then this should be a path to an XQuery stored in the database. e.g. /db/myCollection/myJob.xql
-
- XQuery job's will be launched under the guest account initially, although the
- running XQuery may switch permissions through calls to xmldb:login().
-
-
-
-
-
- Java Jobs
-
- A Java job that is to be invoked from the XQuery function module has to be
- extended from the org.exist.scheduler.UserJavaJob class. A startup or system Java
- job that is invoked from the conf.xml file implements org.exist.storage.SystemTask.
-
- Sample Java Source File
-
-
-
-
-
-
-
-
- Periodic
-
-
-
-
-
-
- Introduction
-
- A scheduled job can be run on a periodic basis. When the job is scheduled, it
- can be specified to run every n milliseconds.
- There is the additional option to start after x milliseconds and to only be repeated y times after the initial execution.
-
-
-
-
-
-
- Cron
-
-
- This section was copied from OpenSymphony CronTriggers Tutorial.
-
-
-
-
-
- Introduction
-
- cron is a UNIX tool that has been around for a long time, so its scheduling
- capabilities are powerful and proven. The CronTrigger class is based on the
- scheduling capabilities of cron.
- CronTrigger uses "cron expressions", which are able to create firing schedules
- such as: "At 8:00am every Monday through Friday" or "At 1:30am every last Friday
- of the month".
- Cron expressions are powerful, but can be pretty confusing. This tutorial aims
- to take some of the mystery out of creating a cron expression, giving users a
- resource which they can visit before having to ask in a forum or mailing list.
-
-
-
-
-
-
- Format
-
- A cron expression is a string comprised of 6 or 7 fields separated by white
- space. Fields can contain any of the allowed values, along with various
- combinations of the allowed special characters for that field. The fields are as
- follows:
-
-
-
-
-
-
-
-
-
- Field Name
-
-
- Mandatory?
-
-
- Allowed Values
-
-
- Allowed Special Characters
-
-
-
-
- Seconds
-
-
- YES
-
-
- 0-59
-
-
- , - * /
-
-
-
-
- Minutes
-
-
- YES
-
-
- 0-59
-
-
- , - * /
-
-
-
-
- Hours
-
-
- YES
-
-
- 0-23
-
-
- , - * /
-
-
-
-
- Day of month
-
-
- YES
-
-
- 1-31
-
-
- , - * ? / L W
-
-
-
-
- Month
-
-
- YES
-
-
- 1-12 or JAN-DEC
-
-
- , - * /
-
-
-
-
- Day of week
-
-
- YES
-
-
- 1-7 or SUN-SAT
-
-
- , - * ? / L #
-
-
-
-
- Year
-
-
- NO
-
-
- empty, 1970-2099
-
-
- , - * /
-
-
-
-
-
- So cron expressions can be as simple as this: * * * * ? *
- or more complex, like this: 0 0/5 14,18,3-39,52 ? JAN,MAR,SEP MON-FRI
- 2002-2010
-
-
-
-
-
- Special characters
-
-
-
- * ("all values") - used to select all values within a field. For
- example, "*" in the minute field means "every minute".
-
-
- ? ("no specific value") - useful when you need to specify something
- in one of the two fields in which the character is allowed, but not the
- other. For example, if I want my trigger to fire on a particular day of
- the month (say, the 10th), but don't care what day of the week that
- happens to be, I would put "10" in the day-of-month field, and "?" in
- the day-of-week field. See the examples below for clarification.
-
-
- - - used to specify ranges. For example, "10-12" in the hour field
- means "the hours 10, 11 and 12".
-
-
- , - used to specify additional values. For example, "MON,WED,FRI" in
- the day-of-week field means "the days Monday, Wednesday, and Friday".
-
-
-
- / - used to specify increments. For example, "0/15" in the seconds
- field means "the seconds 0, 15, 30, and 45". And "5/15" in the seconds
- field means "the seconds 5, 20, 35, and 50". You can also specify '/'
- after the '' character - in this case '' is equivalent to having '0'
- before the '/'. '1/3' in the day-of-month field means "fire every 3 days
- starting on the first day of the month".
-
-
- L ("last") - has different meaning in each of the two fields in which
- it is allowed. For example, the value "L" in the day-of-month field
- means "the last day of the month" - day 31 for January, day 28 for
- February on non-leap years. If used in the day-of-week field by itself,
- it simply means "7" or "SAT". But if used in the day-of-week field after
- another value, it means "the last xxx day of the month" - for example
- "6L" means "the last friday of the month". When using the 'L' option, it
- is important not to specify lists, or ranges of values, as you'll get
- confusing results.
-
-
- W ("weekday") - used to specify the weekday (Monday-Friday) nearest
- the given day. As an example, if you were to specify "15W" as the value
- for the day-of-month field, the meaning is: "the nearest weekday to the
- 15th of the month". So if the 15th is a Saturday, the trigger will fire
- on Friday the 14th. If the 15th is a Sunday, the trigger will fire on
- Monday the 16th. If the 15th is a Tuesday, then it will fire on Tuesday
- the 15th. However if you specify "1W" as the value for day-of-month, and
- the 1st is a Saturday, the trigger will fire on Monday the 3rd, as it
- will not 'jump' over the boundary of a month's days. The 'W' character
- can only be specified when the day-of-month is a single day, not a range
- or list of days. The 'L' and 'W' characters can also be combined in the
- day-of-month field to yield 'LW', which translates to "last weekday of
- the month".
-
-
- # - used to specify "the nth" XXX day of the month. For example, the
- value of "6#3" in the day-of-week field means "the third Friday of the
- month" (day 6 = Friday and "#3" = the 3rd one in the month). Other
- examples: "2#1" = the first Monday of the month and "4#5" = the fifth
- Wednesday of the month. Note that if you specify "#5" and there is not 5
- of the given day-of-week in the month, then no firing will occur that
- month. The legal characters and the names of months and days of the week
- are not case sensitive. MON is the same as mon.
-
-
-
-
-
-
-
- Examples
-
- Here are some full examples:
-
-
-
-
-
-
-
- Expression
-
-
- Meaning
-
-
-
-
- 0 0 12 * * ?
-
-
- Fire at 12pm (noon) every day
-
-
-
-
- 0 15 10 * * ?
-
-
- Fire at 10:15am every day
-
-
-
-
- 0 15 10 * * ? *
-
-
- Fire at 10:15am every day
-
-
-
-
- 0 15 10 * * ? 2005
-
-
- Fire at 10:15am every day during the year 2005
-
-
-
-
- 0 * 14 * * ?
-
-
- Fire every minute starting at 2pm and ending at 2:59pm, every
- day
-
-
-
-
- 0 0/5 14 * * ?
-
-
- Fire every 5 minutes starting at 2pm and ending at 2:55pm, every
- day
-
-
-
-
- 0 0/5 14,18 * * ?
-
-
- Fire every 5 minutes starting at 2pm and ending at 2:55pm, AND fire
- every 5 minutes starting at 6pm and ending at 6:55pm, every day
-
-
-
-
- 0 0-5 14 * * ?
-
-
- Fire every minute starting at 2pm and ending at 2:05pm, every
- day
-
-
-
-
- 0 10,44 14 ? 3 WED
-
-
- Fire at 2:10pm and at 2:44pm every Wednesday in the month of
- March.
-
-
-
-
- 0 15 10 ? * MON-FRI
-
-
- Fire at 10:15am every Monday, Tuesday, Wednesday, Thursday and
- Friday
-
-
-
-
- 0 15 10 15 * ?
-
-
- Fire at 10:15am on the 15th day of every month
-
-
-
-
- 0 15 10 L * ?
-
-
- Fire at 10:15am on the last day of every month
-
-
-
-
- 0 15 10 ? * 6L
-
-
- Fire at 10:15am on the last Friday of every month
-
-
-
-
- 0 15 10 ? * 6L
-
-
- Fire at 10:15am on the last Friday of every month
-
-
-
-
- 0 15 10 ? * 6L 2002-2005
-
-
- Fire at 10:15am on every last friday of every month during the years
- 2002, 2003, 2004 and 2005
-
-
-
-
- 0 15 10 ? * 6#3
-
-
- Fire at 10:15am on the third Friday of every month
-
-
-
-
- 0 0 12 1/5 * ?
-
-
- Fire at 12pm (noon) every 5 days every month, starting on the first
- day of the month.
-
-
-
-
- 0 11 11 11 11 ?
-
-
- Fire every November 11th at 11:11am.
-
-
-
-
-
- Pay attention to the effects of '?' and '*' in the day-of-week and
- day-of-month fields!
-
-
-
-
-
- Notes
-
-
-
- Support for specifying both a day-of-week and a day-of-month value is
- not complete (you must currently use the '?' character in one of these
- fields).
-
-
- Be careful when setting fire times between mid-night and 1:00 AM -
- "daylight savings" can cause a skip or a repeat depending on whether the
- time moves back or jumps forward.
-
-
-
-
+ cron is a UNIX tool that has been around for a long time, so its scheduling
+ capabilities are powerful and proven. The CronTrigger class is based on the
+ scheduling capabilities of cron.
+ CronTrigger uses "cron expressions", which are able to create
+ firing schedules such as: "At 8:00am every Monday through Friday" or "At 1:30am every last
+ Friday of the month".
+ cron expressions are powerful, but can be confusing. This tutorial aims to
+ take some of the mystery out of creating a cron expression, giving users a
+ resource which they can visit before having to ask in a forum or mailing list.
+
+
+
+
+
+ Format
+
+ A cron expression is a string comprised of 6 or 7 fields separated by white
+ space. Fields can contain any of the allowed values, along with various combinations of the
+ allowed special characters for that field. The fields are as follows:
+
+
+
+
+
+
+
+
+
+ Field Name
+
+
+ Mandatory?
+
+
+ Allowed Values
+
+
+ Allowed Special Characters
+
+
+
+
+ Seconds
+
+
+ YES
+
+
+ 0-59
+
+
+ , - * /
+
+
+
+
+ Minutes
+
+
+ YES
+
+
+ 0-59
+
+
+ , - * /
+
+
+
+
+ Hours
+
+
+ YES
+
+
+ 0-23
+
+
+ , - * /
+
+
+
+
+ Day of month
+
+
+ YES
+
+
+ 1-31
+
+
+ , - * ? / L W
+
+
+
+
+ Month
+
+
+ YES
+
+
+ 1-12 or JAN-DEC
+
+
+ , - * /
+
+
+
+
+ Day of week
+
+
+ YES
+
+
+ 1-7 or SUN-SAT
+
+
+ , - * ? / L #
+
+
+
+
+ Year
+
+
+ NO
+
+
+ empty, 1970-2099
+
+
+ , - * /
+
+
+
+
+
+ So cron expressions can be as simple as this: * * * * ? *
+
+ Or more complex, like this: 0 0/5 14,18,3-39,52 ? JAN,MAR,SEP MON-FRI
+ 2002-2010
+
+
+
+
+
+ Special characters
+
+
+
+
+ * ("all values") - used to select all values within a field. For example,
+ "*" in the minute field means "every minute".
+
+
+
+ ? ("no specific value") - useful when you need to specify something in one
+ of the two fields in which the character is allowed, but not the other. For example, if
+ I want my trigger to fire on a particular day of the month (say, the 10th), but don't
+ care what day of the week that happens to be, I would put "10" in the day-of-month
+ field, and "?" in the day-of-week field. See the examples below for clarification.
+
+
+
+
+ - - used to specify ranges. For example, "10-12" in the hour field means
+ "the hours 10, 11 and 12".
+
+
+
+ , - used to specify additional values. For example, "MON,WED,FRI" in the
+ day-of-week field means "the days Monday, Wednesday, and Friday".
+
+
+
+ / - used to specify increments. For example, "0/15" in the seconds field
+ means "the seconds 0, 15, 30, and 45". And "5/15" in the seconds field means "the
+ seconds 5, 20, 35, and 50". You can also specify '/' after the '' character - in this
+ case '' is equivalent to having '0' before the '/'. '1/3' in the day-of-month field
+ means "fire every 3 days starting on the first day of the month".
+
+
+
+ L ("last") - has different meaning in each of the two fields in which it is
+ allowed. For example, the value "L" in the day-of-month field means "the last day of the
+ month" - day 31 for January, day 28 for February on non-leap years. If used in the
+ day-of-week field by itself, it simply means "7" or "SAT". But if used in the
+ day-of-week field after another value, it means "the last xxx day of the month" - for
+ example "6L" means "the last friday of the month". When using the 'L' option, it is
+ important not to specify lists, or ranges of values, as you'll get confusing results.
+
+
+
+
+ W ("weekday") - used to specify the weekday (Monday-Friday) nearest the
+ given day. As an example, if you were to specify "15W" as the value for the day-of-month
+ field, the meaning is: "the nearest weekday to the 15th of the month". So if the 15th is
+ a Saturday, the trigger will fire on Friday the 14th. If the 15th is a Sunday, the
+ trigger will fire on Monday the 16th. If the 15th is a Tuesday, then it will fire on
+ Tuesday the 15th. However if you specify "1W" as the value for day-of-month, and the 1st
+ is a Saturday, the trigger will fire on Monday the 3rd, as it will not 'jump' over the
+ boundary of a month's days. The 'W' character can only be specified when the
+ day-of-month is a single day, not a range or list of days. The 'L' and 'W' characters
+ can also be combined in the day-of-month field to yield 'LW', which translates to "last
+ weekday of the month".
+
+
+
+ # - used to specify "the nth" XXX day of the month. For example, the value
+ of "6#3" in the day-of-week field means "the third Friday of the month" (day 6 = Friday
+ and "#3" = the 3rd one in the month). Other examples: "2#1" = the first Monday of the
+ month and "4#5" = the fifth Wednesday of the month. Note that if you specify "#5" and
+ there is not 5 of the given day-of-week in the month, then no firing will occur that
+ month. The legal characters and the names of months and days of the week are not case
+ sensitive. MON is the same as mon.
+
+
+
+
+
+
+
+ Examples
+
+ Here are some full examples:
+
+
+
+
+
+
+
+ Expression
+
+
+ Meaning
+
+
+
+
+ 0 0 12 * * ?
+
+
+ Fire at 12pm (noon) every day
+
+
+
+
+ 0 15 10 * * ?
+
+
+ Fire at 10:15am every day
+
+
+
+
+ 0 15 10 * * ? *
+
+
+ Fire at 10:15am every day
+
+
+
+
+ 0 15 10 * * ? 2005
+
+
+ Fire at 10:15am every day during the year 2005
+
+
+
+
+ 0 * 14 * * ?
+
+
+ Fire every minute starting at 2pm and ending at 2:59pm, every day
+
+
+
+
+ 0 0/5 14 * * ?
+
+
+ Fire every 5 minutes starting at 2pm and ending at 2:55pm, every day
+
+
+
+
+ 0 0/5 14,18 * * ?
+
+
+ Fire every 5 minutes starting at 2pm and ending at 2:55pm, AND fire every 5
+ minutes starting at 6pm and ending at 6:55pm, every day
+
+
+
+
+ 0 0-5 14 * * ?
+
+
+ Fire every minute starting at 2pm and ending at 2:05pm, every day
+
+
+
+
+ 0 10,44 14 ? 3 WED
+
+
+ Fire at 2:10pm and at 2:44pm every Wednesday in the month of March.
+
+
+
+
+ 0 15 10 ? * MON-FRI
+
+
+ Fire at 10:15am every Monday, Tuesday, Wednesday, Thursday and Friday
+
+
+
+
+ 0 15 10 15 * ?
+
+
+ Fire at 10:15am on the 15th day of every month
+
+
+
+
+ 0 15 10 L * ?
+
+
+ Fire at 10:15am on the last day of every month
+
+
+
+
+ 0 15 10 ? * 6L
+
+
+ Fire at 10:15am on the last Friday of every month
+
+
+
+
+ 0 15 10 ? * 6L
+
+
+ Fire at 10:15am on the last Friday of every month
+
+
+
+
+ 0 15 10 ? * 6L 2002-2005
+
+
+ Fire at 10:15am on every last friday of every month during the years 2002,
+ 2003, 2004 and 2005
+
+
+
+
+ 0 15 10 ? * 6#3
+
+
+ Fire at 10:15am on the third Friday of every month
+
+
+
+
+ 0 0 12 1/5 * ?
+
+
+ Fire at 12pm (noon) every 5 days every month, starting on the first day of the
+ month.
+
+
+
+
+ 0 11 11 11 11 ?
+
+
+ Fire every November 11th at 11:11am.
+
+
+
+
+
+ Pay attention to the effects of '?' and '*' in the day-of-week and day-of-month
+ fields!
+
+
+
+
+
+ Notes
+
+
+
+ Support for specifying both a day-of-week and a day-of-month value is not complete
+ (you must currently use the '?' character in one of these fields).
+
+
+ Be careful when setting fire times between mid-night and 1:00 AM - "daylight
+ savings" can cause a skip or a repeat depending on whether the time moves back or jumps
+ forward.
+
+
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/security/assets/usermanager.png b/src/main/xar-resources/data/security/assets/usermanager.png
new file mode 100644
index 00000000..bde52d1b
Binary files /dev/null and b/src/main/xar-resources/data/security/assets/usermanager.png differ
diff --git a/src/main/xar-resources/data/security/security.xml b/src/main/xar-resources/data/security/security.xml
index 39937950..0cb24826 100644
--- a/src/main/xar-resources/data/security/security.xml
+++ b/src/main/xar-resources/data/security/security.xml
@@ -1,824 +1,753 @@
-
- Security
- October 2012
-
- TBD
-
-
-
-
-
-
- Overview
-
- This article discusses eXist-db's security features and how to manage
- authentication, users, groups, passwords, permissions, and access controls.
- eXist-db's security infrastructure is built on a Unix permissions model with a
- single internal authentication realm, with additional support for access control
- lists and authentication using multiple realms through a central security
- manager.
-
-
-
-
-
- Security Manager
-
- eXist-db has a central Security Manager which is configured by the file
- /db/system/security/config.xml. This file, which is
- generated during database startup, defines what authentication realms are available
- to the Security Manager.
- This example Security Manager configuration file defines a URL to be used for
- authentication:
-
-
- /db/system/security/config.xml
-
-
-
- The Security Manager also features an authentication event listener that you can
- configure to call a custom XQuery module on each authentication event. For example,
- this configuration file would pass authentication events to a module,
- /db/security-events.xq, which performs actions when an
- authentication event occurs.
-
- Excerpt of a sample /db/system/security/config.xml
- illustrating configuration of the authentication event listener
-
-
- The XQuery module that receives the authentication events must be a library module
- in the http://exist-db.org/security/events namespace and must have a
- function called authentication(). This example sends a log
- message to the console.
-
- A sample module handler for authentication events, /db/security-events.xq
-
-
-
-
-
-
-
-
- Authentication Realms
-
- eXist-db always has an internal authentication realm, but also supports multiple
- authentication realms. This allows you to add one or more external realms which
- provide user and group authentication for eXist-db.
-
-
-
-
- Default Internal Realm
-
- The "eXist-db realm" is the default internal realm. By default this realm
- handles the 'SYSTEM', 'admin' and
- 'guest' users and 'DBA' and
- 'guest' groups. Any additional users or groups created
- in eXist-db will be added to this realm.
- Every eXist-db realm user has an account with a username, password, and other
- metadata that is stored in the database. Each user may belong to zero or more
- groups.
- User and group information for the eXist-db realm is maintained in the
- collection /db/system/security/exist.
-
- The security collections in /db/system/security should not be manually
- manipulated or read, rather they should be accessed via the SecurityManager
- class or the SecurityManager Module. Directly manipulation can lead to
- inconsistent state and security issues.
-
- The following is a sample user account document for "aretter" in
- the eXist-db realm:
-
- Account Document stored in /db/system/security/exist/accounts/aretter.xml
-
-
-
- As this example suggests, eXist-db does not store passwords in the clear, but
- rather stores hashed values of the passwords (in base64 encoding). eXist-db uses
- the RIPEMD-160 cryptographic hashing algorithm. Whenever a
- user supplies account credentials for authentication, the database applies
- RIPEMD-160 hash to the supplied password and compares it to the hash stored in
- the user's account document. Storing hashes of passwords is a best practice in
- security that provides a strong layer of security compared to storing passwords
- in the clear; the notion is that even if the hashed password is exposed to an
- attacker, it is difficult to derive the original password from the hash.
- Note that the /db/system/security collection is (by
- default) only readable and writable by the system or users in the dba group.
- The dba group is specially reserved for database
- administrators, and only dba users are allowed to create,
- remove or modify other users.
-
-
-
-
-
- LDAP Realm
-
- The LDAP Realm is enabled by default (in
- extensions/build.properties,
- include.feature.security.ldap is set to
- true.) To use the LDAP realm, add an LDAP realm element to
- /db/system/security/config.xml, as in this example:
-
- Sample /db/system/security/config.xml for
- LDAP
-
-
- Explanation of these elements:
-
-
- The default-username and default-password elements are used to
- communicate with the LDAP server if a non-LDAP user requests information
- from LDAP server.
-
-
- The search-* elements are mapping for names.
-
-
- The metadata-search-attribute elements are used for mapping LDAP
- account metadata onto eXist-db account metadata.
-
-
- The whitelist element contains allowed groups for authentication. The
- blacklist element contains groups that are not allowed.
-
-
- The transformation element contains actions to be performed after
- first authentication.
-
-
- If the config.xml file is configured correctly, then you
- should be able to authenticate by LDAP.
-
-
-
-
-
- OAuth Realm
-
- Due to the variation in implementations across OAuth providers, eXist-db
- developers have to create provider-specific Java libraries. eXist-db currently
- supports only Facebook and Google for OAuth authentication (see Facebook
- Authentication and Google's OAuth
- documentation.
- The OAuth Realm is not enabled by default in eXist-db. To enable the OAuth
- realm, set the include.feature.security.oauth property to
- true in
- extensions/local.build.properties, and rebuild
- eXist-db. Then edit web.xml and controller-config.xml to enable OAuthServlet.
- After startup eXist-db and add a realm element for OAuth to
- /db/system/security/config.xml, as in this
- example:
-
- Sample /db/system/security/config.xml for
- OAuth
-
-
- Explanation of these elements:
-
-
- Valid values for the service element's @provider attribute are 'facebook' and 'google'.
-
-
- @name unique (in database) name for application.
- @key and @secret should be given by OAuth provider.
-
-
- If configured correctly, you should be able to authenticate by OAuth.
-
-
-
-
-
- OpenID Realm
-
-
- OpenID is an authentication mechanism
- where the identity of the user is maintained by trusted external providers. This
- takes the burden in maintaining and securing passwords for users off of the eXist-db
- database and on to the Identity Provider (IdP).
-
-
- By default, the OpenID service is not built and thus is also not enabled. To
- recompile the source with OpenID enabled, edit local.build.properties in the
- extensions directory and change the include.feature.security.openid property
- from false to true. Then recompile.
-
-
- extensions/local.build.properties
-
-
-
- This extension compiles into
- lib/extensions/exist-security-openid.jar
- .
- Run eXist-db with that jar should enable extension. To disable service remove the jar and restart eXist-db.
-
-
-
-
-
-
- Legacy Internal Realm
-
- Before eXist-db 2.0, the internal security realm was maintained in a different
- manner. The details are included here for the purpose of informing decisions on
- migration.
- Every eXist-db database user has an account with a username, password and other
- information that is stored in the database. Furthermore, every user belongs to
- one or more groups - and respectively, every resource in the database is owned
- by a user and by a group. By default, the owner is set to the user who created
- the resource, and his primary group, but eXist-db allows for different permissions
- for the owner, the owner's group and others. However, only the owner of the
- resource (or dba users) can change these
- permissions.
- User and group information is found in the designated XML file
- /db/system/users.xml located in collection
- /db/system. This file is generated during database
- startup. The following is a simple example of a users.xml
- document:
-
-
- users.xml User Information
-
-
- As we see from this example, passwords are encrypted using an
- MD5 algorithm (e.g. user-1 has the
- MD5-encrypted password
- "7f0261c14d7d1b8e51680a013daa623e"). Therefore, whenever a user enters his or
- her password, the database generates an MD5 encryption and compares it to the
- encryption stored in users.xml. Since it is very difficult
- for users to guess the original password from the MD5 string, passwords in eXist-db
- should be sufficiently secure.
- Note that the /db/system collection is (by default) only
- readable by dba users (although it is possible to make it
- accessible by other users). The dba group is specially reserved
- for database administrators, and only dba users are allowed to
- create or remove users, or change permissions for other users.
-
- By default, access to an eXist-db database is disabled until a password is
- set for the administrator (see Changing the Administrator
- Password below for instructions). Since
- write permissions for files are granted to everyone, it
- is important to be careful about how you configure server access for users
- on a network or the Internet.
-
-
-
-
-
-
-
- Changing the Administrator Password
-
- When the database is started for the first time, two default users are created:
- "admin" and "guest". (The "admin" user is a member of the dba
- group, and therefore has administrative privileges; the "guest" user is a member of
- the group "guest" and is not an administrator). At this initial point, the "admin"
- password is set to null, and so access to the database is
- initially granted to everyone. To set restrictions on
- database access, you must first set a password for the "admin" user. To do this, use
- either the Admin Client or the User Manager
- in the Dashboard. If eXist-db is used for applications
- intended for online web publications, or as an embedded library, it is exposed it to
- potential exploitation. It is therefore strongly advised that you first change the
- admin password.
- The Admin Client graphical user interface has a dialog box for user management. To
- open this dialog box, enter Ctrl-U or select
- Tools » Edit users. A
- dialog box will appear, as shown here.
-
-
+ schematypens="http://purl.oclc.org/dsdl/schematron"?>
+
+ Security
+ October 2012
+
+ operations
+ application-development
+
+
+
+
+
+ This article discusses eXist-db's security features and how to manage authentication, users,
+ groups, passwords, permissions and access controls.
+ eXist-db's security infrastructure is built on the Unix permissions model. It uses a single
+ internal authentication realm. It has additional support for access control lists and
+ authentication using multiple realms through a central security manager.
+
+
+
+
+ The Security Manager
+
+ eXist-db has a central Security Manager which is configured by
+ /db/system/security/config.xml. This document, which is generated during
+ database start-up, defines what authentication realms are available to the Security
+ Manager.
+ For example, this Security Manager configuration file defines a URL for
+ authentication:
+
+
+ The Security Manager also features an authentication event listener that you can configure
+ to call a custom XQuery module on each authentication event. For example, this configuration
+ file would pass authentication events to module /db/security-events.xq,
+ which performs actions when an authentication event occurs:
+
+
+ The XQuery module that receives the authentication events must be a library module in the
+ http://exist-db.org/security/events namespace. It must have a function
+ authentication(). This example sends a log message to the console:
+
+
+
+
+
+
+ Authentication Realms
+
+ eXist-db always has an internal authentication realm. It also supports multiple external
+ authentication realms. This allows you to add one or more external realms which provide user
+ and group authentication.
+
+
+
+
+ Default Internal Realm
+
+ The "eXist-db realm" is the default internal realm. By default this realm handles the
+ SYSTEM, admin and guest users and DBA
+ and guest groups. Any additional users or groups created in eXist-db will be
+ added to this realm.
+ Every eXist-db realm user has an account with a username, password and other metadata,
+ that is stored in the database. Each user can belong to zero or more groups. This user and
+ group information for the eXist-db realm is maintained in the collection
+ /db/system/security/exist.
+
+
+ The security collections in /db/system/security should
+ not be manually manipulated or read. It should be accessed via the
+ SecurityManager class or the SecurityManager Module. Directly
+ manipulation can lead to inconsistent state and security issues.
+
+
+ The following is a sample user account document for "aretter" in the eXist-db
+ realm:
+
+
+ As you can see, eXist-db does not store passwords in the clear. It stores hashed values
+ of the passwords (in base64 encoding). For this eXist-db uses the
+ RIPEMD-160 cryptographic hashing algorithm.
+ Whenever a user supplies account credentials for authentication, the database applies
+ RIPEMD-160 hash to the supplied password and compares it to the hash stored in the user's
+ account document. Storing hashes of passwords is a best practice in security that provides a
+ strong layer of security compared to storing passwords in the clear. The notion is that even
+ if the hashed password is exposed to an attacker, it is too difficult to derive the original
+ password from the hash.
+ Note that the /db/system/security collection is (by default) only
+ readable and writable by the system or users in the dba group. The
+ dba group is reserved for database administrators, and only
+ dba users are allowed to create, remove or modify other users.
+
+
+
+
+
+ LDAP Realm
+
+ The LDAP Realm is enabled by default: in extensions/build.properties,
+ include.feature.security.ldap is set to true. To use
+ the LDAP realm, add an LDAP realm element to
+ /db/system/security/config.xml. For example:
+
+
+ Explanation of the elements used:
+
+
+ The default-username and default-password elements are used to
+ communicate with the LDAP server if a non-LDAP user requests information from LDAP
+ server.
+
+
+ The search-* elements are mapping for names.
+
+
+ The metadata-search-attribute elements are used for mapping LDAP account
+ metadata onto eXist-db account metadata.
+
+
+ The whitelist element contains the allowed groups for authentication. The
+ blacklist element contains groups that are not allowed.
+
+
+ The transformation element contains actions to be performed after first
+ authentication.
+
+
+
+
+
+
+
+ OAuth Realm
+
+ Due to the variation in implementations across OAuth providers, eXist-db developers have
+ to create provider-specific Java libraries. eXist-db currently supports only Facebook and
+ Google for OAuth authentication (see Facebook Authentication and Google's OAuth documentation.
+ The OAuth Realm is not enabled by default. To enable the OAuth realm, set the
+ include.feature.security.oauth property to true in
+ extensions/local.build.properties and rebuild eXist-db. Then edit web.xml and
+ controller-config.xml to enable OAuthServlet. After restarting
+ eXist-db, add a realm element for OAuth to
+ /db/system/security/config.xml. For example:
+
+
+ Explanation of these elements:
+
+
+ Valid values for the service element's provider attribute are
+ facebook and google.
+
+
+ The name attribute must be a unique name for the application.
+ The key and secret attribute values must be provided by
+ the OAuth provider.
+
+
+
+
+
+
+
+ OpenID Realm
+
+
+ OpenID is an authentication mechanism where the identity of the
+ user is maintained by trusted external providers. This takes the burden in maintaining and
+ securing passwords for users off of the eXist-db database and on to the Identity Provider
+ (IdP).
+ By default, the OpenID service is not built and therefore not enabled. To recompile the
+ source with OpenID enabled, edit local.build.properties in the extensions
+ directory and change the include.feature.security.openid property from false to
+ true. Then rebuild eXist-db.
+
+
+ This extension compiles into lib/extensions/exist-security-openid.jar.
+ Running eXist-db with that .jar should enable the extension. To disable the
+ service remove the .jar and restart eXist-db.
+
+
+
+
+
+
+
+ Changing the Administrator Password
+
+ When the database is started for the first time, two default users are created:
+ admin and guest. The admin user is a member of the
+ dba group, and therefore has administrative privileges; the
+ guest user is a member of the group guest and is not an
+ administrator.
+ At this point, the admin password is set to null, and so
+ access to the database is initially granted to everyone.
+ To set restrictions on database access, first set a password for the admin
+ user in one of the following ways:
+
+
+ The Java Admin Client GUIe has a dialog
+ box for user management. To open this dialog box, enter Ctrl-U
+ or select Tools, Edit Users:
+
+
-
+
-
-
-
-
- At the top, select the "admin" user in the table of users
-
-
+
+
+
+
+ At the top, select the admin user in the table of users
+
+ Type in your password into the corresponding fields
-
-
- Click the "Modify User" button to apply the changes
-
-
- You can also set a new administrator password on the command line in a console or
- shell. Enter the following at the command prompt:
-
- Setting an Administrator Password
-
-
- Now that the password is set, access control is enabled. To start the shell-mode client as an
- administrator, you must specify the -u option, following these
- steps:
-
-
- For Windows and Mac users, double-click on the desktop shortcut icon (if
- created) or by selecting the shortcut icon from the start menu
-
-
- OR enter the following in a Unix shell or DOS/Windows command
- prompt:
- bin\client.bat -u admin (DOS/Windows)
- bin/client.sh -u admin (Unix)
-
-
- The other default user, "guest", also has the password "guest". The guest identity
- is internally assigned to all clients that have not authenticated themselves. For
- example, the Xincon WebDAV interface does not support authentication, so "guest" is
- assumed for its users by default. Note that this aspect of WebDAV is a potential
- source of confusion, and you have to be careful about setting read/write permissions
- for this API.
-
-
-
-
-
- Creating Users
-
- It is easy to create new users using the Admin Client. In the Edit
- users dialog box, fill in the Username,
- Password, Password (repeat), and
- Home-collection fields, and assign a group (or groups) for the
- new user. Finally, select Create User. The new user will
- appear in the list of users in the top panel.
- The adduser command also allows you to create additional users.
- The command asks for a password and a list of groups to which the user should
- belong. An example is shown below:
-
- Creating a New User
-
-
- To check that the user has been added, use the command users to
- display a list of all known database users.
-
-
-
-
-
- Resource Permissions
-
- eXist-db has supports both a Unix like permissions model and simple Access Control
- Lists. It is important to understand the Unix permission model first, and then
- consider Access Control Lists, should the Unix Model not prove sufficient for your
- application.
-
-
-
-
- Unix Model
-
- The default that is based on the UNIX read,
- write and execute flags for owner,
- group and world. In versions prior to eXist-db 2.0, there was no execute flag, rather an update flag was present.
-
-
-
-
-
-
-
-
-
-
-
-
- Category
-
-
- Description
-
-
-
-
-
-
- Owner
-
-
- These permissions work for the owner of the resource
-
-
-
-
- Group
-
-
- These permissions work for the members of the group of the
- resource
-
-
-
-
- World
-
-
- These permissions work for any user
-
-
-
-
-
-
- Please be aware that permissions for collections are
- NOT inherited by their sub-collections, i.e., write
- permissions can be set for some sub-collections, but you must also have
- write permissions for the parent collection for these to be
- effective.
-
- Using the Java Admin Client or the command line, you can list the permissions
- assigned to each resource (this assumes the permissions
- property in client.properties is set to
- true). An example listing is shown below:
-
- Resource Permission Settings
-
-
- As shown on the left-hand side, the Java Admin Client displays resource permissions
- in a format similar to the output of the Unix ls -l command:
- a ten-character code. The first character represents the type of resource:
- - (hyphen) for documents (files) and d for
- collections. The next three characters are the permissions for the user: a
- - (hyphen) is used for denied permissions,
- r for read, w for write, and
- x for execute. The next three characters (five through
- seven) set the permissions for groups, and the last three for others (i.e.
- anyone else who can access the database). Given the previous example, we can see
- that all files except r_and_j.xml are owned by user "admin"
- and group "dba".
- As mentioned in the previous section, the database root collection
- /db initially has permissions set to
- drwxr-xr-x, i.e. full access is granted to everyone. Also
- note that -rw-r--r-- is the default setting for all newly
- created resources, i.e. the owner has read/write permissions but not execute,
- and the group and others (world) has read permissions.
-
- Changing Resource Permissions
-
- Permissions can be changed using either the Edit
- Properties dialog box (shown below) in the Admin
- Client or the chmod command in the shell window. The
- Edit Properties dialog box is opened by selecting
- Files »Resource
- Properties from the menu, OR by clicking on the
- Properties Icon (image with checkboxes) in the toolbar.
- This dialog box shows permission settings for all database users and
- groups.
-
-
-
-
-
-
-
-
- Please note that only the owner of a resource or members of the
- dba group are allowed to change permissions. All other
- users who attempt to change these settings will receive an error
- message.
-
- On the command line, you can use the chmod command to
- change permissions. This command expects two parameters:
-
-
- chmod
-
- Parameters:
-
-
- Name of a resource or collection
-
-
- Read, write and execute permissions to set or remove
- (+ or -
- respectively, for the specified user, group, or other
- according to the following syntax:
-
-
- chmod [resource] [user|group|other]=[+|-][read|write|execute][, ...]
-
-
-
- For example, to grant the write permission to the group and deny all to
- others, you may use:
- chmod hamlet.xml group=+write,other=-read,-execute,-write
- If you do not specify a resource in the first argument of the
- chmod command, the permission string will be applied
- to the current collection. This is an important feature if you want to
- change permissions for the /db root collection, which
- would otherwise not be possible. For example, to deny write permissions to
- others for the entire database, change directory to the root collection
- (i.e. enter cd ..) and enter:
- chmod other=-write
-
-
- Changing Resource Ownership
-
- Only the owner has full control over a resource, and it is sometimes
- important to change this ownership. The Admin Client provides the
- chown command to do this. The command expects three
- arguments:
-
-
- chown
-
- Arguments:
-
-
- Name of the user.
-
-
- Name of the group.
-
-
- Name of the resource.
-
-
- chown [user] [group] [resource]
-
-
-
- For example, to change the owner of the file
- r_and_j.xml, you would do the following:
-
- Changing Ownership
-
-
-
-
-
-
-
-
- Access Control Lists (ACL)
-
- To be written. More information about ACLs is available as slides (PDF) and a presentation on YouTube.
-
-
-
-
-
-
- Permission Checks
-
- Each operation in eXist-db enforces permission checks. The details of the
- permissions required for an operation are laid out below. These permissions should
- align with a strict Unix model, but if they are found to be incorrect or lacking, please
- let the project know immediately.
-
- When copying a collection, permissions are checked for each sub-collection
- and resource.
- Copying a sub-collection requires r-x on the sub-collection,
- and rwx on the destination collection, and if the
- sub-collection already exists in the destination then r-x is required on
- that.
- Copying resources from a collection requires r-- on the
- resource, and -w- on the destination resource if it exists,
- otherwise -w- on the destination collection.
-
+
+
+ Click the Modify User button to apply the
+ changes
+
+
+
+
+ In the User manager's User Manager, click on the
+ user account that you wish to edit and type its password twice. Confirm by clicking
+ save.
+
+
+
+
+
+
+
+
+
+ You can also set a new administrator password on the command line in the console mode
+ of the Java Admin Client. Enter the following
+ at the command prompt:
+
+
+
+
+
+ The other default user, guest has the default password guest.
+ The guest identity is internally assigned to all clients that have not authenticated
+ themselves.
+
+ The Xincon WebDAV interface does not support authentication, so guest is
+ assumed for its users by default. Note that this aspect of WebDAV is a potential source of
+ confusion. You have to be careful about setting read/write permissions for this API.
+
+
+
+
+
+
+ Creating Users
+
+ It is easy to create new users using the Java Admin
+ Client. In the Edit users dialog box, fill in the
+ username, password, and home collection fields. Assign a group (or groups) for the new user.
+ Finally, select Create User. The new user will appear in the list
+ of users in the top panel.
+ The console adduser command also allows you to create additional users. The
+ command asks for a password and a list of groups to which the user should belong. An example
+ is shown below:
+
+ You can list all the users using the users command.
+
+
+
+
+
+ Resource Permissions
+
+ eXist-db supports both a Unix like permissions model and simple Access Control Lists. It
+ is important to understand the Unix permission model first. Access Control Lists can be useful
+ when the Unix Model is insufficient for your application.
+
+
+
+
+ Unix Model
+
+ The default that is based on the UNIX read, write and execute flags for
+ owner, group and
+ world. (In versions prior to eXist-db 2.0, there was no
+ execute flag, rather an update flag was present.)
+
+
+
+
+
+
+
+
+
+
+
+
+ Category
+
+
+ Description
+
+
+
+
+
+
+ Owner
+
+
+ These permissions work for the owner of the resource
+
+
+
+
+ Group
+
+
+ These permissions work for the members of the group of the resource
+
+
+
+
+ World
+
+
+ These permissions work for any user
+
+
+
+
+
+
+ Be aware that permissions for collections are not inherited by
+ their sub-collections: write permissions can be set for sub-collections, but you must also
+ have write permissions for the parent collection for these to be effective.
+
+ Using the Java Admin Client or its command
+ line, you can list the permissions assigned to each resource (this assumes the
+ permissions property in client.properties is set to
+ true). For example:
+
+
+ The Java Admin Client displays resource permissions in a format similar to the output of
+ the Unix ls -l command: a ten-character code.
+
+
+ The first character represents the type of resource: - (hyphen)
+ for documents (files) and d for collections.
+
+
+ The next three characters are the permissions for the user: a -
+ (hyphen) is used for denied permissions, r for read,
+ w for write, and x for execute.
+
+
+ The next three characters (five through seven) set the permissions for groups
+
+
+ The last three for others (i.e. anyone else who can access the database).
+
+
+
+ Given the previous example, we can see that all files except
+ r_and_j.xml are owned by user admin and group
+ dba.
+ As mentioned in the previous section, the database root collection
+ /db initially has its permissions set to drwxr-xr-x,
+ so full access is granted to everyone.
+ Also note that the default setting for all newly created resources is
+ -rw-r--r--: the owner has read/write permissions, not execute, and the
+ group and others (world) have read permissions only.
+
+
+ Changing Resource Permissions
+
+ Permissions can be changed using either the Edit Properties
+ dialog box (shown below) in the Java Admin
+ Client or the chmod command in its shell window.
+ The Edit Properties dialog box is opened by selecting
+ Files, Resource Properties from the menu, or by clicking on
+ the Properties Icon (image with checkboxes) in the toolbar.
+ This dialog box shows the permission settings for all database users and groups.
+
+
+
+
+
+
+
+
+ Please note that only the owner of a resource or members of the
+ dba group are allowed to change permissions. All other users who
+ attempt to change these settings will receive an error message.
+
+ On the command line, you can use the chmod commandThis command
+ expects two parameters:
+ chmod [resource] [user|group|other]=[+|-][read|write|execute][, ...]
+
+
+ Name of a resource or collection
+
+
+ Read, write and execute permissions to set or remove (+ or
+ - respectively, for the specified user, group, or other according
+ to the following syntax:
+
+
+
+ For example, to grant the write permission to the group and deny all to others:
+ chmod hamlet.xml group=+write,other=-read,-execute,-write
+ If you do not specify a resource in the first argument of the chmod
+ command, the permission string will be applied to the current
+ collection. This is an important feature if you want to change permissions
+ for the /db root collection.
+ For example, to deny write permissions to others for the entire database, change
+ directory to the root collection (enter cd /db) and enter:
+ chmod other=-write
+
+
+
+ Changing Resource Ownership
+
+ Only the owner has full control over a resource. The Java Admin Client provides the
+ chown command to change ownership. The command expects three
+ arguments:
+ chown [user] [group] [resource]
+
+
+ Name of the user.
+
+
+ Name of the group.
+
+
+ Name of the resource.
+
+
+
+ For example, to change the owner of the file r_and_j.xml:
+
+
+
+
+
+
+
+ Access Control Lists (ACL)
+
+ To be written. More information about ACLs is available as slides (PDF) and a presentation on YouTube.
+
+
+
+
+
+
+ Permission Checks
+
+ Each operation in eXist-db enforces permission checks. The details of the permissions
+ required for an operation are laid out below.
+
+ When copying a collection, permissions are checked for each sub-collection and
+ resource.
+ Copying a sub-collection requires r-x on the sub-collection and
+ rwx on the destination collection. If the sub-collection already exists
+ in the destination then r-x is required on that.
+ Copying resources from a collection requires r-- on the resource, and
+ -w- on the destination resource if it exists, otherwise
+ -w- on the destination collection.
+
+
+
+
+
+ Legacy Internal Realm
+
+ Before eXist-db 2.0, the internal security realm was maintained in a different manner. The
+ details are included here for the purpose of informing decisions on migration.
+ Every eXist-db database user has an account with a username, password and other
+ information that is stored in the database. Furthermore, every user belongs to one or more
+ groups - and respectively, every resource in the database is owned by a user and by a group.
+ By default, the owner is set to the user who created the resource, and his primary group, but
+ eXist-db allows for different permissions for the owner, the owner's group and others.
+ However, only the owner of the resource (or dba users) can change these
+ permissions.
+ User and group information is found in the designated XML file
+ /db/system/users.xml located in collection /db/system.
+ This file is generated during database start-up. The following is a simple example of a
+ users.xml document:
+
+
+ As we see from this example, passwords are encrypted using an MD5
+ algorithm (e.g. user-1 has the MD5-encrypted password
+ "7f0261c14d7d1b8e51680a013daa623e"). Therefore, whenever a user enters his or her password,
+ the database generates an MD5 encryption and compares it to the encryption stored in
+ users.xml. Since it is very difficult for users to guess the original
+ password from the MD5 string, passwords in eXist-db should be sufficiently secure.
+ Note that the /db/system collection is (by default) only readable by
+ dba users (although it is possible to make it accessible by other users).
+ The dba group is specially reserved for database administrators, and only
+ dba users are allowed to create or remove users, or change permissions
+ for other users.
+
+ By default, access to an eXist-db database is disabled until a password is set for the
+ administrator (see Changing the Administrator Password below for
+ instructions). Since write permissions for files are granted to everyone,
+ it is important to be careful about how you configure server access for users on a network
+ or the Internet.
+
+
\ No newline at end of file
diff --git a/src/main/xar-resources/data/validation/validation.xml b/src/main/xar-resources/data/validation/validation.xml
index ff888184..86554f39 100644
--- a/src/main/xar-resources/data/validation/validation.xml
+++ b/src/main/xar-resources/data/validation/validation.xml
@@ -134,7 +134,7 @@
Jing
- Each of these options are discussed in the following sections. Consult the XQuery Function Documentation for detailed functions
+ Each of these options are discussed in the following sections. Consult the XQuery Function Documentation for detailed functions
descriptions.
diff --git a/src/main/xar-resources/data/xquery/xquery.xml b/src/main/xar-resources/data/xquery/xquery.xml
index 400d27b6..4acb6d4f 100644
--- a/src/main/xar-resources/data/xquery/xquery.xml
+++ b/src/main/xar-resources/data/xquery/xquery.xml
@@ -125,7 +125,7 @@
Schema-related Features (validate and import schema). eXist-db's XQuery processor does
- currently not support the schema import and schema validation features defined as optional in the XQuery specification. eXist-db provides extension functions to perform XML validation. The database does not store type information along with the nodes. It
+ currently not support the schema import and schema validation features defined as optional in the XQuery specification. eXist-db provides extension functions to perform XML validation. The database does not store type information along with the nodes. It
therefore cannot know the typed value of a node and has to assume xs:untypedAtomic. This is the behaviour defined by
the XQuery specification.