regions.tex

\chapter{Regions}
\label{chap:regions}

Regions are the primary abstraction for managing data in Legion.  Futures,
which the examples in Chapter~\ref{chap:tasks} emphasize, are for passing small amounts of data
between tasks. Regions are for holding and processing bulk data.

Because data placement and movement are crucial to performance in modern machines,
Legion provides extensive facilities for managing regions.  These features are a
distinctive aspect of Legion and also probably the most novel and unfamiliar 
to new Legion programmers.  Most programming systems hide the placement,
movement and organization of data; in Legion, these operations are exposed to
the application.

Figure~\ref{fig:lr1} shows a very simple program that
creates a {\em logical region}.  A logical region is a table (or,
equivalently, a relation), with an {\em index space} defining the rows
and a {\em field space} defining the columns. The example
in Figure~\ref{fig:lr1} illustrates a number of points:

\begin{itemize}

\item An {\tt IndexSpace} defines a set of indices for a region.  The {\tt create\_index\_space}
call in this program creates a index space with 100 elements.  Multidimensional index spaces can be
created from multidimensional {\tt Rect}s.

\item Field spaces are created in a manner analogous to index spaces.
  Unlike indices, whose size must be declared, there is a global upper
  bound on the number of fields in a field space (and exceeding this bound will cause
  the Legion runtime to report an error).  This particular
  field space has only a single field {\tt FIELD\_A}.  Note that each field has an associated type, the
  size of which is the first argument to {\tt allocate\_field}.

\item Once the index space and field space are created, they are used to create
a logical region {\tt lr1}.  A second call to {\tt create\_logical\_region}
creates a separate logical region {\tt lr2}.  It is very common to build
multiple logical regions with either the same index space, field space or both.
By providing separate steps for creating the field and index spaces prior to creating
a logical region, application programmers can reuse them in the creation of multiple
regions, thereby making it easier to keep all the regions in sync as the program 
evolves.
\end{itemize}

Logical regions never hold any data.  In
fact, logical regions consume no space except for their metadata
(number of entries, names of the fields, etc.).  A {\em physical
  instance} of a logical region holds a copy of the actual data for
that region.  The reason for having both concepts, logical region and
physical instance, is that there is not a one-to-one relationship
between logical regions and instances.  It is common, for example, to
have multiple physical instances of the same logical region (i.e.,
multiple copies) distributed around the system in some fashion to
improve read performance.  Because this program does not create any
physical instances, no real computation takes place, either; the
example simply shows how to create, and then destroy, a logical
region.

\begin{figure}
{\small
  \lstinputlisting[linerange={19-38}]{Examples/Regions/logicalregions/logicalregions.cc}}
\caption{\legionbook{Regions/logicalregions/logicalregions.cc}}
\label{fig:lr1}
\end{figure}

\section{Physical Instances, Region Requirements, Privileges and Accessors}
\label{sec:privileges}

Actually doing something with a logical region requires a {\em
  physical instance}.  The simplest way to create a physical instance
is to pass a logical region to a subtask, as Legion automatically
provides a physical instance to the subtask.  This instance is
guaranteed to be up-to-date, meaning it reflects any changes made to
the region by previous tasks that the subtask depends on.  In the
common case, this means that the results of all previously launched
tasks that updated the region will be reflected in the instance, but
the programmer can specify other semantics if desired; see
Chapter~\ref{chap:coherence}.

\begin{figure}
{\small
  \lstinputlisting[linerange={31-39}]{Examples/Regions/physicalregions/physicalregions.cc}}
\caption{Task launches from \legionbook{Regions/physicalregions/physicalregions.cc.}}
\label{fig:privileges}
\end{figure}


Figure~\ref{fig:privileges} shows an excerpt from the top level task in \\
\legionbook{Regions/physicalregions/physicalregions.cc}.  This program is an extension of the
program in Figure~\ref{fig:lr1}---the creation of the (single) logical region is exactly the same as in the 
previous example.  Here we call two tasks that operate on the logical region {\tt lr}. The first
task intializes the elements of the region and the second sums the elements and prints out the results.
As in previous examples, a {\tt TaskLauncher} object describes the task to be called and its non-region arguments,
of which there are none.  When tasks also have region arguments, additional information must be added
to the {\tt TaskLauncher}.
For each region the task will access, a {\em region requirement} must be added to the launcher using the
method {\tt add\_region\_requirement}.  A {\tt RegionRequirement} has four components: 

\begin{itemize}

\item The logical region that will be accessed.

\item A {\em privilege}, which indicates how the subtask will
  use the logical region.  In this program, the two tasks have
  different privileges: the initialization task accesses the region
  with privilege {\tt WRITE\_DISCARD} (which means it will overwrite
  everything that was previously in the region) and the sum task
  accesses the region with privilege {\tt READ\_ONLY}.  Privileges are
  used by the Legion runtime to determine which tasks can run in
  parallel.  For example, if two tasks only read from a region, they
  can execute simultaneously.  Other interesting privileges that we
  will see in future examples are {\tt READ\_WRITE} (the task both
  reads and writes the region), {\tt WRITE} (the task only writes the
  region, but may not update every element as in {\tt
    WRITE\_DISCARD}), and {\tt REDUCE} (the task performs reductions
  to the region).  It is an error to attempt to access a region in a
  manner inconsistent with the privileges, and most such errors can be
  checked by the Legion runtime with appropriate debugging settings.
  The runtime cannot check
  that every region element is updated when using privilege {\tt
    WRITE\_DISCARD} and failure to do so may result in incorrect
  behavior.

\item A {\em coherence mode}, which indicates what the subtask expects to see from {\em other} tasks that may access the
region simultaneously.  The mode {\tt EXCLUSIVE} means that this subtask must appear to have exclusive access to the region---if
any other tasks do access the region, any changes they make cannot be visible to this subtask. Furthermore, the subtask
must see all updates from previously launched tasks. Other coherence modes that we will discuss are {\tt ATOMIC} and
{\tt SIMULTANEOUS} (see Chapter~\ref{chap:coherence}).

\item Finally, the region requirement names its {\em parent region}.
  We have not yet discussed subregions (see
  Chapter~\ref{chap:partitioning}), so we defer a full explanation of
  this argument.  Suffice it to say that it should either be the
  parent region or, if the region in question has no parent, the
  region itself, as in this example.

\end{itemize}

Finally, each region requirement applies to one or more fields of the region, and the method {\tt add\_field} is
used to record which field(s) each region requirement applies to.
In this example, there is only one region requirement with index 0 (region requirements
are numbered from 0 in the order they are added to the launcher) and a single field {\tt FIELD\_A} that will be
accessed by the subtask.

\begin{figure}
  {\small
  \lstinputlisting[linerange={63-75}]{Examples/Regions/physicalregions/physicalregions.cc}}
\caption{Region accessors from \legionbook{Regions/physicalregions/physicalregions.cc}.}
\label{fig:accessors}
\end{figure}
We now turn our attention to the two subtasks.  The initialization task and the sum task have very similar
structures, differing only in that the intialization task writes a ``1'' in {\tt FIELD\_A} of every element of the region and
the sum task adds these numbers up and reports the sum.  The sum task is shown in Figure~\ref{fig:accessors}.

When {\tt sum\_task} is called, the Legion runtime guarantees that it
will have access to an up-to-date physical instance of the region {\tt
  lr} reflecting all the changes made by previously launched tasks
that modify the {\tt FIELD\_A} of the region (which in this case is
just the initialization task {\tt init\_task}).  The only new feature
that we need to discuss, then, is how the task accesses the data in {\tt FIELD\_A}.

Access to the fields of a region is done through a {\tt FieldAccessor}.  Accessors in Legion provide a level of indirection
that shields application code from the details of how physical instances  are represented in memory.  Under the hood, 
the Legion runtime chooses among many different representations depending on the circumstances, so this extra level
of abstraction avoids having those details exposed and fixed in application code.

In Figure~\ref{fig:accessors}, the field
{\tt FIELD\_A} is named in the creation of a {\tt RegionAccessor} for the first (and only) physical region argument.
Note that the type of the field is also included as part of the construction of the accessor.
The other requirement to access the region is knowledge of the region's index space.  Figure~\ref{fig:accessors}
illustrates how to recover a region's index space from a physical instance of the region using the {\tt get\_index\_space} method.
Since this region has a dense index space, we convert the domain to a rectangle (using the {\tt get\_rect} method).
All that is left, then, is to iterate over all the points of the index space (the rectangle {\tt rect}) and read the
field {\tt FIELD\_A} for each such point in the region using the field accessor {\tt fa\_a}.

The example in Figure~\ref{fig:accessors} uses an iterator, which is convenient when the index space is a dense rectangle and one
wants to operate on all of the points in a region.  Accessors can also take a {\tt Point} argument of the correct dimension for their
region to directly access a single point in the index space.

There are many different types of
region accessors provided by Legion.  We mention a few of the more common ones here; the comments in {\tt legion/runtime/legion.h} provides
a good overview of the complete set of accessors.
\begin{itemize}

\item  There are many accessor constructors pre-defined for different combinations of privileges and field types.  For example,
  a {\tt AccessorROfloat} is the type of an accessor with read-only privileges on a field of type {\tt float}.  The accessor in Figure~\ref{fig:accessors}
  could have been constructed using {\tt AccessorROint(regns[0],FIELD\_A)} instead of directly invoking the {\tt FieldAccessor} template.

\item  There is a different template, {\tt ReductionAccessor}, to use with reduction privileges.  For instances with reduction-only privileges, only {\tt ReductionAccessor}s should be used.

\item The {\tt Generic} accessor has 
extensive debugging support and will, for example, detect out of bounds accesses, which is a common programming error.  The {\tt Generic} accessor is also very slow and should never be used in production code.  
The {\tt FieldAccessor} used in Figure~\ref{fig:accessors} does no checking and is much more performant.
\end{itemize}


\section{Fill Fields}
\label{sec:fill}

It is common to initialize all instances of a particular field in a region to the same value, and so Legion
provides direct support for this idiom.  Figure~\ref{fig:fill} gives an excerpt from an example identical
to the one in Figure~\ref{fig:accessors}, except that the initialization task has been replaced by a call to
the runtime that fills every occurrence of {\tt FIELD\_A} with a default value.

\begin{figure}
  {\small
  \lstinputlisting[linerange={28-31}]{Examples/Regions/fillfields/fillfields.cc}}
\caption{\legionbook{Regions/fillfields/fillfields.cc}}
\label{fig:fill}
\end{figure}
The code in Figure~\ref{fig:fill} uses the Legion runtime method {\tt fill\_field} to initialize every 
occurrence of {\tt FIELD\_A} to 1.  The {\tt fill\_field} method takes six arguments:

\begin{itemize}

\item Like almost all runtime calls, the first argument is the current task's context.

\item The second argument is the region to be initialized.

\item The third argument is the parent region, or the region itself if it has no parent.  The parent region is needed
to ensure that there are sufficient privileges to perform the initialization ({\tt READ\_WRITE} privilege
is required).

\item The fourth argument is the ID of the field to be initialized.

\item The fifth argument is a buffer holding the initial value.

\item The sixth argument is the size of the buffer.  
The {\tt fill\_field} call makes a copy of the buffer.

\end{itemize}

The advantage of using {\tt fill\_field} is that the Legion runtime performs the initializaion lazily the next time that
the field is used, which makes the operation less expensive than a normal task call.  Thus, {\tt fill\_field} is preferred
whenever all instances of a field are initialized to the same value.


\section{Inline Launchers}
\label{sec:inlinelaunch}

The most common way to gain access to a region $r$ is by launching a
task $t$ with a region requirement on $r$, in which case the Legion
runtime will automatically map a physical instance of $r$ that will be
accessible in $t$.  There are situations where a task may need to map
a physical instance of a region explicitly, such as when a task needs
to access a newly created region or a new region is returned from a
child task.  Figure~\ref{fig:inlinelaunch} shows the use of an {\em
  inline mapping} to explicitly map a physical region.  After creating
a new logical region an {\tt InlineLauncher} object (so named because
it has similar functionality to a {\tt TaskLauncher} object) is
created with a region requirement and any associated fields.  The
runtime methods $\tt map\_region$ and $\tt unmap\_region$ assign and
unassign a physical instance to {\tt pr}.  Note that $\tt map\_region$
is an asynchronous call and it is necessary to wait for the physical
instance to become valid before it can be used.

\begin{figure}
{\small
  \lstinputlisting[linerange={26-29}]{Examples/Regions/inlinemapping/inlinemapping.cc}
  ... lines omitted ...\\
  \lstinputlisting[linerange={43-44}]{Examples/Regions/inlinemapping/inlinemapping.cc}}
\caption{\legionbook{Regions/inlinemapping/inlinemapping.cc}}
\label{fig:inlinelaunch}
\end{figure}

A important invariant maintained by the Legion runtime system is that
tasks have exclusive access to regions to which they have write access
(we will see how to relax this requirement in
Chapter~\ref{chap:coherence}).  This invariant implies that when a
parent task calls a child task, any regions passed to the child that
may be written must be unmapped before the call and remapped after the
call.  (To see that unmapping a region before passing it to a child
task is necessary, keep in mind that task calls are asynchronous, and
so the parent and child tasks may execute in parallel.  If a
parent task does not unmap a region required by a child task with
write privileges, the application will most likely deadlock.)  For
regions passed as region requirements to task launches and where the
programmer has not explicitly mapped (or unmapped) the region, the
runtime system automatically wraps the task call in the necessary
calls to $\tt unmap\_region$ and $\tt map\_region$.  In cases where
the parent task does not touch the region across many child task
calls, performance can be improved if the application explicitly
unmaps the region at the earliest possible point and maps the region
again at the latest possible point, thereby avoiding any
mapping/unmapping of the region during intermediate task calls.
Whenever a program explicitly maps or unmaps a region $r$ within a
task, the Legion runtime will no longer silently wrap child task
invocations with calls to unmap/map $r$.

\section{Layout Constraints}
\label{sec:layout}
In Chapter~\ref{chap:tasks} we introduced the idea of a {\em constraint}, a restriction specified by the program on how the Legion runtime
may use certain application objects, such as specifying what kind of processor a task can execute on.  The most commonly used constraints
on regions are {\em layout constraints}.


\begin{figure}
\begin{lstlisting}
int main(int argc, char **argv)
{
  Runtime::set_top_level_task_id(TOP_LEVEL_TASK_ID);

  LayoutConstraintID column_major;
  {
    OrderingConstraint order(true/*contiguous*/);
    order.ordering.push_back(DIM_X);
    order.ordering.push_back(DIM_Y);
    order.ordering.push_back(DIM_F);
    LayoutConstraintRegistrar registrar;
    registrar.add_constraint(order);
    column_major = Runtime::preregister_layout(registrar);
  }

  LayoutConstraintID row_major;
  {
    OrderingConstraint order(true/*contiguous*/);
    order.ordering.push_back(DIM_Y);
    order.ordering.push_back(DIM_X);
    order.ordering.push_back(DIM_F);
    LayoutConstraintRegistrar registrar;
    registrar.add_constraint(order);
    row_major = Runtime::preregister_layout(registrar);
  }

  {
    TaskVariantRegistrar registrar(TOP_LEVEL_TASK_ID, "top_level");
    registrar.add_constraint(ProcessorConstraint(Processor::LOC_PROC));
    Runtime::preregister_task_variant<top_level_task>(registrar, "top_level");
  }

  {
    TaskVariantRegistrar registrar(INIT_MATRIX_TASK_ID, "init_matrix");
    registrar.add_constraint(ProcessorConstraint(Processor::LOC_PROC));
    registrar.add_layout_constraint_set(0, column_major); 
    registrar.set_leaf();
    Runtime::preregister_task_variant<init_matrix_task>(registrar, "init_matrix");
  }

  {
    TaskVariantRegistrar registrar(TRANSPOSE_MATRIX_TASK_ID, "transpose");
    registrar.add_constraint(ProcessorConstraint(Processor::LOC_PROC));
    registrar.add_layout_constraint_set(0, column_major);
    registrar.add_layout_constraint_set(1, row_major);
    registrar.set_leaf();
    Runtime::preregister_task_variant<transpose_matrix_task>(registrar, "transpose");
  }

  {
    TaskVariantRegistrar registrar(CHECK_MATRIX_TASK_ID, "check_matrix");
    registrar.add_constraint(ProcessorConstraint(Processor::LOC_PROC));
    registrar.add_layout_constraint_set(0, column_major);
    registrar.set_leaf();
    Runtime::preregister_task_variant<check_matrix_task>(registrar, "check_matrix");
  }

  return Runtime::start(argc, argv);
}
\end{lstlisting}
\caption{The {\tt main} function from {\tt legion/examples/layout\_constraints/transpose.cc}.}
\label{fig:layout}
\end{figure}

Figure~\ref{fig:layout} shows the {\tt main} function from one of the examples in the Legion repository.  The function first defines two layout constraints, named {\tt column\_major} (lines 5-14) and {\tt row\_major} (liines 16-25).
The constants {\tt DIM\_X}, {\tt DIM\_Y} and {\tt DIM\_Z} are distinguished names given to the first three dimensions of an index space; {\tt DIM\_F} stands for all the fields of a region. An {\tt OrderingConstraint}
specifies which dimension varies the fastest in a region's layout: In {\tt column\_major} it is the {\tt x} dimension, in {\tt row\_major} it is the {\tt y} dimension.  The {\tt z} and higher dimensions are ignored here
because the regions involved for this application are two dimensional.   Note that these are struct-of-array layouts; by putting the field dimension first we could specify array-of-struct layouts as well.

To use a layout constraint we associate it with a region argument of a task.  When the {\tt transpose\_matrix\_task} is registered (lines 42-47), the first region argument (region 0) is constrained to be in {\tt column\_major} layout while the
second region argument (region 1) is constrained to be in {\tt row\_major} layout.  When the task is run the runtime system ensures that the physical instances used by the task adhere to any layout constraints.

It is also possible to specify blocked layouts. See the declarations and comments in {\tt legion/runtime/legion/legion\_constraint.h}.

Layout constrains are most commonly used in two situations. First, the code for a task may require a certain layout.  For example, vectorized code will necessarily require that the vectorized dimension be the fastest varying.  Second, when interoperating with
external systems, by using layout constraints the Legion system can describe what the layout of the pre-existing data is, allowing the runtime to correctly interpret it and work with it without making unnecesssary copies.