Add component development guide to readme.

JGCRI · Dec 6, 2018 · aeff9ba · aeff9ba
1 parent 936f14e
commit aeff9ba
Showing 1 changed file with 112 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ assumes that
    could use data from previous time steps of _A_, so long as it did
    not attempt to access data from _A_ in the _current_ time step.
 
-2. Models can be run on a single node.  It _is_ possible to distribute
+2. Each model can be run on a single node.  It _is_ possible to distribute
    models across nodes, and models can run on multiple processors
    within a node.
 
@@ -95,16 +95,35 @@ will also need the `add_new_component` function from
 
 Once you've imported the necessary modules, continue by adding your
 custom components, if any.  Then, you will need to create the
-structure used as the argument to the `main()` function.  In the
+dictionary used as the argument to the `main()` function.  In the
 standalone version, this structure is created by the `argparse` module
-from the command line arguments.  
+from the command line arguments; however, it can be created by adding
+the following keys to a dictionary
 
+* **ctlfile** : Name of the configuration ("control") file.  
+* **mp**      : Flag indicating whether we are running in MP mode  
+* **logdir**  : Directory for log files. In SP mode this can be None, in which case log outputs go to stdout  
+* **verbose** : Flag indicating whether to produce debugging output.  
+* **quiet**   : Flag indicating whether to suppress output except for warnings and error messages  
+
+Here is a simple example of a python script that runs a setup
+including a custom component.  
 ```python
 from cassandra.cassandra_main import main
-## If including a component from an external module, add these lines
-## too.  (Make sure mycomp.py is in your python path!)
+## The next two lines are only necessary If including a component from an external module.
+## (Make sure mycomp.py is in your python path!)
 from cassandra.compfactory import add_new_component
-import mycomp.py
+from mycomp import MyNewComponent
+
+## 1. Add the new component
+add_new_component('MyComponentName', MyNewComponent)
+
+## 2. Set up the arguments structure
+args = {ctlfile : 'mymodels.ini', mp : False, logdir = None,
+        verbose : True, quiet : False}
+
+## Run the Cassandra main
+main(args)
 
 ```
 
@@ -114,4 +133,90 @@ import mycomp.py
 
 Cassandra's interface to models is provided by objects called
 _components_.  To add a model to the system, you have to create a
-component to run the model.  Components 
+component to run the model and provide its data to the other models
+running in the system.  Components export their data to the rest of
+the system by declaring _capabilities_ that other components will use
+to request data.
+
+### Capabilities
+
+Communication between components is organized around labels called
+capabilities.  A capability is a string that identifies a type of
+data that a component plans to export to the other components in the
+system.  A component declares its capabilities as it starts up, and
+other components can fetch by name the data associated with those
+capabilities.  The software imposes no restrictions or requirements on
+the format or type of the data provided for a capability.  Instead,
+these details are considered to be a matter of convention.  Component
+developers document the the details of data their components will
+export as capabilities, and it is the responsibility of components
+using that data to perform any necessary conversions.
+
+Capability names should generally be organized semantically,
+describing _what_ the data is, rather than how it's produced.  For
+that reason, it's best not to include the name of the model in the
+capability names.  Prefer something like
+`gridded-frobnitz-coefficient` over something like
+`fred-model-output`.  This makes it easy for users to swap out one model for
+another that provides the same capability, without having to change
+anything in the rest of the system.  Similarly, if a component
+provides multiple capabilities, consider adding parameter options to
+turn each of them off individually.  This allows users to reimplement
+one capability from your model while retaining all of the others.
+
+### Writing a Component
+
+Making a new component starts with creating a python class for the
+component.  This class must extend the `ComponentBase` class found in
+`cassandra.components`.  The `ComponentBase` provides all of the
+infrastructure needed to start up, shutdown, and communicate with
+other components in the configuration.  There are two methods that
+components may extend (_i.e._, the first thing the method must do is
+to call the base class method), and one that it must override (_i.e._,
+it must _not_ call the base class method).  These methods are:
+
+* `__init__(self, ct)` (extend) The second argument, called the
+  _capability table_ should be passed to the base class method.  After
+  that, you _may_ call `addcapability` to declare capabilities
+  (_i.e._, data that your model intends to provide to the rest of the
+  system).  Your model's parameter settings will not have been parsed
+  from the configuration file yet, so at this stage the only
+  capabilities that can be declared are those that don't depend on the
+  input parameters (such as output that the model always provides).
+
+* `finalize_parsing(self)` (extend) When this method is called, the
+  parameters parsed for the component from the configuration file will
+  be stored in `self.params`.  The component can use this information
+  to do any set-up it needs to do, and it can call `addcapability` to
+  declare capabilities for which it needs its parameters (_e.g._,
+  capabilities that can be turned on or off by parameter settings).
+  This will be the last opportunity to (safely) call `addcapability`,
+  so all remaining capabilities should be declared here.  If a
+  component has no additional parameter processing to do, then it can
+  skip extending this method.
+
+* `run_component(self)` (override) This method does the actual work of
+  running the model.  It should perform any remaining initialization
+  left to be done, launch the model, and run to completion.  While the
+  model is running (or, if necessary, before it starts), it can call
+  the component's `self.fetch(capability)` method to retrieve the data
+  associated with a capability.  It is not necesary to know what other
+  component provides the capability; the machinery in `ComponentBase`
+  figures that out.  If the component providing the data has not
+  finished yet, then `fetch` will block until the data is ready.
+  Trying to fetch a capability that has not been configured into the
+  system will raise a `CapabilityNotFound` exception.  If using that
+  type of data is optional, you can catch this exception and implement
+  whatever contingency plan exists for dealing with the missing data.  
+
+  When the model finishes, you should call
+  `self.addresults(capability, data)` to add `data` as the result for
+  the named capability.  You should do this for each capability you
+  declared in `__init__` and/or `finalize_parsing`.  Finally, have
+  your component return a value of `0` if the model run was
+  successful.  If your model produced some sort of error, you can
+  either raise an exception, or you can return any other value besides
+  `0` to signal an error.  
+
+
+