Skip to content
Vivek Raghuram edited this page Mar 13, 2018 · 35 revisions

Contact: vivek.raghuram@berkeley.edu
sttrott@ucsd.edu


On March 16, 2018, changes will be introduced into the core grammar that will likely break compatibility with older applications. Please use a release of the grammar dated prior to this change for older applications.

About

This is the first public release of the full ECG2 system from the NTL/ECG project. We had a previous Github release consisting only of an earlier version of the ECG Workbench that is now deprecated. The project has been active for 30 years at ICSI and UC Berkeley and builds on findings from several disciplines and myriad research efforts [1]. The core goal of NTL/ECG is a better understanding of language and thought, its realization in our brains, and how these insights can be applied clinically and in intelligent systems.

This Github release includes interoperable grammars and code for a range of “products” and documentation on how to use and expand these capabilities. All of this depends on a number of linguistic, neural, and computational principles that appear to be necessary for our ambitious goals. We see no way to approach these goals with tabula rasa machine learning techniques.

The term ECG stands for Embodied Construction Grammar. In general, construction grammars (CxG) are characterized by describing language as [form, meaning] pairs at all levels, ranging from phonology and morphology, through syntax and semantics to discourse and beyond. A crucial criterion is that all levels should be coherently combined to yield conceptual compositionality, in contrast to the traditional lexical compositionality [2].

ECG [3] extends the scope of general CxG to include a well defined semantic formalism for expressing the meaning “pole” of [form, meaning] pairs. The ECG formalism is based on Schemas, largely motivated by Cognitive Linguistics and a related computational realization of CxG Constructions. Importantly, this semantic formalism focuses on action and simulation as opposed to the traditional identification of meaning with “truth”.

Following results from the Neural Theory of Language (NTL) [1], many of the schemas in an ECG grammar are symbolic descriptions of putative conceptual universals of perception, action, and thought. These are postulated to be encoded neurally and are largely shared because of our common physiology and requirements for functioning in the physical and social world. Other universal schemas capture regularities of language processing and are also shared across grammars. In this release, the grammar Core.grm contains the schemas (and constructions) suggested as a starting point for new grammars.

Of course, there are also schemas and constructions that are specialized to a particular language, culture, or domain. For historical reasons, domain specific content schemas are also called (Fillmore) Frames. Again, our task criteria require conceptual compositionality across universal and specialized schemas and constructions. The CompRobots.grm grammar is an extension of Core.grm to the domain of (simulated) robotics.

One central function of this release is the ability to view, test, and modify ECG grammars. This was the main purpose of the previous (deprecated) release, but this version has been refactored and significantly improved. Information on how to download and use this capability is available here. The key components are the ECG Workbench and the Analyzer [3]. The workbench (WB) is an Eclipse Rich-Client-Platform (RCP) based project editor with a number of grammar-specific capabilities, including the ability to analyze input sentences against a grammar, such as CompRobots. A tutorial on WB use is available here.

The result of a successful analysis is a complex linked conceptual structure called the Semantic Specification (SemSpec) [3]. This was the extent of earlier ECG systems and, for many linguistics purposes, the grammar and the resulting SemSpecs after analysis suffice. ECG2 significantly extends the scope of NLU, but all further processing depends on these SemSpecs. Figure 1 is an overview of the full ECG2 framework; the left side is called the Language Side and shows the Analyzer and SemSpec. The workbench is a separate tool, but is tightly integrated for debugging.


Figure 1

As is well known, natural language almost always underspecifies the meaning of an utterance in context. From the outset, an ECG SemSpec was designed to capture all the meaning relations expressed in an utterance, but to leave open those that are not expressed. Of course, some meanings can only be determined by background, goals, and context. Crucially, the ECG2 framework of Figure 1 assumes that an NLU product is focused on some background, goals, and context. For concreteness, think of an NLU system for controlling a (simulated) robot (video, wiki). The right side of Figure 1, is called the App side and, in the robot case, would include a robot API and a robotics oriented Problem Solver, perhaps with a path planner.

So, the central problem of ECG2 is how to couple a general deep analysis Natural Language capability with specific context and goals. One key step is the introduction of an Action Specification (ActSpec) formalism. This is encoded as (JSON) general feature structures and was called “N-tuples” in some publications. Another basic ability of the framework is to view the ActSpecs that result from the successful analysis of an input. A description of how to do this is available here.

Since the analysis is task-independent and the ActSpec is task-dependent, there needs to be a program (the Specializer) that traverses the SemSpec and outputs appropriate ActSpecs (N-tuples), as depicted in the bottom middle of Figure 1. The framework makes it easy to work separately on the language and app side of a project, using ActSpecs as the link. Looking ahead, building an NLU product for a new domain requires defining the shared vocabulary (ontology) and ActSpec templates for the new task. There are workbench tools to help.

A good next step would be to run the Text-based Robot Demo, as described here. This extends the product to produce a text version of the basic robot simulation. Its main advantage is that it does not require installing a robot API or simulator, which can be tricky. Another requirement of an ECG2 NLU system is that it be able to operate without human intervention, for example in an automated car. The text-based system also illustrates the ability of the Problem Solver to detect errors and carry out clarification dialogs as discussed here. Instructions for installing a ROS or Morse robot module can be found at here and here.

In addition to these various components, the release includes code and documentation for the ECG2 framework that ties it all together. There is a good description of the system design in this paper. As a starting point for new products, the release includes a Core framework that has simple versions of all the components along the bottom of Figure 1.

References

  1. Feldman, J. From Molecule to Metaphor: A Neural Theory of Language, 2005, MIT Press.
  2. Feldman, J. Embodied language, best-fit analysis, and formal compositionality. Phys Life Rev, 2010 Dec;7(4):385-410. doi: 10.1016/j.plrev.2010.06.006
  3. Embodied Construction Grammar   Feldman,J. Dodge,E. and Bryant, J. The Oxford Handbook of Linguistic Analysis Edited by Bernd Heine and Heiko Narrog Oxford, 2009
  4. Bryant, John Edward. Best-fit constructional analysis., 2008.

Paper Links

NOTE: Some of the papers use slightly different terminology and system diagrams, but the underlying structure and system flow is the same. For example, we are moving towards using the word "ActSpec" (Action Specification) instead of "n-tuple", since it is more informative, but the function is the same.

Policies

If you make a change to the core API, and think it should be integrated into the framework, please submit a pull request, and the repository owner can review it.

Navigating the Project

These are some of the important repositories that compose the ECG2 system and related demos. You'll need to download some or all of these. Please view the getting started page for general installation information or follow one of the tutorials. Each repository's wiki will have more information about the each repository.

Disclaimer: Our system does rely on several other software packages. All system requirements are listed in the respective repositories. Although we've added installation instructions and tips on getting things compatible, we cannot control for version skew / software rot that occurs after the writing of this tutorial.

ECG Framework

The ecg_framework_code repository contains code for the core modules of the general NLU system. These modules are all implemented to facilitate easy retargeting of the system to new application domains. Even without retargeting, the core modules function as an integrated starter application. The wiki gives an overview of the various components of the system as well as tutorials regarding how to run it.

ECG Grammars

The ecg_grammars repository contains both a core grammar that can be used to develop new grammars as well as the grammars used for many of the projects and demos based on ECG. The wiki gives an overview of embodied construction grammars.

ECG Workbench

The ECG Workbench is a tool for viewing and editing ECG grammars, analyzing and parsing sentences using the ECG Analyzer, performing evaluation metrics on a grammar using the Sentence Test Runner View and linking vocabulary and ontologies between the language and action sides using the Token Tool. The wiki contains a tutorial on how to use the ECG Workbench.

ECG Robot Demo

The ecg_robot_code repository contains code for controlling MORSE and ROS robots through natural language. The wiki contains tutorials for running both the ROS and MORSE demos as well as a text-based robot demo.

Tutorials

Videos

FrameNet

We have also collaborated with the folks at FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/) to build a system that can automatically hypothesize ECG constructions and schemas from FrameNet data.