Skip to content

Pseudo Column in Templating Enviornments

Aref Shafaei edited this page Mar 3, 2020 · 17 revisions

We are going to describe the set of changes that is required for allowing pseudo columns in templating environments. With these changes we can access any pseudo column data in pattern annotations (markdown_pattern, url_pattern, etc.). For an easier explanation we are going to focus on an example. We are going to show the structure of object that is accessible through pattern annotations.

Table of Contents

ERD

Assume the following is the ERD and we're writing these annotations for the table main.

Old Syntax

Based on the given ERD, the following will be the object available in templating environments.

{
  "id": 1234,
  "_id": "1234",
  "text_col": "abc",
  "_text_col": "abc",
  "array_col": "1,234, 1,235, 1,236",
  "_array_col": [1234, 1235, 1236],
  "jsonb_col": "{\n\"test\": \"1\"}\n",
  "_jsonb_col": {
    "tes": "1"
  },
  "fk_to_f1": 1,
  "_fk_to_f1": "1",
  "$fkeys": {
    "schema": {
      "fk1_cons": {
        "values": {
          "id": 1,
          "_id": 1,
          "f1_text": "t",
          "_f1_text": "t"
        },
        "rowName": "row name of 1",
        "uri": {
          "detailed": "link to record"
        }
      }
    }
  }
}

To give you an example of what that means, the following are some ways of accessing this data:

{{{text_col}}} {{{_array_col}}}
{{{$fkeys.schema.fk1_cons.values.f1_int}}} {{{$fkeys.schema.fk1_cons.values._f1_int}}}

New Syntax

With the new syntax, our goal are to allow data-modelers to

  • access pseudo-columns in templating environment, and
  • explicitly define the list of foreign-keys to address performance issue (e.g. use this feature to avoid adding unnecessary out-bound foreign-keys to the main request).

As we discussed in the issue, we are planning on introducing a new table-level annotation tag:isrd.isi.edu,2019:source_definitions to define the data that you want to be available in any pattern environment within that table.

"tag:isrd.isi.edu,2019:source_definitions": {
  "columns": true, // list of columns. True means all the columns. False/undefined means no column.
  "fkeys": true, // list of fkeys. true means all the fkeys. False/undefined means no column.
   "sources": {} // an object, defining the available pseudo-columns. the key will be used for referring to them in template
}

For backward-compatibility purpose you can also define the list of fkeys that will be accessible using the $fkey_schema_constraint syntax. To give you an example, assume that the following are annotations for main and f2 tables:

"tables": {
  "main": {
    "annotations": {
      "tag:isrd.isi.edu,2019:source_definitions":{
        "columns": ["text_col", "jsonb_col", "array_col"], // or true for all columns; default is true
        "fkeys": [["schema", "fk1_cons"]], // or true for all outbound fks; default is true
        "sources": {
          "all-outbound-custom-name": {
            "source": [{"outbound": ["schema", "fk1_cons"]}, "f1_id"], 
            "entity": true
          },
          "entity_set-custom-name": {
            "source": [{"inbound": ["schema", "fk2_cons"]}, "id"], 
            "entity": true,
          },
          "entity_array_aggregate-custom-name": {
            "source": [{"inbound": ["schema", "fk2_cons"]}, "id"], 
            "entity": true, 
            "aggregate": "array"
          },
          "scalar_array_aggregate-custom-name": {
            "source": [{"inbound": ["schema", "fk2_cons"]}, "f2_text"], 
            "entity": false,
            "aggregate": "array"
          },
          "min_aggregate-custom-name": {
            "source": [{"inbound": ["schema", "fk2_cons"]}, "f2_text"], 
            "entity": false,
            "aggregate": "min"
          }
        }
      }
    }
  },
  "f2": {
    "annotations": {
      "tag:isrd.isi.edu,2019:source_definitions":{
        "columns": true,
        "fkeys": [["schema", "fk3_cons"]],
        "sources": {
          "all_outbound_defined_on_f2": {
            "source": [{"outbound": ["schema", "fk3_cons"]}, "f3_id"]}
          }
        }
      }
    }
  }
}

Therefore this will be the object that you have access to in templates written for table main (visible-columns, visible-foreign-keys, etc.):

{
  "text_col": "test",
  "_text_col": "test",
  "jsonb_col": "\n{\"a\": 1\n}",
  "_jsonb_col": {
    "a": 1
  },
  "array_col": "1,234, 1,235, 1,236",
  "_array_col": [1234, 1235, 1236],
  "$fkey_schema_fk1_cons": {  // requires refactoring annotation to flatten the new default key names (see issues below)
     "values": {
      "f1_id": 1234,
      "_f1_id": 1234,
      "f1_text": "t",
      "_f1_text": "t"
    },
    "uri": {
      "detailed": ""
    },
    "rowName": ""
  },
  "all_outbound-custom-name": {
    "values": {
      "f1_id": 1234,
      "_f1_id": 1234,
      "f1_text": "t",
      "_f1_text": "t"
    },
    "uri": {
      "detailed": ""
    },
    "rowName": ""
  },
  "entity_set-custom-name": [
    {
      "values": {
        "id": 1,
        "_id": 1,
        "fk_to_main": 1,
        "_fk_to_main": "1",
        "f2_text": "t1",
        "_f2_text": "t1",
        "$fkeys_schema_fk3_cons": { // through fkey definition; not available in arr_agg
           "values": {
             "f3_id": 1234,
             "_f3_id": 1234
           },
           "uri": {"detailed": ""},
           "rowName": ""
        },
        "all_outbound_defined_on_f2": { // through source definition; not available in arr_agg
          "values": {
            "f3_id": 1234,
            "_f_id": 1234
          },
          "uri": {"detailed": ""},
          "rowName": ""
        }
      },
      "uri": {
        "detailed": ""
      },
      "rowName": "row_1",
    }
  ],
  "entity_array_aggregate-custom-name": [
    {
      "values": {
        "id": 1,
        "_id": 1,
        "fk_to_main": 1,
        "_fk_to_main": "1",
        "f2_text": "t1",
        "_f2_text": "t1"
      },
      "uri": {
        "detailed": ""
      },
      "rowName": "row_1"
    }
  ],
  "scalar_array_aggregate-custom-name": "t1, t2",
  "_scalar_array_aggregate-custom-name": ["t1", "t2"],
  "min_aggregate-custom-name": "t1",
  "_min_aggregate-custom-name": "t1",
}

Integration with visible columns and visible fks

"tag:isrd.isi.edu,2016:visible-columns": {
  "detailed": [
      {
        "source": null,
        "waitfor": ["key1"], // inform data dependency. List all keys (except columns) used in template 
        "display": { 
          // All columns defined in the main table active list, 
          // and required_sourcekeys are allowed in templating 
          "markdown_pattern": "<prefix>{{{key1.0.rowName}}} <suffix> {{{key1.0.values.RID}}}",
          "page_size" : 10,
          "row_order" : [{"column": col_name, "descending": true}]
        }
      },
      {
        "sourcekey": "key1"
      }
   ]
}

Rules

  • sourcekey can be used in visible-columns and visible-foreign-keys. This will allow data-modelers to refer to one of the existing environment definitions.

    • If both of source and sourcekey are defined, we are going to ignore the source and use the sourcekey.
    • We are going to continue detecting duplicates. Duplicate detection should continue using the internal hash which is solely based on source definition (even if you use different sourcekeys, only one is allowed in visible-columns).
  • Should source and sourcekey be optional?

    • visible-column: 2 options
      • optional (preferred). If there is no markdown_name then column header is empty. If there is no markdown_pattern, the content is empty. z
      • markdown_name and markdown_pattern have to be defined if source/sourcekey is missing.
    • visible-fk: either source or sourcekey needs to be defined (for simplicity and consistent experience of tabular mode).
  • We should allow display and other visible-columns/fks attribute (markdown_name, etc.) in the definition list. If display object is defined on the visible-column/fk, it will overwrite that attribute (even if it doesn't have all the attributes defined on the definition list.

    • Column display heuristics:
      • Use the display defined on the visible_column/visible_fks definition.
      • Use the display defined on the environment definition.
      • Use the old display settings (scalar -> column-display, entity -> table-display, etc.).
    • Note: Record page supports both nested table and array_agg (not recommended). In the array_agg case, the array display is performed by the client side on the returned result. In the nested table case, this is used to formulate the server requests for row order and page size. Therefore, if the visible columns or visible fk definition has display defined, then it is overwriting the table-display definition for tabular mode.
  • In the case of array aggregate and inline table in record, we should allow row_order and page_size in display settings.

    • When these attributes are defined for the nested table in detailed page, they will override the table-display too. They will affect both tabular and markdown displays.
    "compact": [
      { 
        // source definition attributes
        "source": [],
        "entity": boolean,
        "aggregate": "ag_fn", 
    
        // available optional display attributes
        "markdown_name": "",
        "comment": "",
        "array_display": "csv" //olist, ulist, csv, raw (should be moved to display.array_ux_mode)
    
        // proposed attributes:
        "waitfor": []
        "display": { // support for all columns and pseudo-columns (e.g. non-facet)
          "markdown_pattern": "",
          "array_ux_mode": "csv" // currently these are defined under array_display, same level as source
        },
    
        // it is only applicable for array_agg
        // its impact is not limited to the current column and it will affect the data that will be injected everywhere
        "array_options": {
          "order": [],
          "max_length": 12
        }
      }
    ]
  • The query generation in ermrestjs should be based on the unique hash-name that is generated based on the source definition. This means that source definitions that look the same but data-modeler gave them different names, should not generate multiple request. We should do one request but the same data should be available in those given key names.

  • ermerstjs/chaise must dynamically determine which environment variables are used in order to get their data. Apart from the used environment variables, we have to generate requests for all the sources defined on visible-columns and visible-foreignkeys (depending on the app).

  • For pseudo-columns that define an inline table in record page, we are going to get the data for all of its outbound foreignkeys too. We are going to use its environment list for getting this set of foreignkeys. We are going to get all the all-outbound fks that are visible in the nested table. We are going to get this based on the waitfor list.

  • There should be a way to inform ermrestjs that the data is available and render the pattern. We need this because we want to have an indicator to show that the data is finalized and it won't be updated. We could render all the templates after getting any of the individual requests but that would be confusing since users won't know when the page has finished loading. For that, we have added the waitfor field.

    • It's an array of dependencies. We are going to render the pattern as soon as we got the request for all the dependencies.
    • if it's null, then the templating env only have access to all columns defined in the main table.
    • You need to list all the pseudo-columns that have a path.
    • In record page, you can list entity set pseudo-columns (the source definitions that have at least one inbound in their path and don't have any aggregate column) in your wait_for list. But if you have the same entity set as part of your visible columns (inline table) or visible foreign keys (related table), we're going to send multiple requests even though they refer to the same set of data. This is because when the entity set is visible, users can manipulate the visible set of data and we don't want to keep these two (entity sets in wait_for and visible entity sets) in sync to reduce the confusion.
  • In recordset, the requests for getting the pseudo-columns will be prioritized from left to right in the visible-columns list, and their wait_for definition.

  • In the record page, this is the order of sending requests:

    1. The main entity request is the first request that we have to send.
    2. Traverse the list of visible-columns from top to bottom and if the column is inline without any wait_for, or has only entity sets as part of its wait_for; we're going to send the requests for them (and their wait_for list definition).
    3. Traverse the list of visible-columns and if the column is aggregate, or has aggregate in its wait_for; we're going to send the requests for them (and their wait_for list definition).
    4. Since we're showing the related tables from top to bottom, we should send the requests from top to bottom as well (regardless of their types or whether they have aggregate in their wait_for).
    5. Aggregate requests that are displayed in the inline tables tabular mode.
    6. Aggregate requests that are displayed in the related tables tabular mode.

    To demonstrate this, assume that the following is the visible-columns definition (columns named es are entity sets and agg are aggregates):

    i1 (wait_for: agg1, agg2)
    i2 (wait_for: es1, es3, agg1, agg3)
    i3 (wait_for: es1, es2)
    agg4
    i4
    

    Then the following will be the order of sending requests:

     (i3, es1, es2), (i4), (i1, agg1, agg2), (i2, es3, agg3), (agg4) 
    

    And if this was the list of visible related entities, the following would be the order of requests:

    (i1, agg1, agg2), (i2, es1, es3, agg3), (i3, ess), i4
    

default fkey syntax

  • TBD For now, we decided to flatten the $fkeys object. So that if it used to be accessible with $fkeys.schema.constarint syntax, we need to change it to $fkey_schema_constraint.
    • the current format is hierarchical and is not consistent with the self-defined key (which is flat). For consistency, can we change the syntax to be the flatten name?
{
  "text_col": "test",
  "_text_col": "test",
  ...
  "$fkey_schema_fk1_cons": {  // requires refactoring annotation to flatten the new default key names (see issues below)
     "values": {
      "f1_id": 1234,
      "_f1_id": 1234,
      "f1_text": "t",
      "_f1_text": "t"
    },
    "uri": {
      "detailed": ""
    },
    "rowName": ""
  }
  ...
}
vs
{
  "text_col": "test",
  "_text_col": "test",
  ...
  "$fkeys": {
    "schema": {
      "fk1_cons": {
        "values": {
          "id": 1,
          "_id": 1,
          "f1_text": "t",
          "_f1_text": "t"
        },
        "rowName": "row name of 1",
        "uri": {
          "detailed": "link to record"
        }
      }
    }
  }
  ...
}

Examples (GUDMAP)

  • displaying min TS and Max TS on the Gene Page