Skip to content

Commit 8b7b7c7

Browse files
authored
Added generalization using panderas (#84)
See linkml/linkml#671 Added ability to parse a table on the web Adding more documentation
1 parent 803d7d1 commit 8b7b7c7

File tree

12 files changed

+702
-15
lines changed

12 files changed

+702
-15
lines changed

docs/packages/generalizers.rst

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,277 @@ Generalizers take example data and *generalizes* to a schema
1010
Generalization is inherently a heuristic process, this should be viewed as a bootstrapping process
1111
that *semi*-automates the creation of a new schema for you.
1212

13+
Generalizing from a single TSV
14+
-----------------
15+
16+
.. code-block::
17+
18+
schemauto generalize-csv tests/resources/NWT_wildfires_biophysical_2016.tsv -o wildfire.yaml
19+
20+
The schema will have a slot for every column, e,g:
21+
22+
.. code-block:: yaml
23+
24+
classes:
25+
Observation:
26+
slots:
27+
- site
28+
- plot
29+
- plot_size
30+
- date
31+
- observer
32+
33+
Ranges will be auto-inferred, e.g.:
34+
35+
.. code-block:: yaml
36+
37+
slots:
38+
site:
39+
examples:
40+
- value: ZF20-105
41+
range: string
42+
plot:
43+
examples:
44+
- value: '6'
45+
range: integer
46+
plot_size:
47+
examples:
48+
- value: 10X10
49+
range: plot_size_enum
50+
date:
51+
examples:
52+
- value: '2016-07-09'
53+
range: datetime
54+
55+
Enums will be automatically inferred:
56+
57+
.. code-block:: yaml
58+
59+
enums:
60+
plot_size_enum:
61+
permissible_values:
62+
10X10:
63+
description: 10X10
64+
5x5:
65+
description: 5x5
66+
2.5X2.5:
67+
description: 2.5X2.5
68+
5X5:
69+
description: 5X5
70+
3x3:
71+
description: 3x3
72+
ecosystem_enum:
73+
permissible_values:
74+
Open Fen:
75+
description: Open Fen
76+
Treed Fen:
77+
description: Treed Fen
78+
Black Spruce:
79+
description: Black Spruce
80+
Poor Fen:
81+
description: Poor Fen
82+
Fen:
83+
description: Fen
84+
Lowland:
85+
description: Lowland
86+
Upland:
87+
description: Upland
88+
Bog:
89+
description: Bog
90+
Lowland Black Spruce:
91+
description: Lowland Black Spruce
92+
93+
Chaining an annotator
94+
-----------------
95+
96+
If you provide an ``--annotator`` option you can auto-annotate enums:
97+
98+
.. code-block::
99+
100+
schemauto generalize-csv \
101+
--annotator bioportal:envo \
102+
tests/resources/NWT_wildfires_biophysical_2016.tsv \
103+
-o wildfire.yaml
104+
105+
.. code-block:: yaml
106+
107+
ecosystem_enum:
108+
from_schema: https://w3id.org/MySchema
109+
permissible_values:
110+
Open Fen:
111+
description: Open Fen
112+
meaning: ENVO:00000232
113+
exact_mappings:
114+
- ENVO:00000232
115+
Treed Fen:
116+
description: Treed Fen
117+
meaning: ENVO:00000232
118+
exact_mappings:
119+
- ENVO:00000232
120+
Black Spruce:
121+
description: Black Spruce
122+
Poor Fen:
123+
description: Poor Fen
124+
meaning: ENVO:00000232
125+
exact_mappings:
126+
- ENVO:00000232
127+
Fen:
128+
description: Fen
129+
meaning: ENVO:00000232
130+
Lowland:
131+
description: Lowland
132+
Upland:
133+
description: Upland
134+
meaning: ENVO:00000182
135+
Bog:
136+
description: Bog
137+
meaning: ENVO:01000534
138+
exact_mappings:
139+
- ENVO:01000535
140+
- ENVO:00000044
141+
- ENVO:01001209
142+
- ENVO:01000527
143+
Lowland Black Spruce:
144+
description: Lowland Black Spruce
145+
146+
The annotation can also be run as a separate step
147+
148+
See :ref:`annotators`
149+
150+
Generalizing from multiple TSVs
151+
------------
152+
153+
You can use the ``generalize-tsvs`` command to generalize from *multiple* TSVs, with
154+
foreign key linkages auto-inferred.
155+
156+
For example, given a file ``envo.tsv``:
157+
158+
.. csv-table:: environments
159+
:header: envo term id, envo term label
160+
161+
ENVO_01000752,area of barren land
162+
ENVO_01001570,terrestrial ecoregion
163+
ENVO_01001581,sea surface layer
164+
ENVO_01001582,forest floor
165+
166+
And a file file ``samples.tsv``:
167+
168+
.. csv-table:: samples
169+
:header: BIOSAMPLE_ID,BIOSAMPLE_NAME,ENVO_BIOME_ID,ENVO_FEATURE_ID,ENVO_MATERIAL_ID
170+
171+
156554,"Enriched cells from forest soil in Barre Woods Harvard Forest LTER site, Petersham, Massachusetts, United States - Alteio_BWOrgControl_Nextera2",ENVO_01000174,ENVO_01000159,ENVO_00002261
172+
156649,"Enriched cells from forest soil in Barre Woods Harvard Forest LTER site, Petersham, Massachusetts, United States - Alteio_BWOrgHeat_Nextera5",ENVO_01000174,ENVO_01000159,ENVO_00005781
173+
156728,"Enriched cells from forest soil in Barre Woods Harvard Forest LTER site, Petersham, Massachusetts, United States - Alteio_BWOrgHeat_Nextera84",ENVO_01000174,ENVO_01000159,ENVO_00005781
174+
156738,"Enriched cells from forest soil in Barre Woods Harvard Forest LTER site, Petersham, Massachusetts, United States - Alteio_BWMinControl_Nextera2",ENVO_01000174,ENVO_01001275,ENVO_00002261
175+
176+
We can create a multi-class schema, with foreign keys inferred:
177+
178+
.. code-block::
179+
180+
schemauto generalize-tsvs --infer-foreign-keys sample.tsv envo.tsv
181+
182+
This will generate a schema with two classes, where the join between the sample table and the term table
183+
is inferred:
184+
185+
.. code-block:: yaml
186+
187+
classes:
188+
sample:
189+
slots:
190+
- BIOSAMPLE_ID
191+
- BIOSAMPLE_NAME
192+
- ENVO_BIOME_ID
193+
- ENVO_FEATURE_ID
194+
- ENVO_MATERIAL_ID
195+
envo:
196+
slots:
197+
- ENVO_ID
198+
- ENVO_LABEL
199+
200+
slots:
201+
BIOSAMPLE_ID:
202+
range: integer
203+
BIOSAMPLE_NAME:
204+
range: string
205+
ENVO_BIOME_ID:
206+
examples:
207+
- value: ENVO_01000022
208+
range: envo
209+
ENVO_FEATURE_ID:
210+
range: envo
211+
ENVO_MATERIAL_ID:
212+
range: envo
213+
ENVO_ID:
214+
identifier: true
215+
range: string
216+
ENVO_LABEL:
217+
range: string
218+
219+
Generalizing from tables on the web
220+
-----------------
221+
222+
You can use ``generalize-htmltable``
223+
224+
.. code-block::
225+
226+
schemauto generalize-htmltable https://www.nature.com/articles/s41467-022-31626-4/tables/1
227+
228+
Will generate:
229+
230+
.. code-block:: yaml
231+
232+
name: example
233+
description: example
234+
id: https://w3id.org/example
235+
imports:
236+
- linkml:types
237+
prefixes:
238+
linkml: https://w3id.org/linkml/
239+
example: https://w3id.org/example
240+
default_prefix: example
241+
slots:
242+
GWAS trait:
243+
examples:
244+
- value: "\xC2"
245+
range: string
246+
Peak GWAS SNP:
247+
examples:
248+
- value: rs2974298
249+
range: string
250+
Gene:
251+
examples:
252+
- value: SMIM19
253+
range: string
254+
NK cell cis eSNP:
255+
examples:
256+
- value: rs2974348
257+
range: string
258+
TWAS Z score:
259+
examples:
260+
- value: '3.809'
261+
range: string
262+
TWAS P value:
263+
examples:
264+
- value: '0.0001'
265+
range: string
266+
classes:
267+
example:
268+
slots:
269+
- GWAS trait
270+
- Peak GWAS SNP
271+
- Gene
272+
- NK cell cis eSNP
273+
- TWAS Z score
274+
- TWAS P value
275+
276+
Generalizing from JSON
277+
-----------
278+
279+
280+
281+
Packages
282+
--------
283+
13284
.. currentmodule:: schema_automator.generalizers
14285

15286
.. autoclass:: CsvDataGeneralizer

docs/packages/importers.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,31 @@ Importers are the opposite of `Generalizers <https://linkml.io/linkml/generators
1414
representation that lacks `inheritance <https://linkml.io/linkml/schemas/inheritance.html>`_, no ``is_a`` slots
1515
will be created.
1616

17+
Importing from JSON-Schema
18+
---------
19+
20+
The ``import-json-schema`` command can be used:
21+
22+
.. code-block::
23+
24+
schemauto import-json-schema tests/resources/model_card.schema.json
25+
26+
Importing from OWL
27+
---------
28+
29+
You can import from a schema-style OWL ontology. This must be in functional syntax
30+
31+
Use robot to convert ahead of time:
32+
33+
.. code-block::
34+
35+
robot convert -i schemaorg.ttl -o schemaorg.ofn
36+
schemauto import-owl schemaorg.ofn
37+
38+
39+
Packages
40+
-------
41+
1742
.. currentmodule:: schema_automator.importers
1843

1944
.. autoclass:: JsonSchemaImportEngine

0 commit comments

Comments
 (0)