Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering out embargoed data #18

Open
dosumis opened this issue Jun 23, 2020 · 8 comments
Open

Filtering out embargoed data #18

dosumis opened this issue Jun 23, 2020 · 8 comments
Assignees

Comments

@dosumis
Copy link
Member

dosumis commented Jun 23, 2020

  • We need to add a filtering step so that datasets can be embargoed. In pipeline 1, this step is at the OWL export stage. In p2 it should also be between KB and the integration layer.

  • The step is essential to having a complete, functioning pipeline.

  • Filtering works by matching a graph pattern, starting from DataSet. Note, graph patterns specifying filters are likely to be close or identical to the ones we need to schema validation. We can base specification on the filtering code used in owl generation.

@dosumis
Copy link
Member Author

dosumis commented Jun 23, 2020

This Cypher pattern documents a standard pattern of relationships from a single individual

MATCH (ds:DataSet) where ds.production
WITH ds
MATCH lp=(ds)-[:has_license]-(l:License)
WITH ds, lp
OPTIONAL MATCH pp=(ds)-[:has_reference]->(p:pub)
WITH ds, lp, pp
MATCH ip=(ds)<-[:has_source]-(i:Individual)<-[:depicts]-(ch:Individual)-[:in_register_with]->(tc:Individual)-[:depicts]-(t)
WITH ds, lp, pp, ip, i
MATCH icp=(i)-[:INSTANCEOF|Related|hasDbXref]->(c)
RETURN lp,pp, ip, icp limit 2

TBA - (i)-[]-(i) where bother are in production
(cc)-[]-(Class

In addition - all classes and relationships between them should be loaded.

@dosumis
Copy link
Member Author

dosumis commented Jun 23, 2020

Alternative approach - delete entities under embargo after full release

  1. Delete all ds:DataSet where ds.production is False
  2. Delete all i:Individual where (ds)-[:has_source]-(i:Individual)<-[:depicts]-(ch:Individual) WHERE ds.production is False

@matentzn matentzn self-assigned this Jun 23, 2020
@matentzn matentzn changed the title Filtering: Filtering out embargoed data Jun 23, 2020
@matentzn
Copy link
Collaborator

If there is no production flag = true, delete.

@matentzn
Copy link
Collaborator

Flag for broken images that should be filtered out.

@matentzn
Copy link
Collaborator

2 flags:

  • block: true. If block: true, remove the edge/remove the node
  • production: true
  • staging: true (if production is true, staging is true as well)

We want to be able to run a branch of the pipeline in staging mode.

@matentzn
Copy link
Collaborator

blocking on plugin level.

matentzn added a commit to VirtualFlyBrain/vfb-pipeline-collectdata that referenced this issue Jul 4, 2020
- Added SHACL pipeline #3
- Found first solution for SPARQL based embargo (VirtualFlyBrain/neo4j2owl#18)
@matentzn
Copy link
Collaborator

matentzn commented Jul 5, 2020

  • Double check this is the list of currently embargoed data:
http://virtualflybrain.org/data/CostaJefferis_v2
http://virtualflybrain.org/data/TrumanWood2018
http://virtualflybrain.org/data/FlyLight2019LateralHorn2019
http://virtualflybrain.org/data/FlyLight2019Strother2017
http://virtualflybrain.org/data/Shih2020
http://virtualflybrain.org/data/FlyLight2019Wolff2018
http://virtualflybrain.org/data/Tsubouchi2017
http://virtualflybrain.org/data/FlyLight2019Namiki2018
http://virtualflybrain.org/data/FlyLight2019Hampel2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants