Skip to content

big-data-spaces/knowledge-agents-edc

 
 

Repository files navigation

Tractus-X Knowledge Agents EDC Extensions (KA-EDC)

GitHub contributors GitHub Org's stars GitHub GitHub all releases Quality Gate Status

KA-EDC is a product of the Catena-X Knowledge Agents Kit (about to move to: Tractus-X Knowledge Agents Kit) implementing the core "dataspace" modules of the CX-0084 standard (Federated Queries in Dataspaces).

About the Project

This repository hosts reference extensions to the Eclipse Dataspace Components (EDC). It provides container images and deployments for a ready-made KA-enabled Tractus-X EDC.

In particular, KA-EDC consists of

  • Common extensions in order to allow for secure and personalized application access to the EDC infrastructure.
  • Agent (Data) Plane extensions to ingest, validate, process and delegate federated procedure calls (so-called Skills) on top of data and functional assets. In particular, they implement the Semantic Web SPARQL protocol.

Included in this repository are ready-made Helm charts.

They can be installed from the Tractus-X Helm Repository (Stable Versions) or Tractus-X Helm Repository (Dev Versions).

Source Code Layout & Runtime Collaboration

Source Code

Above is a collaboration map of the main implementation classes found in this repository.

It starts with an application performing a SPARQL call against the Consumer's AgentController of the Agent Protocol Data Plane Extension. This call may be handled by a AuthenticationService. Using the configuration facilities of the JWT Auth Extension which sets up single JwtAuthenticationService or composed CompositeAuthenticationService the handler stack may analyses diverse authorisation features of the incoming request, such as checking a JWT-based bearer token for validity against multiple OpenId servers by CompositeJwsVerifier.

The AgentController delegates the call upon preprocessing (e.g. by resolving local Skill Asset references using the EdcSkillStore) to the actual SparqlQueryProcessor (an instance of an Apache Jena Sparql Query Processor). The SparqlQueryProcessor is backed by an RdfStore which hosts the Federated Data Catalogue (and that is regularly synchronized by the DataspaceSynchronizer).

Whenever external SERVICE references in a SPARQL query are to be executed, the SparqlQueryProcessor will ask the DataspaceServiceExecutor to execute the actual sub-operation. This operation could - depending on the actual query binding context - either point to multiple tenant-internal or public endpoints. The operation could also need to be batched in case that there are too many bindings to transfer in one go (see the maxBatchSize Parameter in the Agent Protocol Data Plane Extension). The operation could also hint to dataspace addresses (as indicated through URLs starting with the edc:// or edcs:// schemes). In this latter case, DataspaceServiceExecutor will ask the AgreementController for help.

AgreementController keeps book about already negotiated Dataspace Assets and corresponding EndpointDataReferences. If such an EDR does not yet exist, it will negotiate one using the EDC control plane with the help of the DataManagement facade. The resulting EDR will be asynchronously handed out to the AgreementController and finally returned to DataspaceServiceExecutor to perform the Dataspace Call (effectively tunneling the SPARQL protocol through EDC's HttpProxy transfer).

When the call arrives at the Provider's Data Plane, it will hit the AgentSource. Mirroring the Consumer's AgentController, AgentSource performs some preprocessing and validity checking before finally delegating to the Provider's SparqlQueryProcessor (from where the recursion may go further ...)

Getting Started

Build

To compile, package and containerize the binary artifacts (includes running the unit tests)

mvn package -Pwith-docker-image

To publish the binary artifacts (environment variables GITHUB_ACTOR and GITHUB_TOKEN must be set)

mvn -s settings.xml publish

To update the DEPENDENCIES declarations

./mvnw org.eclipse.dash:license-tool-plugin:license-check 

Deployment

Deployment can be done

See the user documentation for more detailed deployment information.

Setup using Helm/Kind

In order to run KA-EDC applications via helm on your local machine, please make sure the following preconditions are met.

  • Have a local Kubernetes runtime ready. We've tested this setup with KinD, but other runtimes such as Minikube may work as well, we just haven't tested them. All following instructions will assume KinD.

For the most bare-bones installation of the dataspace, execute the following commands in a shell:

kind create cluster -n ka --config kind.config.yaml
# the next step is specific to KinD and will be different for other Kubernetes runtimes!
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
# wait until the ingress controller is ready
kubectl wait --namespace ingress-nginx \
  --for=condition=ready pod \
  --selector=app.kubernetes.io/component=controller \
  --timeout=90s
# transfer images
kind load docker-image docker.io/tractusx/agentplane-hashicorp:1.14.24-SNAPSHOT --name ka
kind load docker-image docker.io/tractusx/agentplane-azure-vault:1.14.24-SNAPSHOT --name ka
# run chart testing
ct install --charts charts/agent-plane
ct install --charts charts/agent-plane-azure-vault   

Notice for Docker Images

About

EDC Extensions for CX-0084 (Federated Queries In Data Spaces)

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Java 96.2%
  • Smarty 2.9%
  • Other 0.9%