Striped XML to Turtle

Are your XML documents striped? If so, they may be convertable to triples/graph form.
Note that the conversion follows my own made up set of rules since there is no official means of converting "just any XML document to triples" (that I'm aware of).

Developed on Windows 10, Python 3.8.3.


  1. python3 -m venv env
  2. . env/bin/activate (or env\Scripts\activate.bat if Windows)
  3. cd to_where_you_cloned_the_repository
  4. pip install -r requirements.txt (pip install -r requirements-dev.txt if you want to dev but all that really adds is flake8)
  5. python --help


  • xmlFile
    • Path to your input XML file
    • The only required parameter, everything else is optional
  • serializePath
    • Path to where your serialized data should be saved
    • If not provided, the graph will be printed
    • If provided (and assuming the conversion is successful, it's going to print the result of serialization which is something like [a rdfg:Graph;rdflib:storage [a rdflib:Store;rdfs:label 'Memory']].)
  • outputFormat
    • The output format of the graph. Uses the built in rdflib formats where relevant.
    • "turtle" (default), "xml", "n3", "nt", "pretty-xml", "trig", "json-ld", or "hext"
  • collectAttributes
    • If this is set, attributes in the document will be picked up for conversion. Since the attributes do not have a namespace, they will take on the namespace of the tag they were found in, otherwise defaults to the default namespace.
  • noIgnoreWhitespace
    • Do not ignore whitespace. Prettified XML documents have whitespace, for example. Not sure why you would want to set this.
  • defaultNamespace

Sample Data

python input.xml --collectAttributes --serializePath output.ttl

XML Input

<gmd:MD_Metadata xmlns:gss="" xmlns:gts="" xmlns:gml="" xmlns:xlink="" xmlns:gco="" xmlns:gmd="" xmlns:functx="" xmlns:gmi="" xmlns:gmx="" xmlns:gsr="" xmlns:srv="" xmlns="" xmlns:xsi="" xsi:schemaLocation="">
		<gco:CharacterString>eng; CAN</gco:CharacterString>
		<gmd:MD_CharacterSetCode codeList="" codeListValue="utf8" codeSpace="ISOTC211/19115">utf8</gmd:MD_CharacterSetCode>
				<gco:CharacterString>John Doe</gco:CharacterString>
				<gco:CharacterString>ACME Corporation</gco:CharacterString>
								<gco:CharacterString>(000) 123-456-7890</gco:CharacterString>
								<gco:CharacterString>42 Wallaby Way</gco:CharacterString>
								<gco:CharacterString>New South Wales</gco:CharacterString>
				<gmd:CI_RoleCode codeList="" codeListValue="pointOfContact" codeSpace="ISOTC211/19115">pointOfContact</gmd:CI_RoleCode>
				<gco:CharacterString>eng; CAN</gco:CharacterString>
				<gmd:EX_Extent id="boundingExtent">
						<gmd:EX_GeographicBoundingBox id="boundingGeographicBoundingBox">
								<gml:TimePeriod gml:id="boundingTemporalExtent">

Turtle Output

@prefix gmd: <> .
@prefix gml: <> .

[] a gmd:MD_Metadata ;
    gmd:characterSet "utf8" ;
    gmd:codeList "" ;
    gmd:codeListValue "utf8" ;
    gmd:codeSpace "ISOTC211/19115" ;
    gmd:contact [ a gmd:CI_ResponsibleParty ;
            gmd:codeList "" ;
            gmd:codeListValue "pointOfContact" ;
            gmd:codeSpace "ISOTC211/19115" ;
            gmd:contactInfo [ a gmd:CI_Contact ;
                    gmd:address [ a gmd:CI_Address ;
                            gmd:administrativeArea "New South Wales" ;
                            gmd:city "Sydney" ;
                            gmd:country "Sealand" ;
                            gmd:deliveryPoint "42 Wallaby Way" ;
                            gmd:electronicMailAddress "" ;
                            gmd:postalCode "123000" ] ;
                    gmd:phone [ a gmd:CI_Telephone ;
                            gmd:voice "(000) 123-456-7890" ] ] ;
            gmd:individualName "John Doe" ;
            gmd:organisationName "ACME Corporation" ;
            gmd:role "pointOfContact" ] ;
    gmd:dateStamp "1970-01-01" ;
    gmd:fileIdentifier "myXMLDocument.xml" ;
    gmd:identificationInfo [ a gmd:MD_DataIdentification ;
            gmd:extent [ a gmd:EX_Extent ;
                    gmd:geographicElement [ a gmd:EX_GeographicBoundingBox ;
                            gmd:eastBoundLongitude "-90" ;
                            gmd:northBoundLatitude "90" ;
                            gmd:southBoundLatitude "180" ;
                            gmd:westBoundLongitude "-180" ] ;
                    gmd:id "boundingGeographicBoundingBox" ;
                    gmd:temporalElement [ a gmd:EX_TemporalExtent ;
                            gmd:extent [ a gml:TimePeriod ;
                                    gml:beginPosition "1970-01-01" ;
                                    gml:endPosition "2020-12-31" ] ] ] ;
            gmd:id "boundingExtent" ;
            gmd:language "eng; CAN" ;
            gmd:topicCategory "life" ] ;
    gmd:language "eng; CAN" .


