-
Notifications
You must be signed in to change notification settings - Fork 7
6: XML Extensions
The NAACCR XML standard allows block of non-standard XML (called extensions) in specific locations of the data files. It defines 3 different extensions:
- NaaccrData extension (one per file)
- Patient extension (one per patient)
- Tumor extension (one per tumor)
For more information about extensions, see the NAACCR XML implementation guide.
Since extensions are an advanced feature of the standard, it might not be obvious how to use them with this library.
This page is going to demonstrates a fully workable example of extensions. The entire code is available in a demo class in the demo package.
Let's assume that a given XML data file is sent to a third party organization that needs to match the provided patients and tumors. That organization then needs to send back the same data file with extensions providing the match results.
The first thing to do to support reading/writing extensions is to define the Java classes that represent the data. For this demo, we are going to use the following:
- An OverallSummary class representing a summary of the match results for all patients and tumors in the file.
- A PatientOverallSummary class representing a summary of the match results for all patients.
- A TumorOverallSummary class representing a summary of the match results for all tumors.
- A PatientSummary class representing a summary of the match results for a single patient.
- A TumorSummary class representing a summary of the match results for a single tumor.
The three first classes are used in the NaaccrData extension (appearing once in the data file); here is the XML their represent:
<ext:OverallSummary>
<ext:Patient>
<ext:NumberRejected>1</ext:NumberRejected>
<ext:NumberProcessed>25</ext:NumberProcessed>
<ext:NumberMatched>24</ext:NumberMatched>
</ext:Patient>
<ext:Tumor>
<ext:NumberRejected>1</ext:NumberRejected>
<ext:NumberProcessed>25</ext:NumberProcessed>
<ext:NumberMatched>24</ext:NumberMatched>
</ext:Tumor>
</ext:OverallSummary>
The next class is used in the Patient extension (once per patient); here is the XML it represents:
<ext:PatientSummary>
<ext:MatchingMethod>Patient Matching Algorithm #1</ext:MatchingMethod>
<ext:MatchingScore>0</ext:MatchingScore>
<ext:MatchingIdentifier>PAT-XXX</ext:MatchingIdentifier>
</ext:PatientSummary>
And the final class is the same type of information but for a given tumor; here is the XML it represents:
<ext:TumorSummary>
<ext:MatchingMethod>Tumor Matching Algorithm #1</ext:MatchingMethod>
<ext:MatchingScore>0</ext:MatchingScore>
<ext:MatchingIdentifier>TUM-YYY</ext:MatchingIdentifier>
</ext:TumorSummary>
The library implements extensions with an "extension" field available on the NaaccrData, Patient and Tumor classes. When reading extensions, the library populates those fields. When writing extensions, the library expects to find them populated on those main entities. Since the library makes no assumption on the type of the extension objects, those are defined as Object.
Here is an example of creating 25 patients, along with the extensions corresponding to the XML snippets provided in the previous step:
NaaccrData data = new NaaccrData(NaaccrFormat.NAACCR_FORMAT_16_ABSTRACT);
// *** start of root extension
OverallSummary summary = new OverallSummary();
PatientOverallSummary patientSummary = new PatientOverallSummary();
patientSummary.setNumberRejected(1);
patientSummary.setNumberProcessed(25);
patientSummary.setNumberMatched(24);
summary.setPatientSummary(patientSummary);
TumorOverallSummary tumorSummary = new TumorOverallSummary();
tumorSummary.setNumberRejected(1);
tumorSummary.setNumberProcessed(25);
tumorSummary.setNumberMatched(24);
summary.setTumorSummary(tumorSummary);
// *** end of root extension
data.setExtension(summary);
for (int i = 1; i <= 25; i++) {
Patient patient = new Patient();
patient.addItem(new Item("patientIdNumber", StringUtils.leftPad(String.valueOf(i), 8, '0')));
// *** start of patient extension
PatientSummary patSummary = new PatientSummary();
patSummary.setMatchingMethod("Patient Matching Algorithm #1");
patSummary.setMatchingScore(i - 1);
patSummary.setMatchingIdentifier("PAT-XXX");
patient.setExtension(patSummary);
// *** end of patient extension
data.addPatient(patient);
Tumor tumor = new Tumor();
tumor.addItem(new Item("sequenceNumberCentral", "01"));
// *** start of tumor extension
TumorSummary tumSummary = new TumorSummary();
tumSummary.setMatchingMethod("Tumor Matching Algorithm #1");
tumSummary.setMatchingScore(i - 1);
tumSummary.setMatchingIdentifier("TUM-YYY");
tumor.setExtension(tumSummary);
// *** end of patient extension
patient.addTumor(tumor);
}
This needs to be done via the Configuration object that can be provided to the NaaccrXmlReader and NaaccrXmlWriter classes. The configuration contains a set of "register" method that can be used to register namespaces, tags and attributes for the extensions. The first method that needs to be called is the "registerNamespace" one.
Here is the full definition corresponding to the data model:
NaaccrStreamConfiguration configuration = new NaaccrStreamConfiguration();
configuration.registerNamespace("ext", "http://demo.org");
configuration.registerTag("ext", "OverallSummary", OverallSummary.class);
configuration.registerTag("ext", "Patient", OverallSummary.class, "_patientSummary", PatientOverallSummary.class);
configuration.registerTag("ext", "NumberRejected", PatientOverallSummary.class, "_numberRejected", Integer.class);
configuration.registerTag("ext", "NumberProcessed", PatientOverallSummary.class, "_numberProcessed", Integer.class);
configuration.registerTag("ext", "NumberMatched", PatientOverallSummary.class, "_numberMatched", Integer.class);
configuration.registerTag("ext", "Tumor", OverallSummary.class, "_tumorSummary", TumorOverallSummary.class);
configuration.registerTag("ext", "NumberRejected", TumorOverallSummary.class, "_numberRejected", Integer.class);
configuration.registerTag("ext", "NumberProcessed", TumorOverallSummary.class, "_numberProcessed", Integer.class);
configuration.registerTag("ext", "NumberMatched", TumorOverallSummary.class, "_numberMatched", Integer.class);
configuration.registerTag("ext", "PatientSummary", PatientSummary.class);
configuration.registerTag("ext", "MatchingMethod", PatientSummary.class, "_matchingMethod", String.class);
configuration.registerTag("ext", "MatchingScore", PatientSummary.class, "_matchingScore", Integer.class);
configuration.registerTag("ext", "MatchingIdentifier", PatientSummary.class, "_matchingIdentifier", String.class);
configuration.registerTag("ext", "TumorSummary", TumorSummary.class);
configuration.registerTag("ext", "MatchingMethod", TumorSummary.class, "_matchingMethod", String.class);
configuration.registerTag("ext", "MatchingScore", TumorSummary.class, "_matchingScore", Integer.class);
configuration.registerTag("ext", "MatchingIdentifier", TumorSummary.class, "_matchingIdentifier", String.class);
We are now setup to be able to write the data we created:
File file = TestingUtils.createFile("test-root-extension.xml", false);
try (PatientXmlWriter writer = new PatientXmlWriter(new FileWriter(file), data, null, null, configuration)) {
for (Patient p : data.getPatients())
writer.writePatient(p);
}
And finally, we are now able to read the data file we just created and output some basic information from the extensions:
try (PatientXmlReader reader = new PatientXmlReader(new FileReader(file), null, null, configuration)) {
OverallSummary os = (OverallSummary)reader.getRootData().getExtension();
System.out.println("total number of processed patients: " + os.getPatientSummary().getNumberProcessed());
System.out.println("total number of processed tumors:" + os.getTumorSummary().getNumberProcessed());
Patient pat = reader.readPatient();
PatientSummary patSum = (PatientSummary)reader.readPatient().getExtension();
System.out.println(" > first patient matching method: " + patSum.getMatchingMethod());
TumorSummary tumSum = (TumorSummary)pat.getTumors().get(0).getExtension();
System.out.println(" > first tumor matching method: " + tumSum.getMatchingMethod());
}
Here is the output of that code:
total number of processed patients: 25
total number of processed tumors:25
> first patient matching method: Patient Matching Algorithm #1
> first tumor matching method: Tumor Matching Algorithm #1
For reference, here is the XML file created by the writing step (for readability, only one patient has been retained instead of the 25 defined in the code):
<?xml version="1.0"?>
<NaaccrData
baseDictionaryUri="http://naaccr.org/naaccrxml/naaccr-dictionary-160.xml"
recordType="A"
timeGenerated="2017-04-05T10:07:57.444-04:00"
specificationVersion="1.1"
xmlns="http://naaccr.org/naaccrxml"
xmlns:ext="http://demo.org">
<ext:OverallSummary>
<ext:Patient>
<ext:NumberRejected>1</ext:NumberRejected>
<ext:NumberProcessed>25</ext:NumberProcessed>
<ext:NumberMatched>24</ext:NumberMatched>
</ext:Patient>
<ext:Tumor>
<ext:NumberRejected>1</ext:NumberRejected>
<ext:NumberProcessed>25</ext:NumberProcessed>
<ext:NumberMatched>24</ext:NumberMatched>
</ext:Tumor>
</ext:OverallSummary>
<Patient>
<Item naaccrId="patientIdNumber">00000001</Item>
<ext:PatientSummary>
<ext:MatchingMethod>Patient Matching Algorithm #1</ext:MatchingMethod>
<ext:MatchingScore>0</ext:MatchingScore>
<ext:MatchingIdentifier>PAT-XXX</ext:MatchingIdentifier>
</ext:PatientSummary>
<Tumor>
<Item naaccrId="sequenceNumberCentral">01</Item>
<ext:TumorSummary>
<ext:MatchingMethod>Tumor Matching Algorithm #1</ext:MatchingMethod>
<ext:MatchingScore>0</ext:MatchingScore>
<ext:MatchingIdentifier>TUM-YYY</ext:MatchingIdentifier>
</ext:TumorSummary>
</Tumor>
</Patient>
</NaaccrData>