CDA in the Wild: Validation – XML Schema (Installment #3)

 

During our last episode, we stumbled upon a wild CDA lying dead in the grass. Not wanting to pass up such an opportunity, we decided to dissect the beast to uncover the cause of death. The immediate cause turned out to be a malformed end tag that resulted in catastrophic system failure.

 

Further analysis found something far more sinister, however, and made it clear that this specimen had been suffering for quite some time. If you are at all squeamish please look away as I peel back the header and probe into the body of this creature.

 

See that? There, in the exposed narrative block? Internally, this is not a wild CDA at all. In fact, it seems to be an unnatural hybrid of a wild CDA and the North American XHTML, likely the result of some illegal genetic tampering. Undoubtedly it suffered chronic autoimmune responses throughout its short, painful life. The CDA’s native validation systems were literally rejecting the beasts own internal structure. The horror it must have endured is unimaginable…

 

First, I’d like to address several comments in the Twittersphere related to the examples from the previous installment of this series, which focused on basic XML issues. Specifically, commenters deemed these examples silly and trivial. I agree wholeheartedly. Sadly, my examples are also very, very real.

 

Every year, certified EHRs submit thousands of CDA documents that fail basic XML well-formedness rules to the US government. This either indicates a widespread lack of understanding, or a wanton carelessness on the part of implementers. Malformed CDA XML documents are simply inexcusable, and should never come from a certified EHR.

 

My second biggest frustration is files that fail to pass CDA XML Schema validation, and that’s the focus of this installment.

 

The CDA specification comes with a set of XML Schema files (http://www.w3.org/XML/Schema) for validating the basic syntax and structure of a CDA document. CDA implementation guides, like C-CDA and QRDA, usually contain updated versions of this schema that have been modified to allow for CDA extensions, which have been approved by the HL7 Structured Documents Working Group (SDWG). The latest copy of the schema files can always be found on the HL7 gForge site (gforge.hl7.org). I recommend getting a gForge account and using a Subversion (SVN) client to download a full copy of all schema files from the repository at http://gforge.hl7.org/svn/strucdoc/trunk/CDA_SDTC.

 

*I’m not going to cover using SVN clients in this blog, but feel free to google for more info.

 

Even though the CDA XML Schema is widely available, it’s often not used at runtime to ensure generated CDA documents are, in fact, valid. Instead, the schema is either ignored completely by implementers who follow the structure of sample documents, or only used during initial development and certification testing, at which point, it is discarded, resulting in production errors. Generating valid test documents during certification does not mean that ALL CDA documents generated with live data, in production, must also be valid.

 

Here are a few common errors I find in production CDA documents, which could have been easily caught by validating against the XML Schema:

 

Elements missing namespaces or in the wrong namespace:

                  <ClinicalDocument> … </ClinicalDocument>

Elements with parsable character data (aka PCDATA) where it is not allowed:

                  <code>1111-1</code>

Elements that are out of order (yes, order matters in CDA XML):

                  <templateId root=”2.16.840.1.113883.10.20.22.1.1″/>

                  <typeId root=”2.16.840.1.113883.1.3″ extension=”POCD_HD000040″/>

Misnamed elements and attributes:

                  <author>

                                    <time date=”2000040714″/>

                                    <assignedAuthor>

                                                      <id code=”KP00017″

codeSystem=”2.16.840.1.113883.19.5″/>

                                                      <person>

                                                                        …

                                                      </person>

                                    </assignedAuthor>

                  </author>

Attributes that are empty or contain invalid content per the CDA XML schema:

<id root=”TODO: replace with OID” extension=””/>

 

I won’t explain how to fix what’s wrong with all the examples above. Rather, I’ll point out that such issues are easily identified if you validate your CDA documents.

A simple Java example of how to validate an XML file against an XML Schema file:

                  public boolean validate(File xmlFile, File xsdFile) throws SAXException {

                                    boolean valid = false;

                                    SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

                                    Schema xsd = sf.newSchema(xsdFile);

                                    Validator v = xsd.newValidator();

                                    try {

                                                      v.validate(new StreamSource(xmlFile));

                                                      valid = true;

                                    } catch (Exception e) {

                                                      System.out.println(e.getMessage());

                                    }

                                    return valid;

                  }

 

This would catch the errors outlined above in a heartbeat. It’s not much code, and it adds very little processing overhead to applications. It is not always done at runtime in production systems, however, based on what we see coming over the wire. If it is done, then those systems are ignoring errors and exporting problems downstream.

 

Customers expect valid CDAs from certified EHRs. Imagine their disappointment when they discover that the CDAs are often invalid, and could not even be called CDAs in the first place. In my opinion, certified EHRs should validate every exported CDA at runtime, report errors, and halt the export if the result is not valid. In fact, I think this should be a certification requirement.

 

Hey, it’s about interoperability – can we agree that “schema valid” is a given?

 

Read Installment #4: Rick takes on Schematron Validation

To see the full series click #CDAinthewild

#CDAinthewild #XML #CDA