Concept Constraint Definition Generator (CCD-Gen)

The CCD-Gen is not part of the specifications. However, since it is a primary tool for authoring CCDs we include this chapter.

Registration & Access

The CCD-Gen does require free registration to gain access to be able to view and download current CCDs. Note that CCDs are also available on GitHub.

Access to authoring your own PCMs & CCDs requires posting a request in the MLHIM Google+ Community. See the Support page for links.

Creating and Managing CCDs

The CCD-Gen is an online tool used to create CCDs via a web driven, declarative environment. The CCD-Gen allows the building of complete CCDs by selecting the desired Pluggable complexType (PCM) definitions from existing PCMs or ones that you design yourself. You assemble the pieces into the concept definition you need. By re-using PCMs you improve the data exchange and analysis capability.

Many of the PCMs have been designed based on existing data models. Resources such as the Common Data Element definitions from the US National Cancer Institute, the United States Health Information Knowledgebase (USHIK), from the Agency for Healthcare Research and Quality (AHRQ) and many others. Some HL7 FHIR® models have been translated to PCMs. More are planned and you can contribute to the open content effort. The Siscolo application has been modeled and example of translating a controlled vocabulary using ICD-10 in an XML-DB is on GitHub.

The CCD is an XML Schema that is compliant with the W3C XML Schema 1.1 standards. It is valid against one MLHIM Reference Model release. However, many of the PCMs maybe used in CCDs across various reference model releases. A CCD provides an approach to improve the process of allowing semantic interoperability between various applications. This process is enabled due the ability to exchange the syntax and semantics for data models designed by domain experts.

The CCD-Gen has a library of some of the CCDs created using this tool. Because of the governance approach to CCDs there is no need for a centralized repository. It is helpful though to have some place for potential users to review published examples. The CCD-Gen Library page displays the CCDs in a table and an included XSLT renders the details directly from the CCD (click on the Title link) in a familiar data dictionary format. This is an example of how applications can extract this information from the CCD for user help pages and tooltips and even decision support.

CCDs created by the CCD-Gen are also commited to a GitHub repository Each reference model version has a separate branch in the repository. See the README for instructions on searching the CCD Library on GitHub.

XML Data Instances

The CCD-Gen creates one sample XML instance along with every CCD. These samples are valid instances of the CCD schema. The one exception is when an xs:assert is used in a PCM definition. The instance generator cannot always create guaranteed valid data content for any arbitrary assert statement. If your sample is not valid, this is likely the case and it can be edited using any XML editor. For regular expressions (regexes) used to describe a string pattern in a dvstring-value element, the CCD-Gen can handle most patterns. Strings like postal codes, phone numbers, etc. should be randomly created okay. Other PCMs have no facility for xs:assert matching.

JSON Data Instances

The CCD-Gen executes an XML to JSON conversion to create the example JSON data instance. The namespace entries are treated as data so that they are maintained for conversion back to XML.

HTML Forms

There are hundreds of possibilities for uses of data when defining a model. Any given application component may write to multiple CCD instances at any one time. There is no a priori solution for this.

The CCD-Gen takes the most generic approach and creates a HTML form that depicts the nested nature of data defined by this CCD. This automatically generated form is more appropriately useful as a communications tool between the knowledge modeler (domain expert) and the software developer. It allows the developer to visually see the structure of the data. This form also includes helpful information for the developer to relate the visual content to the CCD and data instance.

There is a tooltip on every structure name (Cluster, DvString, etc.) that provides the UUID for that structure. A HTTP link is also provided here that connects the developer to the specific documentation on the MLHIM website concerning the parent complexType of the Reference Model.

The textual semantics for each component is also presented. A tooltip on these names/titles provided by the modeler displays the documentation as well as all links/semantic references that the modeler included in the PCM. This provides the developer with the same context that the modeler used when defining the PCM.

R Code

In conjunction with each CCD, the CCD-Gen creates source code for use with the R statistical programming environment [1]. Specifically the code is created in a project suitable for use in R Studio[#f2]_, a package development and management tool for R.

The R code is package complete and compliant for distribution via the The Comprehensive R Archive Network[#f3]_ . You can use R Studio or the R tools for package creation to generate the desired package format, either source or compiled binary for a specific platform (Linux, Mac, Windows, etc.). The code written by the CCD-Gen also contains package documentation (R help files) that can be created using the tools mentioned above.

The R code is created on a PCM level and will return a dataframe from all matching instances of a PCM. For example you may search across a complete repository of XML instances and receive a dataframe containing all occurrences of a specific PCM in all CCDs. Users should note however that the generated R code will only return the first instance in a given file.

For more practical use, the code modules can be edited and combined into an application specific library and used to build any desired data structure based on the content of your repository. For example you may want to provide an additional loop or apply function to gather cases where multiple instances appear within a given DvAdapter node. Customizing these basic routines allows the analyst to apply any of the thousands of algorithms available as R packages to their MLHIM based data repository.

Users and developers should note that the metadata.R file is not included in the NAMESPACES that are exported. However, it is required for every other R file since there are functions for XML namespace resolution located there. This file must be specifically ‘sourced’. You may optionally include it in your NAMESPACE file.