Development of an ontology architecture to ensure semantic interoperability between communication standards in healthcare Appendix with additional explanations

Home -> Frank -> My Thesis

2. Detailed Insights

2.1. HL7 Version 2.x - Detailed Insight

The first version of HL7 (version 1.0) was established in 1987 by agreement of several companies, to perform information exchange in a "standardized", ASTM-derived format. This approach quite quickly lead to the second Version, which was adopted 1990th. The latter was and is the basis for the family "HL7 version 2.x", which is being developed in parallel to version 3. Currently (August 2010) version 2.7 is adopted and v2.8 is under preparation with new proposals. Further versions can be expected during the next few years.

Basis of this pragmatic approach using the "delimitered" exchange format for messages is to save space. At the beginning of the development work, the length of a message for the transfer was still quite debatable.

The default encoding (ER7) with the use of six hierarchically oriented separators allows the construction of structured messages. Basically, a two-dimensional approach for sharing the information content is used:

Figure 1: Two-dimensional parsing HL7 v2.x messages

The most significant delimiter (CR = carriage return) is separating the individual segments and thus provides the segment-oriented message structure to be able to express coherent message units. Herewith one can decide on "top level" whether an information unit (segment or segment group) can be processed or not. But the message structure must be known therefore. Conversely, within the appropriate hierarchy structure it can be identified this way, so that it can be decided directly whether processing of the segment or segment group is possible or not.

Within a segment which is identified by an acronym of three letters (for example, PID = Patient IDentification), the fields carry the actual information. Once a segment is to be processed, the decoding takes place in the second dimension. Here, the individual information is extracted about 5 other delimiters.

The following is an example of an admission message:

       Vogelweg 8&Vogelweg&8^^Hamburg-Harburg^^10260^DEU^H~
       ^WPN^PH^^49^40^5557865||M|CAT||||||Maria-Hilf Krh.|||DEU|Pyrotechniker|DEU
DG1|3||S42.41 R^...^I10-20|||BD|||||||||2.1
IN1|1||0463752^^NII~32453^^NIIP|Techniker Krankenkasse||||||||
       Vogelweg 8^^Hamburg-Harburg^^10260^DEU^H|||||

Figure 2: Example HL7 v2.x message

Further structuring of the message content within the message structure is defined by the data types:

2.1.1. Data Types

The data types define the division of the fields into their components. The first version out of the v2.x family had only one simple and generic compound data type ("CM"). In the course of the development the model for the data types have been further refined. In version 2.6 there are already over 90 different data types.

Unlike programming languages, data types are defined in HL7 over its content. Examples:

  • Addresses
  • Person names
  • Time range
  • etc.

Due to the number of delimiters data types can not be structured as desired, but are limited to two levels. (The length of the field is the sum of the components plus the delimiters). Following the example of people's names is given, the two data types "FN" and "DR" are dissolved as a data type. For further details, please refer to [HL7]:

Table 1: HL7 data type specification (nesting resolved)

1194FNC Family Name
150STR Surname
220STO Own Surname Prefix
350STO Own Surname
420STO Surname Prefix From Partner/Spouse
550STO Surname From Partner/Spouse
230STO Given Name
330STO Second and Further Given Names or Initials Thereof
420STO Suffix (e.g., JR or III)
520STO Prefix (e.g., DR)
66ISB0360Degree (e.g., MD)
71IDO0200Name Type Code
81IDO0465Name Representation Code
9705CWEO0448Name Context
1049DRB Name Validity Range
124DTMO Range Start Date/Time
224DTMO Range End Date/Time
111IDO0444Name Assembly Order
1224DTMO Effective Date
1324DTMO Expiration Date
14199STO Professional Suffix

About the so-called component model, which was officially introduced with version 2.5, the textual assignments which could previously only described verbally (especially the tables, the lengths and optionalities) could be made explicit.

The HL7 database included since the beginning of its existence a component model and introduced data types to describe any structure or component uniquely. To ensure compatibility with the original standard a mapping to the imprecise specifications was introduced.

2.1.2. Events

Events in the real world - for example, the admission of a patient or the creation of a new order - are the trigger to submit the relevant information to a communication partner. Depending on the scenario other data is necessary which is represented in the form of spcialised compositions of different segment (i.e. segment groups).

Originally (HL7 v2.1) one assumed to do it by the type of the message, i.e. the type of message should provide sufficient information to determine the content of the message. Details of the events were then deployed in a particular field in a subsequent segment. In ADT this was EVN-1, and ORC-1 for ORM messages. In the course of the evolution, the messages got more comprehensive and could not be built identically for all events of the same message type. For this reason, the event field was shifted into the message header. Unfortunately, the various working groups (TCs/SIGs) have implemented the situation differently. The ADT (patient administration) domain has developed many similar messages (actually three basic structures with many events), OO (Orders & Observations) has remained with ORM in a very small set of message structures, but has developed many "Subevents" in the form of "Order Control Code" (ORC-1).

The introduction of an identifier (ID) for the message structure ("Message Structure Identifier") should resolve this shortcoming. Initially, all messages were therefore compared with the help of the database to determine such an ID. Since not all messages could be simply and clearly parsed by a parser due to their internal structure (several different messages with the same event), this ID should also be consulted to support the parsing processes.

Since the standard itself is not developed based on a database [SchoOem2001], it is unfortunately often the case that not all messages with the same structure (and ID) are changed similarly, so that different structures with the same ID are presented in the ballot process.

In addition, a cleanup of the documentation can not be performed because of objections by members, because they insist on a backward compatibility. This leads to duplication in the definition, which in turn complicate a consistent definition, if not prevent it. As a remedy, there remains only the systematic collection and review of appeals during the ballot phase (see "The HL7 database" on the website).

2.1.3. dynamic behaviour

In addition to the previously described definition of a static message structure, there is still a dynamic behavior. The message header provides some relevant information thereof:

Figure 3: HL7 v2.x messages - dynamic behavior

The condition of the acknowledgment requirement for transport and processing determines, whether a receipt should be sent and under what conditions. (If nothing is specified, the default is the default behavior is to send an application acknowledgement.) In return, an application may already expect 0, 1 or 2 response to a each message sent. In case of "broadcast" messages such as ADT, this must be multiplied by the number of recipients. This raises the problem to be solved individually, how to deal with different responses from the individual applications, for example, what to do if 2 of 8 applications refuse to process a transfer of a patient?

The currently implemented default behavior is the transmission of a transport acknowledgment, which is however not really evaluated. It only serves to fulfill the requirement to send an ack. This behaviour is again due to the wrong synchronous processing logic which is often implemented incorrectly. HL7 does not make any statements for such a behavior.

Strictly speaking, several messages can be sent before a single response may occur - if a response is requested at all.

A response message references the original message by the message ID. This ID can be of any kind, but it must be a unique string. To correctly assign/identiy errors and to correct them, an application must run a log book for all messages. Most do not, so that a consistent error monitoring in order to deliver a pro-active prevention of errors is not possible.

Another problem associated with the dynamic behavior is to forward the messages to the recipient or recipients. The HL7 standard itself does not state how the messages are to be distributed, that is, whether the application itself is responsible or not. A communication server can be entrusted with the distribution. In return, he guarantees the transfer and directly provides a transport ack. Hence, the question arises how different application acks (see above) should be handled, if e.g. one of four target systems do not process the message?

Ultimately, only manual intervention will solve this problem given the administrator gets an information about it.

2.1.4. Delete Requests

The 2.x version has only three "types of information" (see [IHE Vol.0]):

  • existing information
  • non-existing information
  • to delete information

The first two are relatively easy to understand: values for a field are usually sent when they are available then. Otherwise, this field stays empty/blank.

In the event that information is deleted in an application, it is envisaged this to be indicated by a special sequence. Therefore the double quotes ("") are provided.

The implementation of this behavior requires that an application recognizes and remembers that information was deleted [IHE Vol.0]. In most applications, the information will not be historicized, so that the interface can not recognize that information is no longer available. Therefore simply "no information" is transferred, leaving the target system with the old information until a new value is transmitted.

Difficulties arise when a receiver processes information from different channels, because then the sequence plays an important role. Without forwarding the information to all involved a consistency can not be guaranteed.

2.1.5. Null-Flavors

In contrast to deletion requests HL7 version 2.x does not currently distinguish why a piece of information is not available. Since the first version an approach was made based on the table values. Thus this problem was addressed only in certain cases (tables) and in an inconsitent way. Considerations for a generic approach will be discussed with proposal 608 [HL7 V2 Prop.DB] for v2.8. But, the backward compatibility with previous versions raises issues to be analyzed in more detail.

2.1.6. Transmission Protocols

As mentioned already ASTM [ASTM] was the basis for the work. On the one hand, improvements were introduced, such as segment names consisting of just three letters. On the other they "forgot" to define a message end ("Message trailer") segment, which leads to problems with certain communication protocols: A batch protocol to combine multiple messages into a transmission unit exists, but for pragmatic reasons (additional development) this is not supported by any company. For a file based transmission the messages are simply written subsequently into this file. When transferring a file, this requires additional safeguards (semaphore files, locking, or renaming) to ensure completeness of the messages.

The easiest and most reliable mechanism here is renaming, because this process is supported at the file level by all operating systems as an atomic action. For unexplained reasons, however, semaphore files are preferred to solve the problem, although one has to handle two files.

In addition, with VPN and D2D [D2D] which are only in use in Germany other alternatives exists.

2.1.7. Profiles

The standard itself is the union of all the requirements of all manufacturers/vendors involved in the development of standards. Because such requirements can not be declared as mandatory, many elements are optional, i.e. they need not be provided. This approach allows a broad acceptance of standards, but this in turn also leads to the fact that different manufacturers provide different data so that data exchange is limited despite adherence to the same standard.

To resolve this dilemma, message profiles are introduced. There are three different levels:

  • standard
  • constrainable (=to be limited)
  • implementable (=implemented)

The top level represents the standard itself, i.e. it has the biggest amount of optionalities. For a "constrainable profile" optionalities are somewhat limited already. However, some choices still exist which are eliminated with "implementable profiles". For each information item a statement must be provided whether it is supported or not.

The standard clearly defines, what can be done with the various optionalities to come to an "implementable profile":

Table 2: "HL7 Optionality and Conformance Usage"

HL7 OptionalityAllowed Conformance UsageComment
R - RequiredR 
RE - Required, but may be EmptyR 
O - OptionalR, RE, O, C, CE, Xcan be constrained to any other, but O is only permitted for constrainable profiles
C - ConditionalC, R 
CE - Conditional, but may be emptyC, CE, R 
X - Not SupportedX 
B - Backward CompatibilityR, RE, O, C, CE, Xcan be constrained to any other, but O is only permitted for constrainable definitions
X is the prefered one
W - WithdrawnR, RE, O, C, CE, XX is the prefered one

In an implementable message profile, ultimately only two possibilities are allowed: Either a specific element is supported (= "R / RE") or not (= "X"). This results in the subsequent conversion chart. The only special feature presented here represent the elements that play a role in those conditions. But again, for an implementable profile an exact statement requested [OemBlo2007b]:

Figure 4: Hierarchy for restricting profiles

As of version 2.7 "RE" will also be allowed / used in the standard.

2.2. HL7 Version 3 - Detailed Insight

An entirely different approach is established for "new" HL7 Version 3: in 1995 the first attempt was made to create a globally valid model for healthcare. After three years of work one had to admit that such a model if at all is hard to define because the requirements of the different domains and different countries do not permit a harmonization.

Instead, they came to a metamodel, called (= RIM Reference Information Model) [HL7 RIM]. One can regard it as a toolbox, the elements of which can be used for the construction of domain models. This RIM itself only consist of four base classes and two more classes for relationships. (A good explanation of how to read it can be found next to the Guide V2 also in [Hinch]):

Figure 5: HL7 V3 RIM base classes

The entire reference model (herein after RIM 0208) includes the base classes and their specializations, in a print-optimized version:

Figure 6: HL7 V3 RIM (Reference Information Model)

In addition there are special classes that are necessary for the generation of messages, management of queries, languages and overall control. They are not relevant for further consideration in this work:

Figure 7: HL7 V3 classes for the control of the message exchange

These six basic classes (Figure 5) consist of four base classes and two relationship classes:

An entity is the representation of physical, persistent objects. Beside persons, materials and equipment also organizations and places belongs here. An entity expresses a static thing.

A role (role) expresses the ability that an entity has. For example, a person may play the role of patient.

A participation describes the incorporation of an entity into a certain role in one act. For example, a person in the role of physician participates in a surgery activity as the executive surgeon.

An act the carrier for all information and expresses changes. For example, a finding is an observation and hence an activity. In addtion to these four base classes two classes are provided in order to realize relationships:

A role link is a direct relationship in various roles such as employer / employee. However, this is used very rarely.

An act relationship combines various activities. For example, the activity of observing is the fulfillment of the activity order.

These six base classes can be put together in an instantiation in myriad ways. For example the following construct represents that "person A" participates as a physician in an examination of the patient "person B".

Figure 8: HL7 V3 Example 1 (simple activity)

The two link classes below are the direct linkage of roles and activities. For example, here is Mr. Meier employed by the company Kunze.

Figure 9: HL7 V3 Example 2 ("Role Relationship")

Similarly, two different activities - for example, the request and the corresponding result - are linked by an act relationship.

Figure 10: HL7 V3 Example 3 ("Act Relationship")

Another speciality are the different relationships of roles related to entities. The following graph represents this as indicated by the solid and the dashed line. It is assumed that a role requires a certain context. Thus, a person becomes a patient only in the context of an organization (in this case a hospital).

Figure 11: HL7 V3 Example 4 ("Playing and scoping Entity")

The different options for creating relationships as shown in the four examples above can be (arbitrarily) combined into more complex structures. Out of these classes (components of the toolbox) so-called domain models are constructed. For this purpose, a module for Microsoft Visio (TM) is added which considers the RIM - the necessary meta-information is stored in an Access database - and can validate the used classes directly:

Figure 12: HL7 V3 sample domain model

A domain model is an (abstract) structure of classes that are required for modeling an application domain.

This domain model is taken and parts of it are constrained for their use. This leads to a so-called Refined Message Information Model (R-MIM), which serves to implement a particular scenario (use case).

Another limitation with regard to a particular message in this scenario, including the associated sequencing for transmission of this information in the form of a message, leads to a Hierarchical Message Definition (HMD). Such a definition can be both displayed in tabular form or an XML schema. Both are the basis for an implementation. The ITS - Implementable Technology Specification - then allows for and ensures the implementation of abstract models in a specific technology such as XML.

Apart from these abstract models and their details for an implementation two other areas are important.

Figure 13: HL7 V3 Implementation Issues

The foundations for this are explained in more detail in subsequent sections.

2.2.1. State Transitions

One reason to send a message is the change of an internal state of an entity or an activity. For this purpose, the generic state transitions that will be implemented in the form of state machines have been defined. In the various scenarios subsets of these generic definition of events are specified and communicated by messages.

Figure 14: State Machine for HL7 V3 Acts

The state transitions for activities (see above) are different from those of entities (see below).

Figure 15: State Machine for HL7 V3 entities

2.2.2. Mood-Code

A totally different dimension is realized by the so-called mood code. (The technical term is best translated by "mode".) This refers to the use of abstract classes (entities and activities) in different functions. As such, an observation in the "Request" Mood is something different than an observation in the "Event" Mood. The former is the order, the latter the result.

Ontologically speaking, this should be represented in different subtrees of the hierarchy. For the automatic generation of the structures this cannot be realized, however. Conversely, these are generally to information objects whose semantics are accessible only through the combination of different attributes. For a mapping this must be taken into account by conditions.

2.2.3. Application Roles

As already explained in the introduction, the messages are exchanged as the interaction between different actors. The trigger for the exchange of messages is either an event or a state change. Sometimes it is not done with the exchange of a single message: several messages together form a specific scenario which can only adequately be represented in a system by considering all associated messages.

At this point this is an enhancement over version 2.x, which does have such a construct and therefore specify all messages without any relationship to each other.

However, an "application role" is a non-normative construct, i.e. in this case that there is no uniform understanding of the responsibilities in this regard. It is not clear on what basis and what conditions application roles are specified and what should be considered when implementing the static and dynamic behavior.

2.2.4. Data Types

An essential prerequisite for an exchange of messages is data. These are often presented as a single unit of several individual values that have a common connection - such as an address, which consists of road, zip code and city. Such a conglomeration of individual information is referred to as a data type in programming languages. Quite often they have operations that can be performed on these data types. The data type "integer" for example has the operations "predecessor" and "successor".

The semantic properties of these data types is expressed by "invariant statements". These are statements that are valid at any time and for all possible values. (In other areas of information science (computability and logic), such statements are regarded as fix points.)

2.2.5. ITS

This data type specification is defined on an abstract level, independent of a specific implementation. The same applies to domain models. For usage in a practical scenario a specific technology must be selected and mapping rules defined.

For HL7 V3 currently only XML and UML are approved as technologies and provided in the form of a so-called ITSs (implementable Technology Specifications).

There are also other possible technologies. These include binary XML [XML], ASN.1 [ASN.1] and ER7. The latter requires a revision, because it currently does not allow for arbitrarily deeply nested structures.

In March 2010 a new proposal was submitted to further define an XML ITS, which uses the name of the RIM classes directly and not the ones out of the domain model, so as to address the problem of backward compatibility. Thus the proposal of the author, which was presented at the 2009 IHIC is picked up - more or less unconsciously. The same applies to greenCDA.

2.2.6. "Structural Attributes"

The classes in the abstract models contain information which is absolutely necessary for a correct interpretation of a specific message instance. The RIM provides for the modeling of domains abstract classes that can then be instantiated accordingly. Since these classes can be used multiple times in different forms, this shall be indicated for the correct understanding. The necessary attributes such as "mood code" and "class code" are therefore called "structural attributes".

2.2.7. "mandatory Values"

Closely associated with this structural attribute is the presence of valid values. This means that a replacement of this information with the reasons for the absence of such data ("asked, but no response") are not permitted. This includes the possibility default values as well. However, as some vendors manipulates the XML schemas, default values are very problematic, since they must be known for evaluation. Therefore, for each transmission the associated schema should be communicated in parallel, so that the data volume ("traffic") increases significantly.

2.2.8. "Null-Flavors"

A notable difference between version 3 to version 2 is the fact that information about the absence of data is not mixed with the data. So there is no valid "data code" for "unknown". The table below shows an example:

Table 3: Coding Example "Administrative Gender"

M - MaleM -Male 
F - FemaleF - Female  
A - AmbiguousUN - undifferentiated 
N - Not applicable not used with persons in V3
O - Other not used with persons in V3
U - UnknownnullFlavor="NI" - no information 

Hence the applications have the possibility to select from a dozen different reasons for the absence of information on the one hand. On the other, this type of information must be stored in the application in order to be able to communicate this at all.

2.2.9. Publishing

Another point that has been learned from the publication process of version 2 is the way in which to edit the information. Nowadays, there are no longer proprietary formats such as MS-Word in use. The information is stored in databases and XML documents.

A distinction must be made between text documents and model-specific information. The former are edited in XML documents obeying to given structures (XML schema). The model-specific information is stored with reference to the respective Artefact code into a set of database tables. The models are referenced by the same Artefact code (as a file name).

Based on this information, the documents to be published are generated in an HTML representation and hyperlinked. To ensure the correct linking of that information from various databases which independantly maintained by the individual working groups, they are merged in advance.

In the future, a shift towards the MIF files can be expected.


Last Update: September 22, 2010