T3.1: Identifies Troubleshooting Tools

Knowledge Review - HL7 Interface Specialist

1. Production Configuration Page as Primary Interface

Key Points

  • One-to-one relationship: Production to namespace
  • Start/Stop/Recover: Control production lifecycle
  • Category filter: View All or specific component types (BS, BP, BO)
  • Color coding: Instant component health visualization
  • Component tabs provide filtered views (Settings, Queue, Log, Messages, Jobs, Actions)

Detailed Notes

Overview

The Production Configuration Page is the primary troubleshooting interface in InterSystems HealthConnect. It provides comprehensive visibility into production health and serves as the navigation hub to all other troubleshooting tools. Understanding this page is essential as it represents the first point of contact when investigating production issues.

Production and Namespace Relationship

The page displays all components for the current namespace, respecting the one-to-one relationship between production and namespace. The Start/Stop buttons control the entire production state, affecting all enabled components. The Recover button handles special situations where the production stopped improperly, managing messages that were pending during an unclean shutdown.

Category Filter

The Category filter is critical for managing complex productions with many components. Setting it to "All" displays Business Services, Business Processes, and Business Operations together. For focused troubleshooting, you can filter to view only specific component types, reducing visual clutter in large productions.

Context-Aware Tabs

When you select a component or Production Settings, all tabs on the right side (Settings, Queue, Log, Messages, Jobs, Actions) become context-aware, showing information filtered for that specific component. Navigating to detailed pages from these tabs maintains this filtering, providing pre-filtered views that accelerate troubleshooting.

Documentation References

2. Component Color Coding System

Key Points

  • Dark green: Running normally - all systems go
  • Light green: Running but inactive (typically no adapter)
  • Gray: Disabled component
  • Red: Component error (validation, connectivity, processing)
  • Purple: Retrying message delivery
  • Yellow: Inactivity timeout exceeded - no messages received

Detailed Notes

Visual Feedback System

The color coding system provides instant visual feedback on component health without requiring detailed investigation. This visual language is standardized across all HealthConnect productions and should be memorized for rapid troubleshooting.

Green States

Dark green indicates normal operation with active message processing. This is the expected state for most active components. Light green appears for components that are running but not currently processing messages, often because they lack an adapter (some Business Processes operate this way).

Error and Retry States

Red signifies errors that require immediate attention. These can include message validation failures, inability to connect to remote systems, or processing errors within component logic. When you see red, check the component's Log tab first for error details.

Purple indicates a retry state where the component is attempting to deliver a message but has failed at least once. The component is following its retry configuration, automatically attempting redelivery. This may resolve itself or may indicate a persistent connectivity or configuration issue.

Warning and Disabled States

Yellow warns that a component has exceeded its inactivity timeout without processing messages. For Business Services expecting regular input, this can indicate that the remote system has stopped sending data, representing a potential integration failure.

Gray simply shows disabled components that are not participating in message processing. These are intentionally excluded from the production flow.

Documentation References

3. Production State Troubleshooting

Key Points

  • Running: Normal operation state
  • Stopped: Clean shutdown, no pending messages
  • Suspended: Synchronous messages awaiting responses after shutdown
  • Troubled: Improper shutdown requiring investigation
  • Recovery strategy: Restart for Suspended, investigate then Recover for Troubled

Detailed Notes

State Overview

Productions operate in four distinct states, two of which are normal and two that indicate problems requiring intervention. Understanding these states is critical for maintaining production reliability.

Normal States

Running is the normal operational state where the production is processing messages. Stopped indicates a clean shutdown where all messages completed processing and queues are empty. These two states represent normal production lifecycle.

Suspended State

Suspended appears when an operator manually stops the production, but synchronous messages are still awaiting responses. InterSystems IRIS has a shutdown timeout period during which it waits for pending operations to complete. Messages that don't complete within this window are marked as suspended. When you restart the production, IRIS automatically processes these suspended messages. For planned maintenance, it's best practice to restart and cleanly stop the production again to reach the Stopped state rather than leaving it Suspended.

Troubled State

Troubled state is more serious and indicates the production stopped improperly, typically due to a component crash or system failure. This requires investigation of system logs to identify the root cause. The Recover button can restart failed components and handle pending messages, but you should understand why the crash occurred to prevent recurrence. Troubled state is relatively rare in stable productions and always warrants thorough investigation.

Documentation References

4. Archive I/O Setting for Detailed Tracing

Key Points

  • Available in Business Services and Business Operations
  • Enables I/O column logging in Visual Trace
  • Critical for HL7 ACK debugging: View sent messages and received acknowledgments
  • Shows actual network input/output including protocol details
  • Performance impact: Use selectively in production environments

Detailed Notes

Purpose and Functionality

The Archive I/O setting provides granular visibility into the actual data transmitted and received by Business Services and Business Operations. When enabled, this setting adds I/O columns to the Visual Trace, allowing you to inspect the raw input and output for each message exchange.

HL7 ACK Debugging

For HL7 integrations, Archive I/O is particularly valuable when debugging acknowledgment issues. Without this setting, Visual Trace shows the main HL7 message flow but may not display ACK messages. With Archive I/O enabled in the Business Operation, you can see both the sent HL7 message and the received ACK in separate I/O columns, making it easy to verify that acknowledgments are being received and processed correctly.

Business Service Usage

In Business Services, Archive I/O logs the exact input received from external systems before parsing. This is invaluable when investigating parsing failures or when you suspect the incoming data format doesn't match expectations. You can compare the raw input against the DocType definition to identify discrepancies.

Performance Considerations

The setting applies at the component level, so you can enable it selectively for components experiencing issues rather than production-wide. This targeted approach minimizes performance impact while providing necessary debugging information. In high-volume production environments, consider the storage implications of logging all I/O and enable it only during active troubleshooting.

Documentation References

5. Central Alert Component Architecture

Key Points

  • Fixed name: Ens.Alert (component must use this exact name)
  • Receives all `Ens.AlertRequest` messages from the production
  • Implementation flexibility: Message Router OR Business Operation
  • Critical rule: Never enable "Alert on Error" in alert handling components
  • Prevents infinite alert loops

Detailed Notes

Reserved Component Name

The Ens.Alert component name is reserved and has special meaning in InterSystems HealthConnect. When any component in the production generates an alert, IRIS automatically routes the resulting Ens.AlertRequest message to a component named "Ens.Alert". This name is fixed by the framework and cannot be changed.

Implementation Flexibility

While the name is fixed, the implementation is entirely flexible. You can implement Ens.Alert as a Message Router that uses routing rules to distribute different alert types to different destinations based on error severity, source component, or error message content. Alternatively, you can implement it as a Business Operation that directly handles alerts by sending emails, writing to log files, or integrating with external monitoring systems.

Complex Alert Architectures

For complex alert handling, you might implement Ens.Alert as a Message Router that sends alerts to multiple Business Operations: one for email notifications, one for logging to a monitoring system, and one for archiving critical errors to a database. This architecture provides centralized alert routing with distributed handling.

Avoiding Infinite Loops

The most critical configuration rule for alert handling is to never enable "Alert on Error" in Ens.Alert or any downstream components that process alerts. If an error occurs during alert processing and "Alert on Error" is enabled, the system generates another alert, which routes back to Ens.Alert, potentially creating an infinite loop. Always disable "Alert on Error" throughout the entire alert handling chain.

Documentation References

6. Bad Message Handler Configuration

Key Points

  • Available only in Message Routers: EnsLib.HL7.MsgRouter.RoutingEngine
  • Requires validation enabled (Validation ≠ 0)
  • Routes failed messages to designated component
  • Messages without DocType fail validation automatically
  • Use "Alert on Bad Message" for notification

Detailed Notes

Purpose and Availability

The Bad Message Handler provides a mechanism for gracefully handling HL7 messages that fail validation rather than generating errors that halt processing. This setting exists only in HL7 Message Router components (EnsLib.HL7.MsgRouter.RoutingEngine) and requires validation to be enabled.

Implementation Steps

To implement Bad Message Handler, you must first enable validation in the Message Router by setting the Validation parameter to "1", "dm", "z", or another non-zero value. Then create a destination component (typically a Business Operation) that receives invalid messages. In the Message Router settings, set the Bad Message Handler parameter to the exact name of this destination component.

Handling Invalid Messages

The destination component receives the complete EnsLib.HL7.Message even though it failed validation. Common handling strategies include writing invalid messages to disk for manual review, logging them to a database with error details, or forwarding them to a manual processing queue. This prevents data loss while ensuring invalid messages don't disrupt normal processing.

DocType Requirement

A critical point for exam preparation: messages that arrive at the Message Router without a DocType specified cannot be validated because IRIS doesn't know which schema to validate against. These messages automatically route to the Bad Message Handler. The DocType should be assigned in the Business Service before sending the message to the Message Router.

Alert Integration

The "Alert on Bad Message" setting complements Bad Message Handler by generating an Ens.AlertRequest to the Ens.Alert component whenever validation fails. This provides real-time notification of validation issues while the Bad Message Handler ensures the message is preserved for investigation.

Documentation References

7. Production Monitor, Queues, Jobs, and Messages Pages

Key Points

  • Production Monitor: Real-time production health dashboard
  • Queues Page: View pending messages by component
  • Jobs Page: Monitor component process execution
  • Messages Page: Component message history
  • Context-aware filtering when navigating from component tabs

Detailed Notes

Overview

HealthConnect provides multiple specialized pages for monitoring different aspects of production operation. Each serves a distinct purpose in the troubleshooting workflow.

Production Monitor

The Production Monitor page provides a dashboard view of production health, showing active components, message rates, queue depths, and error counts. This is typically the first page administrators check to assess overall production health and identify components requiring attention.

Queues Page

The Queues page displays pending messages organized by component, showing how many messages are waiting for processing in each queue. High queue depths can indicate performance bottlenecks, slow external systems, or components that have fallen behind in processing. The page allows you to drill down into specific queues to see individual messages and their ages.

Jobs Page

The Jobs page shows the IRIS processes executing for each component. This is valuable for understanding component parallelism, identifying hung processes, and diagnosing performance issues. For components configured with pool sizes greater than one, you'll see multiple job processes. Comparing active jobs against configured pool size helps identify whether components are fully utilizing available capacity.

Messages Page

The Messages page provides comprehensive message history for components, allowing you to filter, search, and view message details. This is essential for investigating specific message processing issues or understanding component behavior over time.

Context-Aware Filtering

A key feature of all these pages is context-aware filtering. When you navigate to any of these pages from the Production Configuration page with a specific component selected, the destination page automatically filters to show only information for that component. This pre-filtering significantly accelerates troubleshooting by eliminating the need to manually configure filters.

Documentation References

8. Testing Service for Routing Rules

Key Points

  • Disabled by default in production environments (security)
  • Enable in Production Settings to use testing tool
  • Send test messages directly to Business Processes or Operations
  • Must specify: DocType AND Source for routing rule matching
  • Creates test sessions visible in Visual Trace

Detailed Notes

Purpose

The testing service provides a powerful mechanism for testing routing rules and message processing without requiring connectivity to external systems. This tool is particularly valuable during development, testing, and troubleshooting phases.

Enabling the Service

By default, the testing service is disabled in production environments to prevent operators from inadvertently sending test messages that might trigger actions in connected external systems. Before using the testing tool, you must enable it in the Production Settings. This deliberate enabling step ensures you're aware you're working in a test mode.

Testing Process

To test a routing rule using the testing tool, navigate to the testing page and select the target Business Process or Business Operation. In the "document content" field, paste the complete HL7 message you want to test. However, pasting the message content alone is insufficient for proper testing.

Required Fields: DocType

You must also specify the DocType in the designated field. The DocType enables proper parsing and validation of the HL7 message against the schema. Without the correct DocType, the message may fail validation or be parsed incorrectly, leading to misleading test results.

Required Fields: Source

Additionally, if your routing rules include constraints based on the source component (using the Source property), you must specify the Source field with the name of the Business Service that would normally send the message. Without this, routing rules with Source constraints won't match, and your test message won't route as expected.

Test Session Visibility

The testing tool creates real test sessions that appear in Visual Trace and the Message Viewer. You can use all standard troubleshooting tools to analyze test message processing, making it easy to validate routing rules and transformation logic.

Documentation References

Exam Preparation Summary

Critical Concepts to Master:

  1. Production Configuration Page: Central hub for monitoring and component management
  2. Color Coding: Green=running, Yellow=warning, Red=error, Gray=disabled/stopped
  3. Queue Tab: View pending messages for any component
  4. Event Log: System events, errors, warnings filtered by component
  5. Testing Service: Must set DocType AND Source for complete testing
  6. Message Viewer: Search and analyze processed messages
  7. Visual Trace: Session-based view of complete message flow

Common Exam Scenarios:

  • Identifying component status from color coding
  • Using Queue tab to diagnose message backlogs
  • Navigating from Event Log to related messages
  • Configuring Testing Service for rule validation
  • Understanding when to use Message Viewer vs Visual Trace
  • Troubleshooting using component-filtered views
  • Identifying connection failures from component status

Hands-On Practice Recommendations:

  • Navigate Production Configuration Page for various components
  • Monitor component queues during message processing
  • Use Event Log to investigate errors and warnings
  • Test routing rules with Testing Service (set DocType and Source)
  • Search messages using Message Viewer filters
  • Trace complete message sessions in Visual Trace
  • Correlate Event Log entries with specific message sessions

Report an Issue