T13.3: Uses Troubleshooting Tools - HealthShare Unified Care Record Technical Specialist Study Guide

1. Production Event Log

Key Points

Access path: Management Portal > Interoperability > View > Event Log
Event types: Error, Warning, Info, Alert, Assert, Trace — in order of severity
Quick search: Search by text string, component name, or time range
Advanced filtering: Complex queries combining multiple criteria
Event details: Expand entries to see full error text, stack traces, and context

Detailed Notes

Overview

The Production Event Log is the primary tool for identifying application-level issues in a UCR production. Every business host (service, process, operation) can write entries to the Event Log when noteworthy events occur during processing. Errors, warnings, alerts, and informational messages are all captured here, making it the first place to look when troubleshooting any production issue.

The Event Log is distinct from the system-level log (cconsole.log); it captures events generated by the production's business hosts rather than system-level events from the IRIS for Health platform itself.

Accessing the Event Log

Navigate to the Event Log from the Management Portal:

Select the appropriate namespace for the UCR component you are investigating
Go to Interoperability > View > Event Log
The page displays a filterable list of events with the most recent first

Event Types

Events are categorized by type, from most to least severe:

Error: Processing failures that require investigation. The business host encountered a condition it could not handle
Warning: Potential issues that did not cause processing failure but may indicate a developing problem
Info: Informational messages about normal processing milestones (component started, configuration loaded)
Alert: Events generated by the AlertOnError mechanism, indicating a component has raised an alert condition
Assert: Debug-level assertions typically used during development
Trace: Detailed diagnostic output for low-level debugging (only generated when tracing is enabled)

Quick Search

The Event Log provides a quick search bar for rapid filtering:

Enter a text string to search across all event text
Filter by specific component name to see only events from one business host
Set a time range to focus on events during a specific period
Combine text search with component and time filters for targeted results

Advanced Filtering

For more complex queries, the Advanced Filter provides:

Multiple criteria combined with AND/OR logic
Filter by event type (show only Errors, for example)
Filter by specific error codes or message patterns
Save frequently used filters for reuse

Viewing Event Details

Each Event Log entry can be expanded to show:

Full error text: The complete error message, which may be truncated in the list view
Stack trace: For errors, the call stack showing where in the code the error occurred
Component context: The business host name, configuration name, and production
Timestamps: Exact time the event occurred
Related message: Link to the message being processed when the event occurred (if applicable)

Best Practices

Check the Event Log regularly, not just when issues are reported
Set up alert operations to proactively notify administrators of errors
Review Warning events periodically, as they often indicate issues that will become Errors
Use the Event Log in conjunction with Visual Trace for complete diagnosis

---

Documentation References

InterSystems Documentation

2. Production Alerts

Key Points

AlertOnError: Business host setting that triggers an alert when errors occur
AlertGracePeriod: Minimum seconds between repeated alerts from the same component
AlertRetryGracePeriod: Minimum seconds between alerts for retry-related failures
Alert operations: Business operations that route alerts to email, pager, or monitoring systems
AuditAlertOperations: Specialized alerts for audit-related events

Detailed Notes

Overview

Production alerts provide proactive notification of issues in UCR productions. Rather than relying on administrators to periodically check the Event Log, alerts push notifications when problems occur. Proper alert configuration is essential for maintaining the health of a UCR federation, as issues in one component can cascade and affect the entire data flow.

Alerts are generated by business hosts when configured to do so, and are routed through alert operations to their final destination (email, monitoring system, etc.).

AlertOnError Setting

The AlertOnError setting is configured on individual business hosts:

When enabled, the business host generates an alert message whenever it logs an error to the Event Log
The alert message contains the error details, component name, and timestamp
Alert messages are sent to configured alert operations for routing and delivery
This setting should be enabled on critical components in the production

AlertGracePeriod

The AlertGracePeriod prevents alert flooding:

Specifies the minimum number of seconds that must pass between consecutive alerts from the same component
If a component generates repeated errors within the grace period, only the first alert is sent
Subsequent errors are still logged in the Event Log but do not generate additional alerts
Setting an appropriate grace period prevents overwhelming administrators with duplicate alerts
Default value should be tuned based on the component's expected error frequency

AlertRetryGracePeriod

The AlertRetryGracePeriod specifically controls alerts during retry scenarios:

When a business operation is retrying a failed request (e.g., network timeout to the Hub), each retry failure would normally generate an alert
AlertRetryGracePeriod suppresses repeated alerts during the retry cycle
This prevents alert storms during transient connectivity issues
The period should be set longer than the expected retry cycle duration

Alert Operations

Alert operations are business operations that deliver alert notifications:

Email alerts: Send alert details via email to administrators or distribution lists
Monitoring system integration: Forward alerts to enterprise monitoring platforms
Custom alert routing: Route alerts based on severity, component, or error type
Multiple alert operations can be configured for different notification channels

AuditAlertOperations

AuditAlertOperations are specialized alert operations for audit-related events:

Monitor audit event processing failures
Alert when the Audit Edge is unreachable or audit events are not being recorded
Critical for regulatory compliance, as missed audit events can be a compliance violation

Configuration Best Practices

Enable AlertOnError on all critical business hosts in the production
Set AlertGracePeriod to avoid alert fatigue (60-300 seconds is typical)
Configure at least one email-based alert operation for immediate notification
Test alert configuration regularly to ensure alerts are being delivered
Document the alert escalation path for different types of alerts

---

Documentation References

InterSystems Documentation

3. Production Monitor

Key Points

Component status: Running (green), Stopped (red), Error (red with indicator), Disabled (gray)
Queue depths: Number of messages waiting in each component's queue
Error counts: Number of errors logged by each component since last restart
Last activity: Timestamp of the most recent message processed by each component
Bottleneck detection: High queue counts indicate a component is falling behind

Detailed Notes

Overview

The Production Monitor provides a real-time dashboard view of the health and status of all business hosts in a UCR production. It displays each component's operational status, queue depth, error count, and last activity timestamp at a glance. This makes it the ideal tool for quickly assessing the overall health of a production and identifying components that need attention.

The Production Monitor is accessed from the Management Portal under Interoperability > Monitor.

Component Status Indicators

Each business host displays a status indicator:

Running (green): The component is active and processing messages normally
Stopped (red): The component has been stopped and is not processing messages
Error (red with error indicator): The component has encountered an error condition
Disabled (gray): The component is disabled in the production configuration and will not start

Queue Depths

Queue depth is one of the most important metrics in the Production Monitor:

Each business process and operation has an input queue where messages wait to be processed
The queue depth shows how many messages are currently waiting
Normal: Queue depth stays low and fluctuates as messages are processed
Warning sign: Queue depth is consistently growing, indicating the component cannot keep up with incoming messages
Stuck queue: Queue depth grows but the component shows no recent activity, indicating the component may be hung

Error Counts

The error count shows the number of errors logged by each component:

Counts are typically reset when the production or component is restarted
A high error count indicates a persistent or recurring problem
Compare error counts across components to identify which components are most affected
Click on the error count to navigate to the Event Log filtered for that component

Last Activity Timestamp

The last activity timestamp shows when the component last processed a message:

A recent timestamp indicates the component is actively processing
A stale timestamp (hours or days old) may indicate the component is idle or stuck
Compare with expected activity levels; a normally busy component with a stale timestamp warrants investigation

Identifying Bottlenecks

The Production Monitor helps identify processing bottlenecks:

Growing queue + recent activity: Component is processing but not fast enough for the workload. Consider increasing PoolSize
Growing queue + stale activity: Component may be stuck or waiting for an external resource. Check for hung connections or deadlocks
Multiple components with growing queues: May indicate a downstream component failure causing backpressure
Zero queue + frequent errors: Component is processing quickly but failing repeatedly

Practical Troubleshooting with Production Monitor

1. Open the Production Monitor for the affected UCR component's namespace 2. Scan for red status indicators (stopped or errored components) 3. Check queue depths for any components with unusually high counts 4. Note error counts and click through to the Event Log for details 5. Verify last activity timestamps match expected activity levels 6. For growing queues, check the downstream components for issues

---

Documentation References

InterSystems Documentation

4. Business Rule Log

Key Points

Access path: Management Portal > Interoperability > View > Business Rule Log
Rule execution: Shows which business rules were evaluated and their results
Condition matching: Displays which conditions in each rule matched or did not match
Action taken: Records the action performed (route, transform, discard, etc.)
Troubleshooting routing: Explains why a message was or was not sent to a specific target

Detailed Notes

Overview

The Business Rule Log records the execution of business rules within UCR productions. Business rules control routing, transformation selection, and processing decisions. When a message is not routed as expected or a transformation is not applied, the Business Rule Log shows exactly which rules were evaluated, which conditions matched, and what actions were taken.

This log is particularly useful for diagnosing issues with custom routing logic and understanding why messages take specific paths through the production.

Accessing the Business Rule Log

Navigate to the Business Rule Log from the Management Portal:

Select the appropriate namespace
Go to Interoperability > View > Business Rule Log
The log shows a chronological list of rule executions

What the Business Rule Log Shows

Each entry in the log records:

Rule name: The business rule that was evaluated
Session and message: The session and message that triggered the rule evaluation
Conditions evaluated: Each condition in the rule and whether it matched (true/false)
Action taken: The action executed as a result of the rule evaluation (send to target, transform, discard, etc.)
Return value: The overall result of the rule execution

Understanding Rule Evaluation

Business rules in UCR productions follow an if-then-else pattern:

Rules contain one or more conditions based on message content, source, or other criteria
When a condition matches, the associated action is executed
If no conditions match, a default action may be executed
The rule log shows the complete evaluation path, making it clear why a particular action was taken

Common Troubleshooting Scenarios

Message not routed to expected target: Check the Business Rule Log to see which conditions were evaluated and why the expected routing condition did not match
Message routed to wrong target: The log shows which condition matched and triggered the incorrect routing
Transformation not applied: If a rule should select a specific DTL transformation but did not, the log shows which condition failed
Message unexpectedly discarded: A rule may have a discard action for certain conditions; the log confirms this

Relationship to Other Troubleshooting Tools

Use the Message Viewer to find the specific message, then check the Business Rule Log for routing decisions about that message
Use the Visual Trace to see the overall message flow, then the Business Rule Log for detailed routing explanations
Use the Event Log for processing errors, and the Business Rule Log for logic/routing questions

---

Documentation References

InterSystems Documentation

5. Message Viewer with Filtering

Key Points

Basic filters: Source, Target, Status, Time Range
Header filters: SessionId, MessageId, CorrelationId for precise searches
Advanced criteria: Combine multiple filters with AND logic for complex searches
Content viewing: Examine message headers and body content from search results
Trace access: Jump directly to Visual Trace from any message in the results

Detailed Notes

Overview

The Message Viewer is the central tool for finding and examining messages in a UCR production. It provides flexible filtering capabilities to narrow down potentially millions of messages to the specific ones relevant to a troubleshooting investigation. Effective use of the Message Viewer's filtering capabilities is essential for efficient troubleshooting.

The Message Viewer is accessed from the Management Portal under Interoperability > View > Messages.

Basic Filtering

The Message Viewer provides these basic filters on the main page:

Source: Filter by the business host that sent the message
Target: Filter by the business host that received the message
Status: Filter by message status (Completed, Error, Suspended, Queued, Discarded)
Time Range: Specify start and end times to limit results to a specific period
Sort order: Results can be sorted by time (newest first or oldest first)

Header-Based Filtering

For precise searches, filter by message header fields:

SessionId: Find all messages belonging to a specific session (all messages from one inbound event)
MessageId: Find one specific message by its unique identifier
CorrelationId: Find the response to a specific request message
Combining header filters: Use SessionId with Status to find all errors in a session, for example

Advanced Filtering

The Extended Criteria section provides additional filtering power:

Combine multiple criteria for targeted searches
Filter by message body content (search within the payload)
Filter by specific business host class names
Use date ranges with precision to narrow results

Working with Search Results

Once messages are found:

View headers: Click on a message to see all header fields (Source, Target, SessionId, MessageId, Status, timestamps)
View body: Examine the message body content in the appropriate format (SDA, SOAP, HL7, etc.)
Visual Trace: Click the trace icon to see the message flow diagram
Session view: Click the SessionId to see all messages in the same session
Resend: For suspended messages, access the Resend Editor directly from the results

Filtering Strategy for Common Scenarios

| Scenario | Recommended Filter | |---|---| | Find errors in the last hour | Status = Error, Time Range = last 1 hour | | Find all messages for a patient | Source or body content containing patient MRN | | Trace one inbound event | SessionId of the inbound message | | Find failed Hub communications | Target = Hub operation, Status = Error | | Check queue backlog | Status = Queued, Target = specific component |

---

Documentation References

InterSystems Documentation

6. Suspended Message Management

Key Points

Finding suspended messages: Filter Message Viewer by Status = Suspended
Common causes: Retry count exceeded, manual suspension, business rule suspension
Resend Editor: Tool for reviewing and resubmitting suspended messages
Resend options: Send to original target, redirect to different target, edit before resending
Fix first: Always resolve the underlying issue before resubmitting

Detailed Notes

Overview

Suspended messages are messages whose processing has been halted and which await manual intervention. They represent a critical operational concern because they indicate data that has not been fully processed. In a UCR context, suspended messages may mean patient data that was not stored, not forwarded to the Hub, or not delivered to a requesting Access Gateway.

Understanding how to find, evaluate, and manage suspended messages is essential for maintaining data completeness in a UCR federation.

Why Messages Get Suspended

Messages can be suspended for several reasons:

Retry exhaustion: A business operation attempted to deliver a message but failed repeatedly, exceeding its configured retry count. After the last retry fails, the message is suspended rather than discarded
Manual suspension: An administrator manually suspended a message for review
Business rule action: A business rule explicitly suspended the message based on its content or routing conditions
Queue issues: In rare cases, system issues may cause messages to be suspended

Finding Suspended Messages

To find suspended messages:

Open the Message Viewer for the relevant namespace
Filter by Status = Suspended
Optionally narrow by time range, source, or target component
Review the results to understand the scope of the issue (one message or many)

Using the Resend Editor

The Resend Editor provides options for handling suspended messages:

View message content: Examine the full message headers and body before deciding on action
Send to original target: Resubmit the message to the component that originally failed to process it. Use this when the underlying issue has been fixed
Send to new target: Redirect the message to a different component. Use this when the original target is permanently unavailable or when rerouting is needed
Edit before resending: Modify the message content before resubmitting. Use this when the message itself contains errors that caused the suspension
Discard: Mark the message as discarded if it should not be reprocessed

Best Practices for Suspended Message Management

1. Investigate first: Before resubmitting, determine why the message was suspended 2. Fix the root cause: Resolve the underlying issue (connectivity, configuration, data format) before resubmitting 3. Test with one message: If multiple messages are suspended, resubmit one first and verify it processes successfully before resubmitting the rest 4. Monitor after resubmission: Watch the Event Log and Production Monitor after resubmitting to confirm successful processing 5. Document the resolution: Record what caused the suspension and how it was resolved for future reference

Impact of Unresolved Suspended Messages

Leaving suspended messages unresolved can cause:

Missing patient data in the Clinical Viewer
Incomplete patient records in the ECR
Failed MPI registrations at the Hub
Audit gaps if audit messages are suspended
Growing database size as suspended messages accumulate

---

Documentation References

InterSystems Documentation

7. Queue Manager

Key Points

Queue inspection: View messages currently in any component's queue
Queue diagnostics: Identify stuck messages or growing queues
PoolSize: Number of concurrent processing threads for a component; increase to improve throughput
QueueCountAlert: Threshold that triggers an alert when queue depth exceeds the configured value
Stuck queue indicators: Growing depth with no activity, or single message that never processes

Detailed Notes

Overview

The Queue Manager provides visibility into the internal message queues of business processes and operations. Every business process and operation has an input queue where messages wait to be processed. The Queue Manager allows administrators to inspect queue contents, diagnose processing delays, and identify stuck or problematic messages.

Queue-related issues are among the most common operational problems in UCR productions, as they directly affect data processing throughput and timeliness.

Accessing the Queue Manager

The Queue Manager is accessed from the Management Portal:

Navigate to Interoperability > Monitor > Queue Manager (or view queues from the Production Monitor)
Select the component whose queue you want to inspect
View the messages currently in the queue with their details

Diagnosing Queue Issues

Common queue-related issues and their indicators:

Growing queue depth: The component is receiving messages faster than it can process them. The queue grows steadily over time
Stuck queue: The queue depth remains constant and the component shows no processing activity. A single message at the head of the queue may be causing the blockage
Queue spikes: Sudden increases in queue depth, typically caused by a burst of inbound messages or a temporary processing slowdown
Empty queue with errors: Messages are being processed immediately but failing, indicating a processing issue rather than a queue issue

PoolSize Configuration

PoolSize controls how many messages a component can process concurrently:

Each business process and operation has a configurable PoolSize (default is typically 1)
Increasing PoolSize allows the component to process multiple messages in parallel
Higher PoolSize can resolve growing queue issues caused by high message volume
However, some components must maintain message order (PoolSize = 1) to avoid data consistency issues
Increasing PoolSize too high can consume excessive system resources

QueueCountAlert Setting

QueueCountAlert provides proactive notification of queue buildup:

Configured on individual business hosts
When the queue depth exceeds the configured threshold, an alert is generated
Helps identify processing delays before they become critical
Set the threshold based on expected queue depth during normal operations
Combine with AlertOnError for comprehensive monitoring

Queue Manager Actions

From the Queue Manager, administrators can:

View all messages in a specific queue
Examine individual message headers and content
Identify the message at the head of the queue (the next to be processed)
Correlate queue issues with Event Log errors for the same component

---

Documentation References

InterSystems Documentation

8. Message and Event Log Purging

Key Points

Purpose: Remove old messages and events to manage database size and performance
Purge tasks: Scheduled tasks that automatically delete messages older than a specified retention period
Message purging: Removes message headers, bodies, and search table entries
Event Log purging: Removes old Event Log entries
Retention balance: Longer retention enables better troubleshooting; shorter retention saves storage

Detailed Notes

Overview

UCR productions generate large volumes of messages and Event Log entries over time. Without regular purging, the database grows continuously, potentially impacting performance and consuming excessive storage. Purge tasks are scheduled operations that delete old messages and events based on configurable retention policies.

However, purging must be carefully configured because once messages and events are purged, they are no longer available for troubleshooting or auditing. Finding the right balance between retention and storage management is an important operational decision.

Message Purging

Message purging removes old messages from the production database:

What is purged: Message headers, message bodies, search table entries, and associated metadata
Retention period: Configured as a number of days; messages older than this period are eligible for purging
Purge scheduling: Typically scheduled to run daily during off-peak hours
Selective purging: Can be configured to purge only certain message types or statuses

Event Log Purging

Event Log purging removes old Event Log entries:

What is purged: Event Log entries including error text, stack traces, and component information
Retention period: Separately configurable from message purging; may use a different retention period
Considerations: Event Log entries are typically smaller than messages but can accumulate rapidly in high-volume productions

Configuring Purge Tasks

Purge tasks are configured through the Management Portal:

Navigate to the Task Manager to schedule purge tasks
Configure retention period (number of days to keep)
Set the schedule (daily, weekly) and time of execution
Configure which message types and statuses to include in purging

Retention Period Considerations

Choosing the right retention period involves trade-offs:

Longer retention (30-90+ days):
Enables investigation of historical issues
Supports compliance and audit requirements
Requires more database storage and may impact performance
Shorter retention (7-14 days):
Reduces storage requirements and improves database performance
Limits historical troubleshooting capability
May not meet regulatory retention requirements

Impact on Troubleshooting

Purging directly affects troubleshooting capability:

Purged messages cannot be viewed, traced, or resubmitted
Purged Event Log entries cannot be reviewed for historical error patterns
If an issue is reported after the relevant data has been purged, diagnosis becomes much harder
Consider archiving purged data to external storage for long-term retention if needed

Best Practices

Set retention periods based on your organization's operational and regulatory requirements
Schedule purge tasks during off-peak hours to minimize performance impact
Monitor database size trends to verify purging is keeping pace with growth
Do not purge suspended messages until they have been reviewed and resolved
Coordinate purge schedules across all UCR components for consistent behavior

---

Documentation References

InterSystems Documentation

9. I/O Log

Key Points

ArchiveIO setting: Enable on business hosts to capture raw I/O data
Raw data capture: Records the exact bytes sent and received by a component
External communication: Most useful for diagnosing issues with connections to external systems
Performance impact: Enabling I/O logging adds overhead; use only for targeted debugging
Visual Trace comparison: I/O Log shows raw data; Visual Trace shows message-level flow

Detailed Notes

Overview

The I/O Log captures the raw data exchanged between UCR business hosts and external systems at the lowest communication level. While Visual Trace shows message-level flow through the production, the I/O Log shows the actual bytes sent and received over network connections. This is invaluable when diagnosing communication problems where the issue is at the transport or protocol level rather than the message content level.

The I/O Log is enabled per-component using the ArchiveIO setting and should be used selectively due to its performance and storage impact.

Enabling the I/O Log

To enable I/O logging:

Open the Production Configuration page
Select the business host you want to monitor
Set the ArchiveIO setting to true (or 1)
The component will begin capturing raw I/O data for all communications
Remember to disable ArchiveIO after the debugging session to avoid performance impact

What the I/O Log Captures

The I/O Log records:

The raw data sent by the business host to an external system (outbound request)
The raw data received from the external system (inbound response)
Timestamps for each I/O operation
The connection endpoint (host, port)

When to Use the I/O Log

The I/O Log is most useful in these scenarios:

Connection failures: When a business operation cannot connect to an external system, the I/O Log shows what was attempted
Protocol issues: When the raw data exchanged does not conform to the expected protocol (e.g., malformed SOAP, incorrect HL7 framing)
Encoding problems: When character encoding issues corrupt data in transit
External system debugging: When the external system claims it did not receive the expected data
SSL/TLS handshake failures: The I/O Log can capture the point where the TLS handshake fails

I/O Log vs Visual Trace

| Aspect | I/O Log | Visual Trace | |---|---|---| | Level of detail | Raw bytes | Message-level | | Scope | Single component's external I/O | Entire message flow through production | | When to use | Transport/protocol issues | Message routing/processing issues | | Performance impact | Significant | Minimal | | Storage impact | High (raw data volume) | Low (message metadata) |

Best Practices

Only enable ArchiveIO on the specific component being debugged
Disable ArchiveIO as soon as the debugging session is complete
Use I/O Log as a last resort after Visual Trace and Event Log have been checked
Be aware of storage implications; raw I/O data can be voluminous
Coordinate with the external system team when debugging bilateral communication issues

---

Documentation References

InterSystems Documentation

10. System Index Dashboard

Key Points

Federation view: Monitors production status across all UCR components in the federation
Component health: Shows status of Hub, Edge Gateways, Access Gateways at a glance
Event Log integration: View Event Log entries across the federation from a single dashboard
Reconciliation Service: Checks data consistency between components
Centralized monitoring: Single point for assessing overall federation health

Detailed Notes

Overview

The System Index Dashboard provides a federation-wide view of the health and status of all UCR components. While the Production Monitor shows the status of a single production, the System Index aggregates status information from the Hub, all Edge Gateways, all Access Gateways, and any other components in the federation. This makes it the starting point for assessing the overall health of a UCR deployment.

The System Index Dashboard is typically accessed from the Hub's Management Portal, as the Hub has visibility into all registered components.

Accessing the System Index

The System Index is accessed from the Management Portal:

Navigate to the Hub namespace
Go to HealthShare > System Index (or similar navigation path depending on version)
The dashboard displays all registered UCR components and their current status

What the Dashboard Shows

The System Index Dashboard displays:

Component list: All registered UCR components (Hub, Edge Gateways, Access Gateways, Bus, ODS)
Production status: Whether each component's production is running, stopped, or in error
Connectivity status: Whether the Hub can communicate with each component
Last check-in time: When each component last reported its status
Error indicators: Visual indicators for components experiencing issues

Event Log Integration

The System Index provides access to Event Log data across the federation:

View recent events from all components in a single consolidated view
Filter events by component, severity, or time range
Quickly identify which components are generating errors
Drill down into specific components for detailed Event Log viewing

Reconciliation Service

The Reconciliation Service is a tool within the System Index for checking data consistency:

Compares patient identities across the MPI and Edge Gateways
Identifies discrepancies in document registrations
Checks that data registered at the Hub matches data stored at Edge Gateways
Helps identify synchronization issues between components

Using System Index for Troubleshooting

The System Index is valuable for:

Initial assessment: When an issue is reported, the System Index shows which components are healthy and which are not
Connectivity verification: Confirm that all components can communicate with each other
Federation-wide issues: Identify problems affecting multiple components simultaneously
Proactive monitoring: Regular review of the System Index can catch issues before they impact users

---

Documentation References

InterSystems Documentation

11. Event Log vs System Log

Key Points

Production Event Log: Application-level events from business hosts (errors, warnings, info)
cconsole.log: System-level events from the IRIS for Health platform (license, database, startup)
When to check Event Log: Production processing errors, message failures, component issues
When to check cconsole.log: License errors, database issues, system crashes, startup failures
Location: cconsole.log is a file on the server; Event Log is in the Management Portal

Detailed Notes

Overview

UCR troubleshooting requires understanding the distinction between two different logging systems: the Production Event Log (application-level) and the system console log, cconsole.log (system-level). Each captures different types of events, and knowing which log to check for a given type of issue is essential for efficient diagnosis.

A common mistake is to search the Production Event Log for system-level issues (license errors, database problems) that would only appear in cconsole.log, or vice versa.

Production Event Log

The Production Event Log captures application-level events:

Source: Generated by business hosts (services, processes, operations) during message processing
Access: Management Portal > Interoperability > View > Event Log
Types of events:
Message processing errors (transformation failures, connection timeouts, data validation errors)
Component lifecycle events (started, stopped, configuration changes)
Alert events (triggered by AlertOnError settings)
Business rule evaluation results
Custom logging from custom code
Scope: Limited to the current namespace's production

cconsole.log (System Console Log)

The cconsole.log captures system-level events:

Source: Generated by the IRIS for Health platform itself, not by production business hosts
Access: A text file on the server's file system (not in the Management Portal)
Location: Typically in the IRIS for Health installation directory (e.g., `/mgr/cconsole.log`)
Types of events:
License errors: License key expired, license limit reached, license key invalid
Database errors: Database corruption, journal issues, storage failures
System startup/shutdown: Instance start, stop, and restart events
Memory issues: Out-of-memory conditions, large memory allocations
Network issues: TCP listener failures, port conflicts
Security events: Failed login attempts, access control violations
Process errors: System process crashes, abnormal terminations
Mirror/failover events: Mirror connection status, failover events

When to Check Which Log

| Symptom | Check First | Why | |---|---|---| | Message processing error | Event Log | Business host logged the error | | Component not starting | Both | Could be application or system issue | | Production won't start | cconsole.log | May be a license or system issue | | Intermittent connection failures | Event Log then cconsole.log | Could be application timeout or network issue | | System unresponsive | cconsole.log | Likely a system-level resource issue | | Login failures | cconsole.log | Security events logged at system level | | Database errors | cconsole.log | Database issues are system-level | | Missing patient data | Event Log | Processing issue in production | | License warnings | cconsole.log | License events are system-level |

Practical Approach

For most UCR troubleshooting scenarios, start with the Production Event Log: 1. If the Event Log shows clear application-level errors, investigate and resolve 2. If the Event Log is silent but the system is misbehaving, check cconsole.log 3. If the system is unresponsive or the Management Portal is inaccessible, go directly to cconsole.log on the server 4. For license-related issues, always check cconsole.log 5. After system restarts, check cconsole.log to understand why the restart occurred

Accessing cconsole.log

Since cconsole.log is a file on the server:

Access requires server-level login (SSH, console, or terminal)
Use text tools (less, tail, grep) to search the file
`tail -f cconsole.log` can be used to monitor the log in real time
The file can grow large; use date-based searching to find relevant entries
Log rotation may archive older entries into separate files

---

Documentation References

InterSystems Documentation

Exam Preparation Summary

Critical Concepts to Master:

Event Log: First troubleshooting stop; shows application-level errors, warnings, and info from business hosts
Alerts: AlertOnError enables proactive notification; AlertGracePeriod prevents alert flooding
Production Monitor: Quick assessment of component status, queue depths, error counts, and activity
Business Rule Log: Explains routing decisions — which rules fired, which conditions matched, which actions taken
Message Viewer filtering: Filter by Source, Target, Status, Time Range, SessionId, MessageId for targeted searches
Suspended messages: Use Resend Editor to review and resubmit; always fix root cause first
Queue Manager: Diagnose growing queues by checking PoolSize, component status, and downstream components
Purging: Balance retention period against storage; purged data cannot be recovered for troubleshooting
I/O Log: Enable ArchiveIO for raw communication data; use for transport-level issues, not message-level
System Index: Federation-wide monitoring dashboard; Reconciliation Service checks data consistency
Event Log vs cconsole.log: Event Log for production processing issues; cconsole.log for system-level issues (license, database, startup)

Common Exam Scenarios:

Choosing the correct troubleshooting tool for a given symptom (Event Log for errors, Visual Trace for message flow, Production Monitor for status)
Identifying why messages are accumulating in a queue (component stopped, PoolSize too low, downstream failure)
Determining the appropriate action for suspended messages (fix root cause, then use Resend Editor)
Configuring alerts to avoid alert fatigue while ensuring critical errors are notified
Deciding whether to check the Event Log or cconsole.log for a given type of issue
Understanding the impact of purging on troubleshooting capability
Using the System Index to assess federation-wide health before drilling into individual components
Deciding when to enable I/O logging versus using Visual Trace
Interpreting Business Rule Log entries to understand routing decisions
Setting appropriate QueueCountAlert thresholds based on normal queue behavior

Hands-On Practice Recommendations:

Navigate to the Event Log and practice searching with different filter combinations (time range, severity, component)
Open the Production Monitor and identify the status, queue depth, and error count for each component
Find a suspended message in the Message Viewer and practice using the Resend Editor
Review the Business Rule Log for a production with business rules and trace a routing decision
Enable ArchiveIO on a test component and examine the I/O Log output
Access the System Index Dashboard and review the federation-wide status
Locate cconsole.log on the server and practice searching for specific event types
Configure AlertOnError and AlertGracePeriod on a test component and verify alert generation
Practice configuring and running a message purge task with a short retention period in a test environment
Use the Queue Manager to inspect messages in a component's queue during normal processing

1. Production Event Log Report Issue

Key Points

Detailed Notes

Overview

Accessing the Event Log

Event Types

Quick Search

Advanced Filtering

Viewing Event Details

Best Practices

Documentation References

2. Production Alerts Report Issue

Key Points

Detailed Notes

Overview

AlertOnError Setting

AlertGracePeriod

AlertRetryGracePeriod

Alert Operations

AuditAlertOperations

Configuration Best Practices

Documentation References

3. Production Monitor Report Issue

Key Points

Detailed Notes

Overview

Component Status Indicators

Queue Depths

Error Counts

Last Activity Timestamp

Identifying Bottlenecks

Practical Troubleshooting with Production Monitor

Documentation References

4. Business Rule Log Report Issue

Key Points

Detailed Notes

Overview

Accessing the Business Rule Log

What the Business Rule Log Shows

Understanding Rule Evaluation

Common Troubleshooting Scenarios

Relationship to Other Troubleshooting Tools

Documentation References

5. Message Viewer with Filtering Report Issue

Key Points

Detailed Notes

Overview

Basic Filtering

Header-Based Filtering

Advanced Filtering

Working with Search Results

Filtering Strategy for Common Scenarios

Documentation References

6. Suspended Message Management Report Issue

Key Points

Detailed Notes

Overview

Why Messages Get Suspended

Finding Suspended Messages

Using the Resend Editor

Best Practices for Suspended Message Management

Impact of Unresolved Suspended Messages

Documentation References

7. Queue Manager Report Issue

Key Points

Detailed Notes

Overview

Accessing the Queue Manager

Diagnosing Queue Issues

PoolSize Configuration

QueueCountAlert Setting

Queue Manager Actions

Documentation References

8. Message and Event Log Purging Report Issue

Key Points

Detailed Notes

Overview

Message Purging

Event Log Purging

Configuring Purge Tasks

1. Production Event Log

2. Production Alerts

3. Production Monitor

4. Business Rule Log

5. Message Viewer with Filtering

6. Suspended Message Management

7. Queue Manager

8. Message and Event Log Purging

9. I/O Log

10. System Index Dashboard

11. Event Log vs System Log