1. Production Event Log
Key Points
- Access path: Management Portal > Interoperability > View > Event Log
- Event types: Error, Warning, Info, Alert, Assert, Trace — in order of severity
- Quick search: Search by text string, component name, or time range
- Advanced filtering: Complex queries combining multiple criteria
- Event details: Expand entries to see full error text, stack traces, and context
Detailed Notes
Overview
The Production Event Log is the primary tool for identifying application-level issues in a UCR production. Every business host (service, process, operation) can write entries to the Event Log when noteworthy events occur during processing. Errors, warnings, alerts, and informational messages are all captured here, making it the first place to look when troubleshooting any production issue.
The Event Log is distinct from the system-level log (cconsole.log); it captures events generated by the production's business hosts rather than system-level events from the IRIS for Health platform itself.
Accessing the Event Log
Navigate to the Event Log from the Management Portal:
- Select the appropriate namespace for the UCR component you are investigating
- Go to Interoperability > View > Event Log
- The page displays a filterable list of events with the most recent first
Event Types
Events are categorized by type, from most to least severe:
- Error: Processing failures that require investigation. The business host encountered a condition it could not handle
- Warning: Potential issues that did not cause processing failure but may indicate a developing problem
- Info: Informational messages about normal processing milestones (component started, configuration loaded)
- Alert: Events generated by the AlertOnError mechanism, indicating a component has raised an alert condition
- Assert: Debug-level assertions typically used during development
- Trace: Detailed diagnostic output for low-level debugging (only generated when tracing is enabled)
Quick Search
The Event Log provides a quick search bar for rapid filtering:
- Enter a text string to search across all event text
- Filter by specific component name to see only events from one business host
- Set a time range to focus on events during a specific period
- Combine text search with component and time filters for targeted results
Advanced Filtering
For more complex queries, the Advanced Filter provides:
- Multiple criteria combined with AND/OR logic
- Filter by event type (show only Errors, for example)
- Filter by specific error codes or message patterns
- Save frequently used filters for reuse
Viewing Event Details
Each Event Log entry can be expanded to show:
- Full error text: The complete error message, which may be truncated in the list view
- Stack trace: For errors, the call stack showing where in the code the error occurred
- Component context: The business host name, configuration name, and production
- Timestamps: Exact time the event occurred
- Related message: Link to the message being processed when the event occurred (if applicable)
Best Practices
- Check the Event Log regularly, not just when issues are reported
- Set up alert operations to proactively notify administrators of errors
- Review Warning events periodically, as they often indicate issues that will become Errors
- Use the Event Log in conjunction with Visual Trace for complete diagnosis
---
Documentation References
2. Production Alerts
Key Points
- AlertOnError: Business host setting that triggers an alert when errors occur
- AlertGracePeriod: Minimum seconds between repeated alerts from the same component
- AlertRetryGracePeriod: Minimum seconds between alerts for retry-related failures
- Alert operations: Business operations that route alerts to email, pager, or monitoring systems
- AuditAlertOperations: Specialized alerts for audit-related events
Detailed Notes
Overview
Production alerts provide proactive notification of issues in UCR productions. Rather than relying on administrators to periodically check the Event Log, alerts push notifications when problems occur. Proper alert configuration is essential for maintaining the health of a UCR federation, as issues in one component can cascade and affect the entire data flow.
Alerts are generated by business hosts when configured to do so, and are routed through alert operations to their final destination (email, monitoring system, etc.).
AlertOnError Setting
The AlertOnError setting is configured on individual business hosts:
- When enabled, the business host generates an alert message whenever it logs an error to the Event Log
- The alert message contains the error details, component name, and timestamp
- Alert messages are sent to configured alert operations for routing and delivery
- This setting should be enabled on critical components in the production
AlertGracePeriod
The AlertGracePeriod prevents alert flooding:
- Specifies the minimum number of seconds that must pass between consecutive alerts from the same component
- If a component generates repeated errors within the grace period, only the first alert is sent
- Subsequent errors are still logged in the Event Log but do not generate additional alerts
- Setting an appropriate grace period prevents overwhelming administrators with duplicate alerts
- Default value should be tuned based on the component's expected error frequency
AlertRetryGracePeriod
The AlertRetryGracePeriod specifically controls alerts during retry scenarios:
- When a business operation is retrying a failed request (e.g., network timeout to the Hub), each retry failure would normally generate an alert
- AlertRetryGracePeriod suppresses repeated alerts during the retry cycle
- This prevents alert storms during transient connectivity issues
- The period should be set longer than the expected retry cycle duration
Alert Operations
Alert operations are business operations that deliver alert notifications:
- Email alerts: Send alert details via email to administrators or distribution lists
- Monitoring system integration: Forward alerts to enterprise monitoring platforms
- Custom alert routing: Route alerts based on severity, component, or error type
- Multiple alert operations can be configured for different notification channels
AuditAlertOperations
AuditAlertOperations are specialized alert operations for audit-related events:
- Monitor audit event processing failures
- Alert when the Audit Edge is unreachable or audit events are not being recorded
- Critical for regulatory compliance, as missed audit events can be a compliance violation
Configuration Best Practices
- Enable AlertOnError on all critical business hosts in the production
- Set AlertGracePeriod to avoid alert fatigue (60-300 seconds is typical)
- Configure at least one email-based alert operation for immediate notification
- Test alert configuration regularly to ensure alerts are being delivered
- Document the alert escalation path for different types of alerts
---
Documentation References
3. Production Monitor
Key Points
- Component status: Running (green), Stopped (red), Error (red with indicator), Disabled (gray)
- Queue depths: Number of messages waiting in each component's queue
- Error counts: Number of errors logged by each component since last restart
- Last activity: Timestamp of the most recent message processed by each component
- Bottleneck detection: High queue counts indicate a component is falling behind
Detailed Notes
Overview
The Production Monitor provides a real-time dashboard view of the health and status of all business hosts in a UCR production. It displays each component's operational status, queue depth, error count, and last activity timestamp at a glance. This makes it the ideal tool for quickly assessing the overall health of a production and identifying components that need attention.
The Production Monitor is accessed from the Management Portal under Interoperability > Monitor.
Component Status Indicators
Each business host displays a status indicator:
- Running (green): The component is active and processing messages normally
- Stopped (red): The component has been stopped and is not processing messages
- Error (red with error indicator): The component has encountered an error condition
- Disabled (gray): The component is disabled in the production configuration and will not start
Queue Depths
Queue depth is one of the most important metrics in the Production Monitor:
- Each business process and operation has an input queue where messages wait to be processed
- The queue depth shows how many messages are currently waiting
- Normal: Queue depth stays low and fluctuates as messages are processed
- Warning sign: Queue depth is consistently growing, indicating the component cannot keep up with incoming messages
- Stuck queue: Queue depth grows but the component shows no recent activity, indicating the component may be hung
Error Counts
The error count shows the number of errors logged by each component:
- Counts are typically reset when the production or component is restarted
- A high error count indicates a persistent or recurring problem
- Compare error counts across components to identify which components are most affected
- Click on the error count to navigate to the Event Log filtered for that component
Last Activity Timestamp
The last activity timestamp shows when the component last processed a message:
- A recent timestamp indicates the component is actively processing
- A stale timestamp (hours or days old) may indicate the component is idle or stuck
- Compare with expected activity levels; a normally busy component with a stale timestamp warrants investigation
Identifying Bottlenecks
The Production Monitor helps identify processing bottlenecks:
- Growing queue + recent activity: Component is processing but not fast enough for the workload. Consider increasing PoolSize
- Growing queue + stale activity: Component may be stuck or waiting for an external resource. Check for hung connections or deadlocks
- Multiple components with growing queues: May indicate a downstream component failure causing backpressure
- Zero queue + frequent errors: Component is processing quickly but failing repeatedly
Practical Troubleshooting with Production Monitor
1. Open the Production Monitor for the affected UCR component's namespace 2. Scan for red status indicators (stopped or errored components) 3. Check queue depths for any components with unusually high counts 4. Note error counts and click through to the Event Log for details 5. Verify last activity timestamps match expected activity levels 6. For growing queues, check the downstream components for issues
---
Documentation References
4. Business Rule Log
Key Points
- Access path: Management Portal > Interoperability > View > Business Rule Log
- Rule execution: Shows which business rules were evaluated and their results
- Condition matching: Displays which conditions in each rule matched or did not match
- Action taken: Records the action performed (route, transform, discard, etc.)
- Troubleshooting routing: Explains why a message was or was not sent to a specific target
Detailed Notes
Overview
The Business Rule Log records the execution of business rules within UCR productions. Business rules control routing, transformation selection, and processing decisions. When a message is not routed as expected or a transformation is not applied, the Business Rule Log shows exactly which rules were evaluated, which conditions matched, and what actions were taken.
This log is particularly useful for diagnosing issues with custom routing logic and understanding why messages take specific paths through the production.
Accessing the Business Rule Log
Navigate to the Business Rule Log from the Management Portal:
- Select the appropriate namespace
- Go to Interoperability > View > Business Rule Log
- The log shows a chronological list of rule executions
What the Business Rule Log Shows
Each entry in the log records:
- Rule name: The business rule that was evaluated
- Session and message: The session and message that triggered the rule evaluation
- Conditions evaluated: Each condition in the rule and whether it matched (true/false)
- Action taken: The action executed as a result of the rule evaluation (send to target, transform, discard, etc.)
- Return value: The overall result of the rule execution
Understanding Rule Evaluation
Business rules in UCR productions follow an if-then-else pattern:
- Rules contain one or more conditions based on message content, source, or other criteria
- When a condition matches, the associated action is executed
- If no conditions match, a default action may be executed
- The rule log shows the complete evaluation path, making it clear why a particular action was taken
Common Troubleshooting Scenarios
- Message not routed to expected target: Check the Business Rule Log to see which conditions were evaluated and why the expected routing condition did not match
- Message routed to wrong target: The log shows which condition matched and triggered the incorrect routing
- Transformation not applied: If a rule should select a specific DTL transformation but did not, the log shows which condition failed
- Message unexpectedly discarded: A rule may have a discard action for certain conditions; the log confirms this
Relationship to Other Troubleshooting Tools
- Use the Message Viewer to find the specific message, then check the Business Rule Log for routing decisions about that message
- Use the Visual Trace to see the overall message flow, then the Business Rule Log for detailed routing explanations
- Use the Event Log for processing errors, and the Business Rule Log for logic/routing questions
---
Documentation References
5. Message Viewer with Filtering
Key Points
- Basic filters: Source, Target, Status, Time Range
- Header filters: SessionId, MessageId, CorrelationId for precise searches
- Advanced criteria: Combine multiple filters with AND logic for complex searches
- Content viewing: Examine message headers and body content from search results
- Trace access: Jump directly to Visual Trace from any message in the results
Detailed Notes
Overview
The Message Viewer is the central tool for finding and examining messages in a UCR production. It provides flexible filtering capabilities to narrow down potentially millions of messages to the specific ones relevant to a troubleshooting investigation. Effective use of the Message Viewer's filtering capabilities is essential for efficient troubleshooting.
The Message Viewer is accessed from the Management Portal under Interoperability > View > Messages.
Basic Filtering
The Message Viewer provides these basic filters on the main page:
- Source: Filter by the business host that sent the message
- Target: Filter by the business host that received the message
- Status: Filter by message status (Completed, Error, Suspended, Queued, Discarded)
- Time Range: Specify start and end times to limit results to a specific period
- Sort order: Results can be sorted by time (newest first or oldest first)
Header-Based Filtering
For precise searches, filter by message header fields:
- SessionId: Find all messages belonging to a specific session (all messages from one inbound event)
- MessageId: Find one specific message by its unique identifier
- CorrelationId: Find the response to a specific request message
- Combining header filters: Use SessionId with Status to find all errors in a session, for example
Advanced Filtering
The Extended Criteria section provides additional filtering power:
- Combine multiple criteria for targeted searches
- Filter by message body content (search within the payload)
- Filter by specific business host class names
- Use date ranges with precision to narrow results
Working with Search Results
Once messages are found:
- View headers: Click on a message to see all header fields (Source, Target, SessionId, MessageId, Status, timestamps)
- View body: Examine the message body content in the appropriate format (SDA, SOAP, HL7, etc.)
- Visual Trace: Click the trace icon to see the message flow diagram
- Session view: Click the SessionId to see all messages in the same session
- Resend: For suspended messages, access the Resend Editor directly from the results
Filtering Strategy for Common Scenarios
| Scenario | Recommended Filter | |---|---| | Find errors in the last hour | Status = Error, Time Range = last 1 hour | | Find all messages for a patient | Source or body content containing patient MRN | | Trace one inbound event | SessionId of the inbound message | | Find failed Hub communications | Target = Hub operation, Status = Error | | Check queue backlog | Status = Queued, Target = specific component |
---
Documentation References
6. Suspended Message Management
Key Points
- Finding suspended messages: Filter Message Viewer by Status = Suspended
- Common causes: Retry count exceeded, manual suspension, business rule suspension
- Resend Editor: Tool for reviewing and resubmitting suspended messages
- Resend options: Send to original target, redirect to different target, edit before resending
- Fix first: Always resolve the underlying issue before resubmitting
Detailed Notes
Overview
Suspended messages are messages whose processing has been halted and which await manual intervention. They represent a critical operational concern because they indicate data that has not been fully processed. In a UCR context, suspended messages may mean patient data that was not stored, not forwarded to the Hub, or not delivered to a requesting Access Gateway.
Understanding how to find, evaluate, and manage suspended messages is essential for maintaining data completeness in a UCR federation.
Why Messages Get Suspended
Messages can be suspended for several reasons:
- Retry exhaustion: A business operation attempted to deliver a message but failed repeatedly, exceeding its configured retry count. After the last retry fails, the message is suspended rather than discarded
- Manual suspension: An administrator manually suspended a message for review
- Business rule action: A business rule explicitly suspended the message based on its content or routing conditions
- Queue issues: In rare cases, system issues may cause messages to be suspended
Finding Suspended Messages
To find suspended messages:
- Open the Message Viewer for the relevant namespace
- Filter by Status = Suspended
- Optionally narrow by time range, source, or target component
- Review the results to understand the scope of the issue (one message or many)
Using the Resend Editor
The Resend Editor provides options for handling suspended messages:
- View message content: Examine the full message headers and body before deciding on action
- Send to original target: Resubmit the message to the component that originally failed to process it. Use this when the underlying issue has been fixed
- Send to new target: Redirect the message to a different component. Use this when the original target is permanently unavailable or when rerouting is needed
- Edit before resending: Modify the message content before resubmitting. Use this when the message itself contains errors that caused the suspension
- Discard: Mark the message as discarded if it should not be reprocessed
Best Practices for Suspended Message Management
1. Investigate first: Before resubmitting, determine why the message was suspended 2. Fix the root cause: Resolve the underlying issue (connectivity, configuration, data format) before resubmitting 3. Test with one message: If multiple messages are suspended, resubmit one first and verify it processes successfully before resubmitting the rest 4. Monitor after resubmission: Watch the Event Log and Production Monitor after resubmitting to confirm successful processing 5. Document the resolution: Record what caused the suspension and how it was resolved for future reference
Impact of Unresolved Suspended Messages
Leaving suspended messages unresolved can cause:
- Missing patient data in the Clinical Viewer
- Incomplete patient records in the ECR
- Failed MPI registrations at the Hub
- Audit gaps if audit messages are suspended
- Growing database size as suspended messages accumulate
---
Documentation References
7. Queue Manager
Key Points
- Queue inspection: View messages currently in any component's queue
- Queue diagnostics: Identify stuck messages or growing queues
- PoolSize: Number of concurrent processing threads for a component; increase to improve throughput
- QueueCountAlert: Threshold that triggers an alert when queue depth exceeds the configured value
- Stuck queue indicators: Growing depth with no activity, or single message that never processes
Detailed Notes
Overview
The Queue Manager provides visibility into the internal message queues of business processes and operations. Every business process and operation has an input queue where messages wait to be processed. The Queue Manager allows administrators to inspect queue contents, diagnose processing delays, and identify stuck or problematic messages.
Queue-related issues are among the most common operational problems in UCR productions, as they directly affect data processing throughput and timeliness.
Accessing the Queue Manager
The Queue Manager is accessed from the Management Portal:
- Navigate to Interoperability > Monitor > Queue Manager (or view queues from the Production Monitor)
- Select the component whose queue you want to inspect
- View the messages currently in the queue with their details
Diagnosing Queue Issues
Common queue-related issues and their indicators:
- Growing queue depth: The component is receiving messages faster than it can process them. The queue grows steadily over time
- Stuck queue: The queue depth remains constant and the component shows no processing activity. A single message at the head of the queue may be causing the blockage
- Queue spikes: Sudden increases in queue depth, typically caused by a burst of inbound messages or a temporary processing slowdown
- Empty queue with errors: Messages are being processed immediately but failing, indicating a processing issue rather than a queue issue
PoolSize Configuration
PoolSize controls how many messages a component can process concurrently:
- Each business process and operation has a configurable PoolSize (default is typically 1)
- Increasing PoolSize allows the component to process multiple messages in parallel
- Higher PoolSize can resolve growing queue issues caused by high message volume
- However, some components must maintain message order (PoolSize = 1) to avoid data consistency issues
- Increasing PoolSize too high can consume excessive system resources
QueueCountAlert Setting
QueueCountAlert provides proactive notification of queue buildup:
- Configured on individual business hosts
- When the queue depth exceeds the configured threshold, an alert is generated
- Helps identify processing delays before they become critical
- Set the threshold based on expected queue depth during normal operations
- Combine with AlertOnError for comprehensive monitoring
Queue Manager Actions
From the Queue Manager, administrators can:
- View all messages in a specific queue
- Examine individual message headers and content
- Identify the message at the head of the queue (the next to be processed)
- Correlate queue issues with Event Log errors for the same component
---
Documentation References
8. Message and Event Log Purging
Key Points
- Purpose: Remove old messages and events to manage database size and performance
- Purge tasks: Scheduled tasks that automatically delete messages older than a specified retention period
- Message purging: Removes message headers, bodies, and search table entries
- Event Log purging: Removes old Event Log entries
- Retention balance: Longer retention enables better troubleshooting; shorter retention saves storage
Detailed Notes
Overview
UCR productions generate large volumes of messages and Event Log entries over time. Without regular purging, the database grows continuously, potentially impacting performance and consuming excessive storage. Purge tasks are scheduled operations that delete old messages and events based on configurable retention policies.
However, purging must be carefully configured because once messages and events are purged, they are no longer available for troubleshooting or auditing. Finding the right balance between retention and storage management is an important operational decision.
Message Purging
Message purging removes old messages from the production database:
- What is purged: Message headers, message bodies, search table entries, and associated metadata
- Retention period: Configured as a number of days; messages older than this period are eligible for purging
- Purge scheduling: Typically scheduled to run daily during off-peak hours
- Selective purging: Can be configured to purge only certain message types or statuses
Event Log Purging
Event Log purging removes old Event Log entries:
- What is purged: Event Log entries including error text, stack traces, and component information
- Retention period: Separately configurable from message purging; may use a different retention period
- Considerations: Event Log entries are typically smaller than messages but can accumulate rapidly in high-volume productions
Configuring Purge Tasks
Purge tasks are configured through the Management Portal:
- Navigate to the Task Manager to schedule purge tasks
- Configure retention period (number of days to keep)
- Set the schedule (daily, weekly) and time of execution
- Configure which message types and statuses to include in purging
Retention Period Considerations
Choosing the right retention period involves trade-offs:
- Longer retention (30-90+ days):
- Enables investigation of historical issues
- Supports compliance and audit requirements
- Requires more database storage and may impact performance
- Shorter retention (7-14 days):
- Reduces storage requirements and improves database performance
- Limits historical troubleshooting capability
- May not meet regulatory retention requirements
Impact on Troubleshooting
Purging directly affects troubleshooting capability:
- Purged messages cannot be viewed, traced, or resubmitted
- Purged Event Log entries cannot be reviewed for historical error patterns
- If an issue is reported after the relevant data has been purged, diagnosis becomes much harder
- Consider archiving purged data to external storage for long-term retention if needed
Best Practices
- Set retention periods based on your organization's operational and regulatory requirements
- Schedule purge tasks during off-peak hours to minimize performance impact
- Monitor database size trends to verify purging is keeping pace with growth
- Do not purge suspended messages until they have been reviewed and resolved
- Coordinate purge schedules across all UCR components for consistent behavior
---
Documentation References
9. I/O Log
Key Points
- ArchiveIO setting: Enable on business hosts to capture raw I/O data
- Raw data capture: Records the exact bytes sent and received by a component
- External communication: Most useful for diagnosing issues with connections to external systems
- Performance impact: Enabling I/O logging adds overhead; use only for targeted debugging
- Visual Trace comparison: I/O Log shows raw data; Visual Trace shows message-level flow
Detailed Notes
Overview
The I/O Log captures the raw data exchanged between UCR business hosts and external systems at the lowest communication level. While Visual Trace shows message-level flow through the production, the I/O Log shows the actual bytes sent and received over network connections. This is invaluable when diagnosing communication problems where the issue is at the transport or protocol level rather than the message content level.
The I/O Log is enabled per-component using the ArchiveIO setting and should be used selectively due to its performance and storage impact.
Enabling the I/O Log
To enable I/O logging:
- Open the Production Configuration page
- Select the business host you want to monitor
- Set the ArchiveIO setting to true (or 1)
- The component will begin capturing raw I/O data for all communications
- Remember to disable ArchiveIO after the debugging session to avoid performance impact
What the I/O Log Captures
The I/O Log records:
- The raw data sent by the business host to an external system (outbound request)
- The raw data received from the external system (inbound response)
- Timestamps for each I/O operation
- The connection endpoint (host, port)
When to Use the I/O Log
The I/O Log is most useful in these scenarios:
- Connection failures: When a business operation cannot connect to an external system, the I/O Log shows what was attempted
- Protocol issues: When the raw data exchanged does not conform to the expected protocol (e.g., malformed SOAP, incorrect HL7 framing)
- Encoding problems: When character encoding issues corrupt data in transit
- External system debugging: When the external system claims it did not receive the expected data
- SSL/TLS handshake failures: The I/O Log can capture the point where the TLS handshake fails
I/O Log vs Visual Trace
| Aspect | I/O Log | Visual Trace | |---|---|---| | Level of detail | Raw bytes | Message-level | | Scope | Single component's external I/O | Entire message flow through production | | When to use | Transport/protocol issues | Message routing/processing issues | | Performance impact | Significant | Minimal | | Storage impact | High (raw data volume) | Low (message metadata) |
Best Practices
- Only enable ArchiveIO on the specific component being debugged
- Disable ArchiveIO as soon as the debugging session is complete
- Use I/O Log as a last resort after Visual Trace and Event Log have been checked
- Be aware of storage implications; raw I/O data can be voluminous
- Coordinate with the external system team when debugging bilateral communication issues
---
Documentation References
10. System Index Dashboard
Key Points
- Federation view: Monitors production status across all UCR components in the federation
- Component health: Shows status of Hub, Edge Gateways, Access Gateways at a glance
- Event Log integration: View Event Log entries across the federation from a single dashboard
- Reconciliation Service: Checks data consistency between components
- Centralized monitoring: Single point for assessing overall federation health
Detailed Notes
Overview
The System Index Dashboard provides a federation-wide view of the health and status of all UCR components. While the Production Monitor shows the status of a single production, the System Index aggregates status information from the Hub, all Edge Gateways, all Access Gateways, and any other components in the federation. This makes it the starting point for assessing the overall health of a UCR deployment.
The System Index Dashboard is typically accessed from the Hub's Management Portal, as the Hub has visibility into all registered components.
Accessing the System Index
The System Index is accessed from the Management Portal:
- Navigate to the Hub namespace
- Go to HealthShare > System Index (or similar navigation path depending on version)
- The dashboard displays all registered UCR components and their current status
What the Dashboard Shows
The System Index Dashboard displays:
- Component list: All registered UCR components (Hub, Edge Gateways, Access Gateways, Bus, ODS)
- Production status: Whether each component's production is running, stopped, or in error
- Connectivity status: Whether the Hub can communicate with each component
- Last check-in time: When each component last reported its status
- Error indicators: Visual indicators for components experiencing issues
Event Log Integration
The System Index provides access to Event Log data across the federation:
- View recent events from all components in a single consolidated view
- Filter events by component, severity, or time range
- Quickly identify which components are generating errors
- Drill down into specific components for detailed Event Log viewing
Reconciliation Service
The Reconciliation Service is a tool within the System Index for checking data consistency:
- Compares patient identities across the MPI and Edge Gateways
- Identifies discrepancies in document registrations
- Checks that data registered at the Hub matches data stored at Edge Gateways
- Helps identify synchronization issues between components
Using System Index for Troubleshooting
The System Index is valuable for:
- Initial assessment: When an issue is reported, the System Index shows which components are healthy and which are not
- Connectivity verification: Confirm that all components can communicate with each other
- Federation-wide issues: Identify problems affecting multiple components simultaneously
- Proactive monitoring: Regular review of the System Index can catch issues before they impact users
---
Documentation References
11. Event Log vs System Log
Key Points
- Production Event Log: Application-level events from business hosts (errors, warnings, info)
- cconsole.log: System-level events from the IRIS for Health platform (license, database, startup)
- When to check Event Log: Production processing errors, message failures, component issues
- When to check cconsole.log: License errors, database issues, system crashes, startup failures
- Location: cconsole.log is a file on the server; Event Log is in the Management Portal
Detailed Notes
Overview
UCR troubleshooting requires understanding the distinction between two different logging systems: the Production Event Log (application-level) and the system console log, cconsole.log (system-level). Each captures different types of events, and knowing which log to check for a given type of issue is essential for efficient diagnosis.
A common mistake is to search the Production Event Log for system-level issues (license errors, database problems) that would only appear in cconsole.log, or vice versa.
Production Event Log
The Production Event Log captures application-level events:
- Source: Generated by business hosts (services, processes, operations) during message processing
- Access: Management Portal > Interoperability > View > Event Log
- Types of events:
- Message processing errors (transformation failures, connection timeouts, data validation errors)
- Component lifecycle events (started, stopped, configuration changes)
- Alert events (triggered by AlertOnError settings)
- Business rule evaluation results
- Custom logging from custom code
- Scope: Limited to the current namespace's production
cconsole.log (System Console Log)
The cconsole.log captures system-level events:
- Source: Generated by the IRIS for Health platform itself, not by production business hosts
- Access: A text file on the server's file system (not in the Management Portal)
- Location: Typically in the IRIS for Health installation directory (e.g., `
/mgr/cconsole.log`) - Types of events:
- License errors: License key expired, license limit reached, license key invalid
- Database errors: Database corruption, journal issues, storage failures
- System startup/shutdown: Instance start, stop, and restart events
- Memory issues: Out-of-memory conditions, large memory allocations
- Network issues: TCP listener failures, port conflicts
- Security events: Failed login attempts, access control violations
- Process errors: System process crashes, abnormal terminations
- Mirror/failover events: Mirror connection status, failover events
When to Check Which Log
| Symptom | Check First | Why | |---|---|---| | Message processing error | Event Log | Business host logged the error | | Component not starting | Both | Could be application or system issue | | Production won't start | cconsole.log | May be a license or system issue | | Intermittent connection failures | Event Log then cconsole.log | Could be application timeout or network issue | | System unresponsive | cconsole.log | Likely a system-level resource issue | | Login failures | cconsole.log | Security events logged at system level | | Database errors | cconsole.log | Database issues are system-level | | Missing patient data | Event Log | Processing issue in production | | License warnings | cconsole.log | License events are system-level |
Practical Approach
For most UCR troubleshooting scenarios, start with the Production Event Log: 1. If the Event Log shows clear application-level errors, investigate and resolve 2. If the Event Log is silent but the system is misbehaving, check cconsole.log 3. If the system is unresponsive or the Management Portal is inaccessible, go directly to cconsole.log on the server 4. For license-related issues, always check cconsole.log 5. After system restarts, check cconsole.log to understand why the restart occurred
Accessing cconsole.log
Since cconsole.log is a file on the server:
- Access requires server-level login (SSH, console, or terminal)
- Use text tools (less, tail, grep) to search the file
- `tail -f cconsole.log` can be used to monitor the log in real time
- The file can grow large; use date-based searching to find relevant entries
- Log rotation may archive older entries into separate files
---
Documentation References
Exam Preparation Summary
Critical Concepts to Master:
- Event Log: First troubleshooting stop; shows application-level errors, warnings, and info from business hosts
- Alerts: AlertOnError enables proactive notification; AlertGracePeriod prevents alert flooding
- Production Monitor: Quick assessment of component status, queue depths, error counts, and activity
- Business Rule Log: Explains routing decisions — which rules fired, which conditions matched, which actions taken
- Message Viewer filtering: Filter by Source, Target, Status, Time Range, SessionId, MessageId for targeted searches
- Suspended messages: Use Resend Editor to review and resubmit; always fix root cause first
- Queue Manager: Diagnose growing queues by checking PoolSize, component status, and downstream components
- Purging: Balance retention period against storage; purged data cannot be recovered for troubleshooting
- I/O Log: Enable ArchiveIO for raw communication data; use for transport-level issues, not message-level
- System Index: Federation-wide monitoring dashboard; Reconciliation Service checks data consistency
- Event Log vs cconsole.log: Event Log for production processing issues; cconsole.log for system-level issues (license, database, startup)
Common Exam Scenarios:
- Choosing the correct troubleshooting tool for a given symptom (Event Log for errors, Visual Trace for message flow, Production Monitor for status)
- Identifying why messages are accumulating in a queue (component stopped, PoolSize too low, downstream failure)
- Determining the appropriate action for suspended messages (fix root cause, then use Resend Editor)
- Configuring alerts to avoid alert fatigue while ensuring critical errors are notified
- Deciding whether to check the Event Log or cconsole.log for a given type of issue
- Understanding the impact of purging on troubleshooting capability
- Using the System Index to assess federation-wide health before drilling into individual components
- Deciding when to enable I/O logging versus using Visual Trace
- Interpreting Business Rule Log entries to understand routing decisions
- Setting appropriate QueueCountAlert thresholds based on normal queue behavior
Hands-On Practice Recommendations:
- Navigate to the Event Log and practice searching with different filter combinations (time range, severity, component)
- Open the Production Monitor and identify the status, queue depth, and error count for each component
- Find a suspended message in the Message Viewer and practice using the Resend Editor
- Review the Business Rule Log for a production with business rules and trace a routing decision
- Enable ArchiveIO on a test component and examine the I/O Log output
- Access the System Index Dashboard and review the federation-wide status
- Locate cconsole.log on the server and practice searching for specific event types
- Configure AlertOnError and AlertGracePeriod on a test component and verify alert generation
- Practice configuring and running a message purge task with a short retention period in a test environment
- Use the Queue Manager to inspect messages in a component's queue during normal processing