T4.4: Monitors and Manages System Size

Knowledge Review - InterSystems Enterprise Master Patient Index Technical Specialist

1. Database Tools for System Growth Monitoring

Key Points

  • Database Size Monitoring: Track database growth over time to project capacity needs
  • Table Growth Analysis: Identify tables experiencing rapid growth (audit log, linkage pairs)
  • Growth Rate Calculation: Calculate monthly growth rate to forecast future capacity requirements
  • Capacity Planning: Use growth trends to plan hardware upgrades and storage expansion

Detailed Notes

Overview

InterSystems EMPI databases grow continuously as patient records are added, linkages are created, and audit logs accumulate activity. Unmanaged growth can lead to performance degradation, storage exhaustion, and system instability. Effective database size monitoring tracks growth trends, identifies rapidly growing tables, and informs capacity planning decisions.

Database tools provide the metrics needed to understand current size, growth rate, and projected future needs. Proactive monitoring ensures that capacity is expanded before critical thresholds are reached.

Database Size Monitoring Tools

InterSystems IRIS provides several tools for monitoring database size:

System Management Portal:

  • Navigate to System Explorer > Databases to view size of each database
  • Displays current size, maximum size, and percentage used
  • Shows expansion history and current growth rate

SQL Queries: Use SQL to query database and table sizes:

```sql -- Database size SELECT Name, Size, MaxSize, ExpansionSize FROM %SYS_Database.GetDatabases()

-- Table sizes SELECT TABLE_NAME, DATA_PAGES, INDEX_PAGES, (DATA_PAGES + INDEX_PAGES) AS TOTAL_PAGES FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'HSPI' ORDER BY TOTAL_PAGES DESC ```

pButtons Output: pButtons diagnostic tool captures comprehensive system metrics including database sizes, table sizes, and growth history. Use pButtons for detailed analysis and troubleshooting.

Key Tables to Monitor

Rapidly Growing Tables:

  • HSPI.Audit.Log: Audit log grows with every user action and system event
  • HSPI.Data.LinkagePairs: Linkage pairs table grows with patient volume and linkage complexity
  • HSPI.Data.Patient: Patient table grows with patient volume
  • HS.Registry.Patient: Registry patient table (if collocated with EMPI)
  • Local.Linkage.Definition.Normalized: Normalized demographic table

Audit Log as Primary Growth Driver: The Audit Log typically accounts for 30-50% of total database growth in mature EMPI systems. Every manual link/unlink action, Worklist assignment, status change, and comment generates an audit log entry. Without periodic purging, the audit log can consume excessive storage.

Growth Rate Calculation

Calculate monthly growth rate to project future capacity needs:

1. Record database size at beginning of month: `Size1` 2. Record database size at end of month: `Size2` 3. Calculate absolute growth: `Growth = Size2 - Size1` 4. Calculate growth rate: `Rate = (Growth / Size1) * 100` 5. Project future size: `FutureSize = CurrentSize * (1 + Rate/100)^Months`

Example:

  • Database size on Feb 1: 100 GB
  • Database size on Mar 1: 105 GB
  • Monthly growth: 5 GB
  • Monthly growth rate: 5%
  • Projected size in 12 months: 100 * (1.05)^12 = 179.6 GB

Capacity Planning

Use growth trends to plan capacity expansion:

Storage Capacity:

  • Monitor available disk space on database storage volumes
  • Plan expansion when database utilization exceeds 70% of available capacity
  • Account for database expansion overhead (InterSystems databases grow in chunks)

Performance Impact:

  • Larger databases require more time for backups and integrity checks
  • Query performance may degrade as tables grow
  • Index maintenance becomes more resource-intensive

Mitigation Strategies:

  • Implement audit log purging to control log growth (covered in section 3)
  • Archive historical patient records no longer needed for active operations
  • Consider database partitioning for very large systems (>1 TB)
  • Plan periodic data retention reviews to identify purgeable data

---

Documentation References

2. Usage Dashboards for System Activity Monitoring

Key Points

  • InterSystems EMPI Linkages Dashboard: Tracks linkage pairs by facility, status, and conflict type
  • InterSystems EMPI Summary Dashboard: Shows record counts, action counts, and MPIID/MRN trends
  • Activity Metrics: Monitor logins, patient imports, and manual actions over time
  • Trend Analysis: Identify usage patterns and forecast resource needs

Detailed Notes

Overview

Usage Dashboards provide visibility into system activity, including patient record imports, linkage creation, user actions, and Worklist processing. These dashboards help administrators understand system utilization, identify peak usage periods, and detect anomalies that may indicate problems or opportunities for optimization.

InterSystems EMPI adds specialized dashboards to the standard HealthShare Usage Dashboards, focusing on linkage-specific metrics and EMPI operations.

Accessing Usage Dashboards

Navigate to HealthShare Usage Dashboards in the Management Portal. From the dashboard selector, choose:

  • InterSystems EMPI Linkages: Detailed linkage metrics
  • InterSystems EMPI Summary: High-level system activity summary

Each dashboard displays a set of charts that can be filtered by time period, facility, or other dimensions.

InterSystems EMPI Linkages Dashboard

This dashboard displays linkage-specific metrics:

Charts:

  • Facilities: Count of records by facility, showing which facilities contribute the most data
  • Linkage Conflicts: Breakdown of Worklist items by conflict category (Overlay, Overlap, Open-Chaining, Deterministic, etc.)
  • Linkage Rules: Distribution of linkages by linkage rule (Deterministic, Probabilistic, Manual, etc.)
  • Link Status: Count of record pairs by link status (Linked, Non-Link, Potential Link, etc.)
  • Link Reason: Breakdown of linkages by link reason (Same MRN, Name Match, DOB Match, etc.)

Use Cases:

  • Facility Contribution Analysis: Identify facilities with high record volumes vs. low contribution
  • Worklist Prioritization: See which conflict categories have the most entries requiring review
  • Linkage Rule Performance: Assess which linkage rules are creating the most links
  • Link Quality: Monitor link status distribution to ensure appropriate linking behavior

InterSystems EMPI Summary Dashboard

This dashboard provides high-level system activity metrics:

Charts:

  • Record Counts: Total patient records in the system over time
  • Action Counts: Manual actions performed by users (link, unlink, status changes)
  • Unique MRNs and MPIIDs, Cumulative: Growth of unique patient identifiers over time
  • Unique MRN and MPIID Change, by Time Period: Rate of new MRN and MPIID creation
  • Actions Performed on Records, Cumulative: Total actions over time, showing user activity trends

Use Cases:

  • Growth Monitoring: Track patient record accumulation rate
  • User Activity: Monitor manual action volume to assess Worklist workload
  • Unique Patient Count: Understand true patient count (unique MPIIDs) vs. record count
  • System Adoption: Measure system usage growth over time

Activity Metrics and Trend Analysis

Use Usage Dashboards to identify trends:

Peak Usage Periods:

  • Identify time periods with highest activity (end of month, fiscal year-end)
  • Plan maintenance windows during low-activity periods
  • Allocate staff resources based on expected workload patterns

Anomaly Detection:

  • Sudden spikes in record imports may indicate batch load or data migration
  • Drops in activity may indicate facility outages or interface failures
  • Unusual Worklist growth may signal linkage definition issues

Capacity Forecasting:

  • Project future system load based on historical activity trends
  • Use record import rate to forecast database growth
  • Plan user training and staffing based on manual action volume trends

Dashboard Export and Reporting

Export dashboard data for external analysis:

  • Use dashboard export features to save data to Excel or CSV
  • Create custom reports combining Usage Dashboard data with other metrics
  • Distribute dashboard snapshots to stakeholders during governance meetings
  • Archive dashboard data for historical trend analysis

---

Documentation References

3. Managing Audit Log Size

Key Points

  • Audit Log Growth: Major driver of database growth; every action generates audit entry
  • %HSPI_Manager Role: Required role for audit log purge operations (per sample Q18)
  • Purge Methods: Manual purge via Management Portal or automated purge task
  • Retention Policy: Define retention period based on regulatory and operational requirements

Detailed Notes

Overview

The Audit Log in InterSystems EMPI records every action performed on patient records and linkages, including manual links/unlinks, Worklist assignments, status changes, and comments. While essential for compliance and troubleshooting, the Audit Log can grow to consume significant storage if not managed. According to sample exam question Q18, purging the Audit Log requires the %HSPI_Manager role.

Managing Audit Log size requires establishing a retention policy, implementing periodic purging, and monitoring log growth to ensure compliance with storage budgets.

Audit Log Contents

The Audit Log captures:

  • Manual Actions: Link, unlink, merge MPIID, reset actions performed by users
  • Worklist Actions: Assignments, status changes, comments
  • System Actions: Automatic linkage events, batch operations
  • User Identity: Who performed each action
  • Timestamps: When each action occurred
  • Record Details: Which records or record pairs were affected
  • Comments: User-entered comments explaining actions

Every entry in the Audit Log is immutable and retained indefinitely unless purged.

Audit Log Growth Patterns

Growth Rate:

  • Audit log growth correlates with user activity volume
  • Systems with high Worklist volumes generate more audit entries
  • Manual linking/unlinking generates the most log entries
  • Typical growth: 100-500 MB per month for medium-volume systems

Impact on Database Size:

  • Audit log can represent 30-50% of total database size in mature systems
  • Unchecked growth leads to storage exhaustion
  • Large audit logs slow backup and restore operations

Retention Policy

Define an audit log retention policy based on:

Regulatory Requirements:

  • HIPAA and other regulations may require retention of audit logs for 6-7 years
  • Consult legal and compliance teams to determine minimum retention period

Operational Needs:

  • Recent audit data (30-90 days) is frequently accessed for troubleshooting
  • Older audit data is rarely accessed except for investigations

Storage Constraints:

  • Available storage capacity limits how much audit history can be retained
  • Balance retention requirements with storage budget

Recommended Policy:

  • Keep Last 90 Days: Retain 90 days of audit log for active troubleshooting
  • Archive 91 Days to 7 Years: Export older audit data to external storage for compliance
  • Purge >7 Years: Delete audit entries older than 7 years

Manual Audit Log Purge

To manually purge audit log entries:

1. Verify Role: Ensure you have the %HSPI_Manager role (per sample Q18) 2. Navigate to Audit Log: Person Index > Audit Log 3. Specify Date Range: Select start and end dates for purge (e.g., purge entries older than 90 days) 4. Execute Purge: Confirm purge operation 5. Verify Results: Check database size reduction after purge completes

Important: Purge operations are irreversible. Ensure audit data is archived before purging if retention is required.

Automated Audit Log Purge Task

For ongoing audit log management, implement an automated purge task (covered in section 4).

Audit Log Archival

Before purging, archive audit log data for compliance:

Archival Process: 1. Export audit log entries to CSV or database backup 2. Store exported data on external archive storage (tape, cloud storage) 3. Document archive location and retention period 4. Test archive restoration process to ensure recoverability 5. After successful archive, purge entries from production database

Archival Tools:

  • SQL queries to export audit log data to CSV
  • Database backup tools to backup audit tables separately
  • Custom archival scripts for automated export and purge

---

Documentation References

4. Implementing Automated Audit Log Purge Task

Key Points

  • Task Manager: Create scheduled task for automated audit log purging
  • %HSPI_Manager Role: Task must run as user with %HSPI_Manager role (per sample Q18)
  • Purge Frequency: Schedule monthly or quarterly based on growth rate
  • Retention Period: Configure task to purge entries older than defined retention period (e.g., 90 days)

Detailed Notes

Overview

Automated audit log purge tasks ensure that audit log size is controlled without requiring manual intervention. According to sample exam question Q18, purging the audit log requires the %HSPI_Manager role. The automated task must be configured to run as a user with this role.

Implementing an automated purge task is a best practice for production EMPI systems, ensuring consistent audit log management and preventing storage exhaustion.

Creating the Automated Purge Task

To create an automated audit log purge task:

1. Navigate to System Operation > Task Manager > New Task 2. In the Task Scheduler Wizard, configure:

  • Task Name: "Audit Log Purge Task"
  • Description: "Automatically purge audit log entries older than 90 days"
  • Namespace: Your EMPI namespace (e.g., HSPIDATA)
  • Task Type: Select the appropriate audit log purge task class (e.g., HSPI.Util.AuditPurge.Task or custom class)
  • Run task as this user: Select a user with %HSPI_Manager role
  • Task Priority: Select Priority Normal

3. Configure Task Parameters:

  • Retention Days: Number of days of audit log to retain (e.g., 90)
  • Batch Size: Number of records to delete per batch (e.g., 10000)

4. Create Schedule:

  • Frequency: Monthly (e.g., first Sunday of each month)
  • Time: Off-peak hours (e.g., 2:00 AM)
  • Time Zone: Local server time zone

5. Select Finish to create the task

Task Execution and Monitoring

Execution:

  • The task executes automatically according to the configured schedule
  • Purge operations may take several hours for large audit logs
  • Task runs during off-peak hours to minimize production impact

Monitoring:

  • Task Manager: View task execution history and status
  • Event Log: Check for task completion messages or errors
  • Database Size: Monitor database size after purge to verify reduction

Alerts:

  • Configure email alerts for task failures
  • Set up notifications for unexpectedly long purge operations

Purge Task Parameters

Retention Days:

  • Number of days of audit log to retain (e.g., 90, 180, 365)
  • Only entries older than this number of days are purged
  • Adjust based on retention policy requirements

Batch Size:

  • Number of audit log entries deleted per transaction
  • Larger batch sizes speed up purge but consume more resources
  • Typical values: 5000-10000 records per batch

Archive Before Purge:

  • Optional setting to export data before deletion
  • Enables compliance with retention requirements
  • Exports to specified file location or database

Task Scheduling Considerations

Frequency:

  • High Activity Systems: Monthly purge to control growth
  • Medium Activity Systems: Quarterly purge
  • Low Activity Systems: Annual purge

Timing:

  • Schedule during off-peak hours (nights, weekends)
  • Avoid scheduling during backup windows
  • Allow sufficient time for completion before business hours

Resource Impact:

  • Purge operations consume CPU and disk I/O
  • Large purges may impact concurrent operations
  • Monitor system performance during first few executions to assess impact

Verification and Validation

After implementing the automated purge task:

1. Monitor First Execution: Observe first scheduled execution to verify correct operation 2. Verify Database Reduction: Confirm that database size decreases after purge 3. Check Retention: Query audit log to verify that only entries within retention period remain 4. Review Logs: Check Event Log and Task Manager for completion messages or errors 5. Archive Verification: If archiving is enabled, verify that archived data is complete and accessible

Troubleshooting Purge Task Issues

Task Fails with Permission Error:

  • Verify that the task is running as a user with %HSPI_Manager role
  • Check that the user has database write permissions

Purge Completes but Database Size Unchanged:

  • Purge removes records but may not immediately free disk space
  • Run database compaction to reclaim freed space
  • Check that retention period is not too long (e.g., if retention is 365 days but all audit data is <365 days old, nothing is purged)

Task Runs Too Long:

  • Reduce batch size to minimize transaction size
  • Increase retention period to reduce number of records purged
  • Schedule more frequent purges with shorter retention periods

---

Documentation References

5. Mirror Monitor for Mirrored Systems

Key Points

  • Mirror Monitor Page: System Operation > Mirror Monitor (per sample Q19)
  • Mirror Status: Primary, backup, DR async member status and health
  • Synchronization Lag: Monitor replication lag between mirror members
  • Failover Readiness: Verify that backup is ready to assume primary role

Detailed Notes

Overview

According to sample exam question Q19, to check the status of a mirrored system, navigate to System Operation > Mirror Monitor in the Management Portal. Mirroring is a high-availability configuration where two or more IRIS instances maintain synchronized copies of the database, enabling automatic failover if the primary system fails.

Monitoring mirrored EMPI systems ensures that failover readiness is maintained and that replication lag is within acceptable limits.

Accessing Mirror Monitor

Navigate to System Operation > Mirror Monitor in the Management Portal (per sample Q19).

Important: The Mirror Monitor page is found under System Operation, NOT System Administration.

Mirror Monitor Page Contents

The Mirror Monitor page displays:

Mirror Members:

  • Primary: The active member currently processing transactions
  • Backup: The standby member ready to assume primary role
  • DR Async (if configured): Disaster recovery member with asynchronous replication

Status Indicators:

  • Member Status: Primary, Backup, Async, Trouble
  • Connection Status: Connected, Disconnected
  • Database Status: Mounted, Dismounted
  • Role: Which member is currently primary

Replication Metrics:

  • Journal Files: Number of journal files awaiting replication
  • Dejournaling Lag: Time lag between primary and backup/async members
  • Catch-Up Status: Whether backup is fully synchronized or catching up

Monitoring Mirror Health

Key metrics to monitor:

Connection Status:

  • All mirror members should show "Connected" status
  • "Disconnected" indicates network or system failure

Replication Lag:

  • Backup should be <5 seconds behind primary in normal operations
  • Lag >30 seconds may indicate network bandwidth issues or backup system performance problems
  • DR async lag depends on network distance and bandwidth; typically minutes to hours

Failover Readiness:

  • Backup must be fully synchronized to assume primary role without data loss
  • "Catch-Up" status indicates backup is not yet ready for failover

Journal File Accumulation:

  • Monitor journal file count on primary
  • High journal count may indicate replication backlog or disk space issues

Failover Testing

Periodically test failover to verify mirror configuration:

1. Schedule failover test during maintenance window 2. Initiate manual failover from Mirror Monitor page 3. Verify that backup assumes primary role successfully 4. Verify that applications reconnect to new primary 5. Monitor for errors or performance issues 6. Fail back to original primary when test is complete

Troubleshooting Mirror Issues

Backup Disconnected:

  • Check network connectivity between mirror members
  • Verify firewall rules allow mirror communication (default port: 2188)
  • Review Event Log for mirror errors

High Replication Lag:

  • Check network bandwidth and latency
  • Assess backup system performance (CPU, disk I/O)
  • Consider increasing mirror Queue Size or Parallel Dejournaling settings

Failover Fails:

  • Verify that backup has %HSPI_Manager and other required roles
  • Check that production configuration is synchronized
  • Review mirror configuration for asymmetries

---

Documentation References

Report an Issue