T4.2: Evaluates Data Integrity Check Utility

Knowledge Review - InterSystems Enterprise Master Patient Index Technical Specialist

1. Operating the Data Integrity Check Utility

Key Points

  • Purpose: Identify MPIID inconsistencies between EMPI tables and Registry (Hub)
  • Three-Table Check: Validates MPIID consistency across Registry, EMPI, and normalized tables
  • Weekly Schedule: InterSystems recommends weekly execution for first month, then monthly
  • Terminal Execution: Run from Terminal in EMPI namespace using ##class method

Detailed Notes

Overview

The data integrity check utility is a critical maintenance tool for InterSystems EMPI systems. It identifies MPIID (Master Patient Index ID) inconsistencies between the three tables that store patient identity information: the Registry table (HS.Registry.Patient), the EMPI table (HSPI.Data.Patient), and the normalized linkage table (Local.Linkage.Definition.Normalized or custom location). Inconsistencies typically arise when IDUpdateNotification messages from EMPI fail to be received or processed by the Registry, although other causes are possible.

For a patient record to pass the integrity check, it must appear in all three tables with the same, non-null MPIID value. Any deviation from this condition constitutes a discrepancy that requires investigation and repair.

The Three Tables

In a Unified Care Record deployment using InterSystems EMPI as the MPI, patient records are stored in three critical tables:

1. HS.Registry.Patient: The Registry (Hub) table that stores the authoritative patient index for the entire UCR federation 2. HSPI.Data.Patient: The EMPI table that stores patient demographic data and linkage information 3. Local.Linkage.Definition.Normalized: The normalized linkage table (or custom table location) that stores normalized demographic values used for matching

All three tables must have consistent MPIID values for each patient. Discrepancies between these tables can result in incorrect patient matching, broken Composite View display, or data retrieval failures.

Running the MPIID Check Method

To execute the MPIID data integrity check:

1. Open a Terminal window 2. Switch to your InterSystems EMPI namespace (e.g., `HSPIDATA`) 3. Call the MPIID check method:

``` set status=##class(HSPI.Util.Checkup).MPIIDCheck(.outfile,.total) ```

The method outputs its progress to the terminal and generates a CSV file containing any discrepancies found. The `outfile` variable receives the path to the output file, and the `total` variable receives the count of discrepancies.

Understanding Output

The MPIID check method produces:

  • Terminal Output: Progress messages indicating which tables are being checked and how many records have been processed
  • CSV Output File: A comma-separated values file listing all discrepancies, with columns for record identifiers, table names, MPIID values, and discrepancy types
  • Discrepancy Count: The total number of discrepancies found

The output file location is returned in the `outfile` parameter. The file is typically stored in the instance's default directory (often the manager directory).

Recommended Execution Schedule

InterSystems recommends the following schedule for running the data integrity check:

  • First Month: Weekly execution during off-peak hours (e.g., overnight on weekends)
  • After First Month: If no new discrepancies are found, reduce frequency to monthly execution
  • After Major Events: Run the check after system upgrades, production incidents, or Registry failures

The check may take several hours to complete on large systems and consumes system resources, so schedule execution during periods of low system utilization.

Automated Task Scheduling

To implement ongoing data integrity monitoring, create a scheduled task using `HSPI.Util.Checkup.Task`:

1. Navigate to System Operation > Task Manager > New Task 2. Set Task Type to `HSPI.Util.Checkup.Task` 3. Configure the schedule (weekly or monthly) 4. Set the namespace to your EMPI namespace 5. Select a superuser to run the task 6. Optionally configure email alerts for discrepancies

The automated task issues an alert to the production Event Log if discrepancies are found, and can be configured to send email notifications.

---

Documentation References

2. Assessing Errors Identified by Data Integrity Check

Key Points

  • Existence Discrepancies: Patient missing from one or two of the three tables (7 variations)
  • MPIID Mismatch: MPIID values differ across tables or are null
  • Output File Analysis: CSV file lists discrepancy type, affected tables, and MPIID values
  • Database Pointer Validation: Ensures referential integrity between tables

Detailed Notes

Overview

The data integrity check utility identifies two primary types of discrepancies: existence discrepancies (where a patient record is missing from one or more tables) and MPIID mismatch discrepancies (where MPIID values differ across tables or are null). Assessing these errors requires understanding the expected data flow, the relationship between tables, and the severity of each discrepancy type.

The output file produced by the MPIID check method provides detailed information about each discrepancy, enabling systematic analysis and repair.

Discrepancy Types

Existence Discrepancies

An existence discrepancy occurs when a patient record does not exist in one or two of the three tables. There are seven possible variations:

1. Record exists only in Registry table 2. Record exists only in EMPI table 3. Record exists only in normalized table 4. Record exists in Registry and EMPI, but not normalized table 5. Record exists in Registry and normalized, but not EMPI table 6. Record exists in EMPI and normalized, but not Registry table 7. Record does not exist in any table (edge case, detected through orphaned references)

Each variation indicates a different failure mode in the data synchronization process.

MPIID Mismatch Discrepancies

An MPIID mismatch discrepancy occurs when the MPIID values among the three tables are not all the same. This includes:

  • Different MPIID values in different tables
  • Null MPIID in one or more tables
  • MPIID present in some tables but missing in others

MPIID mismatches prevent correct patient identity resolution and can cause failures in downstream applications.

Output File Format

The CSV output file contains the following information for each discrepancy:

  • Discrepancy Type: Description of the discrepancy (existence or mismatch)
  • Affected Tables: Which of the three tables contain the record
  • Registry MPIID: The MPIID value in the Registry table (or null)
  • EMPI MPIID: The MPIID value in the EMPI table (or null)
  • Normalized MPIID: The MPIID value in the normalized table (or null)
  • Record Identifiers: Facility, MRN, or other identifiers to locate the patient record

Severity Assessment

Not all discrepancies have equal impact:

High Severity:

  • MPIID mismatch where different non-null MPIIDs exist across tables (indicates serious synchronization failure)
  • Missing Registry entry for active patient (prevents data retrieval in UCR applications)

Medium Severity:

  • Null MPIID in one table (may resolve automatically when next update occurs)
  • Record exists in EMPI and normalized but not Registry (indicates failed IDUpdateNotification)

Low Severity:

  • Orphaned normalized table entries with no corresponding EMPI or Registry record (stale data)
  • Historical records with expected inconsistencies (patients who moved or merged)

Database Pointer Validation

The data integrity check also validates database pointers between tables:

  • Foreign Key Integrity: Ensures that references between tables point to valid records
  • Orphaned Pointers: Identifies pointers to records that no longer exist
  • Circular References: Detects invalid circular reference patterns

Pointer validation errors are typically addressed by the repair method (covered in the next section).

Common Causes

Understanding common causes helps prioritize remediation:

  • Failed IDUpdateNotification Messages: Most common cause; EMPI sends notification to Registry, but Registry fails to process it
  • Network or Service Interruptions: Temporary outages during critical update operations
  • Production Errors: Business operation failures during message processing
  • Data Migration Issues: Inconsistencies introduced during system upgrades or data migrations
  • Manual Database Modifications: Direct database changes bypassing normal message flow

---

Documentation References

3. Reporting Unresolved Data Integrity Issues

Key Points

  • Repair Method: After check, run repair method to fix identified inconsistencies (per sample Q16)
  • Rerun MPIID Check: After repair, rerun the MPIID check to verify corrections (per sample Q16)
  • Forward Output File: If issues persist, forward output file to InterSystems WRC (per sample Q16)
  • Contact WRC: Escalate ongoing discrepancies to InterSystems support for investigation

Detailed Notes

Overview

According to sample exam question Q16, after running the data integrity check utility and identifying problems in the output file, the recommended workflow is: (1) run the repair method, (2) rerun the MPIID check to verify that repairs resolved the issues, and (3) if problems persist, forward the output file to the InterSystems Worldwide Response Center (WRC) for further investigation.

This systematic approach ensures that routine discrepancies are automatically corrected while complex or persistent issues receive expert attention.

Running the Repair Method

The MPIID repair method attempts to automatically fix inconsistencies identified by the check method:

1. Open a Terminal window 2. Switch to your InterSystems EMPI namespace 3. Call the repair method:

``` set status=##class(HSPI.Util.Checkup).MPIIDRepair(.repairfile) ```

The repair method:

  • Reads the output file from the most recent MPIID check
  • Attempts to synchronize MPIID values across the three tables
  • Corrects database pointer errors where possible
  • Generates a repair log file documenting all actions taken

The repair method operates conservatively, only making changes where the correct MPIID value can be determined with confidence.

Rerunning the MPIID Check

After running the repair method, you must rerun the MPIID check to verify that discrepancies have been resolved (per sample Q16):

1. Execute the MPIID check method again:

``` set status=##class(HSPI.Util.Checkup).MPIIDCheck(.outfile,.total) ```

2. Review the new output file to determine if discrepancies remain 3. Compare the total count before and after repair to assess effectiveness

If the rerun check shows zero discrepancies, the repair was successful. If discrepancies remain, further investigation is required.

Forwarding Output to InterSystems WRC

If unresolved discrepancies remain after running the repair method and rechecking (per sample Q16):

1. Locate the output file from the most recent MPIID check (path stored in `outfile` variable) 2. Open a case with the InterSystems Worldwide Response Center (WRC) 3. Attach the following to the case:

  • MPIID check output file (CSV)
  • Repair method output file
  • Production Event Log excerpts showing any related errors
  • System configuration details (production configuration, namespace settings)

4. Provide context:

  • When the discrepancies were first detected
  • Any recent system events (upgrades, migrations, outages)
  • Whether discrepancies are increasing or stable

What to Include in WRC Report

When reporting unresolved data integrity issues to InterSystems WRC:

Required Information:

  • Output Files: MPIID check output and repair method output
  • System Version: InterSystems IRIS/HealthShare version and patch level
  • Discrepancy Count: Total number of unresolved discrepancies
  • Discrepancy Types: Breakdown of existence vs. mismatch discrepancies
  • Timeline: When discrepancies first appeared and frequency of occurrence

Helpful Contextual Information:

  • Recent Changes: System upgrades, configuration changes, production modifications
  • Production Errors: Event Log entries related to IDUpdateNotification failures
  • Message Flow: Volume and throughput of patient update messages
  • Database Size: Size of the three patient tables
  • Network Issues: Any known network or connectivity problems between EMPI and Registry

Ongoing Discrepancy Monitoring

If the data integrity check task is configured (as described in section 1), ongoing discrepancies will trigger alerts:

  • Event Log Alerts: Automated task logs alerts when discrepancies are detected
  • Email Notifications: Optional email alerts can be configured using EnsLib.EMail.AlertOperation
  • Alert Content: "MPIID Check Utility has found x discrepancies. Output file is y."

Contact InterSystems support if the automated task consistently reports discrepancies over multiple weeks, indicating a systemic issue requiring root cause analysis.

Prevention of Future Discrepancies

After resolving existing discrepancies with WRC assistance, implement preventive measures:

  • Monitor IDUpdateNotification Messages: Review production message flow to ensure notifications are successfully delivered to Registry
  • Event Log Monitoring: Set up alerts for IDUpdateNotification failures
  • Network Stability: Ensure reliable network connectivity between EMPI and Registry
  • Production Configuration: Verify Pool Size settings and retry logic in the production
  • Regular Integrity Checks: Maintain weekly or monthly automated integrity check schedule

When NOT to Contact WRC

Some scenarios do not require WRC escalation:

  • First-time discrepancies resolved by repair: If the repair method successfully corrects all discrepancies, no escalation is needed
  • Expected historical inconsistencies: Old records from data migrations or system conversions may have explainable inconsistencies
  • Isolated incidents: A single occurrence with few discrepancies that are corrected by repair

Use judgment to determine whether discrepancies represent systemic issues requiring WRC assistance or routine maintenance items.

---

Documentation References

Report an Issue