T1.4: Manages Linkage Builds - InterSystems Enterprise Master Patient Index Technical Specialist Study Guide

1. Understanding Linkage and the Need for Builds

Key Points

Linkage identifies records representing the same person
Links connect records from different source systems
Linkage algorithm uses demographic matching rules
Thresholds determine automatic vs. manual review
MPIID (Master Person Index ID) unifies linked records

Detailed Notes

Linkage is the core function of InterSystems EMPI (Enterprise Master Patient Index). As patient records arrive from multiple source systems and facilities, the linkage process determines which records represent the same individual and should be connected. Understanding linkage is essential for managing linkage builds effectively.

The Linkage Concept

In a healthcare organization with multiple facilities or systems, the same patient may have different medical record numbers (MRNs) at each location. For example:

Patient John Smith has MRN 123456 at Community General Hospital
The same John Smith has MRN 789012 at Metro Medical Center
He also has MRN 555555 at Regional Clinic

Without a master patient index, these three records appear to be three different patients. The linkage process in InterSystems EMPI analyzes demographic data (name, date of birth, gender, address, etc.) to determine that these three records represent the same person. Once linked, all three records share a common MPIID (Master Person Index ID), enabling clinicians to see a complete picture of John Smith's care across all facilities.

How Linkage Works

The linkage algorithm compares pairs of patient records using configurable matching rules defined in the Linkage Definition. The matching process:

1. Normalization: Demographic data is normalized to standard formats (e.g., removing punctuation from names, standardizing addresses) 2. Link Key Matching: Records must match on required "link keys" to be considered for linkage (e.g., same last name phonetic code) 3. Weight Calculation: Matching and non-matching attributes contribute positive or negative weight 4. Threshold Comparison: The calculated weight is compared to three thresholds: Review, Autolink, and Validate

Linkage Thresholds

Three thresholds control linkage decisions:

Review Threshold: The minimum weight for a record pair to be considered a potential match. Pairs below this threshold are classified as non-links (different people) and are not linked.

Autolink Threshold: Pairs exceeding this weight are automatically linked as they have very high confidence of representing the same person.

Validate Threshold: Pairs between Autolink and Validate thresholds are automatically linked but flagged for expert review to confirm the link is correct. Pairs between Review and Autolink thresholds are not linked but appear on the Worklist for manual review.

When Linkage Builds Are Necessary

Linkage builds are required in several scenarios:

Initial Data Load: After loading patient records via batch import, a linkage build must be run to identify potential duplicates and establish links between records.

Linkage Definition Changes: Any modification to the linkage definition (changing thresholds, adding/removing matching parameters, modifying weights) requires rebuilding linkage data to apply the changes to existing records.

Data Quality Corrections: After correcting data quality issues (e.g., fixing invalid dates, standardizing name formats), a linkage rebuild may improve matching accuracy.

Periodic Maintenance: Even without definition changes, periodic linkage rebuilds can be valuable in large, complex deployments to ensure linkage data remains consistent and up-to-date.

---

Documentation References

InterSystems Documentation

2. Types of Linkage Builds

Key Points

Full rebuild reprocesses all linkage data from scratch
Incremental/partial builds process only changed records
Build modes: batch vs. multi-process vs. normalized-only
Development/test environments may use different approaches than production
Build selection depends on what changed and system size

Detailed Notes

InterSystems EMPI provides different build modes to accommodate various scenarios and performance requirements. Understanding when to use each build type is critical for efficient system management.

Full Linkage Rebuild

A full linkage rebuild reprocesses all patient records in the system, recalculating linkage weights and reestablishing links based on the current linkage definition. This is the most comprehensive build type.

When to Use Full Rebuild:

After modifying the linkage definition (parameters, weights, thresholds)
After changing normalization functions that affect how data is compared
After correcting widespread data quality issues
When migrating to a new version of InterSystems EMPI
When linkage data integrity is questionable

Full Rebuild Process:

The full rebuild process includes multiple stages:

1. Purging: Existing linkage data tables are cleared (normalized data, link key indices, classified data, transitive links) 2. Normalized Database Build: Patient demographic data is normalized according to current rules 3. Warning Database Build: Data quality warnings are generated for problematic records 4. Classified Database Build: Records are classified according to link key requirements 5. Link Key Index Build: Link key indices are constructed for efficient pair comparison 6. Transitive Links Build: Link relationships are calculated based on matching weights 7. EID Synchronization: MPIIDs are synchronized so linked records share the same MPIID 8. Consistency Check: The system identifies linkage conflicts (overlaps and overlays) for the Worklist

Full Rebuild Duration:

Full rebuilds are resource-intensive and time-consuming. For large patient populations:

100,000 records: 15-30 minutes
1,000,000 records: 2-6 hours
10,000,000 records: 12-24+ hours

Actual duration depends on server capacity, record complexity, and linkage definition complexity.

Partial/Incremental Build

Partial builds process only records that have changed since the last build, making them much faster than full rebuilds. However, partial builds are only available in specific scenarios.

When Partial Build Is Appropriate:

No changes to linkage definition
Only new records added since last build
No normalization function changes
No widespread data corrections

Limitations of Partial Builds:

Cannot be used after linkage definition modifications
May not detect all linkage conflicts
Not suitable for data quality remediation scenarios

Batch Mode Build

Batch mode is the standard build approach that processes all records or all changed records in a single continuous operation. The build runs in the background and logs progress to the Build Log.

Batch Mode Characteristics:

Processes all linkage data in predefined sequence
Provides detailed logging of each stage
Can be monitored via Build Log or show progress dialog
Can be stopped (though this leaves linkage data incomplete)

Multi-Process Mode:

Some versions of InterSystems EMPI support multi-process builds that can leverage multiple CPU cores for faster processing. This is particularly valuable for very large datasets.

Normalized-Only Build

In some cases, you may only need to rebuild the normalized database without recalculating all linkage relationships. This is much faster than a full rebuild but only appropriate when normalization functions changed without affecting matching logic.

---

Documentation References

InterSystems Documentation

3. Executing a Linkage Build from the Settings Menu

Key Points

Build is initiated from Settings menu in Person Index
Definition Designer also provides build access
Select build mode (batch, multi-process, etc.)
Process runs in background
Progress banner appears in Management Portal

Detailed Notes

Linkage builds are executed through the Management Portal interface. The process is straightforward but must be performed by users with appropriate privileges (typically the %HSPI_Master role).

Navigating to Build Options

To start a linkage build:

1. Open the Management Portal and navigate to the Person Index menu 2. Select Settings from the menu options 3. The Settings page contains the linkage build controls

Alternative Access Point:

Builds can also be initiated from the Definition Designer:

1. Navigate to Person Index > Definition Designer 2. The Definition Designer page includes build buttons 3. This approach is common when you've just modified the linkage definition and want to build immediately

Based on sample question Q4 from the EMPI exam, the exam expects candidates to know that builds are started from the Settings menu option.

Selecting Build Mode

On the Settings page, you'll see options to start different types of builds:

Build All Linkage Data: Initiates a full rebuild of all linkage data, including all stages (purge, normalized, warning, classified, link key index, transitive links, synchronization, consistency check).

Build Options Menu: Depending on the version, there may be additional options for:

Build mode selection (batch vs. multi-process)
Partial build (if conditions allow)
Specific database builds (normalized only, link key index only, etc.)

Starting the Build

After selecting the appropriate build type and options:

1. Click the Start Build or Build Linkage Data button 2. A confirmation dialog may appear warning that the build can take significant time 3. Confirm to start the build process 4. The build begins running in the background

Important Note: Users must have the %HSPI_Master role to initiate linkage builds. Additionally, to compile the linkage definition (which is often required before building), users need "USE" permissions on the %Ens_ProductionRun resource.

Build Progress Banner

Once the build starts, a banner appears in the Management Portal indicating that a linkage data build is currently running. The banner displays:

Build mode (e.g., "batch - normalized/eid")
Processing mode (e.g., "multi-process")
Linkage definition name
Log index number

The banner includes a Show Progress button that opens a detailed progress window.

Viewing Build Progress

Click the Show Progress button in the banner to open the Linkage Data Builder Output window. This window displays real-time progress information:

Current processing stage
Number of records processed in current stage
Estimated time remaining (in some versions)
Any errors or warnings encountered

The output window updates continuously as the build progresses. You can close this window without affecting the build—it continues running in the background.

Stopping a Build

The progress banner includes a Stop Build button. However, stopping a build is strongly discouraged:

Warning: Stopping the build linkage data process while it is running can cause your linkage data to be out-of-date. Use discretion when stopping a build.

If you must stop a build (e.g., due to an emergency or critical error), be aware that:

Linkage data will be incomplete and inconsistent
The system may not function correctly until a complete build finishes
You'll need to restart and complete a full build as soon as possible

Only stop builds in genuine emergencies or when you've discovered a critical error that must be corrected before proceeding.

---

Documentation References

InterSystems Documentation

4. Monitoring Build Progress and Status

Key Points

Real-time progress shown in output window
Build stages clearly identified in log
Processing time varies by system size and complexity
Build runs in background - other work can continue
Progress stored in global for persistent access

Detailed Notes

While linkage builds run in the background, several mechanisms provide visibility into build progress and status. Understanding how to monitor builds is essential for managing the process effectively.

Real-Time Progress Monitoring

The primary mechanism for monitoring build progress is the Linkage Data Builder Output window, accessed via the Show Progress button. This window displays detailed information about each stage of the build:

Build Stages Displayed:

1. Building started... - Initial message confirming build has begun 2. Purging [Database] Database started... - For each database being purged (Normalized, Warning, Classified, Link Key Index, Transitive Links) 3. Purging [Database] Database finished... - With record counts and duration 4. Building [Database] Database started... - For each database being built 5. Building [Database] Database finished... - With record counts saved and duration 6. Synchronizing EID started... - Beginning of MPIID synchronization 7. Synchronizing EID finished... - With count of records changed 8. Checking for EID consistency started... - Beginning of consistency validation 9. Checking for EID consistency finished... - With count of issues discovered 10. Building finished. Total duration is [X] seconds - Final completion message

Example Build Log Output:

``` Building All Linkage Data for Linkage Definition 'Local.Linkage.Definition' Building mode is 'batch - normalized/eid', processing mode is 'multi-process', log index is 2 Building all linkage data (preserving all manual linkages) 2018-08-24 16:58:43.842: Building started ... 2018-08-24 16:58:43.920: Purging Normalized Database started ... 2018-08-24 16:58:44.017: Purging Normalized Database finished, 5,012 records deleted, in 0.097 seconds 2018-08-24 16:58:44.017: Building Normalized Database started ... 2018-08-24 16:58:49.389: Building Normalized Database finished, 5,012 records saved, in 5.373 seconds ... 2018-08-24 16:58:56.335: Building finished. Total duration is 12.493 seconds ```

Understanding Processing Metrics

The build log provides important metrics for each stage:

Records Deleted: During purge stages, shows how many existing records were removed from each database.

Records Saved: During build stages, shows how many records were processed and saved.

Duration: Each stage reports completion time, helping identify bottlenecks in the build process.

Records Changed: During EID synchronization, shows how many records had their MPIID updated to match linked records.

Issues Discovered: During consistency checking, shows how many linkage conflicts (overlaps/overlays) were identified.

Background Processing

The linkage build process runs in the background, meaning:

You can close the progress window without stopping the build
You can navigate to other pages in the Management Portal
Other users can continue working (though system performance may be slower)
The production continues processing incoming messages

However, for large builds, system resources are heavily consumed, so it's best to schedule builds during off-peak hours when possible.

Persistent Progress Storage

Build progress is stored in a global variable:

``` ^CacheTemp.Output($J,"output",lineNumber) ```

If you close the progress dialog and the process is still running, you can review the current status by examining this global. This is particularly useful if:

Your browser session is disconnected during a build
You need to check build status from Terminal
You want to script build monitoring

---

Documentation References

InterSystems Documentation

5. Using the Build Log to Review Build History

Key Points

Build Log accessible from Person Index menu
Records all past builds with timestamps and details
Search and filter by build type, date range, status
Expand entries to see full build output
Purge old entries to manage log size

Detailed Notes

The Build Log provides a permanent record of all linkage builds executed on the system. This historical data is valuable for troubleshooting, auditing, and understanding system behavior over time.

Accessing the Build Log

To access the Build Log:

1. Navigate to the Person Index menu in the Management Portal 2. Select Build Log from the menu options 3. The Build Log page displays a list of all recorded builds

The Build Log page shows a summary of recent builds by default, with the most recent builds listed first.

Build Log Entry Information

Each entry in the Build Log displays:

Timestamp: Date and time when the build started (format: YYYY-MM-DD HH:MM:SS.mmm)

Build Status: Visual indicator and text showing:

Last Completed (green) - Build finished successfully
In Progress (yellow) - Build currently running
Error (red) - Build encountered errors
Stopped (orange) - Build was manually stopped before completion

Build Description: Brief summary including:

Build type (e.g., "Building All Linkage Data")
Linkage definition name
Build mode and processing mode
Preservation of manual linkages (yes/no)

Duration: Total time the build took to complete (shown only for completed builds)

Expanding Build Log Entries

By default, build entries are shown in collapsed form. Click on a build entry to expand it and see the full build output, including:

All stages (purging, building, synchronizing, checking)
Record counts for each stage
Timing information for each stage
Any errors or warnings generated
Final completion status and total duration

This detailed view is identical to what was shown in the real-time progress window during build execution.

Searching and Filtering Build Log

The Build Log page provides a search panel to filter builds:

Build Log Types: Filter by build status:

In Progress (currently running builds)
Error (builds that failed)
Stopped (builds manually interrupted)
Last Completed (successful completions)
All Completed (all finished builds, successful or not)
All Types (show everything)

Build Start Time Range: Filter by when builds started:

From: Start date/time
To: End date/time

Build End Time Range: Filter by when builds completed:

From: Completion date/time
To: Completion date/time

Display Options:

Logs per Page: Choose how many builds to display (e.g., 20, 50, 100)
Newest First / Oldest First: Sort order
Expand Logs / Contract Logs: Show all builds expanded or collapsed

After setting filters, click Refresh Table to reload the Build Log with the specified criteria.

Purging Old Build Logs

Over time, the Build Log accumulates many entries. To manage log size and improve performance:

1. The top of the Build Log page shows the total number of entries 2. Click the Purge button to delete old entries 3. In the "Do not purge most recent n days" field, specify how many days of history to retain (default: 7) 4. Confirm the purge operation

Purge Recommendations:

For development/test systems: Purge frequently (retain 7-14 days)
For production systems: Retain longer history for audit purposes (30-90 days)
Before major changes: Export or document recent build logs if needed for reference

---

Documentation References

InterSystems Documentation

6. Interpreting Build Results and Messages

Key Points

Synchronizing EID messages indicate MPIID assignments
Consistency check identifies overlaps and overlays
Record counts should match expected population
Errors indicate configuration or data problems
Build duration helps capacity planning

Detailed Notes

The output from a linkage build contains important information about the state of the patient index. Learning to interpret build messages helps identify issues and validate that the build completed correctly.

Normal Build Messages

A successful build will show consistent patterns in the output:

Purge and Build Balance: The number of records deleted during purging should match (approximately) the number of records saved during building for each database. If the normalized database purge deletes 5,012 records, the normalized database build should save approximately 5,012 records (the exact count may differ slightly if records were added during the build).

Processing Time Patterns: Different stages have characteristic durations:

Purging: Very fast (seconds)
Building Normalized: Moderate (minutes for large datasets)
Building Classified: Fast (seconds to minutes)
Building Link Key Index: Moderate to slow (depends on linkage complexity)
Building Transitive Links: Can be slow for large, highly linked populations
Synchronizing EID: Fast to moderate
Checking Consistency: Fast to moderate

Record Counts: Should align with known patient population size. If you expect 500,000 patients and the normalized database only builds 250,000 records, investigate why half the population is missing.

EID Synchronization Messages

The EID (Enterprise ID, also called MPIID) synchronization stage is particularly important:

Purpose: InterSystems EMPI tries to assign the same MPIID to linked records and different MPIIDs to unlinked records.

"Synchronizing EID finished, X records changed":

Large number of changed records after definition changes: Normal, indicates thresholds were adjusted and links changed
Large number of changed records with no definition changes: May indicate data quality issues or incorrect previous builds
Zero records changed: Normal if no new linkages were established

Understanding Synchronization:

When two records are determined to be linked, they should share the same MPIID. During synchronization, one record's MPIID is chosen as the "master" and the other record(s) in the link group are updated to match. The "records changed" count shows how many records had their MPIID updated.

Consistency Check Messages

The consistency check stage identifies linkage conflicts:

Overlaps: Linked records that have different MPIIDs. This shouldn't happen after synchronization, so overlaps found during consistency checking indicate a problem that needs investigation.

Overlays: Two records with the same MPIID that are not linked. This represents a serious data integrity issue where unrelated patients share an identifier.

"Checking for EID consistency finished, X issues discovered":

Zero issues: Ideal result, indicates clean linkage data
Small number of issues: Normal in complex environments, add to Worklist for review
Large number of issues: Investigate whether linkage definition is configured correctly

Worklist Impact:

Issues discovered during consistency checking are automatically added to the Worklist for manual review by data stewards with %HSPI_Operator or %HSPI_Master roles.

Error Messages in Build Log

If a build encounters errors, the Build Log will contain error messages:

Common Error Types:

Database errors (disk full, permissions issues)
Memory allocation errors (insufficient system memory)
Linkage definition errors (invalid configuration)
Data errors (corrupt records, invalid data types)

Responding to Errors:

When errors occur: 1. Review the full error message in the expanded Build Log entry 2. Address the root cause (free disk space, fix permissions, correct definition, repair data) 3. Restart the build after resolving the issue

Most build errors require a full rebuild after correction—partial builds may not be possible after an error.

---

Documentation References

InterSystems Documentation

7. Development/Test vs. Production Build Strategies

Key Points

Test environments: Frequent builds during definition tuning
Production: Scheduled builds during maintenance windows
Test with representative data subset
Document build duration for capacity planning
Coordinate builds with data loading activities

Detailed Notes

Linkage build strategies differ between development/test environments and production systems. Understanding these differences helps manage builds effectively across the system lifecycle.

Development and Test Environment Builds

In development and test environments, linkage builds are frequent and iterative:

Frequent Building: During linkage definition development, you may run dozens of builds as you:

Adjust matching parameters
Modify thresholds
Add or remove link keys
Test normalization functions
Evaluate data quality rules

Small Datasets: Test environments often use a subset of production data (10,000-100,000 records rather than millions). This allows:

Faster build cycles (minutes instead of hours)
Rapid iteration on linkage definition changes
Quick validation of configuration changes

Representative Data: Even though test datasets are smaller, they should be representative:

Include records from all facilities
Contain various data quality levels (clean and problematic records)
Represent diverse patient demographics (name variations, multiple addresses, etc.)
Include known duplicate pairs for validation

Build Validation: After each test build:

Review known duplicate pairs to verify they linked correctly
Check Build Log for unexpected record counts or errors
Inspect sample records to confirm normalization working as expected
Use Threshold Adjuster to visualize weight distribution

Production Environment Builds

Production systems require more careful build management:

Infrequent Scheduled Builds: Production builds should be infrequent and carefully planned:

Schedule during maintenance windows (overnight, weekends)
Coordinate with source system administrators
Notify users of potential performance impact
Plan for extended duration (hours to complete)

Build Triggers in Production:

Production builds are only needed when:

Linkage definition is updated (after thorough testing)
Large batch data loads are completed
Data quality remediation is performed
System upgrades require rebuild
Periodic maintenance (quarterly or semi-annually)

Change Control: Production linkage definition changes should go through formal change control:

Test thoroughly in development environment
Document expected impact on linkage
Get approval from data governance committee
Schedule implementation during maintenance window
Have rollback plan if issues occur

Performance Impact: Production builds consume significant resources:

CPU utilization may reach 80-100%
Memory usage increases substantially
Disk I/O is intensive
Concurrent user activity may slow
Message processing throughput may decrease

Therefore, schedule builds when system usage is minimal.

Coordinating Builds with Data Loading

Linkage builds and data loading activities must be coordinated:

After Batch Loads: When loading large numbers of new records via batch import: 1. Complete the batch load 2. Verify data loaded correctly 3. Schedule linkage build to process new records 4. Review Build Log to confirm appropriate processing 5. Check Worklist for potential duplicates requiring review

Ongoing HL7 Feeds: For systems with continuous HL7 data feeds:

Real-time linkage occurs for each incoming message
Periodic builds are less frequent (quarterly/semi-annually)
Builds primarily needed for definition changes or data cleanup

Temporary Feed Suspension: For very large batch loads or major linkage definition changes, consider:

Temporarily suspending HL7 feeds during build
Queuing incoming messages for processing after build completes
Coordinating with source systems to minimize traffic during build window

---

Documentation References

InterSystems Documentation

8. Exam Preparation Tips

Key Points

Review section content

Detailed Notes

Review documentation for detailed information.

Documentation References

InterSystems Documentation

1. Understanding Linkage and the Need for Builds Report Issue

Key Points

Detailed Notes

The Linkage Concept

How Linkage Works

Linkage Thresholds

When Linkage Builds Are Necessary

Documentation References

2. Types of Linkage Builds Report Issue

Key Points

Detailed Notes

Full Linkage Rebuild

Partial/Incremental Build

Batch Mode Build

Normalized-Only Build

Documentation References

3. Executing a Linkage Build from the Settings Menu Report Issue

Key Points

Detailed Notes

Navigating to Build Options

Selecting Build Mode

Starting the Build

Build Progress Banner

Viewing Build Progress

Stopping a Build

Documentation References

4. Monitoring Build Progress and Status Report Issue

Key Points

Detailed Notes

Real-Time Progress Monitoring

Understanding Processing Metrics

Background Processing

Persistent Progress Storage

Documentation References

5. Using the Build Log to Review Build History Report Issue

Key Points

Detailed Notes

Accessing the Build Log

Build Log Entry Information

Expanding Build Log Entries

Searching and Filtering Build Log

Purging Old Build Logs

Documentation References

6. Interpreting Build Results and Messages Report Issue

Key Points

Detailed Notes

Normal Build Messages

EID Synchronization Messages

Consistency Check Messages

Error Messages in Build Log

Documentation References

7. Development/Test vs. Production Build Strategies Report Issue

Key Points

Detailed Notes

Development and Test Environment Builds

Production Environment Builds

Coordinating Builds with Data Loading

Documentation References

8. Exam Preparation Tips Report Issue

Key Points

Detailed Notes

Documentation References

Report an Issue

1. Understanding Linkage and the Need for Builds

2. Types of Linkage Builds

3. Executing a Linkage Build from the Settings Menu

4. Monitoring Build Progress and Status

5. Using the Build Log to Review Build History

6. Interpreting Build Results and Messages

7. Development/Test vs. Production Build Strategies

8. Exam Preparation Tips