T1.2: Updates Linkage Definition and Data Feeds - InterSystems Enterprise Master Patient Index Technical Specialist Study Guide

1. Understanding Demographic Data Collection and Assessment

Key Points

Collect comprehensive demographic data inventory from all participating sites
Identify available patient identifiers: Names, SSN, Gender, Birth Date, Addresses, Telecoms
Document data formats, completeness, and quality by facility/assigning authority
Assess consistency across data domains (facilities)
Identify dummy values, default values, and missing data patterns
Use Data Quality tool to analyze valid, invalid, and blank values by source

Detailed Notes

When implementing or updating an EMPI system, the first critical step is collecting and appraising demographic data from each participating site. This assessment drives all subsequent configuration decisions in the linkage definition.

Collecting Site Information

Technical specialists must gather detailed information about the demographic data available from each facility or data domain. The default EMPI parameters include:

Names (First, Last, Middle)
Social Security Number (SSN)
Gender
Birth Date
Identifiers (MRN, facility-specific IDs)
Addresses (Street, City, State, ZIP)
Telecoms (Phone numbers, emails)

Additionally, Facility and MRN parameters are used in preliminary matching. Records matching on the facility/assigning authority/MRN triplet are automatically updated without requiring full linkage comparison.

Appraising Data Quality

The InterSystems EMPI Data Quality tool provides analytics cubes to evaluate data completeness and validity:

PatientIndexDQCube: Built from HSPI.Data.Patient table, shows original data quality
PatientIndexNormalizedDQCube: Built from normalized table, shows data after normalization
DQTrend: Tracks long-term trends in data quality over time

The Data Quality tool uses business rules to classify values as valid, invalid, or blank. Dashboards display counts by property, facility, and assigning authority. This analysis reveals:

Which facilities provide complete SSN data versus those that don't collect it
Sources using dummy telephone numbers (e.g., 000-000-0000)
Default values that should be treated as null
Properties with high percentages of blank or invalid values

Sites should create custom Data Quality dashboards and trending pivots to monitor specific concerns relevant to their implementation. Finding and remediating data quality issues significantly improves and accelerates the tuning process.

Documentation References

InterSystems Documentation

2. Data Normalization Strategies

Key Points

Normalization standardizes data for consistent comparison
Default normalization functions provided for each linkage type
Custom normalization functions address site-specific data problems
Common normalizations: case conversion, punctuation removal, date standardization
Null value lists identify dummy values to exclude from matching
Normalization function overrides null value list unless coded to check it
Use %object, %property, and %parameters variables in custom functions

Detailed Notes

Normalization is the process of standardizing demographic data into a consistent format before comparison. Each linkage type has an associated default normalization function that describes how to normalize that particular type of data.

Default Normalization Functions

Default normalization includes:

Converting all characters to lowercase
Removing punctuation and special characters
Standardizing presentation of addresses, phone numbers, and dates
Stripping whitespace

For example, "SMITH, JOHN" and "Smith,John" both normalize to "smithjohn" for comparison purposes.

Selecting Normalization Approaches for Problem Data

When data quality issues are identified, technical specialists must select appropriate normalization approaches:

List of Values to be Treated as Null: For known dummy values (e.g., 000-00-0000 for SSN, 999-999-9999 for phone), add these to the null value list for the parameter. These values will be excluded from linkage weight calculations.

Custom Normalization Functions: Override default normalization by entering custom ObjectScript code in the Normalization Function field. Custom functions should return the normalized value.

Custom Normalization Function Structure

Custom functions have access to:

%object: Access other normalized values for the record (e.g., %object.getDataSource())
%property: The original value being normalized
%parameters: List of linkage type parameter name/value pairs

Typical approaches: 1. Call default function then manipulate result 2. Manipulate original value then pass to default function

Example: Gender Normalization

```objectscript set gender=##class(%MPRL.LinkageType.Gender).getNormalized(%object,%property,.%parameters) if gender'="M",gender'="F" set gender ="U" quit gender ```

This normalizes any gender values other than M or F to U (Unknown).

Example: Date Normalization for Invalid Dates

```objectscript If +$zdt(%property,3)<1900 quit "" quit ##class(%MPRL.LinkageType.Date).getNormalized(%object,%property,.%parameters) ```

This treats dates before January 1, 1900, as missing values.

Important Consideration

If both a null value list and normalization function are defined, the normalization function overrides the null value list by default. To preserve null value functionality, add this code at the beginning of custom functions:

```objectscript if (%property="")||$data(%parameters("_MissingValueArray",$zcvt(%property,"l"))) quit "" ```

Documentation References

InterSystems Documentation

3. Modifying Parameters in Definition Designer

Key Points

Access Definition Designer from EMPI menu
Parameters tab displays default parameters (Names, SSN, Gender, Birth Date, etc.)
Modify parameter values: linkage type, agreement/disagreement weights, normalization
Settings tab: Locale, Enable Domain Conflict, threshold values
Data Classes tab: Must use default HSPI.Data.Patient class (not customizable)
Weighted parameters contribute to agreement pattern displayed in Worklist
Save changes and rebuild linkage data to apply modifications

Detailed Notes

The Definition Designer is the primary interface for configuring the EMPI linkage definition. It provides a comprehensive UI for managing all aspects of how records are compared and linked.

Accessing Definition Designer

Navigate to Definition Designer from the InterSystems EMPI menu. The interface includes multiple tabs: Settings, Data Classes, Parameters, Link Keys, and Calibration.

Settings Tab

Key settings include:

Locale: Choose the geographical location that best represents your data. As of version 2020.1, EMPI supports: France, Germany, Italy, South Africa, Spain, and USA. The locale affects name matching algorithms, alias dictionaries, and address normalization.

Enable Domain Conflict: Controls how EMPI handles similar records from the same data domain (facility):

Enabled: Records from same domain with different MRNs are NOT linked (duplicates should be resolved by source system)
Above autolink threshold: Category = Duplicate, Status = non-link, Reason = domain conflict
Below autolink threshold: Category = Review, Status = potential link, Reason = threshold
Disabled: Records from same domain are linked and assigned same MPIID

Data Classes Tab

The Data Classes tab identifies locations and kinds of data to be analyzed. Settings are NOT customizable.

Critical: You must use the default HSPI.Data.Patient class for patient data. Do not modify default data classes or create additional data classes.

Parameters Tab

The Parameters tab is where most linkage definition work occurs. Default parameters are listed on the left:

Names
SSN
Gender
Birth Date
Identifiers
Addresses
Telecoms

For each parameter, you can modify:

Parameter Name: Descriptive name for the parameter

Normalized Property Name: Field name in normalized class (e.g., stdSSN for SSN parameter)

Weighted Checkbox: If checked, parameter contributes to linkage weight calculation. Unweighted parameters can still be used in link keys.

Linkage Type: Class describing field contents (e.g., %MPRL.LinkageType.GivenName). Determines:

How data is normalized
Default agreement function
Default agreement/disagreement weights

Linkage Type Parameters: Each linkage type has associated parameters affecting weight calculation. For example, HSPI.LinkageType.Name includes:

FrequencyAdjusted
FrequencyAdjustmentMaxFactor
CheckTranspositions
ScrubAffix
AgreementWeightPercentage

Original Property Name: The field name as it appears in data class (case sensitive). Multiple properties can be comma-separated.

List of Values to be Treated as Null: Dummy values to exclude from comparison (e.g., 000-00-0000)

Agreement Weight: Amount added to link weight when values match

Disagreement Weight: Amount subtracted from link weight when values don't match (typically negative)

Normalization Function: Custom ObjectScript code to override default normalization

Agreement Function: Custom ObjectScript code to override default comparison logic

Saving and Applying Changes

After modifying parameters: 1. Click "Save All" to persist changes 2. Navigate to Linkage Data tab 3. Build or rebuild linkage data to apply changes to existing records

Documentation References

InterSystems Documentation

4. Threshold Values and Adjustment Process

Key Points

Three thresholds control linkage decisions: Review, Autolink, Validate
Review threshold: Minimum link weight for Worklist inclusion
Autolink threshold: Cutoff between automatic links and potential links
Validate threshold: Maximum link weight for Worklist appearance
Adjust thresholds to balance automation vs. manual review
Early implementation: Lower thresholds (more manual review)
Mature implementation: Higher thresholds (more automation)
MLE calibration does NOT adjust thresholds (manual adjustment required)

Detailed Notes

Threshold values are critical configuration parameters that determine how record pairs are classified and whether they require manual review.

The Three Thresholds

Review Threshold: Defines the lowest link weight at which a record pair is included in the Worklist. Any record pair with a link weight below the review threshold never undergoes further consideration as a match. These pairs are classified as "strong non-links."

Autolink Threshold: Defines the cutoff between record pairs that are automatically linked and those requiring review. Record pairs with link weight between review and autolink thresholds are "potential links" that appear in the Review category of the Worklist.

Validate Threshold: Defines the highest link weight at which pairs appear on the Worklist. All records above the autolink threshold are automatically linked. Those between autolink and validate thresholds are links that appear in the Validate category for quality assurance review.

Example Threshold Scenario

Consider these threshold values:

Review: 10
Autolink: 20
Validate: 25

Record pair classifications:

Link weight < 10: Strong non-link (not on Worklist)
Link weight 10-19: Potential link (Review category)
Link weight 20-24: Automatic link (Validate category)
Link weight ≥ 25: Automatic link (not on Worklist)

Tuning Philosophy

Threshold adjustment is an iterative process aligned with implementation maturity:

Early Stage: Set thresholds to create larger Worklists:

Lower autolink threshold: More pairs require manual review
Higher validate threshold: More automatic links appear for validation
Purpose: Allow manual review of borderline cases to understand data patterns

Mature Stage: Adjust thresholds for greater automation:

Higher autolink threshold: Fewer pairs require review
Lower validate threshold: High-confidence links skip validation
Purpose: Only uncertain cases require manual intervention

Threshold Adjustment Process

1. Review Worklist after building linkage data 2. Analyze Review category pairs: Are they true matches? 3. Analyze Validate category pairs: Are they true links? 4. Identify patterns in link weights for true matches vs. non-matches 5. Adjust thresholds to optimize classification 6. Rebuild linkage data (Batch mode, Weights option) 7. Review results and iterate

Critical Note: Running MLE calibration adjusts parameter weights but does NOT adjust thresholds. Threshold tuning must be performed manually based on Worklist review and organizational capacity for manual review.

Tools for Threshold Analysis

Use the Worklist filtering and analysis features:

Filter by link weight ranges
Review agreement patterns
Examine secondary reasons and comments
Identify systematic misclassifications

The goal is to position thresholds so automatic decisions are highly accurate while manageable volumes appear for manual review.

Documentation References

InterSystems Documentation

5. Maximum Likelihood Estimation (MLE) Process

Key Points

MLE uses probabilistic algorithms to calculate optimal parameter weights
Analyzes actual data to determine agreement/disagreement weights
Requires sizable dataset (minimum 100,000 records recommended)
Accessed via "MLE Calibration" button in Definition Designer
Monitor page shows real-time weight adjustments (aWeight, dWeight columns)
Run until weights stabilize, then click "Stop Calibration"
Apply Results to update parameter weights in linkage definition
Multiple MLE iterations improve accuracy as data and thresholds evolve

Detailed Notes

Maximum Likelihood Estimation (MLE) is a powerful tool for determining appropriate parameter weights based on statistical analysis of actual data rather than generic defaults.

Understanding MLE

EMPI comes with default parameter weights based on generic patient data. These defaults typically produce large volumes of potential links on the Worklist. MLE analyzes your specific database to calculate what agreement and disagreement weights should be for each comparison field based on statistical calculations drawn from actual data patterns.

MLE uses probabilistic record linkage algorithms to determine:

How discriminating each parameter is in your dataset
Optimal weights that maximize linkage accuracy
Statistical confidence in parameter comparisons

Running MLE Calibration

Step 1: Initiate Calibration

1. Navigate to Definition Designer 2. Click "MLE Calibration" in banner 3. Click OK to confirm (or Cancel to return) 4. If recently run, may be prompted to view previous results

Step 2: Monitor Calibration Process

The calibration runs in background. Click OK when prompted to open the Calibration Monitor page.

The monitor displays:

Calibration Index: Current iteration number
Linkage Definition: Which definition is being calibrated
Sample Size: Dataset size being analyzed
Status: Running status and timing
Parameter table: Shows weight adjustments in real-time
Name: Parameter name
aWeight: Agreement weight (positive value)
dWeight: Disagreement weight (negative value)
maProb, uaProb, mdProb, udProb: Probability calculations

Step 3: Evaluate and Stop

Watch the aWeight and dWeight columns. Allow calibration to run until:

Weights stop changing significantly
Weighting scores are fairly stable
Statistical convergence is achieved

Click "Stop Calibration" when satisfied with convergence.

Step 4: Apply Results

Two options when calibration completes:

Apply Results: Updates parameter weights in linkage definition based on calibration OK without applying: Closes monitor without updating weights (can note values for manual entry later)

Once applied, new weights appear in Parameters tab of Definition Designer.

Dataset Requirements

For useful MLE results:

Minimum 100,000 records recommended
Representative sample of data diversity
Include all facilities/data domains
Sufficient true matches and non-matches for statistical validity

Iterative MLE Process

InterSystems recommends running MLE in multiple iterations:

1. Initial run with default weights 2. Review results, adjust thresholds 3. Second MLE run with adjusted thresholds 4. Further threshold refinement 5. Continue iterations until satisfied with accuracy and Worklist volume

Each iteration provides new estimates based on current threshold values and accumulated data patterns.

After MLE Calibration

Critical: After applying MLE results, you must rebuild linkage data:

1. Navigate to Linkage Data tab 2. Select Batch mode 3. Select "Weights" option (recalculates linkages with new weights) 4. Run build process 5. Review Worklist pairs with new weights

Note that data is unavailable during batch rebuild process.

Documentation References

InterSystems Documentation

6. Agreement and Disagreement Weights Revision

Key Points

Agreement weight: Added to link weight when parameters match
Disagreement weight: Subtracted when parameters don't match (negative value)
Overall link weight = sum of all parameter agreement/disagreement weights
Higher agreement weight = parameter contributes more to match decisions
Adjust weights based on parameter reliability in your dataset
Unreliable data (e.g., phone numbers): Lower agreement, raise disagreement
Critical identifiers (e.g., SSN): Higher agreement weight
Manual adjustment complements MLE calibration results

Detailed Notes

Agreement and disagreement weights are fundamental to how EMPI calculates the overall link weight for each record pair. Understanding and properly tuning these weights is essential for accurate patient matching.

Weight Fundamentals

For each weighted parameter, EMPI compares values between two records and applies either:

Agreement weight: If values match or are similar (positive value added to link weight)
Disagreement weight: If values differ (negative value subtracted from link weight)

The sum of all parameter weights produces the overall link weight, which is compared against thresholds to determine link status.

How Weights Affect Linkage

Example Scenario:

Parameters and weights:

SSN: Agreement +12, Disagreement -8
Birth Date: Agreement +8, Disagreement -6
Last Name: Agreement +6, Disagreement -4
First Name: Agreement +5, Disagreement -3
Gender: Agreement +2, Disagreement -1

Record Pair A vs. B:

SSN: Match (+12)
Birth Date: Match (+8)
Last Name: Match (+6)
First Name: Different (-3)
Gender: Match (+2)

Link weight = 12 + 8 + 6 - 3 + 2 = 25

If autolink threshold is 20, this pair would be automatically linked.

Strategic Weight Adjustment

Increasing Parameter Importance: Raise agreement weight and/or make disagreement weight more negative (e.g., -4 to -6). This makes the parameter contribute more to both positive and negative match decisions.

Decreasing Parameter Importance: Lower agreement weight and/or make disagreement weight less negative (e.g., -4 to -2). Use this for unreliable data elements.

Data Quality Considerations:

If Data Quality tool reveals that telephone numbers are frequently missing or contain dummy values:

Original: Agreement +5, Disagreement -4
Adjusted: Agreement +2, Disagreement -1
Result: Telecoms have less influence on link decisions

If SSN data is highly reliable and complete:

Original: Agreement +10, Disagreement -6
Adjusted: Agreement +15, Disagreement -10
Result: SSN becomes stronger discriminator

Linkage Type Parameters

Beyond basic weights, linkage type parameters affect weight calculations:

FrequencyAdjusted: For name parameters, common surnames (Smith, Jones) contribute less weight than rare surnames, reflecting their lower discriminating value.

AgreementWeightPercentage / DisagreementWeightPercentage: Modify base weights by percentage for fine-tuning.

These parameters are set in the Linkage Type Parameter Name/Value fields in Definition Designer.

Weight Revision Workflow

1. Build linkage data with current weights 2. Review Worklist for misclassifications 3. Identify parameters causing false positives or false negatives 4. Adjust weights strategically 5. Rebuild linkage data (Batch mode, Weights option) 6. Evaluate results 7. Iterate until optimal accuracy achieved

Combine manual weight adjustment with MLE calibration for best results. MLE provides data-driven baseline; manual tuning addresses specific organizational priorities and data quality realities.

Documentation References

InterSystems Documentation

7. Evaluating Tuning Process Effectiveness

Key Points

Use Worklist to review link and non-link pair results
Analyze agreement patterns to understand match decisions
Track Worklist volume trends over tuning iterations
Measure false positive and false negative rates
Review pairs by category: Review, Validate, Duplicate
Filter by secondary reason, comment, facility, date range
Monitor impact of each tuning change on classification accuracy
Goal: High accuracy with manageable Worklist volume

Detailed Notes

Evaluating the effectiveness of tuning changes is critical to achieving optimal EMPI performance. The tuning process is iterative, requiring systematic measurement and validation after each modification.

Worklist Analysis

The Worklist is the primary tool for evaluating tuning effectiveness. It displays record pairs requiring review or validation, organized into categories:

Review Category: Pairs with link weight between review and autolink thresholds (potential links requiring decision)

Validate Category: Pairs with link weight between autolink and validate thresholds (automatic links recommended for quality assurance)

Duplicate Category: Same-domain pairs with different MRNs (when Domain Conflict enabled)

Key Metrics to Track

Worklist Volume: Total number of pairs requiring manual review

Initial implementation: Expect high volumes
After tuning: Should decrease to manageable levels
Trend: Declining volume indicates improving configuration

Classification Accuracy:

Review true matches in Review category: What percentage are actual matches?
Review true non-matches: What percentage correctly identified?
Validate links: What percentage of automatic links are accurate?

False Positives: Pairs incorrectly classified as links

Check Validate category for non-matches
Indicates thresholds too low or weights too generous

False Negatives: Pairs incorrectly classified as non-links

Manually search for known matches not in Worklist
Indicates thresholds too high or weights too conservative

Agreement Pattern Analysis

The agreement pattern is a string of characters showing parameter agreement for each pair. Example: XLXHN

Pattern characters:

H: High agreement (exact match)
X: Approximate agreement (similar but not exact)
L: Low agreement (some similarity)
N: No agreement (different)
M: Missing data (one or both values null)

Analyze patterns to understand:

Which parameter combinations produce accurate links
Which combinations produce false positives
Whether certain parameters are over/under-weighted

Filtering and Reporting

Use Worklist filters to focus analysis:

By Link Weight Range: Examine pairs near threshold boundaries to validate cutoff decisions

By Secondary Reason: Review pairs affected by specific rules (e.g., "Gender" rule, "Roommates-A" rule)

By Comment: Filter for specific automated or manual annotations

By Facility/Assigning Authority: Identify data quality issues by source

By Date Range: Track performance over time or for specific data loads

Effectiveness Evaluation Process

After each tuning change (weights, thresholds, normalization, rules):

1. Rebuild linkage data with new configuration 2. Snapshot Worklist volume before and after 3. Sample Review category pairs (e.g., 100 pairs): Count true matches vs. non-matches 4. Sample Validate category pairs: Verify automatic links are accurate 5. Calculate accuracy metrics: % correct in each category 6. Identify patterns in misclassifications 7. Document findings and determine next tuning actions

Success Criteria

Effective tuning achieves:

High accuracy: >95% of automatic links are true matches
Manageable volume: Worklist size matches organizational review capacity
Clear decisions: Pairs near thresholds have ambiguous data justifying manual review
Stable performance: Metrics remain consistent as new data arrives

When these criteria are met, the linkage definition is well-tuned for production use.

Documentation References

InterSystems Documentation

8. Recommending Data Quality Corrective Actions

Key Points

Use Data Quality tool dashboards to identify systematic issues
Recommend source system corrections for upstream data problems
Implement null value lists for known dummy values
Apply custom normalization for consistent formatting issues
Configure custom data quality rules for organization-specific validation
Engage data stewards at source facilities for long-term improvements
Document data quality issues and remediation plans
Balance EMPI configuration vs. source system fixes

Detailed Notes

Data quality issues significantly impact EMPI linkage accuracy. Technical specialists must identify these issues and recommend appropriate corrective actions at both source and EMPI levels.

Data Quality Tool Analysis

The Data Quality Manager provides three dashboards:

InterSystems EMPI Data Quality Summary: Shows valid, invalid, and blank value counts for each property in original patient data

InterSystems EMPI Data Quality Summary - Normalized: Shows data quality after normalization processing

Trend Dashboard: Displays long-term patterns in data quality metrics

Use these dashboards to identify:

Properties with high percentages of blank values by facility
Facilities using dummy/default values (e.g., all patients with same phone number)
Invalid data formats (e.g., dates outside reasonable ranges)
Inconsistent data entry practices across sources

Categorizing Data Quality Issues

Missing Data: Properties consistently blank from specific facilities

Recommendation: Engage facility to improve data collection
EMPI Action: Reduce weight of unreliable parameters for that source

Dummy Values: Placeholder values used when data unavailable (000-00-0000, 999-999-9999)

Recommendation: Source system should use NULL instead of dummy values
EMPI Action: Add dummy values to "List of values to be treated as null"

Format Inconsistencies: Same data represented differently (Jr., Jr, Junior)

Recommendation: Source system standardization
EMPI Action: Custom normalization function to standardize formats

Invalid Values: Data outside valid ranges (birth dates in future, impossible SSNs)

Recommendation: Source system validation rules
EMPI Action: Custom normalization to treat invalid values as null

Transposition Errors: Fields frequently swapped (first/last name, city/state)

Recommendation: Source system UI/validation improvements
EMPI Action: Configure "List of Possible Transpositions" parameter

Corrective Action Strategies

Immediate EMPI Configuration: 1. Add null value lists for identified dummy values 2. Implement custom normalization for format standardization 3. Adjust parameter weights for unreliable data sources 4. Create custom data quality rules for validation

Source System Engagement: 1. Document specific data quality issues with evidence 2. Quantify impact on EMPI linkage accuracy 3. Provide recommendations for source system improvements 4. Establish ongoing data quality monitoring

Long-term Improvements: 1. Establish data governance policies across organization 2. Implement validation at point of data entry 3. Provide training to data entry staff 4. Schedule regular data quality audits

Custom Data Quality Rules

Create custom rules in Data Quality Manager to validate organization-specific requirements:

1. Navigate to Data Quality Manager 2. Create custom rule in Validation Rule field 3. Define validation logic (e.g., SSN must be 9 digits, no repeating patterns) 4. Save changes 5. Select "Rebuild Cube Data" to apply custom rule 6. View results in dashboards

Custom rules appear in data quality reports alongside built-in validations.

Balancing Configuration vs. Correction

Determine whether issues should be addressed through:

EMPI Configuration: When source system changes are impractical or delayed Source System Fixes: For sustainable, long-term improvements Both: Immediate EMPI workaround while pursuing source corrections

The goal is clean, standardized data at the source with EMPI configuration providing resilience against remaining variability.

Documentation References

InterSystems Documentation

9. Composite Record Trust Tiers and Aging Factors

Key Points

Composite Record aggregates demographic data from all linked records
Trust tiers rank data sources by reliability (e.g., registration > billing)
Manual overrides allow selecting specific property group as default
Aging factors determine how long manual overrides remain effective
Five aging options: change in record, days, change OR days, change AND days, no expiration
Composite override expires automatically when linkage group changes
Configure aging in Settings > Composites tab
Changes apply to all subsequent overrides (not retroactive)

Detailed Notes

The Composite Record is an EMPI feature that creates a single, best-available view of patient demographics by aggregating data from all linked records. Configuring trust tiers and aging factors ensures the Composite Record reflects the most reliable and current information.

Understanding Composite Records

When multiple records are linked to the same patient (same MPIID), EMPI creates a Composite Record that selects the best value for each property group (address, phone, name, etc.). This selection is based on:

1. Data source trust rankings: Which facilities provide most reliable data 2. Data completeness: Records with more complete information rank higher 3. Data recency: More recent updates may be preferred 4. Manual overrides: User-specified selections that override automatic ranking

Trust Tier Configuration

Trust tiers establish a hierarchy of data source reliability. For example:

Tier 1 (Highest Trust): Registration/ADT systems Tier 2: Billing systems Tier 3: Laboratory systems Tier 4: External data sources

When selecting which address to use in the Composite Record, an address from a Tier 1 source outranks an address from a Tier 2 source, all else being equal.

Technical specialists should work with organizational stakeholders to: 1. Identify all data sources feeding EMPI 2. Assess reliability and completeness of each source 3. Establish consensus on trust rankings 4. Document trust tier rationale 5. Configure rankings in EMPI

Manual Overrides

Despite automated ranking, sometimes users identify the correct demographic value through manual review. For example, if a patient has multiple addresses but staff confirms the correct one through phone contact, they may manually select that property group to be used as default in the Composite Record.

This manual selection is called a "manual override." It supersedes automated ranking for that property group.

Aging Factor Configuration

Manual overrides should not persist indefinitely, as patient information changes over time. The Composites tab in Settings page controls how long manual overrides last.

Access: Settings > Composites tab

Five aging options:

When there is a change in a record (default):

Override expires when Composite Record is edited
Ensures overrides are reconsidered when new information arrives
Most conservative approach

After a number of days:

Override expires after specified number of days (enter positive integer)
Example: 90 days for addresses, assuming patients move
Time-based expiration regardless of data changes

After a change in a record or a number of days:

Override expires when record changes OR days elapse, whichever comes first
Combines both triggers
More aggressive expiration policy

After a change in a record and a number of days:

Override expires only after record changes AND days elapse
Both conditions must be met
More conservative, allows longer override persistence

No additional expiration:

Override persists until linkage group changes
No time-based or edit-based expiration
Use cautiously, as overrides may become stale

Important Considerations

Linkage Group Changes: All manual overrides expire automatically whenever the linkage group changes (records are merged, unlinked, or re-linked). This ensures Composite Record is rebuilt from scratch when fundamental linkage structure changes.

Retroactive Application: Changes to aging settings apply only to subsequent overrides created after the change. Existing overrides follow the aging policy in effect when they were created.

Property Group Granularity: Aging applies at the property group level (entire address, not individual city field). Override of one address doesn't affect overrides of other property groups.

Recommendations by Property Type

Addresses: 90-180 day expiration - patients move, addresses become outdated

Phone Numbers: 90-120 day expiration - phone numbers change, patients switch carriers

Names: Change-based expiration only - names rarely change except for marriage/divorce events

Clinical Identifiers: No expiration - SSN and other identifiers are permanent

Technical specialists should recommend aging policies based on:

How frequently each data type changes in real-world patient populations
Organizational capacity to review and update overrides
Regulatory requirements for data currency
Downstream system dependencies on Composite Record accuracy

Documentation References

InterSystems Documentation

10. Recommending Changes to Rules

Key Points

Built-in rules handle common scenarios: Twins, Siblings, Roommates
Custom rules created via onCreateClassifiedPair() method in linkage definition class
Rules modify linkStatus, secondaryReason, and comment properties
Link status values: 0=Strong Non-Link, 1=Non-Link to Review, 2=Link to Validate, 3=Link
Common rule patterns: gender mismatch, DOB discrepancies, suspicious address sharing
Rules filter Worklist by Secondary Reason for targeted review
Do NOT modify other classified pair properties (causes unexpected behavior)
Test rules thoroughly before production deployment

Detailed Notes

Rules provide fine-grained control over linkage decisions by overriding weight-based classifications in specific scenarios. Technical specialists must understand when to recommend rules and how to implement them safely.

Understanding Rules

After EMPI calculates link weight and applies thresholds, rules provide a final opportunity to adjust classification. Rules examine the specific pattern of agreement and disagreement across parameters and modify link status based on organizational business logic.

Built-in Rules

EMPI includes several built-in rules for common scenarios:

Twins Rule: Two records from same facility, same birth date, same last name, different first names, link weight above threshold → Downgrade to potential link (manual review)

Siblings Rule: Similar to Twins but with different birth dates → Non-link

Roommates Rules (A, B, C variants): Records that match on address but disagree on names/identifiers → Downgrade to non-link or potential link for review

These built-in rules prevent common false positive scenarios.

When to Recommend Custom Rules

Consider custom rules when:

1. Specific data quality patterns cause systematic misclassifications 2. Organizational policies require manual review of certain scenarios 3. Regulatory requirements mandate human verification for specific cases 4. Built-in rules don't cover observed false positive/negative patterns

Custom Rule Implementation

Custom rules are implemented by adding an onCreateClassifiedPair() method to the linkage definition class. This method is called immediately before the record pair is saved to the database.

Method Signature: ```objectscript ClassMethod onCreateClassifiedPair(pClassifiedPair as %MPRL.Linkage.Classified, isModified As %Boolean) As %Status ```

Editable Properties (ONLY these should be modified):

linkStatus - Numerical value:

0 = Strong Non-Link (below Review threshold)
1 = Non-Link to Review (Potential Link)
2 = Link to Validate
3 = Link

secondaryReason - String identifying which rule was applied (used for Worklist filtering)

comment - Comment associated with action (displayed in Worklist)

Critical Warning: Do NOT edit any other properties of the classified pair. Modifying other properties may cause unexpected behavior and database corruption.

Example Custom Rules

Gender Mismatch Rule: ```objectscript ClassMethod onCreateClassifiedPair(pClassifiedPair as %MPRL.Linkage.Classified, isModified As %Boolean) As %Status { // Set appropriate variables (tRecNormalizedA, tRecNormalizedB) if ((tRecNormalizedA.stdGender'="")&(tRecNormalizedB.stdGender'=""))&& ($zcvt(tRecNormalizedA.stdGender,"u")'=$zcvt(tRecNormalizedB.stdGender,"u")) { set pClassifiedPair.linkStatus = 1 set pClassifiedPair.secondaryReason = "Gender" set pClassifiedPair.comment = "AutoNonLink rule: Genders are different" } } ```

This rule downgrades to potential link any pair where genders are both present and different.

Birth Date Component Mismatch Rule: If at least one portion of birth date (day, month, year) doesn't match → Downgrade to potential link

SSN/Name/Address Triple Disagreement Rule: If all three critical parameters disagree → Force to non-link regardless of link weight from other parameters

Rule Development Best Practices

1. Analyze Worklist First: Identify specific patterns causing misclassifications before writing rules

2. Start Conservative: Begin with rules that downgrade to "Non-Link to Review" rather than forcing final decisions

3. Use Descriptive secondaryReason: Make it easy to filter and analyze rule impact in Worklist

4. Document Thoroughly: Comment code with business justification for each rule

5. Test on Sample Data: Evaluate rule impact before applying to full dataset

6. Monitor Ongoing: Track Worklist volume and accuracy after rule deployment

Rule Grouping

Individual rules can be grouped together in a single onCreateClassifiedPair() method:

```objectscript ClassMethod onCreateClassifiedPair(pClassifiedPair as %MPRL.Linkage.Classified, isModified As %Boolean) As %Status { // Gender rule if (gender mismatch logic) { set pClassifiedPair.linkStatus = 1 set pClassifiedPair.secondaryReason = "Gender" }

// DOB rule if (birthdate discrepancy logic) { set pClassifiedPair.linkStatus = 1 set pClassifiedPair.secondaryReason = "BirthDate" }

// Address sharing rule if (suspicious address pattern) { set pClassifiedPair.linkStatus = 1 set pClassifiedPair.secondaryReason = "Roommates" }

quit $$$OK } ```

Recommended Rules by Scenario

High-Stakes Environments (e.g., oncology, transplant):

Force manual review for any gender mismatch
Force manual review for SSN disagreement when names match
Require validation for all automatic links above certain weight

High-Volume Environments (e.g., large hospital systems):

Aggressive automatic linking for high-weight pairs
Rules to prevent only most obvious false positives
Minimize manual review requirements

Multi-Facility Environments:

Domain conflict handling rules
Facility-specific trust rankings
Rules accounting for data quality variations by source

Technical specialists should recommend rules aligned with organizational risk tolerance, data quality reality, and operational capacity for manual review.

Documentation References

InterSystems Documentation

11. Link-Key Indices and Preliminary Matching

Key Points

Link-key index identifies feasible linkage candidates before detailed comparison
Preliminary comparison based on hash of several fields
Eliminates obviously unrelated records from intensive parameter comparison
Reduces time required to build linkage data
Default link keys provided (accept during initial setup)
Link Keys tab in Definition Designer for customization
Non-weighted parameters can still be used in link keys
Advanced tuning may adjust link-key composition for performance optimization

Detailed Notes

Link-key indices are performance optimization features that reduce the computational cost of linkage by pre-filtering record pairs before detailed parameter comparison.

Purpose of Link-Key Index

Without link-key indices, EMPI would compare every record against every other record using all weighted parameters—an O(n²) operation that becomes computationally prohibitive as datasets grow.

Link-key indices use a preliminary comparison based on a hash of several fields to identify all record pairs that are similar in some way. Only pairs identified by the link-key index undergo detailed parameter comparison and weight calculation.

How Link-Key Indices Work

1. Link Key Creation: Combine several parameters (e.g., first 3 letters of last name + birth year) into a hash value 2. Index Construction: Build index of hash values for all records 3. Candidate Identification: Records sharing any link-key hash value become candidates for detailed comparison 4. Parameter Comparison: Only candidate pairs undergo full weighted parameter evaluation

This approach eliminates obviously unrelated records (different birth years, completely different names) from intensive comparison, dramatically reducing build time.

Default Link Keys

EMPI provides default link keys optimized for patient matching:

Name-based keys (phonetic and substring variations)
Birth date keys (exact and partial matches)
Identifier keys (SSN, MRN)
Address keys (geographic proximity)

During initial setup, accept these defaults. They are designed to cast a wide net while maintaining performance.

Link Keys Tab

Access link-key configuration in Definition Designer > Link Keys tab.

For each link key, you can specify:

Which parameters contribute to the hash
Transformation functions applied before hashing (e.g., soundex, substring)
Combinations of parameters to create multiple keys

Parameters in Link Keys vs. Weighted Parameters

A parameter can be:

Weighted: Contributes to link weight calculation
In link key: Used to identify candidate pairs
Both: Most common scenario
Neither: Displayed but not used in linkage logic

Non-weighted parameters can still be useful in link keys if they help identify candidates efficiently.

Advanced Link-Key Tuning

For performance optimization in large datasets:

Too Few Candidates: If valid matches are being missed, link keys may be too restrictive

Solution: Add more link keys or loosen transformation functions

Too Many Candidates: If build time is excessive, link keys may be too permissive

Solution: Make link keys more selective (requires careful validation that true matches aren't excluded)

This is advanced tuning typically performed after initial implementation stabilizes.

Documentation References

InterSystems Documentation

12. Exam Preparation Tips

Key Points

Understand complete workflow: data collection → assessment → normalization → tuning
Master Definition Designer navigation and all tabs
Know three thresholds and their ranges (Review < Autolink < Validate)
Understand MLE process: when to run, how to interpret, how to apply
Differentiate agreement weights (positive) vs. disagreement weights (negative)
Know when to use normalization functions vs. null value lists
Understand Composite Record aging options and appropriate use cases
Know onCreateClassifiedPair() method structure and editable properties
Practice scenario-based problem solving for data quality issues

Detailed Notes

Key Concepts to Master

1. Data Collection and Assessment (KSAs 1-2)

What demographic data to collect from sites
How to use Data Quality tool for assessment
Identifying patterns in valid/invalid/blank values
Facility-specific data quality issues

2. Normalization (KSA 3)

Default normalization functions by linkage type
Custom normalization function structure
Variables available: %object, %property, %parameters
When normalization overrides null value list
Common normalization patterns (gender, dates, names)

3. Definition Designer (KSA 4)

Navigate Settings, Parameters, Link Keys tabs
Modify parameter values in UI
Understand Locale setting impact
Enable Domain Conflict behavior

4. Thresholds (KSA 5)

Review threshold: Lowest weight for Worklist inclusion
Autolink threshold: Cutoff for automatic linking
Validate threshold: Highest weight for Worklist
How threshold positioning affects Worklist categories
Tuning philosophy: early vs. mature implementation

5. MLE Process (KSA 6)

Purpose of Maximum Likelihood Estimation
Dataset size requirements (100,000+ records)
How to run MLE Calibration
Interpreting monitor page (aWeight, dWeight columns)
When to stop calibration (weights stabilize)
Applying results to update weights
Iterative MLE approach

6. Evaluating Effectiveness (KSA 7)

Worklist analysis techniques
Metrics to track (volume, accuracy, false positive/negative rates)
Agreement pattern interpretation
Filtering by secondary reason, link weight, facility
Success criteria for well-tuned definition

7. Data Quality Corrective Actions (KSA 8)

Categorizing issues: missing, dummy, format, invalid, transposition
EMPI configuration vs. source system fixes
Null value lists for dummy data
Custom normalization for format issues
Engaging data stewards for long-term improvements

8. Composite Records (KSA 9)

Purpose of Composite Record
Trust tier ranking concept
Manual override scenarios
Five aging options and appropriate use cases
When overrides expire (linkage group changes)
Retroactive vs. prospective application

9. Rules (KSA 10)

Built-in rules: Twins, Siblings, Roommates
onCreateClassifiedPair() method structure
Editable properties: linkStatus, secondaryReason, comment
Link status values (0-3)
Gender mismatch example
When to recommend custom rules
Testing and monitoring rule impact

Common Exam Scenarios

Scenario 1: Given data quality issues (high percentage of dummy phone numbers from Facility A), what corrective actions would you recommend?

Answer: Add dummy values to null value list for Telecoms parameter, reduce Telecoms agreement weight for that facility, engage Facility A to improve data collection

Scenario 2: After running MLE calibration, you see SSN aWeight is 15 and dWeight is -12. What does this mean and what should you do next?

Answer: SSN is highly discriminating in your dataset. Agreement strongly indicates match, disagreement strongly indicates non-match. Apply results, rebuild linkage data with Weights option, review Worklist to validate improved accuracy.

Scenario 3: Worklist shows many gender mismatch pairs being automatically linked. What actions would you take?

Answer: Implement custom onCreateClassifiedPair() rule to downgrade pairs with gender disagreement to potential link (linkStatus = 1, secondaryReason = "Gender"), rebuild linkage data, review filtered Worklist by secondaryReason.

Scenario 4: Organization wants manual overrides for addresses to persist for 6 months. How would you configure this?

Answer: Navigate to Settings > Composites tab, select "After a number of days" option, enter 180 in days field, apply changes. Note this applies only to future overrides.

Scenario 5: Which threshold values would produce the largest Worklist volume: Review=5/Autolink=15/Validate=25 or Review=15/Autolink=25/Validate=30?

Answer: First option (5/15/25) produces larger Worklist because lower thresholds capture more pairs in Review and Validate categories.

Study Approach

1. Hands-on Practice: Use Definition Designer in test environment 2. Understand Relationships: How weights, thresholds, and rules interact 3. Scenario-Based Thinking: Practice diagnosing issues and recommending solutions 4. Terminology Mastery: Know precise definitions of technical terms 5. Process Flows: Understand sequences (MLE → Rebuild → Review → Adjust → Iterate) 6. Tool Navigation: Be able to quickly locate specific settings in UI 7. Code Examples: Study normalization and rule examples, understand variables 8. Data Quality: Connect quality issues to appropriate remediation strategies

Critical Distinctions

Normalization (standardizing format) vs. Agreement Function (comparing normalized values)
Agreement Weight (positive, added) vs. Disagreement Weight (negative, subtracted)
Review Threshold (minimum to consider) vs. Autolink Threshold (minimum to link automatically)
MLE Calibration (calculates weights) vs. Threshold Adjustment (manual tuning)
Trust Tiers (source reliability) vs. Aging Factors (override duration)
Built-in Rules (predefined) vs. Custom Rules (onCreateClassifiedPair method)
Link Keys (identify candidates) vs. Parameters (detailed comparison)

Final Preparation

Review all 10 KSAs systematically: 1. Collect information on demographic data 2. Appraise demographic data quality 3. Select normalization approaches 4. Modify parameters in Definition Designer 5. Review and revise thresholds 6. Review and revise weights using MLE 7. Evaluate tuning effectiveness 8. Recommend data quality corrective actions 9. Recommend Composite Record trust tiers and aging 10. Recommend changes to rules

Master these skills through practical application, scenario analysis, and thorough understanding of underlying concepts. The exam tests not just knowledge but ability to apply these concepts to real-world EMPI implementation challenges.

Documentation References

InterSystems Documentation

1. Understanding Demographic Data Collection and Assessment Report Issue

Key Points

Detailed Notes

Collecting Site Information

Appraising Data Quality

Documentation References

2. Data Normalization Strategies Report Issue

Key Points

Detailed Notes

Default Normalization Functions

Selecting Normalization Approaches for Problem Data

Custom Normalization Function Structure

Example: Gender Normalization

Example: Date Normalization for Invalid Dates

Important Consideration

Documentation References

3. Modifying Parameters in Definition Designer Report Issue

Key Points

Detailed Notes

Accessing Definition Designer

Settings Tab

Data Classes Tab

Parameters Tab

Saving and Applying Changes

Documentation References

4. Threshold Values and Adjustment Process Report Issue

Key Points

Detailed Notes

The Three Thresholds

Example Threshold Scenario

Tuning Philosophy

Threshold Adjustment Process

Tools for Threshold Analysis

Documentation References

5. Maximum Likelihood Estimation (MLE) Process Report Issue

Key Points

Detailed Notes

Understanding MLE

Running MLE Calibration

Dataset Requirements

Iterative MLE Process

After MLE Calibration

Documentation References

6. Agreement and Disagreement Weights Revision Report Issue

Key Points

Detailed Notes

Weight Fundamentals

How Weights Affect Linkage

Strategic Weight Adjustment

Linkage Type Parameters

Weight Revision Workflow

Documentation References

7. Evaluating Tuning Process Effectiveness Report Issue

Key Points

Detailed Notes

Worklist Analysis

Key Metrics to Track

Agreement Pattern Analysis

Filtering and Reporting

Effectiveness Evaluation Process

Success Criteria

Documentation References

8. Recommending Data Quality Corrective Actions Report Issue

Key Points

Detailed Notes

Data Quality Tool Analysis

Categorizing Data Quality Issues

Corrective Action Strategies

Custom Data Quality Rules

Balancing Configuration vs. Correction

Documentation References

9. Composite Record Trust Tiers and Aging Factors Report Issue

Key Points

Detailed Notes

Understanding Composite Records

Trust Tier Configuration

Manual Overrides

Aging Factor Configuration

Important Considerations

Recommendations by Property Type

1. Understanding Demographic Data Collection and Assessment

2. Data Normalization Strategies

3. Modifying Parameters in Definition Designer

4. Threshold Values and Adjustment Process

5. Maximum Likelihood Estimation (MLE) Process

6. Agreement and Disagreement Weights Revision

7. Evaluating Tuning Process Effectiveness

8. Recommending Data Quality Corrective Actions

9. Composite Record Trust Tiers and Aging Factors

10. Recommending Changes to Rules

11. Link-Key Indices and Preliminary Matching

12. Exam Preparation Tips