1. Decision Framework for Data Quality Issue Resolution
Key Points
- Evaluate correction location: source system vs EMPI vs intermediate system
- Consider permanence and scalability of solutions
- Assess impact on existing linked records
- Balance immediate fixes with long-term sustainability
- Prioritize patient safety and data integrity
Detailed Notes
Resolving data quality issues requires careful analysis to determine the most appropriate correction point. The decision framework involves evaluating multiple factors to select the optimal approach.
Correction Location Options
There are three primary locations where data quality corrections can be implemented:
Source System Corrections (Preferred for sustainable solutions):
- Modify the source system to prevent issues at the point of data entry
- Implement validation rules in registration systems
- Update source system workflows and user interfaces
- Provide training to data entry personnel
*Advantages*:
- Prevents problems at the source
- Improves data quality for all downstream systems (not just EMPI)
- Most sustainable long-term solution
- Reduces ongoing correction burden
*Disadvantages*:
- Requires coordination with facility IT teams
- May require source system vendor involvement
- Longer implementation timeline
- May not be possible for legacy or vendor-controlled systems
EMPI-Level Corrections (Most common for handling variation):
- Apply normalization functions to standardize data
- Use null value lists to ignore dummy values
- Implement exclusion conditions to filter problematic records
- Create custom preprocessing logic
*Advantages*:
- Centralized control within EMPI
- Can be implemented quickly by EMPI administrator
- Handles variations from multiple source systems consistently
- Does not require source system changes
*Disadvantages*:
- Does not improve source data quality
- Requires ongoing maintenance as new patterns emerge
- May mask underlying source system issues
- Other systems receiving source data still see poor quality
Intermediate System Corrections (For integration layer transformation):
- Modify HL7 transformations in interfaces
- Apply DTL (Data Transformation Language) corrections in HealthShare productions
- Use Edge Gateway or Hub transformations
- Implement custom business processes
*Advantages*:
- Can standardize data before reaching EMPI
- Useful when source system cannot be changed
- Can apply site-specific transformations
- Maintains audit trail of transformations
*Disadvantages*:
- Adds complexity to integration layer
- Must maintain transformation logic separately from EMPI configuration
- May require coordination with interface team
- Can make troubleshooting more difficult
Decision Criteria
When evaluating correction approaches, consider these factors:
Root Cause of Issue:
- Source System Limitation: If the source system doesn't validate input or allows problematic values, consider source system correction or EMPI normalization
- Data Entry Process: If users are entering dummy values or skipping fields, address through training and source system validation
- Interface Transformation Error: If data is correct in source but corrupted during transmission, fix the interface
- Legacy Data: If old data has issues but new data is clean, may need one-time correction rather than ongoing normalization
Scope and Impact:
- Affects All Facilities: Implement EMPI-level correction that applies universally
- Facility-Specific Issue: Consider source system correction or facility-specific normalization
- Single Field Problem: Normalization function or null value list may suffice
- Systemic Data Quality: Requires comprehensive approach involving source system improvements
Urgency and Risk:
- Patient Safety Risk (Overlays, MRN reuse): Implement immediate EMPI fix, then pursue source system correction
- Matching Accuracy Impact: EMPI normalization may be sufficient
- Reporting/Analytics Issue: May tolerate delay for source system fix
Resource Availability:
- EMPI Administrator Control: Can implement EMPI-level fixes quickly
- Requires Facility Cooperation: Source system fixes may take months
- Interface Team Involvement: HL7 transformation changes require coordination
- Vendor Dependency: May be limited to EMPI-level corrections if source is vendor system
Sustainability and Maintenance:
- One-Time Problem: Manual correction or temporary fix acceptable
- Ongoing Issue: Invest in permanent source system or EMPI configuration solution
- Expected to Worsen: Proactive source system correction essential
- Expected to Improve: Temporary EMPI fix may suffice
Common Issue Resolution Patterns
Issue: Dummy phone numbers (e.g., "999-999-9999") Best Approach: Add to EMPI null value list Rationale: Quick implementation, handles variation from multiple facilities, normalization is appropriate role for EMPI
Issue: MRN reuse at a facility Best Approach: Source system correction to prevent reuse, manual EMPI cleanup for existing cases Rationale: MRN reuse is serious patient safety issue that must be prevented at source; EMPI cannot prevent source from sending duplicate MRNs
Issue: Name suffix inconsistency (Jr., JR, Junior) Best Approach: EMPI normalization function to standardize suffixes Rationale: Name variation is expected, normalization is designed for this purpose
Issue: Interface sending wrong field for middle name Best Approach: Fix HL7 transformation in interface Rationale: Data is correct in source, interface is corrupting it; fix at integration layer
Issue: Records with "Baby Boy" or "Baby Girl" as given name Best Approach: Add to EMPI null value list, request source system validation to prompt for update Rationale: Temporary names are unavoidable for newborns, but should be treated as null for matching; source should prompt for update at follow-up visits
Issue: Veterinary patients in human patient data (identified by MRN prefix "VET") Best Approach: EMPI exclusion condition to filter these records Rationale: Source cannot change feed, EMPI should exclude non-human records from matching
Issue: SSN field contains medical record number Best Approach: Fix interface transformation, add validation to source system Rationale: Field mapping error in interface; also implement source validation to prevent future errors
---
Documentation References
2. EMPI Normalization Functions
Key Points
- List of values treated as null - ignore dummy/placeholder values
- MRN normalization - standardize format variations
- Exclusion conditions - filter unwanted records entirely
- Custom normalization functions - site-specific transformations
- Parameter-level vs record-level normalization
Detailed Notes
Normalization is the process of transforming data into a standardized, consistent format to improve matching accuracy. InterSystems EMPI provides several normalization mechanisms that can be configured in the Linkage Definition.
List of Values Treated as Null
The null value list specifies values that should be ignored for matching purposes. When a field contains a value from the null value list, it is treated as if the field were empty and does not contribute to the link weight.
Configuration Location: Linkage Definition Designer > Parameters tab > Select a parameter > "List of values to be treated as null" field
Format: Comma-separated list (no spaces unless spaces are significant characters)
- Values are NOT case-sensitive: "UNKNOWN" matches "unknown" matches "Unknown"
- Exact match required: "999-999-9999" will not match "9999999999"
Common Null Value Lists:
*Social Security Number*: ``` 000-00-0000,111-11-1111,123-45-6789,999-99-9999 ```
*Phone Number*: ``` 000-000-0000,999-999-9999,123-456-7890,555-555-5555 ```
*Given Name / Family Name*: ``` Unknown,Test,Baby,Baby Boy,Baby Girl,UNKNOWN,TEST ```
*Gender*: ``` U,Unknown,O ```
*Street Address*: ``` Unknown,123 Main St,PO Box 0,N/A ```
Important Considerations:
Interaction with Normalization Functions: If both a null value list and a normalization function are defined for a parameter, the normalization function runs first and can override the null value list behavior. To ensure the null value list is checked, add this code to the beginning of custom normalization functions:
```objectscript if (%property="")||$data(%parameters("_MissingValueArray",$zcvt(%property,"l"))) quit "" ```
Impact on Matching: Null values are ignored in both directions:
- If Record A has null SSN and Record B has SSN "123-45-6789", SSN contributes 0 to link weight (neutral)
- If both Record A and Record B have null SSN, SSN still contributes 0 (not an agreement)
Use Cases:
- Eliminate dummy values entered when real value is unavailable
- Handle placeholder values that will be updated later
- Ignore test data that leaked into production
- Treat clearly invalid values as missing
MRN Normalization Function
The MRN Normalization Function standardizes medical record number formats before comparison and storage in the normalized table.
Default Behavior: The default MRN normalization:
- Converts to uppercase
- Removes hyphens, spaces, and other non-alphanumeric characters
- Preserves leading zeros
Custom MRN Normalization: Create custom normalization for facility-specific requirements:
*Scenario 1 - Remove Prefix*: Facility adds facility code prefix "MC-" to all MRNs, but matching should ignore the prefix.
```objectscript /// Custom MRN normalization to remove "MC-" prefix ClassMethod NormalizeMRN(property As %String) As %String { // Remove "MC-" prefix if present if $extract(property,1,3)="MC-" { set property = $extract(property,4,*) }
// Apply default normalization (uppercase, remove non-alphanumeric) quit ##class(HSPI.LinkageType.Default).Normalize(property) } ```
*Scenario 2 - Standardize Length*: Facility migrated from 6-digit to 8-digit MRNs by adding leading zeros, need to normalize old format to new.
```objectscript /// Pad MRN to 8 digits with leading zeros ClassMethod NormalizeMRN(property As %String) As %String { // Remove non-numeric characters set numeric = "" for i=1:1:$length(property) { set char = $extract(property,i) if char?1N set numeric = numeric _ char }
// Pad to 8 digits while $length(numeric)<8 { set numeric = "0" _ numeric }
quit numeric } ```
*Scenario 3 - Handle Multiple Formats*: Facility uses different MRN formats for different departments (numeric for inpatient, alphanumeric for outpatient).
```objectscript /// Normalize various MRN formats ClassMethod NormalizeMRN(property As %String) As %String { // Convert to uppercase set property = $zcvt(property,"U")
// Remove spaces and hyphens set property = $replace(property," ","") set property = $replace(property,"-","")
// If purely numeric, pad to standard length if property?1.N { while $length(property)<8 { set property = "0" _ property } }
quit property } ```
Configuration: Linkage Definition Designer > Parameters tab > Select MRN parameter > Normalization Function field > Enter class method name (e.g., `##class(Custom.Package.Utils).NormalizeMRN`)
Exclusion Conditions
Exclusion conditions filter records entirely from the EMPI matching process. Records matching exclusion criteria are not stored in the normalized table and do not participate in linkage.
Configuration Location: Linkage Definition Designer > General tab > "Exclusion Condition" field
Syntax: ObjectScript boolean expression that evaluates to true (exclude record) or false (include record)
- Can reference patient record fields using %patient object
- Must return 1 (exclude) or 0 (include)
Common Exclusion Patterns:
*Exclude Veterinary Patients by MRN Prefix*: ```objectscript // Exclude if MRN starts with "VET" $extract(%patient.MRN,1,3)="VET" ```
*Exclude Test Patients*: ```objectscript // Exclude if last name is "Test" or "Testpatient" (%patient.FamilyName="TEST")||(%patient.FamilyName="TESTPATIENT") ```
*Exclude Records Without Minimum Data*: ```objectscript // Exclude if missing both name and DOB ((%patient.FamilyName="")||(%patient.GivenName=""))&&(%patient.BirthDate="") ```
*Exclude Specific Facility's Old Data*: ```objectscript // Exclude facility ABC records before 2020 (%patient.AssigningAuthority="ABC")&&(%patient.CreatedOn<$zdatetimeh("2020-01-01",3)) ```
*Exclude Deceased Patients (if source sends deceased indicator)*: ```objectscript // Exclude if deceased indicator is "Y" %patient.DeceasedIndicator="Y" ```
*Complex Exclusion - Multiple Conditions*: ```objectscript // Exclude if: // 1. Veterinary patient (MRN starts with VET), OR // 2. Test patient (last name contains TEST), OR // 3. Missing critical demographics (no name AND no DOB) ($extract(%patient.MRN,1,3)="VET")||($ patient.FamilyName["TEST")||(((%patient.FamilyName="")||(%patient.GivenName=""))&&(%patient.BirthDate="")) ```
Important Considerations:
Permanent Exclusion: Excluded records are not stored in normalized table and cannot be retrieved for matching. Changing exclusion condition later requires full linkage rebuild.
Search Impact: Excluded records do not appear in Patient Search or Worklist. They effectively do not exist from EMPI perspective.
Source Data Preservation: Original records remain in HSPI.Data.Patient table (source table) even when excluded. Exclusion only affects normalized table and matching.
Use Cases:
- Filter non-human patients from human patient matching
- Exclude test/training data from production matching
- Filter out records with insufficient data for meaningful matching
- Temporarily exclude problematic facility data during remediation
- Exclude deceased patients if business rules require
---
Documentation References
3. Custom Preprocessing and Advanced Normalization
Key Points
- Custom normalization functions for complex transformations
- Preprocessing hook for record-level modifications
- Name parsing and standardization logic
- Address parsing and geocoding integration
- Phone number format standardization
Detailed Notes
Beyond basic null value lists and default normalization, EMPI supports custom preprocessing and normalization functions for complex data quality scenarios.
Custom Normalization Functions
Normalization functions are ObjectScript class methods that transform individual field values. Each linkage parameter can have its own normalization function.
Function Signature: ```objectscript ClassMethod NormalizeField(property As %String) As %String { // Transformation logic quit transformedValue } ```
Available Variables:
- `%property`: The field value to normalize
- `%patient`: The complete patient record object (provides access to other fields if needed)
- `%parameters`: Array of linkage definition parameters
Name Normalization Example - Handle Suffixes: ```objectscript /// Standardize name suffixes ClassMethod NormalizeName(property As %String) As %String { // Convert to uppercase set property = $zcvt(property,"U")
// Remove extra whitespace set property = $zstrip(property,"<>W")
// Standardize suffixes set property = $replace(property," JR"," JR.") set property = $replace(property," SR"," SR.") set property = $replace(property," JUNIOR"," JR.") set property = $replace(property," SENIOR"," SR.") set property = $replace(property," III"," 3RD") set property = $replace(property," II"," 2ND")
quit property } ```
Address Normalization Example - Standardize Street Types: ```objectscript /// Standardize street address components ClassMethod NormalizeAddress(property As %String) As %String { // Convert to uppercase set property = $zcvt(property,"U")
// Standardize street types (use official USPS abbreviations) set property = $replace(property," STREET"," ST") set property = $replace(property," AVENUE"," AVE") set property = $replace(property," ROAD"," RD") set property = $replace(property," BOULEVARD"," BLVD") set property = $replace(property," DRIVE"," DR") set property = $replace(property," LANE"," LN") set property = $replace(property," COURT"," CT")
// Standardize directionals set property = $replace(property," NORTH "," N ") set property = $replace(property," SOUTH "," S ") set property = $replace(property," EAST "," E ") set property = $replace(property," WEST "," W ")
// Remove apartment designators for matching (keeps base address) set property = $piece(property," APT ",1) set property = $piece(property," UNIT ",1) set property = $piece(property," #",1)
quit property } ```
Phone Number Normalization Example - Extract Digits Only: ```objectscript /// Normalize phone to digits only ClassMethod NormalizePhone(property As %String) As %String { set digits = ""
// Extract only numeric characters for i=1:1:$length(property) { set char = $extract(property,i) if char?1N set digits = digits _ char }
// If starts with "1" (country code) and is 11 digits, remove leading 1 if ($length(digits)=11)&&($extract(digits,1)="1") { set digits = $extract(digits,2,11) }
// Must be exactly 10 digits to be valid if $length(digits)'=10 quit ""
quit digits } ```
Date Normalization Example - Handle Various Formats: ```objectscript /// Normalize date from various input formats to ODBC format ClassMethod NormalizeDate(property As %String) As %String { // Try to parse as ODBC date (YYYY-MM-DD) set odbcDate = ##class(%Library.Date).DisplayToLogical(property,4) if odbcDate'="" quit ##class(%Library.Date).LogicalToDisplay(odbcDate,4)
// Try MM/DD/YYYY format set usDate = ##class(%Library.Date).DisplayToLogical(property,3) if usDate'="" quit ##class(%Library.Date).LogicalToDisplay(usDate,4)
// Could not parse, return empty quit "" } ```
Preprocessing Hook for Record-Level Transformation
For transformations that require access to multiple fields or complex logic, implement the preprocessing callback method.
Method: Override `PreProcess()` method in custom linkage definition class
Use Cases:
- Derive fields from other fields
- Complex validation logic
- Conditional normalization based on multiple fields
- Data enrichment from external sources
Example - Swap Reversed Names: ```objectscript /// Detect and fix reversed name order Method PreProcess(pPatient As HSPI.Data.Patient) As %Status { // If family name contains comma, assume names are reversed if pPatient.FamilyName["," { // Parse "LastName, FirstName" set lastName = $piece(pPatient.FamilyName,",",1) set firstName = $piece(pPatient.FamilyName,",",2)
// Remove extra whitespace set lastName = $zstrip(lastName,"<>W") set firstName = $zstrip(firstName,"<>W")
// Fix the reversal set pPatient.FamilyName = lastName set pPatient.GivenName = firstName }
quit $$$OK } ```
Example - Conditional MRN Handling: ```objectscript /// Apply different MRN normalization based on facility Method PreProcess(pPatient As HSPI.Data.Patient) As %Status { // Memorial Hospital uses format "MH-123456" if pPatient.AssigningAuthority="MEMORIAL" { if $extract(pPatient.MRN,1,3)="MH-" { set pPatient.MRN = $extract(pPatient.MRN,4,*) } }
// City General uses 8-digit format with leading zeros if pPatient.AssigningAuthority="CITYGEN" { while ($length(pPatient.MRN)<8)&&(pPatient.MRN?1.N) { set pPatient.MRN = "0" _ pPatient.MRN } }
quit $$$OK } ```
---
Documentation References
4. Source System Data Quality Improvements
Key Points
- Implement validation rules in source registration systems
- Provide staff training on data quality importance
- Design workflows that require complete demographics
- Monitor source data quality metrics
- Establish feedback loop from EMPI to source systems
Detailed Notes
The most sustainable data quality improvements address issues at their source. While EMPI normalization can handle variation, preventing problems from entering the data stream is preferable.
Source System Validation Rules
Implement validation logic in registration systems to prevent problematic data entry:
Required Field Validation:
- Make critical fields mandatory (Name, DOB, Gender minimum)
- Require SSN for inpatient registration (if policy allows)
- Require phone number or alternate contact method
- Prevent saving incomplete records
Format Validation:
- SSN must match format XXX-XX-XXXX and pass basic validation (not 000-00-0000, not 123-45-6789)
- Phone must be 10 digits with valid area code
- ZIP code must be 5 digits or 5+4 format
- DOB must be valid date and patient must be between 0 and 120 years old
Value Validation:
- Names cannot be "Unknown", "Test", "Baby" (require temporary ID process for newborns)
- Phone cannot be "999-999-9999" or other known dummy values
- Gender must be from defined value set
- Address cannot be PO Box for certain registration types
Uniqueness Validation:
- Check for existing patient with same demographics before creating new record
- Warn when SSN already exists in system for different patient name
- MRN uniqueness validation - Critical: Prevent MRN reuse by checking that MRN is not already assigned to a different patient
User Interface Design
Design registration screens to encourage complete, accurate data entry:
Smart Defaults:
- Use appropriate defaults that are not misleading (e.g., leave blank rather than default to "Unknown")
- Remember previous values where appropriate (same city/state for local facility)
Helpful Prompts:
- If phone number is missing, prompt "Patient declined to provide" checkbox
- If SSN is missing, prompt reason (Patient doesn't know, Patient declined, etc.)
- For newborns, provide specific "Temporary - Update Required" workflow
Verification Steps:
- Display complete patient summary before saving
- Require confirmation for suspicious patterns (e.g., "Patient is 150 years old - verify DOB")
- Highlight incomplete or questionable fields
Duplicate Detection:
- Search for existing patients during registration
- Display possible matches based on name, DOB
- Require user to confirm "create new patient" vs "use existing patient"
Staff Training and Process Improvement
Data quality requires organizational commitment:
Training Programs:
- Explain why data quality matters (patient safety, correct medical records)
- Show examples of how poor data quality causes problems
- Demonstrate correct data entry techniques
- Provide reference guide for handling special cases (newborns, patients without SSN, etc.)
Data Quality Metrics:
- Share facility-specific data quality scorecards with registration staff
- Recognize high-performing staff/departments
- Provide feedback when patterns are detected (e.g., "ER registration has 40% missing phone numbers - please emphasize collection")
Process Changes:
- Implement "update patient demographics" step at every visit
- Create workflow to update temporary newborn names at first follow-up visit
- Establish process to verify and update address when patient moves
- Define escalation process for complex cases (patient refuses to provide name, homeless patient, etc.)
Feedback Loop from EMPI
Establish communication channel to report source data quality issues back to facilities:
Regular Reports:
- Monthly data quality scorecard to each facility
- Highlight specific issues (dummy phone numbers, missing SSN, etc.)
- Track trends (improving, stable, degrading)
Incident Reporting:
- Alert facility when serious issue is detected (MRN reuse, overlay, etc.)
- Provide specific record examples (de-identified if necessary)
- Request investigation and correction at source
Collaboration:
- Regular meetings with facility data quality coordinators
- Discuss patterns and root causes
- Share best practices across facilities
- Coordinate on system changes and upgrades
---
Documentation References
5. Interface and Transformation Layer Corrections
Key Points
- HL7 transformations standardize data during integration
- DTL (Data Transformation Language) enables field-level corrections
- Fix field mapping errors in interface layer
- Handle facility-specific data formats
- Maintain transformation logic separately from EMPI config
Detailed Notes
When data is correct in the source system but arrives incorrectly in EMPI, the problem likely exists in the interface or transformation layer. HL7 transformations and DTL provide mechanisms to correct data during integration.
HL7 Transformation Corrections
HL7 interfaces transmit patient demographic data from source systems to EMPI. Transformation errors can corrupt data during this process.
Common HL7 Transformation Issues:
*Field Mapping Errors*:
- Middle name mapped to suffix field
- Home phone mapped to business phone
- Address line 2 (apartment) mapped to address line 1 (street)
- SSN field contains driver's license number
*Data Type Conversion Errors*:
- Date formats not converted correctly (MM/DD/YYYY vs YYYY-MM-DD)
- Numeric fields contain non-numeric characters
- String fields truncated due to length limits
*Character Encoding Issues*:
- Special characters corrupted (O'Brien becomes OBrien)
- Accented characters lost (José becomes Jose)
- Non-ASCII characters replaced with question marks
Correcting HL7 Transformations:
1. Identify the specific HL7 segment and field causing the issue 2. Locate the transformation in the HealthShare production (typically a DTL or custom transformation class) 3. Modify the transformation to correct the mapping or conversion 4. Test with sample HL7 messages 5. Deploy and monitor
Example - Fix Field Mapping: ``` Problem: PID-5 (Patient Name) middle name is being mapped to suffix
HL7 Segment: PID|1||MRN12345||SMITH^JOHN^ALLEN^JR^||
Incorrect Transformation: FamilyName = PID-5.1 (SMITH) ✓ GivenName = PID-5.2 (JOHN) ✓ MiddleName = PID-5.4 (JR) ✗ WRONG - this is suffix Suffix = PID-5.3 (ALLEN) ✗ WRONG - this is middle name
Corrected Transformation: FamilyName = PID-5.1 (SMITH) GivenName = PID-5.2 (JOHN) MiddleName = PID-5.3 (ALLEN) ✓ Suffix = PID-5.4 (JR) ✓ ```
DTL (Data Transformation Language) Corrections
DTL is InterSystems' graphical transformation language used in HealthShare productions to transform messages.
DTL Correction Scenarios:
*Conditional Field Mapping*: ``` If source system sends "M" for Male, "F" for Female, but also "MALE", "FEMALE" (inconsistent):
DTL Logic: If source.Gender = "MALE" Then target.Gender = "M" If source.Gender = "FEMALE" Then target.Gender = "F" Otherwise target.Gender = source.Gender ```
*Data Cleansing*: ``` Remove leading/trailing whitespace from name fields:
DTL Action: target.GivenName = ##class(%Library.String).Strip(source.GivenName) target.FamilyName = ##class(%Library.String).Strip(source.FamilyName) ```
*Default Value Replacement*: ``` Replace placeholder SSN with null:
DTL Logic: If source.SSN = "000-00-0000" Then target.SSN = "" Otherwise target.SSN = source.SSN ```
*Complex Transformation*: ``` Parse combined address field into separate components:
DTL Action (calls custom ObjectScript method): do ##class(Custom.Utils).ParseAddress(source.Address, .street, .city, .state, .zip) target.StreetLine = street target.City = city target.State = state target.PostalCode = zip ```
Custom Business Process Corrections
For complex scenarios, create custom business process classes in the HealthShare production:
Use Cases:
- Lookup/enrichment from external data sources
- Complex validation requiring multiple steps
- Conditional routing based on data quality
- Quarantine of problematic records for manual review
Example - Data Quality Quarantine Process: ```objectscript /// Route messages with poor data quality to quarantine Method OnMessage(pRequest As HS.Message.Patient) As %Status { // Check data quality set qualityScore = ..CalculateQualityScore(pRequest)
if qualityScore < 50 { // Poor quality - send to quarantine queue do ..SendToQueue("Data Quality Review Queue", pRequest) quit $$$OK }
// Good quality - proceed to EMPI quit ..SendRequestSync("EMPI Input Service", pRequest) }
Method CalculateQualityScore(pPatient As HS.Message.Patient) As %Integer { set score = 100
// Deduct points for missing critical fields if pPatient.Name.Family="" set score = score - 20 if pPatient.Name.Given="" set score = score - 20 if pPatient.SSN="" set score = score - 10 if pPatient.BirthDate="" set score = score - 15 if pPatient.Address.Street="" set score = score - 10 if pPatient.Telecom.Phone="" set score = score - 10
// Deduct points for known dummy values if pPatient.SSN="000-00-0000" set score = score - 15 if pPatient.Telecom.Phone="999-999-9999" set score = score - 10
quit score } ```
---
Documentation References
6. System Component Modifications for Recurring Issues
Key Points
- Modify linkage definition parameters for data characteristics
- Adjust agreement/disagreement weights based on observed patterns
- Create custom agreement functions for special cases
- Implement data domains to handle facility-specific rules
- Design custom rules for automated exception handling
Detailed Notes
When data quality issues recur despite normalization and transformation efforts, consider modifying EMPI system components to handle the patterns systematically.
Linkage Definition Parameter Modifications
Adjust linkage parameters to better accommodate data quality characteristics:
Agreement Weight Adjustments:
If a parameter has poor data quality, reduce its weight so it contributes less to matching: ``` SSN has 30% missing data and 10% dummy values:
- Reduce agreement weight from 15 to 8
- Rationale: SSN is less reliable in this dataset, should not dominate matching decision
```
If a parameter has excellent data quality, increase its weight: ``` MRN is highly reliable (unique, no reuse, always populated):
- Increase agreement weight from 5 to 12
- Rationale: Can trust MRN matches more strongly
```
Disagreement Weight Adjustments:
When a field frequently has innocent variations, reduce disagreement penalty: ``` Address has high variation due to formatting differences:
- Reduce disagreement weight from -8 to -3
- Rationale: Address disagreement may be due to formatting, not actual different patients
```
When a field should be deterministic, increase disagreement penalty: ``` SSN should be unique per patient - disagreement is very suspicious:
- Increase disagreement weight from -10 to -25
- Rationale: Different SSNs strongly indicates different patients
```
Custom Agreement Functions
Create custom agreement functions for parameters with special matching requirements:
Partial Match Credit:
Standard agreement functions return full agreement weight for exact match, 0 for mismatch. Custom functions can return partial credit:
```objectscript /// Phone agreement with partial credit for same area code ClassMethod PhoneAgreement(value1 As %String, value2 As %String, agrWeight As %Numeric, disWeight As %Numeric) As %Numeric { // Exact match - full agreement weight if value1=value2 quit agrWeight
// If both have same area code (first 3 digits), partial credit if $extract(value1,1,3)=$extract(value2,1,3) { quit agrWeight * 0.3 // 30% of agreement weight }
// Different area codes - disagreement weight quit disWeight } ```
Nickname Matching:
```objectscript /// Name agreement with nickname recognition ClassMethod NameAgreement(value1 As %String, value2 As %String, agrWeight As %Numeric, disWeight As %Numeric) As %Numeric { // Exact match if $zcvt(value1,"U")=$zcvt(value2,"U") quit agrWeight
// Check nickname table if ##class(Custom.NicknameTable).AreNicknames(value1, value2) { quit agrWeight * 0.8 // 80% credit for nickname match }
// Check first initial match (Bob vs Robert both start with different letters, so this is separate check) if $extract(value1,1)=$extract(value2,1) { quit agrWeight * 0.2 // 20% credit for same first initial }
quit 0 // Different names, but don't penalize (could be middle name vs first name scenario) } ```
Data Domain Configuration
Data domains allow facility-specific rules and constraints:
Purpose: Prevent within-facility duplicates while allowing cross-facility matches
Configuration: Linkage Definition Designer > General tab > Domain field
Implementation: ```objectscript /// Define data domain by facility Method getDataDomain(pPatient As HSPI.Data.Patient) As %String { // Use facility as domain quit pPatient.AssigningAuthority } ```
Impact:
- Two records from same facility (same domain) with high similarity → Duplicate category in Worklist
- Two records from different facilities with high similarity → Linked normally
- Helps identify MRN reuse and within-facility registration errors
Custom Rules for Automated Handling
Create custom linkage rules to automatically handle specific data quality patterns:
Rule to Auto-Link Based on Unique Identifier: ``` Rule: If SSN matches AND SSN is not in dummy value list, force link
Implementation:
- Check that SSN field is populated and not null
- Check that SSN is not in known dummy list
- If criteria met, create link with LinkReason="Rule" and SecondaryReason="SSN Match"
```
Rule to Auto-Unlink Based on Conflicting Data: ``` Rule: If Gender codes are different, force unlink
Implementation:
- Check that both records have populated Gender
- Check that Gender codes are different (not just one null)
- If criteria met, create non-link with LinkReason="Rule" and SecondaryReason="Gender Conflict"
```
Rule to Flag for Review: ``` Rule: If link weight is high but Address strongly disagrees, flag for manual review
Implementation:
- Check that link weight > 40 (high similarity)
- Check that Address disagreement weight is substantial
- Add record pair to Worklist with CustomStatus="Address Conflict Review"
```
---