T1.3: Plans and Executes Data Load - InterSystems Enterprise Master Patient Index Technical Specialist Study Guide

1. Understanding Data Loading Options in InterSystems EMPI

Key Points

InterSystems EMPI supports multiple data ingestion methods
Batch Import utility for initial CSV/flat file loads
HL7 TCP/IP services for ongoing real-time data feeds
SDA (Summary Document Architecture) format support
Choice depends on source system capabilities and timing requirements

Detailed Notes

InterSystems EMPI (Enterprise Master Patient Index), formerly known as HealthShare Patient Index (HSPI), provides flexible data loading mechanisms to accommodate different deployment scenarios and data source requirements. The system supports both batch loading for initial data migration and real-time streaming for ongoing operations.

Data Loading Approaches

The primary data loading methods available in InterSystems EMPI are:

Batch Import Utility: Designed for loading large volumes of patient records from CSV or delimited flat files. This method is particularly useful when migrating data from legacy systems or other master patient index solutions that can export patient demographics in tabular format. The batch import utility is the recommended approach for initial data loads.

HL7 TCP/IP Services: For ongoing data feeds, InterSystems EMPI can receive HL7 messages over TCP/IP connections. This real-time approach is standard for healthcare interoperability and is typically used after the initial data load is complete. HL7 ADT (Admission, Discharge, Transfer) messages are the most common format for patient demographic updates.

SDA Format: The Summary Document Architecture (SDA) format is another option for loading patient data. While SDA is more commonly associated with clinical documents, it can also carry patient demographic information. Organizations can drop SDA files into the SDAIn directory for processing.

Selecting the Initial Data Loading Approach

When planning the initial data load for a new InterSystems EMPI deployment, technical specialists must consider several factors:

Data Source Format: If the source system can export data as CSV or delimited text files, the Batch Import utility is the most efficient choice. If the source system can only send HL7 messages, then HL7 TCP/IP services should be configured from the beginning.

Volume and Timing: For large initial loads (hundreds of thousands or millions of patient records), batch import is significantly faster than processing individual HL7 messages. The batch utility can process records in bulk, making it ideal for one-time migration scenarios.

Source System Availability: Some organizations prefer batch loads because they can be performed during maintenance windows without requiring continuous connectivity to the source system. HL7 feeds require the source system to be online and capable of sending messages.

Data Quality Requirements: Before loading large volumes of data, it is critical to run the Configuration Check utility to ensure the system is properly configured for production use. InterSystems recommends running this check before any batch load operation.

---

Documentation References

InterSystems Documentation

2. Configuring the Batch Import Utility

Key Points

Batch Import utility uses delimited file format
Configuration through Production settings in Edge Gateway
Requires Facility Registry setup (except standalone deployments)
File Service component processes dropped files automatically
Log files track processing success and errors

Detailed Notes

The Batch Import utility in InterSystems EMPI provides a straightforward mechanism for loading patient records from delimited files. This section covers the configuration steps required to enable batch data loading.

Prerequisites for Batch Import

Before configuring the Batch Import utility, ensure that all facilities providing data are properly registered in the system. For deployments that include the HealthShare Unified Care Record Registry:

1. Navigate to Registry Management > Facility Registry in the Management Portal 2. For each facility, specify:

Facility Code: A unique identifier for the facility
Name: A descriptive name for the facility
Edge Gateway: The gateway that will accept patient records from this facility

3. Save the facility configuration

Note: For standalone InterSystems EMPI deployments (deployment model 4), the Facility Registry step can be skipped as there is no Registry component.

Configuring the File Service Component

The batch import functionality is provided by a file service component that monitors a designated directory for incoming files. To configure this component:

1. In the Management Portal, switch to the namespace of the Edge Gateway 2. Navigate to Interoperability > Configure > Production 3. Select the plus sign (+) next to Services to add a new service 4. From the Service Class dropdown, select HS.MPI.Dataload.Delimited.FileService 5. Click OK to create the component

Once the component is created, configure its settings:

Enabled: Select this checkbox to activate the file service.

File Path: Enter the full directory path where data files will be dropped. This directory will be monitored continuously for new files. You can use the magnifying glass icon to browse for the path. Example: `/data/empi/batch-import/`

Target Config Names: For deployments with Registry, select HUB from the dropdown. This routes the patient records through the Hub component for processing. For standalone InterSystems EMPI, keep the default value of HS.Hub.MPI.Manager.

LogFileSpec: Specify the full path and filename for the processing log. This log captures details about each file processed, including record counts, errors, and processing time. Example: `/data/empi/logs/batch-import.log`

After configuring these settings, click Apply to save the changes. The production must be running for the file service to monitor the directory and process files.

Batch Load File Specification

The Batch Import utility expects files in a specific delimited format. Consult the Batch Load File Specification appendix in the InterSystems EMPI Configuration Guide for details on:

Required and optional fields
Field delimiters and escape characters
Date and time formats
Facility and assigning authority codes
Medical record number (MRN) formatting

The file specification defines how demographic data elements (name, date of birth, gender, address, identifiers, etc.) must be structured in the input file.

---

Documentation References

InterSystems Documentation

3. Testing the Initial Data Load

Key Points

Start with small test file before full load
Monitor log file for errors and processing status
Verify records appear in Person Index
Check facility and assigning authority assignments
Use Data Quality tools to assess loaded data

Detailed Notes

After configuring the Batch Import utility, it is essential to test the configuration with a small sample file before attempting to load the full dataset. This testing phase helps identify and resolve configuration issues, data format problems, and facility mapping errors.

Creating a Test File

Prepare a test file containing 10-50 patient records that represent the variety of data in your full dataset. Include records with:

Different name formats (including suffixes, prefixes, multiple middle names)
Various date formats to confirm correct parsing
Multiple facilities if loading multi-facility data
Different identifier types (MRN, SSN, driver's license, etc.)
International addresses if applicable
Records with minimal data to test required field validation

Executing the Test Load

To execute a test load:

1. Ensure the production is running and the File Service component is enabled 2. Copy the test file to the configured File Path directory 3. The file should be picked up automatically within seconds 4. Monitor the LogFileSpec file for processing messages

The log file will show:

File detection and processing start time
Number of records processed
Any validation errors or warnings
Processing completion time and status

Verifying Loaded Data

After the test file is processed, verify that the data was loaded correctly:

Search for Test Patients: Use the Person Index search functionality to locate patients from your test file. Search by name, date of birth, or medical record number to confirm records are accessible.

Review Facility Assignments: Verify that each record is associated with the correct facility code. The Facility Registry must contain entries that match the facility codes in your batch file.

Check Assigning Authorities: Confirm that medical record numbers and other identifiers are assigned to the correct assigning authority. Assigning authorities must be registered in the system before loading data.

Inspect Data Quality: Navigate to the Data Quality Manager to assess the quality of the loaded demographic data. Look for:

Missing required fields
Invalid date formats
Suspicious patterns (all records from same date, sequential MRNs, etc.)

Common Issues During Testing

File Not Processed: If the file remains in the directory without being processed, check:

Production is running
File Service component is enabled
File path is correct and accessible
File permissions allow read/delete access

Records Rejected: If records are rejected during processing, common causes include:

Missing required fields (name, date of birth, facility code)
Invalid facility code not in Facility Registry
Invalid assigning authority code
Date format mismatches
Delimiter or escape character errors in file format

Incorrect Facility or Authority: If records load but have wrong facility or authority assignments:

Verify Facility Registry contains correct mappings
Check that facility codes in file exactly match registry (case-sensitive)
Confirm assigning authority codes are registered in system

---

Documentation References

InterSystems Documentation

4. Configuring Ongoing HL7 Data Feeds

Key Points

HL7 ADT messages for patient demographic updates
TCP/IP service configured in Production
Standard HL7 v2.x message formats (A01, A04, A08, A31, etc.)
Inbound service listens on configured port
Message acknowledgment (ACK) confirms processing

Detailed Notes

Once the initial data load is complete, most InterSystems EMPI deployments transition to ongoing data feeds using HL7 messages. HL7 (Health Level 7) is the healthcare industry standard for exchanging patient demographic and clinical information.

Understanding HL7 ADT Messages

ADT (Admission, Discharge, Transfer) messages are the HL7 message type used for patient demographic data. Common ADT message types processed by InterSystems EMPI include:

A01 (Admit/Visit Notification): Sent when a patient is admitted or registers for a visit. Contains demographic information and may create a new patient record if one doesn't exist.

A04 (Register a Patient): Similar to A01 but specifically for patient registration without admission.

A08 (Update Patient Information): Sent when patient demographic data changes (name change, address update, etc.). Updates existing patient records.

A31 (Update Person Information): Another update message type that may contain demographic changes.

A40 (Merge Patient): Indicates that two patient records should be merged because they represent the same person.

Configuring HL7 TCP/IP Service

To enable HL7 data feeds, configure an HL7 TCP/IP service in the InterSystems EMPI production:

1. Navigate to Interoperability > Configure > Production 2. Add a new Business Service 3. Select an HL7 service class (e.g., EnsLib.HL7.Service.TCPService) 4. Configure the following settings:

Port Number: The TCP port on which the service will listen for incoming HL7 messages. Coordinate this port number with the sending system. Common ports include 5000-5999.

Target Config Names: Similar to batch import, specify HUB for Registry deployments or HS.Hub.MPI.Manager for standalone EMPI.

ACK Mode: Configure acknowledgment behavior. Options include:

Immediate ACK (acknowledge receipt immediately)
Application ACK (acknowledge after processing)
Never (no acknowledgment - not recommended)

Message Schema: Specify the HL7 version (2.3, 2.4, 2.5, etc.) that matches your source system.

Testing HL7 Connectivity

Before going live with HL7 feeds, perform connectivity testing:

Port Accessibility: Confirm that the configured port is accessible from the sending system. Network firewalls must allow traffic on the specified port.

Message Format: Send test messages and verify they are parsed correctly. Check the Message Viewer in the Management Portal to inspect received messages.

Acknowledgment: Verify that the sending system receives ACK messages. This confirms bidirectional communication is working.

Error Handling: Send intentionally malformed messages to test error handling and ensure problematic messages are logged appropriately.

HL7In Directory Alternative

In addition to TCP/IP services, InterSystems EMPI supports file-based HL7 message processing. The HL7In directory can be configured to accept HL7 message files that are dropped into the directory. This approach is useful for:

Testing message processing without network connectivity
Batch processing of accumulated HL7 messages
Integration with systems that generate HL7 files rather than sending over TCP/IP

To use the HL7In directory approach, configure a file service similar to the batch import service, but specify the HL7 message parser instead of the delimited file parser.

---

Documentation References

InterSystems Documentation

5. Ongoing Data Loading Process Management

Key Points

Production monitoring ensures continuous operation
Message queues show processing backlog
Alerts notify of service interruptions
Regular data quality checks on incoming data
Linkage builds may be needed after large data loads

Detailed Notes

After the initial data load and HL7 service configuration, ongoing management ensures reliable continuous data loading. This section covers operational best practices for maintaining data feeds.

Production Monitoring

The InterSystems EMPI production should be monitored regularly to ensure data feeds are functioning properly:

Production Status: Navigate to Interoperability > Configure > Production to view the production status. All components should show a green status indicator. Yellow or red indicators signal problems requiring attention.

Message Queue Depth: Monitor the queue depth for business services and operations. Large or growing queues indicate processing bottlenecks. Normal operations should have minimal queued messages.

Message Throughput: Review message processing rates to ensure they meet expected volumes. Sudden drops in throughput may indicate connectivity issues or source system problems.

Error Monitoring and Alerting

Configure alerts to notify administrators of data loading issues:

Suspended Messages: When messages cannot be processed (due to validation errors, system errors, etc.), they are suspended. Configure alerts to notify staff when messages are suspended so they can be investigated and resolved.

Service Interruptions: Set up monitoring to detect when HL7 services stop receiving messages. This could indicate network issues, source system downtime, or service configuration problems.

Log File Monitoring: For batch import operations, implement automated monitoring of log files to detect processing failures or data quality issues.

Data Quality Monitoring

As data is loaded continuously, implement ongoing data quality monitoring:

Data Quality Manager: Use the Data Quality Manager tool in InterSystems EMPI to track data quality metrics over time. This tool identifies trends such as:

Increasing rates of missing demographic fields
Changes in data format or patterns
New facilities or assigning authorities appearing in data

Regular Reporting: Establish regular reporting (weekly or monthly) on data quality metrics. Compare quality scores over time to identify degradation requiring investigation.

Source System Coordination: When data quality issues are detected, coordinate with source system administrators to identify and resolve root causes. Issues may include:

Changes to source system configuration
Interface mapping errors
Data entry training needs

Linkage Builds After Data Loads

When large volumes of new patient records are loaded (whether through batch import or accumulated HL7 messages), a linkage build may be necessary to identify potential duplicate records and establish links between records representing the same patient.

Scheduling Builds: Plan linkage builds during off-peak hours as they are resource-intensive operations. For very large datasets, builds can take hours to complete.

Build Modes: Choose the appropriate build mode:

Full Rebuild: Reprocesses all linkage data from scratch. Required after linkage definition changes.
Incremental Build: Processes only new or modified records. Faster but only available in certain scenarios.

Build Monitoring: Monitor build progress through the Build Log. The log shows processing stages and identifies any errors encountered during the build.

---

Documentation References

InterSystems Documentation

6. Data Source Configuration and Management

Key Points

Facility Registry defines all data sources
Each facility must have unique code
Assigning Authority Registry manages identifier domains
Edge Gateway routing for multi-source deployments
Data source metadata critical for linkage accuracy

Detailed Notes

Proper configuration of data sources is fundamental to InterSystems EMPI operations. Each system or facility that sends patient data must be registered and configured correctly.

Facility Registry Configuration

The Facility Registry is the central repository of all data sources in the HealthShare federation. Each entry in the registry represents a distinct healthcare facility or system that contributes patient data.

Facility Code: A unique identifier for the facility, typically 2-5 characters. This code appears in all patient records from this facility and is used in linkage processing. Examples: "CGH" for Community General Hospital, "METRO" for Metro Medical Center.

Facility Name: A human-readable name for the facility. This name appears in user interfaces when displaying patient records.

Edge Gateway Assignment: For deployments using the Registry model, each facility must be associated with an Edge Gateway. The Edge Gateway is the entry point for data from that facility into the HealthShare federation.

Home Facility Setting: The "Home Facility" designation has special meaning in HealthShare deployments. No facility associated with an Edge Gateway should have this setting enabled. The Configuration Check utility validates this setting.

Uniqueness Requirement: All facility codes must be unique, and the system is case-sensitive. CGH and cgh would be treated as different facilities, which could cause data fragmentation. The Configuration Check utility verifies facility code uniqueness.

Assigning Authority Registry

Assigning authorities define the domains for patient identifiers like medical record numbers (MRNs). Each facility typically has its own assigning authority for MRNs, though some organizations use shared assigning authorities across multiple facilities.

Assigning Authority Code: Similar to facility codes, assigning authority codes must be unique and are case-sensitive. Example: "CGH" (if the facility assigns its own MRNs) or "SHAREDMRN" for a system-wide identifier.

Identifier Type: Assigning authorities can be defined for different identifier types:

MRN (Medical Record Number)
SSN (Social Security Number)
Driver's License
Passport Number
Enterprise identifiers

MPIID Management: The Master Person Index ID (MPIID) is a special identifier managed by InterSystems EMPI itself. The system automatically creates and maintains MPIIDs to uniquely identify each person across all source systems.

Multi-Facility Considerations

When loading data from multiple facilities, special considerations apply:

Consistent Coding: Ensure facility and assigning authority codes are consistent across all data sources. If one system sends "Community General Hospital" as the facility code and another sends "CGH", they will be treated as different facilities.

Network Name Coordination: The Network Name setting must align with Registry configuration. Mismatched network names prevent proper routing of patient data through the Edge Gateway to the Registry.

Gateway Capacity Planning: The Pool Size setting in the HS.MPI.HSPI.Operations component should be set equal to the sum of:

Number of Edge Gateways in the federation
Expected number of simultaneous patient searches

This ensures adequate capacity to handle concurrent requests from all data sources.

---

Documentation References

InterSystems Documentation

7. Pre-Load Configuration Validation

Key Points

Configuration Check utility validates production readiness
Run from Settings menu in Management Portal
Checks production components, registry settings, facility/authority uniqueness
Identifies issues before they affect data quality
InterSystems recommends running before batch loads

Detailed Notes

Before loading large volumes of patient data, InterSystems strongly recommends running the Configuration Check utility. This utility validates that the system is properly configured for production use and identifies common configuration errors that could impact data quality or system performance.

Purpose of Configuration Check

The Configuration Check utility serves as a pre-flight checklist for InterSystems EMPI deployments. It examines numerous configuration settings and identifies problems that should be corrected before data loading begins. Issues caught by the Configuration Check are much easier to fix before thousands or millions of patient records are loaded.

Running the Configuration Check

To run the Configuration Check utility:

1. Navigate to the Person Index menu in the Management Portal 2. Select Settings from the menu 3. Click the Run Configuration Check button (exact location varies by version) 4. Review the results displayed

The utility produces a report showing:

Configuration deployment model detected
Status of each checked setting (pass/fail/warning)
Specific issues found
Recommended corrective actions

Configuration Items Validated

The Configuration Check utility examines multiple aspects of the system configuration:

Hub MPI Manager Configuration: In Registry deployments, validates that the HS.Hub.MPI.Manager component is correctly configured:

Component name is exactly "HS.Hub.MPI.Manager"
Classname is HS.Hub.MPI.Manager (not a subclass)
MPIOperations component is set to HS.MPI.HSPI.Operations
DynamicAssignAuthorityRegistration is disabled (checkbox cleared)
DynamicFacilityRegistration is disabled (checkbox cleared)

Hub Web Service Configuration: Validates the MPITarget property in the Hub web service points to HS.Hub.MPI.Manager.

Production Settings: Checks that LogEnsSystemError is disabled in the Registry production. This setting can cause excessive logging and performance issues in production.

HSPI Operations Component: Validates the HS.MPI.HSPI.Operations business operation:

Enabled checkbox is selected
Pool Size is appropriate (equal to number of Edge Gateways plus expected simultaneous searches)

Facility Registry Validation: Checks all facilities in the registry:

All facility codes are unique (case-sensitive comparison)
No two facilities differ only by case (e.g., CGH and cgh)
No facilities associated with Edge Gateways have Home Facility setting enabled

Assigning Authority Validation: Checks all assigning authorities:

All codes are unique (case-sensitive)
No two authorities differ only by case

Addressing Configuration Check Failures

When the Configuration Check identifies issues, they should be addressed before proceeding with data loading:

Critical Failures: Issues that will prevent proper system operation or cause data corruption. These must be fixed before loading any data.

Warnings: Issues that may impact performance or data quality but do not prevent basic operation. These should be reviewed and fixed if applicable to your deployment.

Informational Messages: Status information that doesn't require action but provides context about your configuration.

After addressing identified issues, rerun the Configuration Check to confirm all problems are resolved.

---

Documentation References

InterSystems Documentation

8. Exam Preparation Tips

Key Points

Review section content

Detailed Notes

Review documentation for detailed information.

Documentation References

InterSystems Documentation

1. Understanding Data Loading Options in InterSystems EMPI Report Issue

Key Points

Detailed Notes

Data Loading Approaches

Selecting the Initial Data Loading Approach

Documentation References

2. Configuring the Batch Import Utility Report Issue

Key Points

Detailed Notes

Prerequisites for Batch Import

Configuring the File Service Component

Batch Load File Specification

Documentation References

3. Testing the Initial Data Load Report Issue

Key Points

Detailed Notes

Creating a Test File

Executing the Test Load

Verifying Loaded Data

Common Issues During Testing

Documentation References

4. Configuring Ongoing HL7 Data Feeds Report Issue

Key Points

Detailed Notes

Understanding HL7 ADT Messages

Configuring HL7 TCP/IP Service

Testing HL7 Connectivity

HL7In Directory Alternative

Documentation References

5. Ongoing Data Loading Process Management Report Issue

Key Points

Detailed Notes

Production Monitoring

Error Monitoring and Alerting

Data Quality Monitoring

Linkage Builds After Data Loads

Documentation References

6. Data Source Configuration and Management Report Issue

Key Points

Detailed Notes

Facility Registry Configuration

Assigning Authority Registry

Multi-Facility Considerations

Documentation References

7. Pre-Load Configuration Validation Report Issue

Key Points

Detailed Notes

Purpose of Configuration Check

Running the Configuration Check

Configuration Items Validated

Addressing Configuration Check Failures

Documentation References

8. Exam Preparation Tips Report Issue

Key Points

Detailed Notes

Documentation References

Report an Issue

1. Understanding Data Loading Options in InterSystems EMPI

2. Configuring the Batch Import Utility

3. Testing the Initial Data Load

4. Configuring Ongoing HL7 Data Feeds

5. Ongoing Data Loading Process Management

6. Data Source Configuration and Management

7. Pre-Load Configuration Validation

8. Exam Preparation Tips