1. Understanding Data Loading Options in InterSystems EMPI
Key Points
- InterSystems EMPI supports multiple data ingestion methods
- Batch Import utility for initial CSV/flat file loads
- HL7 TCP/IP services for ongoing real-time data feeds
- SDA (Summary Document Architecture) format support
- Choice depends on source system capabilities and timing requirements
Detailed Notes
InterSystems EMPI (Enterprise Master Patient Index), formerly known as HealthShare Patient Index (HSPI), provides flexible data loading mechanisms to accommodate different deployment scenarios and data source requirements. The system supports both batch loading for initial data migration and real-time streaming for ongoing operations.
Data Loading Approaches
The primary data loading methods available in InterSystems EMPI are:
Batch Import Utility: Designed for loading large volumes of patient records from CSV or delimited flat files. This method is particularly useful when migrating data from legacy systems or other master patient index solutions that can export patient demographics in tabular format. The batch import utility is the recommended approach for initial data loads.
HL7 TCP/IP Services: For ongoing data feeds, InterSystems EMPI can receive HL7 messages over TCP/IP connections. This real-time approach is standard for healthcare interoperability and is typically used after the initial data load is complete. HL7 ADT (Admission, Discharge, Transfer) messages are the most common format for patient demographic updates.
SDA Format: The Summary Document Architecture (SDA) format is another option for loading patient data. While SDA is more commonly associated with clinical documents, it can also carry patient demographic information. Organizations can drop SDA files into the SDAIn directory for processing.
Selecting the Initial Data Loading Approach
When planning the initial data load for a new InterSystems EMPI deployment, technical specialists must consider several factors:
Data Source Format: If the source system can export data as CSV or delimited text files, the Batch Import utility is the most efficient choice. If the source system can only send HL7 messages, then HL7 TCP/IP services should be configured from the beginning.
Volume and Timing: For large initial loads (hundreds of thousands or millions of patient records), batch import is significantly faster than processing individual HL7 messages. The batch utility can process records in bulk, making it ideal for one-time migration scenarios.
Source System Availability: Some organizations prefer batch loads because they can be performed during maintenance windows without requiring continuous connectivity to the source system. HL7 feeds require the source system to be online and capable of sending messages.
Data Quality Requirements: Before loading large volumes of data, it is critical to run the Configuration Check utility to ensure the system is properly configured for production use. InterSystems recommends running this check before any batch load operation.
---
Documentation References
2. Configuring the Batch Import Utility
Key Points
- Batch Import utility uses delimited file format
- Configuration through Production settings in Edge Gateway
- Requires Facility Registry setup (except standalone deployments)
- File Service component processes dropped files automatically
- Log files track processing success and errors
Detailed Notes
The Batch Import utility in InterSystems EMPI provides a straightforward mechanism for loading patient records from delimited files. This section covers the configuration steps required to enable batch data loading.
Prerequisites for Batch Import
Before configuring the Batch Import utility, ensure that all facilities providing data are properly registered in the system. For deployments that include the HealthShare Unified Care Record Registry:
1. Navigate to Registry Management > Facility Registry in the Management Portal 2. For each facility, specify:
- Facility Code: A unique identifier for the facility
- Name: A descriptive name for the facility
- Edge Gateway: The gateway that will accept patient records from this facility
3. Save the facility configuration
Note: For standalone InterSystems EMPI deployments (deployment model 4), the Facility Registry step can be skipped as there is no Registry component.
Configuring the File Service Component
The batch import functionality is provided by a file service component that monitors a designated directory for incoming files. To configure this component:
1. In the Management Portal, switch to the namespace of the Edge Gateway 2. Navigate to Interoperability > Configure > Production 3. Select the plus sign (+) next to Services to add a new service 4. From the Service Class dropdown, select HS.MPI.Dataload.Delimited.FileService 5. Click OK to create the component
Once the component is created, configure its settings:
Enabled: Select this checkbox to activate the file service.
File Path: Enter the full directory path where data files will be dropped. This directory will be monitored continuously for new files. You can use the magnifying glass icon to browse for the path. Example: `/data/empi/batch-import/`
Target Config Names: For deployments with Registry, select HUB from the dropdown. This routes the patient records through the Hub component for processing. For standalone InterSystems EMPI, keep the default value of HS.Hub.MPI.Manager.
LogFileSpec: Specify the full path and filename for the processing log. This log captures details about each file processed, including record counts, errors, and processing time. Example: `/data/empi/logs/batch-import.log`
After configuring these settings, click Apply to save the changes. The production must be running for the file service to monitor the directory and process files.
Batch Load File Specification
The Batch Import utility expects files in a specific delimited format. Consult the Batch Load File Specification appendix in the InterSystems EMPI Configuration Guide for details on:
- Required and optional fields
- Field delimiters and escape characters
- Date and time formats
- Facility and assigning authority codes
- Medical record number (MRN) formatting
The file specification defines how demographic data elements (name, date of birth, gender, address, identifiers, etc.) must be structured in the input file.
---
Documentation References
3. Testing the Initial Data Load
Key Points
- Start with small test file before full load
- Monitor log file for errors and processing status
- Verify records appear in Person Index
- Check facility and assigning authority assignments
- Use Data Quality tools to assess loaded data
Detailed Notes
After configuring the Batch Import utility, it is essential to test the configuration with a small sample file before attempting to load the full dataset. This testing phase helps identify and resolve configuration issues, data format problems, and facility mapping errors.
Creating a Test File
Prepare a test file containing 10-50 patient records that represent the variety of data in your full dataset. Include records with:
- Different name formats (including suffixes, prefixes, multiple middle names)
- Various date formats to confirm correct parsing
- Multiple facilities if loading multi-facility data
- Different identifier types (MRN, SSN, driver's license, etc.)
- International addresses if applicable
- Records with minimal data to test required field validation
Executing the Test Load
To execute a test load:
1. Ensure the production is running and the File Service component is enabled 2. Copy the test file to the configured File Path directory 3. The file should be picked up automatically within seconds 4. Monitor the LogFileSpec file for processing messages
The log file will show:
- File detection and processing start time
- Number of records processed
- Any validation errors or warnings
- Processing completion time and status
Verifying Loaded Data
After the test file is processed, verify that the data was loaded correctly:
Search for Test Patients: Use the Person Index search functionality to locate patients from your test file. Search by name, date of birth, or medical record number to confirm records are accessible.
Review Facility Assignments: Verify that each record is associated with the correct facility code. The Facility Registry must contain entries that match the facility codes in your batch file.
Check Assigning Authorities: Confirm that medical record numbers and other identifiers are assigned to the correct assigning authority. Assigning authorities must be registered in the system before loading data.
Inspect Data Quality: Navigate to the Data Quality Manager to assess the quality of the loaded demographic data. Look for:
- Missing required fields
- Invalid date formats
- Suspicious patterns (all records from same date, sequential MRNs, etc.)
Common Issues During Testing
File Not Processed: If the file remains in the directory without being processed, check:
- Production is running
- File Service component is enabled
- File path is correct and accessible
- File permissions allow read/delete access
Records Rejected: If records are rejected during processing, common causes include:
- Missing required fields (name, date of birth, facility code)
- Invalid facility code not in Facility Registry
- Invalid assigning authority code
- Date format mismatches
- Delimiter or escape character errors in file format
Incorrect Facility or Authority: If records load but have wrong facility or authority assignments:
- Verify Facility Registry contains correct mappings
- Check that facility codes in file exactly match registry (case-sensitive)
- Confirm assigning authority codes are registered in system
---
Documentation References
4. Configuring Ongoing HL7 Data Feeds
Key Points
- HL7 ADT messages for patient demographic updates
- TCP/IP service configured in Production
- Standard HL7 v2.x message formats (A01, A04, A08, A31, etc.)
- Inbound service listens on configured port
- Message acknowledgment (ACK) confirms processing
Detailed Notes
Once the initial data load is complete, most InterSystems EMPI deployments transition to ongoing data feeds using HL7 messages. HL7 (Health Level 7) is the healthcare industry standard for exchanging patient demographic and clinical information.
Understanding HL7 ADT Messages
ADT (Admission, Discharge, Transfer) messages are the HL7 message type used for patient demographic data. Common ADT message types processed by InterSystems EMPI include:
A01 (Admit/Visit Notification): Sent when a patient is admitted or registers for a visit. Contains demographic information and may create a new patient record if one doesn't exist.
A04 (Register a Patient): Similar to A01 but specifically for patient registration without admission.
A08 (Update Patient Information): Sent when patient demographic data changes (name change, address update, etc.). Updates existing patient records.
A31 (Update Person Information): Another update message type that may contain demographic changes.
A40 (Merge Patient): Indicates that two patient records should be merged because they represent the same person.
Configuring HL7 TCP/IP Service
To enable HL7 data feeds, configure an HL7 TCP/IP service in the InterSystems EMPI production:
1. Navigate to Interoperability > Configure > Production 2. Add a new Business Service 3. Select an HL7 service class (e.g., EnsLib.HL7.Service.TCPService) 4. Configure the following settings:
Port Number: The TCP port on which the service will listen for incoming HL7 messages. Coordinate this port number with the sending system. Common ports include 5000-5999.
Target Config Names: Similar to batch import, specify HUB for Registry deployments or HS.Hub.MPI.Manager for standalone EMPI.
ACK Mode: Configure acknowledgment behavior. Options include:
- Immediate ACK (acknowledge receipt immediately)
- Application ACK (acknowledge after processing)
- Never (no acknowledgment - not recommended)
Message Schema: Specify the HL7 version (2.3, 2.4, 2.5, etc.) that matches your source system.
Testing HL7 Connectivity
Before going live with HL7 feeds, perform connectivity testing:
Port Accessibility: Confirm that the configured port is accessible from the sending system. Network firewalls must allow traffic on the specified port.
Message Format: Send test messages and verify they are parsed correctly. Check the Message Viewer in the Management Portal to inspect received messages.
Acknowledgment: Verify that the sending system receives ACK messages. This confirms bidirectional communication is working.
Error Handling: Send intentionally malformed messages to test error handling and ensure problematic messages are logged appropriately.
HL7In Directory Alternative
In addition to TCP/IP services, InterSystems EMPI supports file-based HL7 message processing. The HL7In directory can be configured to accept HL7 message files that are dropped into the directory. This approach is useful for:
- Testing message processing without network connectivity
- Batch processing of accumulated HL7 messages
- Integration with systems that generate HL7 files rather than sending over TCP/IP
To use the HL7In directory approach, configure a file service similar to the batch import service, but specify the HL7 message parser instead of the delimited file parser.
---
Documentation References
5. Ongoing Data Loading Process Management
Key Points
- Production monitoring ensures continuous operation
- Message queues show processing backlog
- Alerts notify of service interruptions
- Regular data quality checks on incoming data
- Linkage builds may be needed after large data loads
Detailed Notes
After the initial data load and HL7 service configuration, ongoing management ensures reliable continuous data loading. This section covers operational best practices for maintaining data feeds.
Production Monitoring
The InterSystems EMPI production should be monitored regularly to ensure data feeds are functioning properly:
Production Status: Navigate to Interoperability > Configure > Production to view the production status. All components should show a green status indicator. Yellow or red indicators signal problems requiring attention.
Message Queue Depth: Monitor the queue depth for business services and operations. Large or growing queues indicate processing bottlenecks. Normal operations should have minimal queued messages.
Message Throughput: Review message processing rates to ensure they meet expected volumes. Sudden drops in throughput may indicate connectivity issues or source system problems.
Error Monitoring and Alerting
Configure alerts to notify administrators of data loading issues:
Suspended Messages: When messages cannot be processed (due to validation errors, system errors, etc.), they are suspended. Configure alerts to notify staff when messages are suspended so they can be investigated and resolved.
Service Interruptions: Set up monitoring to detect when HL7 services stop receiving messages. This could indicate network issues, source system downtime, or service configuration problems.
Log File Monitoring: For batch import operations, implement automated monitoring of log files to detect processing failures or data quality issues.
Data Quality Monitoring
As data is loaded continuously, implement ongoing data quality monitoring:
Data Quality Manager: Use the Data Quality Manager tool in InterSystems EMPI to track data quality metrics over time. This tool identifies trends such as:
- Increasing rates of missing demographic fields
- Changes in data format or patterns
- New facilities or assigning authorities appearing in data
Regular Reporting: Establish regular reporting (weekly or monthly) on data quality metrics. Compare quality scores over time to identify degradation requiring investigation.
Source System Coordination: When data quality issues are detected, coordinate with source system administrators to identify and resolve root causes. Issues may include:
- Changes to source system configuration
- Interface mapping errors
- Data entry training needs
Linkage Builds After Data Loads
When large volumes of new patient records are loaded (whether through batch import or accumulated HL7 messages), a linkage build may be necessary to identify potential duplicate records and establish links between records representing the same patient.
Scheduling Builds: Plan linkage builds during off-peak hours as they are resource-intensive operations. For very large datasets, builds can take hours to complete.
Build Modes: Choose the appropriate build mode:
- Full Rebuild: Reprocesses all linkage data from scratch. Required after linkage definition changes.
- Incremental Build: Processes only new or modified records. Faster but only available in certain scenarios.
Build Monitoring: Monitor build progress through the Build Log. The log shows processing stages and identifies any errors encountered during the build.
---
Documentation References
6. Data Source Configuration and Management
Key Points
- Facility Registry defines all data sources
- Each facility must have unique code
- Assigning Authority Registry manages identifier domains
- Edge Gateway routing for multi-source deployments
- Data source metadata critical for linkage accuracy
Detailed Notes
Proper configuration of data sources is fundamental to InterSystems EMPI operations. Each system or facility that sends patient data must be registered and configured correctly.
Facility Registry Configuration
The Facility Registry is the central repository of all data sources in the HealthShare federation. Each entry in the registry represents a distinct healthcare facility or system that contributes patient data.
Facility Code: A unique identifier for the facility, typically 2-5 characters. This code appears in all patient records from this facility and is used in linkage processing. Examples: "CGH" for Community General Hospital, "METRO" for Metro Medical Center.
Facility Name: A human-readable name for the facility. This name appears in user interfaces when displaying patient records.
Edge Gateway Assignment: For deployments using the Registry model, each facility must be associated with an Edge Gateway. The Edge Gateway is the entry point for data from that facility into the HealthShare federation.
Home Facility Setting: The "Home Facility" designation has special meaning in HealthShare deployments. No facility associated with an Edge Gateway should have this setting enabled. The Configuration Check utility validates this setting.
Uniqueness Requirement: All facility codes must be unique, and the system is case-sensitive. CGH and cgh would be treated as different facilities, which could cause data fragmentation. The Configuration Check utility verifies facility code uniqueness.
Assigning Authority Registry
Assigning authorities define the domains for patient identifiers like medical record numbers (MRNs). Each facility typically has its own assigning authority for MRNs, though some organizations use shared assigning authorities across multiple facilities.
Assigning Authority Code: Similar to facility codes, assigning authority codes must be unique and are case-sensitive. Example: "CGH" (if the facility assigns its own MRNs) or "SHAREDMRN" for a system-wide identifier.
Identifier Type: Assigning authorities can be defined for different identifier types:
- MRN (Medical Record Number)
- SSN (Social Security Number)
- Driver's License
- Passport Number
- Enterprise identifiers
MPIID Management: The Master Person Index ID (MPIID) is a special identifier managed by InterSystems EMPI itself. The system automatically creates and maintains MPIIDs to uniquely identify each person across all source systems.
Multi-Facility Considerations
When loading data from multiple facilities, special considerations apply:
Consistent Coding: Ensure facility and assigning authority codes are consistent across all data sources. If one system sends "Community General Hospital" as the facility code and another sends "CGH", they will be treated as different facilities.
Network Name Coordination: The Network Name setting must align with Registry configuration. Mismatched network names prevent proper routing of patient data through the Edge Gateway to the Registry.
Gateway Capacity Planning: The Pool Size setting in the HS.MPI.HSPI.Operations component should be set equal to the sum of:
- Number of Edge Gateways in the federation
- Expected number of simultaneous patient searches
This ensures adequate capacity to handle concurrent requests from all data sources.
---
Documentation References
7. Pre-Load Configuration Validation
Key Points
- Configuration Check utility validates production readiness
- Run from Settings menu in Management Portal
- Checks production components, registry settings, facility/authority uniqueness
- Identifies issues before they affect data quality
- InterSystems recommends running before batch loads
Detailed Notes
Before loading large volumes of patient data, InterSystems strongly recommends running the Configuration Check utility. This utility validates that the system is properly configured for production use and identifies common configuration errors that could impact data quality or system performance.
Purpose of Configuration Check
The Configuration Check utility serves as a pre-flight checklist for InterSystems EMPI deployments. It examines numerous configuration settings and identifies problems that should be corrected before data loading begins. Issues caught by the Configuration Check are much easier to fix before thousands or millions of patient records are loaded.
Running the Configuration Check
To run the Configuration Check utility:
1. Navigate to the Person Index menu in the Management Portal 2. Select Settings from the menu 3. Click the Run Configuration Check button (exact location varies by version) 4. Review the results displayed
The utility produces a report showing:
- Configuration deployment model detected
- Status of each checked setting (pass/fail/warning)
- Specific issues found
- Recommended corrective actions
Configuration Items Validated
The Configuration Check utility examines multiple aspects of the system configuration:
Hub MPI Manager Configuration: In Registry deployments, validates that the HS.Hub.MPI.Manager component is correctly configured:
- Component name is exactly "HS.Hub.MPI.Manager"
- Classname is HS.Hub.MPI.Manager (not a subclass)
- MPIOperations component is set to HS.MPI.HSPI.Operations
- DynamicAssignAuthorityRegistration is disabled (checkbox cleared)
- DynamicFacilityRegistration is disabled (checkbox cleared)
Hub Web Service Configuration: Validates the MPITarget property in the Hub web service points to HS.Hub.MPI.Manager.
Production Settings: Checks that LogEnsSystemError is disabled in the Registry production. This setting can cause excessive logging and performance issues in production.
HSPI Operations Component: Validates the HS.MPI.HSPI.Operations business operation:
- Enabled checkbox is selected
- Pool Size is appropriate (equal to number of Edge Gateways plus expected simultaneous searches)
Facility Registry Validation: Checks all facilities in the registry:
- All facility codes are unique (case-sensitive comparison)
- No two facilities differ only by case (e.g., CGH and cgh)
- No facilities associated with Edge Gateways have Home Facility setting enabled
Assigning Authority Validation: Checks all assigning authorities:
- All codes are unique (case-sensitive)
- No two authorities differ only by case
Addressing Configuration Check Failures
When the Configuration Check identifies issues, they should be addressed before proceeding with data loading:
Critical Failures: Issues that will prevent proper system operation or cause data corruption. These must be fixed before loading any data.
Warnings: Issues that may impact performance or data quality but do not prevent basic operation. These should be reviewed and fixed if applicable to your deployment.
Informational Messages: Status information that doesn't require action but provides context about your configuration.
After addressing identified issues, rerun the Configuration Check to confirm all problems are resolved.
---
Documentation References
8. Exam Preparation Tips
Key Points
- Review section content
Detailed Notes
Review documentation for detailed information.