T43.1: Implements Mirroring

Knowledge Review - InterSystems IRIS System Administration Specialist

1. Describe Mirror Architecture (Failover, Async, Reporting)

Key Points

  • Failover members: Primary and backup for automatic failover with no data loss
  • Async members: DR async (disaster recovery) and reporting async (data warehousing)
  • Mirror supports up to 16 total members with 2 failover + 14 async members
  • ISCAgent and Arbiter components support automatic failover decisions
  • Logical data replication avoids risks of physical replication corruption

Detailed Notes

InterSystems IRIS mirroring provides high availability through logical data replication between physically independent systems. A mirror consists of failover members and optional async members.

Failover Members: A mirror requires exactly two failover members for automatic failover. At any time, one acts as primary (providing database access) and the other as backup (maintaining synchronized copies). When the primary becomes unavailable, the backup takes over automatically with no data loss. The failover members are coequal - neither is preferred as primary. Network latency between failover members significantly impacts performance, so they should be located to minimize latency.

Async Members - DR Type: Disaster recovery async members maintain asynchronous copies and can be manually promoted to failover member if both failover members fail. A DR async belongs to one mirror only, but you can configure up to 14 async members. DR asyncs provide geographically dispersed disaster recovery capability.

Async Members - Reporting Type: Reporting async members maintain read-only or read-write copies for data mining and business intelligence. They cannot be promoted to failover members. A reporting async can belong to up to 10 mirrors simultaneously, functioning as an enterprise-wide data warehouse bringing together databases from separate locations.

Supporting Components: ISCAgent runs on each mirror member's host system, providing communication when normal channels are interrupted. The Arbiter provides an additional decision point for failover, preventing split-brain scenarios where both members think they should be primary.

Mirroring avoids single points of failure by maintaining independent resources on primary and backup members, and uses logical data replication to avoid risks associated with physical replication like out-of-order updates and carry-forward corruption.

2. Configure Mirror Members

Key Points

  • Configure two failover members with matching mirror name
  • Primary and backup roles assigned during initial configuration
  • Configure network addresses for mirror communication channels
  • Virtual IP (VIP) typically used for external client connections
  • ECP application servers automatically redirect after failover (no VIP needed)

Detailed Notes

Configuring a mirror involves setting up the two failover members to communicate and replicate data synchronously. The configuration process establishes the mirror relationship and sets up communication channels.

Initial Setup: Create the mirror on the first failover member, specifying a mirror name. Then configure the second failover member to join the same mirror name. During initial configuration, one member is designated as primary and the other as backup, but these are temporary designations only.

Network Configuration: Failover members communicate through several channels using multiple network addresses:

  • Mirror connection addresses for member-to-member communication
  • Agent connection addresses for ISCAgent communication
  • Optional arbiter connection address
  • Virtual IP address for external client redirection (optional)

Virtual IP Address: External clients typically connect through a VIP that is always bound to the current primary. When failover occurs, the VIP moves to the new primary, automatically redirecting client connections. However, application servers in a distributed cache cluster (ECP) automatically redirect to the new primary without requiring a VIP.

Important Considerations:

  • The two failover members are coequal; neither is preferred as primary
  • Primary and backup are temporary designations that change during failover
  • Network latency between failover members directly impacts application performance
  • Members should be physically located to minimize network latency
  • Failover members can be configured in separate data centers for additional redundancy

The backup failover member's purpose is to be ready to take over as primary. It is NOT supported to use the backup member directly to run queries or application code - the LOCK command will fail with a PROTECT error if attempted.

3. Add Databases to a Mirror

Key Points

  • Databases must be journaled before adding to mirror
  • Add databases to mirror from primary member
  • Databases automatically synchronized to backup member
  • Mirrored databases maintain exact copies on both members
  • Only databases explicitly added are mirrored

Detailed Notes

Once a mirror is created, you can add databases to be mirrored between the failover members. Only databases explicitly added to the mirror configuration will be replicated.

Prerequisites: Before adding a database to a mirror, journaling must be enabled for that database. Journaling provides the complete record of all database modifications that mirroring uses for replication. Write image journaling (WIJ) is automatically enabled and protects against physical corruption during updates.

Adding Process: Databases are added to the mirror from the primary member using the Management Portal or programmatic methods. When a database is added: 1. The database is marked as mirrored on the primary 2. Journal records for that database begin replicating to the backup 3. The backup creates or updates its copy of the database 4. Ongoing changes are synchronously replicated

Synchronization: The mirroring process maintains exact copies of mirrored databases on both failover members. When data is modified on the primary: 1. The change is written to the primary's journal 2. Journal records are sent to the backup 3. Backup acknowledges receipt 4. Backup applies the changes to its database copy 5. Transaction commits on primary after backup acknowledgment

This synchronous replication ensures zero data loss during failover. The backup always has the most recent committed transactions.

Database Access: On the primary, mirrored databases are read-write. On the backup, they are read-only - any attempt to modify data will generate a PROTECT error. After failover, the new primary's databases become read-write and the new backup's become read-only.

Namespace Mapping: Applications access data through namespaces, which map to databases. The namespace configuration should be consistent across both mirror members to ensure applications work correctly after failover.

4. Monitor Mirror Status

Key Points

  • Mirror Monitor provides real-time status of all mirror members
  • Shows current primary/backup roles and member health
  • Displays dejournaling lag and database synchronization status
  • Monitor agent connectivity and communication channels
  • Track failover history and mirror events

Detailed Notes

Monitoring mirror status is essential for ensuring high availability and identifying potential issues before they cause outages. InterSystems IRIS provides comprehensive monitoring through the Mirror Monitor and related tools.

Mirror Monitor: Accessed through the Management Portal (System Administration > Mirror Monitor), the Mirror Monitor displays:

  • Current primary and backup member identification
  • Member status (Primary, Backup, Connected, Disconnected, etc.)
  • Database synchronization status
  • Dejournaling lag (how far behind backup is in applying journal records)
  • Agent connectivity status
  • Recent failover events

Health Indicators:

  • Green/Normal: Member is functioning correctly, databases synchronized
  • Yellow/Warning: Minor issues like dejournaling lag or temporary disconnection
  • Red/Error: Serious issues like member down, agent disconnected, or major sync lag

Dejournaling Status: The backup continuously applies (dejournals) journal records from the primary. Monitor displays:

  • Current dejournaling position
  • Lag time or journal file count behind primary
  • Rate of dejournaling progress
  • Any dejournaling errors or issues

Connection Monitoring: Track the status of communication channels:

  • Mirror connection between failover members
  • ISCAgent connections
  • Arbiter connection (if configured)
  • Application server (ECP) connections to mirror

Event History: The monitor maintains a history of mirror events including:

  • Failover occurrences (automatic and manual)
  • Member state changes
  • Connection/disconnection events
  • Configuration changes
  • Error conditions

Proactive Monitoring: Regular monitoring allows you to:

  • Detect network issues affecting mirror communication
  • Identify dejournaling lag that could impact failover recovery time
  • Verify successful database synchronization
  • Confirm both members are healthy and ready for failover
  • Track failover frequency and causes

You can also monitor mirror status programmatically using the %SYS.Mirror class methods and SQL queries against mirror status tables for integration with enterprise monitoring systems.

5. Perform Manual and Automatic Failover

Key Points

  • Automatic failover: triggered by primary failure, no data loss
  • Manual/planned failover: for maintenance, testing, or upgrades
  • Failover typically completes in seconds
  • Application servers (ECP) automatically reconnect to new primary
  • VIP address moves to new primary for client redirection

Detailed Notes

Failover is the process where the backup member becomes the new primary when the current primary becomes unavailable. Failover can be automatic (triggered by system detection) or manual (administrator-initiated).

Automatic Failover Process: 1. Primary becomes unavailable (crash, network failure, shutdown) 2. Backup detects loss of communication with primary 3. Backup contacts ISCAgent on primary's host to verify primary is truly down 4. If arbiter configured, backup contacts arbiter for confirmation 5. Backup promotes itself to primary role 6. Virtual IP (if used) moves to new primary 7. Databases on new primary become read-write 8. Application servers (ECP) automatically reconnect to new primary 9. External clients reconnect via VIP or connection redirect

Automatic Failover Triggers:

  • Primary InterSystems IRIS instance crashes or stops
  • Primary host system failure
  • Network interruption between primary and backup
  • Primary becomes unresponsive to health checks

Manual Failover (Planned): Manual failover is initiated by an administrator for:

  • Planned maintenance on primary member
  • Operating system or hardware upgrades
  • InterSystems IRIS version upgrades
  • Testing failover procedures
  • Rebalancing load after recovering from unplanned failover

To perform manual failover: 1. Access Mirror Monitor on intended new primary 2. Select "Become Primary" action 3. Confirm the failover operation 4. System performs controlled role swap 5. Former primary becomes new backup 6. Databases and connections transition smoothly

Failover Duration: Typical automatic failover completes in seconds to tens of seconds, depending on:

  • Network latency between members
  • Dejournaling lag on backup
  • Number and size of mirrored databases
  • Application server reconnection time

Failover and Applications:

  • Applications using ECP: automatic reconnection to new primary, minimal disruption
  • Applications using VIP: automatic redirection when VIP moves
  • Applications with hardcoded connections: may require manual reconnection
  • Transaction in progress during failover: rolled back, application must retry

Zero Data Loss: Because mirroring uses synchronous replication, the backup acknowledges receipt of journal records before the primary commits transactions. This guarantees that all committed transactions are on the backup at failover time, ensuring zero data loss.

Post-Failover: After failover:

  • Monitor the new primary for normal operation
  • Investigate cause of failover if it was unplanned
  • When failed member recovers, it automatically rejoins as backup
  • Consider manual failover back to original primary if desired

Important: If a problem is detected on the primary and the backup is available, failover occurs immediately even if the primary problem might resolve on its own. This ensures maximum availability but means transient issues can trigger failover.

6. Understand Arbiter Role and Configuration

Key Points

  • Arbiter provides tie-breaking vote to prevent split-brain scenarios
  • Lightweight component, minimal system requirements
  • Should be on separate system from both failover members
  • Contacted by backup during failover decision process
  • Prevents both members from simultaneously acting as primary

Detailed Notes

The arbiter is an optional but recommended component that provides an additional decision point during failover. It prevents "split-brain" scenarios where both failover members might simultaneously believe they should be primary due to network issues.

Arbiter Purpose: When the backup loses contact with the primary, it must determine whether:

  • The primary has actually failed (should become primary)
  • Network issues prevent communication but primary is still running (should NOT become primary)

The arbiter provides a third perspective. If the backup can contact the arbiter but not the primary, this indicates the primary likely failed rather than just a network partition.

Split-Brain Prevention: Without an arbiter, network issues between failover members could cause both to act as primary simultaneously, leading to:

  • Divergent database copies
  • Data conflicts
  • Complex recovery procedures
  • Potential data loss

The arbiter prevents this by serving as a witness - the member that can contact the arbiter is allowed to be primary.

Arbiter Requirements:

  • Lightweight process with minimal CPU/memory requirements
  • Should run on a different host from both failover members
  • Ideally in a different physical location/network segment from members
  • Requires reliable network connectivity to both failover members
  • Available as Docker container or standard installation

Arbiter Installation: The arbiter can be:

  • Installed as part of ISCAgent on any system
  • Run as a Docker container (intersystems/arbiter image)
  • Deployed on virtual or physical hardware
  • Hosted on a third system in the mirror infrastructure

Arbiter Configuration: 1. Install arbiter on chosen host system 2. Configure arbiter IP address and port 3. Add arbiter configuration to both failover members 4. Failover members automatically contact arbiter during failover decisions 5. No ongoing management needed - arbiter operates autonomously

Arbiter Communication: During failover: 1. Backup loses contact with primary 2. Backup attempts to contact ISCAgent on primary host 3. Backup contacts arbiter 4. If backup can reach arbiter but not primary/agent, backup becomes primary 5. If backup cannot reach arbiter, it remains backup (primary likely still active)

Arbiter Recommendations:

  • Always use an arbiter in production mirror configurations
  • Place arbiter on independent network segment if possible
  • Monitor arbiter connectivity as part of mirror monitoring
  • Consider arbiter when planning network architecture
  • Test failover scenarios with arbiter disconnected to understand behavior

Without Arbiter: If no arbiter is configured, failover decisions rely on ISCAgent alone. This is acceptable for testing but not recommended for production due to increased risk of split-brain scenarios.

Exam Preparation Summary

Critical Concepts to Master:

  1. Mirror Member Types: Failover (2 max, auto-failover), DR Async (manual promotion), Reporting Async (read-only, multi-mirror)
  2. Failover Members: Exactly 2 required, coequal (no preferred primary), zero data loss
  3. Mirror Limits: Up to 16 total members (2 failover + 14 async)
  4. ISCAgent: Runs on each member's host, enables communication when normal channels fail
  5. Arbiter: Third-party decision point for failover, prevents split-brain scenarios
  6. Virtual IP (VIP): External client connection point, moves with primary role
  7. ECP + Mirroring: App servers auto-redirect after failover (no VIP needed)

Common Exam Scenarios:

  • Identifying when to use failover vs DR async vs reporting async members
  • Understanding automatic failover requirements (ISCAgent, Arbiter)
  • Configuring mirror member network addresses and VIP
  • Troubleshooting mirror synchronization and failover issues
  • Planning mirror topology for disaster recovery
  • Understanding split-brain prevention with Arbiter
  • Promoting DR async to failover member

Hands-On Practice Recommendations:

  • Configure a two-node failover mirror
  • Add DR async and reporting async members
  • Install and configure ISCAgent and Arbiter
  • Test automatic failover scenarios
  • Monitor mirror status and synchronization
  • Practice manual failover and DR async promotion
  • Configure VIP for client connections

Report an Issue