T43.2: Implements ECP

Knowledge Review - InterSystems IRIS System Administration Specialist

1. Describe ECP Architecture (Data Server, Application Server)

Key Points

  • ECP enables horizontal scaling for user volume
  • Data server: stores data, manages locks, synchronizes caches
  • Application servers: handle user requests, maintain independent caches
  • Distributed caching transparent to applications and users
  • Easy to scale by adding/removing application servers

Detailed Notes

Enterprise Cache Protocol (ECP) is a core component of InterSystems IRIS that enables distributed caching, an architecturally straightforward, application-transparent, low-cost approach to horizontal scaling for user volume.

The Scaling Problem: When vertical scaling (adding memory/CPU to a single server) proves insufficient, horizontal scaling distributes the workload across multiple systems. For user-intensive workloads, the challenge is that all users share a single database cache on one server. As user volume grows, cache efficiency decreases and performance suffers.

ECP Solution: The distributed caching architecture scales horizontally by distributing both application logic and caching across a tier of application servers in front of a data server. Each application server handles user requests and maintains its own database cache, automatically kept in sync with the data server. The data server handles all data storage and management.

Architecture Components:

Data Server:

  • Stores data in local databases
  • Synchronizes application server caches to prevent stale data
  • Manages lock distribution across the entire cluster
  • Monitors application server connection status
  • Takes action if connections are interrupted

Application Servers:

  • Establish connections to data server when applications request data
  • Maintain their own database cache of retrieved data
  • Automatically connect to the server storing requested data
  • Monitor data server connections and attempt recovery if interrupted
  • Scale independently - add or remove servers without cluster reconfiguration

How It Works: 1. An application server becomes part of the cluster by adding the data server as a remote server 2. Databases on the data server are configured as remote databases on the application server 3. Local namespaces on the application server map to these remote databases 4. When users query a namespace, the application server fetches data from the data server 5. Retrieved data is cached locally on the application server 6. Subsequent requests for the same data are served from local cache 7. Data server ensures all application server caches remain coherent

Key Benefits:

  • Avoids expensive necessity of fitting entire working set in single server's memory
  • Each application server maintains independent working set of data
  • Add inexpensive commodity application servers to handle more users
  • Helps when application is limited by CPU capacity
  • Entirely transparent to users and application code
  • Interrupted connections automatically recovered or reset based on outage length

Cache Distribution Strategy: User requests can be load-balanced round-robin, but the most effective approach directs users with similar requests to the same application server, increasing cache efficiency. For example:

  • Healthcare: clinical users on one server, front-desk staff on another
  • Multiple applications: each application's users on separate servers
  • Department-based: group users by department or function

This targeted distribution maximizes cache hits because users with similar query patterns share cached data.

Scalability: The number of application servers can be increased or reduced without other reconfiguration of the cluster or operational changes. This makes it easy to scale as user volume grows.

2. Configure ECP Data Servers

Key Points

  • Enable ECP service on data server
  • Configure ECP server port and IP address
  • Set up security for application server connections
  • Define which databases are available for remote access
  • Monitor data server connections and performance

Detailed Notes

The data server configuration establishes which databases can be accessed remotely by application servers and how application servers authenticate and connect.

Enabling ECP Service: The data server must have the ECP service enabled to accept connections from application servers. This is configured in the Management Portal under System Administration > Configuration > Connectivity > ECP Settings.

Configuration includes:

  • Enable/disable ECP service
  • ECP server port number (default: 1972)
  • IP address(es) to listen on
  • Connection security settings
  • Timeout and buffer settings

Security Configuration: Application servers must authenticate to the data server. Security options include:

  • Password authentication using %Service_DataServer resource
  • Kerberos authentication
  • TLS/SSL encrypted connections
  • IP address restrictions
  • User/role-based access control

Best practice is to create a dedicated user account for ECP connections with appropriate privileges.

Database Availability: Not all databases on the data server need to be available for remote access. Configure which databases application servers can access through:

  • Database resource names
  • Namespace mappings
  • Access permissions (read-only vs. read-write)

Resource Name Configuration: Each database has a resource name used by application servers to reference it. The resource name:

  • Should be descriptive and consistent
  • Can differ from the database directory name
  • Must be unique on the data server
  • Is used in application server remote database configuration

Connection Management: The data server manages connections from application servers:

  • Tracks active application server connections
  • Monitors connection health
  • Detects disconnected application servers
  • Automatically cleans up resources after disconnection
  • Configurable timeout for considering connection lost

Performance Tuning: Data server ECP settings that affect performance:

  • Network buffers: Larger buffers improve throughput for high-latency connections
  • Connection timeout: Balance between detecting failures quickly and avoiding false disconnections
  • Cache coherency settings: How aggressively to invalidate application server caches
  • Lock timeout: How long to wait for distributed locks

Monitoring Data Server: Monitor data server ECP activity through:

  • Management Portal ECP data server monitor
  • System Management Portal > System Operation > ECP Data Servers
  • View connected application servers
  • Monitor data transfer rates
  • Track lock activity across cluster
  • Identify performance bottlenecks

Configuration Steps: 1. Open Management Portal on data server 2. Navigate to System Administration > Configuration > Connectivity > ECP Settings 3. Enable ECP service and set port 4. Configure security settings 5. Set up allowed application server connections 6. Configure database resources 7. Save and activate configuration 8. Verify ECP service is running

High Availability Considerations: When the data server is part of a mirror:

  • Application servers can be configured to automatically reconnect to new primary after failover
  • Use mirror connection rather than direct data server connection
  • Application servers treat mirror failover as data server restart
  • Processing continues on new primary with minimal disruption

3. Configure ECP Application Servers

Key Points

  • Add data server as remote server on application server
  • Configure remote databases from data server
  • Map namespaces to remote databases
  • Difference between local and remote databases is transparent
  • Application server maintains its own cache automatically

Detailed Notes

An application server is configured by adding remote servers and remote databases, then mapping namespaces to those databases. This process is transparent to applications.

Adding Remote Server: The first step is to configure the data server as a remote server on the application server: 1. Open Management Portal on application server 2. Navigate to System Administration > Configuration > Connectivity > ECP Connections 3. Add new ECP server connection 4. Specify data server host/IP and port 5. Configure authentication credentials 6. Set connection parameters (timeout, buffers, etc.) 7. Test connection and save

Remote Server Configuration Fields:

  • Remote Server Name: Descriptive name for this connection
  • IP Address/Hostname: How to reach the data server
  • Port: ECP port on data server (default 1972)
  • User/Password: Authentication credentials
  • Connection Type: Standard, SSL/TLS, or Kerberos
  • Timeout: How long to wait for responses
  • Auto-reconnect: Whether to automatically reconnect after interruption

Adding Remote Databases: Once the remote server is configured, add its databases as remote databases: 1. Navigate to System Administration > Configuration > Local Databases > Add Remote Database 2. Select the remote server 3. Specify the database resource name (from data server) 4. Configure local mount point for the remote database 5. Set access mode (read-only or read-write) 6. Save configuration

Remote Database Configuration:

  • Remote Resource Name: The database name as defined on data server
  • Local Mount Point: Where application server "mounts" the remote database
  • Access Mode: Read-only or read-write (must match data server permissions)
  • Collation: Should match data server database collation
  • Cache Size: How much local cache to allocate for this remote database

Namespace Mapping: Map namespaces to remote databases exactly as you would local databases: 1. Navigate to System Administration > Configuration > Namespaces 2. Create or edit namespace 3. Map global/routine/class data to remote databases 4. Configuration is identical whether database is local or remote

Transparency to Applications: Once configured:

  • Applications query namespaces normally
  • Application server automatically fetches data from remote databases
  • Fetched data is cached locally
  • Subsequent queries use local cache
  • Application server keeps cache synchronized with data server
  • Applications cannot tell whether data is local or remote

Multiple Data Servers: An application server can connect to multiple data servers:

  • Configure each as a separate remote server
  • Add remote databases from each server
  • Application server automatically connects to correct server for each query
  • Useful for accessing data distributed across multiple systems

Connection Monitoring: The application server monitors its data server connections:

  • Automatically detects disconnections
  • Attempts to reconnect periodically
  • Can be configured to queue operations during disconnection
  • Logs connection status changes

Recovery Behavior: When connection is interrupted:

  • Short outage (configurable threshold): Application server queues operations and waits
  • Long outage: Application server resets connection and drops cached data
  • On reconnection: Application server re-establishes connection and refreshes needed data
  • During reconnection: Applications may experience delays but typically don't fail

Configuration Best Practices:

  • Use consistent namespace configurations across all application servers
  • Document which databases are remote vs. local
  • Configure appropriate cache sizes based on workload
  • Set reasonable timeout values balancing detection speed and false positives
  • Test failover scenarios if data server is mirrored
  • Monitor application server performance and adjust cache allocation

Testing Configuration: After configuring application server: 1. Verify remote server connection is active 2. Query data through mapped namespaces 3. Verify data is being cached (check cache statistics) 4. Test application functionality 5. Monitor for errors in console log or alerts 6. Verify behavior during data server disconnection

4. Monitor ECP Connections and Performance

Key Points

  • Monitor data server connections to all application servers
  • Track application server connections to data servers
  • Measure data transfer rates and cache hit ratios
  • Monitor distributed lock activity
  • Identify performance bottlenecks and connection issues

Detailed Notes

Monitoring ECP connections and performance is essential for maintaining optimal distributed cache cluster operation and identifying issues before they impact users.

Data Server Monitoring: The data server provides comprehensive monitoring of application server connections through the Management Portal:

Navigate to: System Operation > ECP Data Servers

View metrics including:

  • Connected Application Servers: List of all currently connected app servers
  • Connection Status: Active, inactive, or error states
  • Data Transfer: Bytes sent/received to each app server
  • Transfer Rate: Current network throughput
  • Blocks Sent/Received: Database blocks transferred
  • Globals Referenced: Number of global references served
  • Lock Requests: Distributed lock operations
  • Connection Duration: How long each app server has been connected

Application Server Monitoring: Each application server monitors its connections to data servers:

Navigate to: System Operation > ECP Application Servers

View metrics including:

  • Connected Data Servers: List of configured remote servers
  • Connection Status: Connected, disconnected, or error
  • Cache Statistics: Hit ratio, size, efficiency
  • Network I/O: Data transferred to/from data server
  • Block Requests: Database blocks requested from data server
  • Response Time: Latency for data server requests
  • Reconnection Attempts: Number of reconnect tries after disconnection

Cache Performance Metrics: Monitor cache efficiency on application servers:

  • Cache Hit Ratio: Percentage of requests served from local cache
  • Cache Size: Current cache memory usage
  • Global References: Total global accesses
  • Remote References: Accesses requiring data server communication
  • Cache Efficiency: Ratio of local to remote references

Target Metrics:

  • Cache hit ratio should typically be >80-90%
  • Low hit ratio indicates:
  • Insufficient cache allocated
  • Users with different query patterns on same app server
  • Working set larger than cache capacity
  • Poor cache distribution strategy

Network Performance: Monitor network aspects of ECP:

  • Latency: Round-trip time between app and data server
  • Bandwidth Utilization: How much network capacity is being used
  • Packet Loss: Network reliability indicator
  • Connection Interruptions: Frequency of disconnections

Lock Monitoring: Distributed locks are managed by the data server:

  • Monitor lock request volume
  • Track lock wait times
  • Identify lock contention issues
  • Detect deadlock situations

Performance Bottlenecks: Common issues identified through monitoring:

  • High Network Latency: Increases response time for cache misses
  • Low Cache Hit Ratio: Forces frequent data server access
  • Lock Contention: Multiple app servers competing for same locks
  • Connection Instability: Frequent disconnections disrupt operations
  • Insufficient Cache: Working set doesn't fit in allocated cache

Diagnostic Tools:

  • %Monitor.System.HistoryDump: Collect system metrics over time
  • mgstat utility: Real-time system statistics
  • Console.log: Connection events and errors
  • ECP status queries: SQL queries against ECP system tables
  • ^%FREECNT: Check global buffer usage

Monitoring Best Practices: 1. Establish baseline metrics for normal operation 2. Set up alerts for connection failures or performance degradation 3. Monitor cache hit ratios and adjust cache sizes as needed 4. Track trends over time to identify gradual degradation 5. Correlate ECP metrics with application performance 6. Monitor during peak usage periods 7. Test monitoring during failover scenarios

Programmatic Monitoring: Use class methods for integration with monitoring systems:

  • %Monitor.System.LineByLine: Detailed process metrics
  • ECP.Server status methods: Connection status programmatically
  • %SYS.ECP: ECP configuration and statistics

Troubleshooting Connection Issues: When monitoring reveals connection problems: 1. Check network connectivity between servers 2. Verify ECP service is running on data server 3. Confirm authentication credentials are correct 4. Review firewall rules for ECP port 5. Check console logs for specific error messages 6. Test connection using Management Portal tools 7. Verify data server has capacity for additional connections

Performance Tuning Based on Monitoring:

  • Increase cache size if hit ratio is low and memory is available
  • Adjust connection timeouts if network is unstable
  • Redistribute users if some app servers are overloaded
  • Add app servers if all are near capacity
  • Optimize network path if latency is high
  • Review application queries if many are cache-inefficient

Exam Preparation Summary

Critical Concepts to Master:

  1. Data Server Role: Stores data, manages locks, synchronizes caches across cluster
  2. Application Server Role: Handles user requests, maintains local cache, connects to data server
  3. Distributed Caching: Application-transparent horizontal scaling for user volume
  4. Remote Database Configuration: App server maps local namespaces to remote databases on data server
  5. Cache Coherency: Data server ensures all app server caches remain synchronized
  6. Connection Recovery: Both data and app servers monitor connections and attempt recovery
  7. ECP + Mirroring: App servers auto-redirect after mirror failover

Common Exam Scenarios:

  • Distinguishing data server vs application server roles
  • Configuring remote database connections on application servers
  • Understanding cache synchronization and coherency
  • Monitoring ECP connection status and performance
  • Troubleshooting ECP connection failures
  • Planning ECP topology for horizontal scaling
  • Understanding network latency impact on ECP performance

Hands-On Practice Recommendations:

  • Configure data server and application server instances
  • Set up remote database mappings on application servers
  • Monitor ECP connections from both data and app server perspectives
  • Test application server failover and recovery
  • Review ECP performance metrics and cache statistics
  • Simulate network interruption and observe recovery
  • Scale by adding/removing application servers

Report an Issue