1. Determines which databases to include in namespaces
Key Points
- Namespace structure: Contains default databases for globals, routines, and temporary data
- Database separation: Isolate application data, code, and system components logically
- Local vs. Remote: Choose between local databases or remote databases via ECP connections
- Copy from existing: Duplicate namespace configurations to maintain consistency across environments
- Mapping override: Use global, routine, and package mappings to access data across databases
Detailed Notes
Overview
A namespace in InterSystems IRIS serves as a logical workspace that references multiple databases for different purposes. When designing namespace architecture, developers must determine which databases to include based on data access patterns, security requirements, and application boundaries.
Namespace Components
- Default Database for Globals: Required for data storage
- Default Database for Routines: Optionally separate for code storage
- Temporary Database: Can reference IRISTEMP or a custom temporary database
Best Practices for Database Separation
- Functional Domain Isolation: Application data should reside in dedicated databases separate from code databases, allowing independent backup schedules and security policies
- System Database Protection: IRISSYS and IRISLIB provide core functionality and should not store application data
- Shared Reference Data: When multiple applications share common reference data, store it in a shared database and map into multiple namespaces
- Remote Databases: Use Enterprise Cache Protocol (ECP) for distributed architectures where application servers access data on remote data servers
- Environment Consistency: The "Copy from" feature during namespace creation allows rapid deployment of consistent namespace configurations across development, test, and production environments
Documentation References
2. Recommends architecture based on data growth expectations
Key Points
- Initial sizing: Configure initial database size, expansion increment, and maximum size limits
- Vertical scaling: Add CPU, memory, and disk resources to existing servers for growth
- Horizontal scaling: Distribute data across multiple servers using ECP or sharding
- Multivolume databases: Automatically span multiple disk volumes when size thresholds are reached
- Growth monitoring: Track database size, free space, and expansion patterns over time
Detailed Notes
Overview
Database architecture must anticipate data growth to avoid performance degradation and system outages. When creating databases, administrators configure three critical size parameters.
Critical Size Parameters
- Initial Size: Typically starting at 1 MB minimum
- Expansion Increment: Default 12% of current size or 10 MB, whichever is larger
- Maximum Size: Default unlimited unless explicitly set
For predictable growth, setting a non-zero expansion value prevents frequent small expansions that can fragment the database.
Vertical Scaling Approaches
- Memory Increases: Improves database cache efficiency
- Faster Disk Subsystems: Improves I/O throughput
- Additional CPU Cores: Supports more concurrent users
However, vertical scaling has practical limits based on hardware maximums and cost constraints.
Horizontal Scaling Options
- Enterprise Cache Protocol (ECP): Separates application processing from data storage, allowing multiple application servers to access centralized data servers
- Sharding: Partitions data across multiple data nodes, enabling linear scalability for massive datasets
- Multivolume Databases: Automatically create new volume files when the current volume reaches a configured threshold size, allowing databases to span multiple file systems or storage arrays without application disruption
Cloud and Monitoring Considerations
For applications with unpredictable growth patterns, cloud deployments with elastic storage and compute resources provide flexibility to scale up or down based on actual demand. Monitoring database free space, expansion frequency, and I/O wait times provides early warning of capacity constraints requiring architectural changes.
Documentation References
3. Structures data appropriately for global mappings
Key Points
- Subscript-level mapping: Map specific global subscript ranges to different databases
- Wildcard mappings: Use asterisk (*) to map multiple globals with common naming patterns
- Range specifications: Define subscript ranges using (BEGIN):(END) or literal values
- Mapping priority: Mapped content overrides local content with identical identifiers
- Cross-database access: Enable transparent data access across physical database boundaries
Detailed Notes
Overview
Global mappings provide a powerful mechanism for structuring data across multiple databases while maintaining transparent application access. When designing global structure for mapping, developers should organize data hierarchically with logical separation at the top-level subscript, enabling efficient subscript-level mappings.
Example: Facility-Based Partitioning
Storing patient data by facility in ^Patient(facilityID, patientID) allows:
- Mapping ^Patient(1) through ^Patient(100) to one database
- Mapping ^Patient(101) through ^Patient(200) to another database
- Effectively partitioning data by facility across databases
Subscript Mapping Syntax
- Numeric Ranges: (1):(100) maps numeric subscript ranges
- Alphabetic Ranges: ("A"):("M") maps alphabetic ranges
- Complex Multi-Level: ("B",23,"m"):("E",5) maps complex multi-level subscript hierarchies
- Special Tokens: (BEGIN) and (END) represent the first and last possible subscript values in collation order
- Wildcard Mappings: ^ABC* maps all globals beginning with ABC to a specified database
Mapping Precedence
Critical to understand is mapping precedence: mapped globals take priority over local globals of the same name, potentially hiding local data. Therefore, mappings should be as specific as possible to avoid unintended conflicts.
Architectural Patterns Enabled by Mappings
- Data Archival: Map historical subscript ranges to archive databases
- Multi-Tenancy: Map tenant-specific subscript ranges to isolated databases
- Data Partitioning: Distribute large globals across multiple databases for I/O parallelism
- Distributed Architectures: Combine with ECP remote database access to run application logic on application servers while data resides on dedicated data servers
Documentation References
4. Assesses mirroring implications for database design
Key Points
- Mirror membership: Only databases added to the mirror configuration are replicated
- Journaling required: Mirrored databases must have journaling enabled for synchronization
- Property synchronization: Database size and configuration sync from primary to backup/async members
- Block size immutability: Cannot change block size after database creation in mirrored environments
- Failover considerations: Design for automatic failover with mount-at-startup and resource settings
Detailed Notes
Overview
Database mirroring in InterSystems IRIS provides high availability through automatic failover between primary and backup instances, with optional disaster recovery async members. When designing databases for mirrored environments, several architectural constraints and considerations apply.
Key Constraints
- Explicit Configuration: Databases must be explicitly added to the mirror configuration; merely existing on a mirror member does not enable mirroring
- Journaling Requirement: Journaling is mandatory for mirrored databases because synchronization relies on journal file transmission from primary to backup members
- Block Size Immutability: Block size cannot be changed after database creation, so initial block size selection (default 8KB) must consider long-term requirements
- Encryption Status: The encryption status of a mirrored database cannot be changed after creation, requiring careful planning during initial database configuration
Property Synchronization
Database properties on backup and async members synchronize automatically from the primary, including current size, expansion settings, and maximum size. This means manual size adjustments on non-primary members may be overwritten.
Failover Considerations
- Mount Required at Startup: Should be enabled to ensure the instance will not become primary unless all required databases mount successfully and complete journal recovery
- Resource-Based Security: Settings must be consistent across all mirror members to maintain access control during failover
- Stream Data: File streams are not automatically mirrored, requiring separate replication mechanisms if critical
- Multivolume Support: Mirrored databases support multivolume configuration, but volume expansion occurs independently on each mirror member based on its local disk availability
Database architecture should minimize dependencies on non-mirrored components to ensure complete application availability after failover.
Documentation References
5. Evaluates configuration settings for scaling applications
Key Points
- Database cache sizing: Allocate memory to minimize disk I/O for working set
- Lock table configuration: Size lock structures for concurrent transaction volume
- Journal settings: Balance data protection with journal write performance impact
- ECP configuration: Tune network buffers and connection parameters for distributed caching
- Sharding parameters: Configure data nodes, compute nodes, and shard key design
Detailed Notes
Overview
Scaling InterSystems IRIS applications requires careful configuration of multiple system parameters that interact to determine overall performance and capacity.
Database Cache Configuration
- Purpose: Database cache (global buffer pool) should be sized to hold the application's working set in memory, minimizing physical disk reads
- Undersized Impact: Causes excessive disk I/O and poor performance
- Oversized Impact: Wastes memory that could support more application processes
- Monitoring: Cache efficiency can be monitored through database cache hit ratios, with targets typically above 95% for OLTP workloads
Key Memory Settings
- Lock Table Size: Determines how many concurrent locks the system can manage; applications with high transaction concurrency or long-held locks require larger lock tables
- Routine Buffer Pool: Stores compiled routine and class code in memory; adequate sizing reduces compilation overhead and code loading time
- Generic Memory Heap (gmheap): Provides shared memory for system structures; should be sized based on the number of concurrent processes and ECP connections
Distributed Architecture Tuning
For distributed architectures using ECP, network buffer sizes and connection pools must be tuned based on network latency and bandwidth. Application servers cache data from data servers; sizing application server caches based on locality of reference patterns improves performance.
Sharded Cluster Decisions
- Data Nodes: Number of data nodes for partitioning
- Compute Nodes: Deployment for workload separation
- Shard Key Selection: Critical for data distribution
- Rebalancing Strategies: Required as data volumes grow
Journaling Considerations
- Journaling Requirement: Journaling is required for transactions and cannot be disabled in most environments
- Journal Buffers: Journal buffers are written to disk every 2 seconds automatically
- Non-Journaled Databases: Use non-journaled databases (like IRISTEMP) only for data that can be recreated, such as print queues, temporary data, or session caches
- Production Systems: Always enable journaling for databases containing critical data to ensure data protection and recovery capabilities
Capacity Planning
Configuration parameters should be established through capacity planning based on expected workload characteristics (transactions per second, concurrent users, data volume) and validated through performance testing under realistic load conditions.
Documentation References
6. Considers upgrade impacts on database architecture
Key Points
- In-place upgrades: Database files and configuration preserved during version upgrades
- System database conversion: IRISSYS and IRISLIB upgraded automatically to new version format
- Application compatibility: User databases remain compatible across InterSystems IRIS versions
- Feature deprecation: Review deprecated features that may affect database access patterns
- Backup before upgrade: Always backup databases before performing version upgrades
Detailed Notes
Overview
InterSystems IRIS upgrades preserve existing database architecture and data while updating system software and system databases.
What Gets Preserved
During an in-place upgrade using the REINSTALL=ALL command-line property, the installer preserves:
- The iris.cpf configuration file
- All user databases (their IRIS.DAT files)
- Application code
- Namespace configurations
System databases IRISSYS and IRISLIB are upgraded to the new version's format, which may involve internal structure changes to support new features or improved performance.
Database Architecture Considerations
- Block Size Compatibility: Databases created with non-standard block sizes (larger than 8KB) must be validated for continued support in the new version
- Encryption Settings: Keys must be preserved during upgrades, requiring coordination with key management systems
- Mirrored Database Procedures: Typically, the backup member is upgraded first, allowed to rejoin the mirror, then the primary is failed over and upgraded (rolling upgrade approach minimizes downtime)
Application Compatibility
- Deprecated API Usage: Review applications for deprecated ObjectScript functions or SQL syntax that may still work but could impact performance or future compatibility
- Class Recompilation: Class definitions may need recompilation after upgrade to take advantage of new storage optimizations or index implementations
- Custom Collation: Globals using custom collation must be validated for compatibility
Testing and Validation
- Pre-Production Testing: Upgrade impact testing should be performed in non-production environments first
- Validation Checks: Verify database access performance meets expectations, application functionality remains intact, and backup/restore procedures work correctly
- Integrity Checks: Run ^INTEGRIT or DBCHECK utilities post-upgrade to verify database structure consistency
- Rollback Planning: Include rollback procedures in case upgrade issues arise, requiring restoration from pre-upgrade backups
Documentation References
7. Addresses security requirements in storage design
Key Points
- Database resources: Assign resource names to control read/write/admin access via roles
- Database encryption: Enable encryption at database creation for data-at-rest protection
- Stream location security: Configure stream directories with appropriate file system permissions
- Mount read-only: Prevent modifications by mounting databases in read-only mode
- Audit database isolation: IRISAUDIT database must be separate and protected
Detailed Notes
Overview
Security must be integrated into database architecture from initial design through operational deployment.
Resource-Based Access Control
- Database Resources: Each database is associated with a resource (named %DB_databasename by default) that controls access permissions
- Permission Types: Users and roles are granted permissions (Read, Write, or Use) to database resources
- Least Privilege Principle: Assign minimal necessary permissions to each role, and assign roles to users based on their job functions
Database Encryption
- Purpose: Protects data at rest from unauthorized access if physical storage media is compromised
- Configuration Timing: Encryption must be enabled during database creation; it cannot be added or removed afterward
- Key Management: Encryption keys must be available during database mount, requiring secure key storage and distribution mechanisms
- Mirroring Requirement: Mirrored databases require consistent encryption settings across all mirror members, with coordinated key management
- Performance Impact: Typically less than 5% for most workloads due to hardware acceleration (AES-NI instructions)
Stream Data Security
- Default Storage: Streams are stored in subdirectories under the database directory
- Custom Locations: Custom stream locations can be specified
- File System Permissions: Stream directories must have appropriate permissions to prevent unauthorized access
- Sensitive Data: For highly sensitive data, streams should be stored on encrypted file systems or within the database itself
Audit and Special Considerations
- IRISAUDIT Database: Must be protected from modification by application code and users; should be on separate physical storage from application databases
- Read-Only Mounting: Prevents all modifications, useful for reference databases, archived data, or reporting databases
- Network Security: Address network-level protection for remote databases accessed via ECP, including TLS encryption and network access controls
Documentation References
8. Evaluates interoperability functionality costs and benefits
Key Points
- ENSLIB database overhead: Interoperability adds message storage and routing database requirements
- Message persistence: All messages stored persistently, increasing database size and I/O
- Production configuration: Separate namespaces for interoperability productions recommended
- Purge management: Message purge schedules critical to control database growth
- ECP for productions: Distribute production processing across application servers for scalability
Detailed Notes
Overview
InterSystems IRIS interoperability functionality (productions, business services, operations, and processes) introduces significant database architecture considerations due to message persistence requirements.
Message Storage Implications
- Persistent Storage: All messages flowing through productions are stored persistently in the ENSLIB database (or a custom-named database), enabling message resend, replay, and auditing
- Database Growth: High-throughput productions processing millions of messages daily can generate gigabytes of database growth
- Capacity Planning: Requires separate capacity planning for message storage distinct from application data storage
Cost-Benefit Analysis
Costs:
- Performance overhead of message persistence (every message requires database writes for headers, bodies, and search tables)
Benefits:
- Guaranteed message delivery
- Automatic retry on failure
- Complete message audit trails
Purge Management
Message purge management becomes a critical operational task:
- Purge schedules must balance retention requirements for compliance and troubleshooting against database size constraints
- Automated purge tasks should run regularly to remove messages beyond retention periods
- Consider archiving critical messages to separate storage before purge
Architectural Best Practices
- Namespace Isolation: Deploy interoperability productions in dedicated namespaces separate from application namespaces
- Physical Storage Separation: For high-volume productions, place message database and application database on separate physical storage for I/O parallelism
- ECP Scaling: Production processing on application servers with message storage on data servers provides horizontal scalability
Security and Mirroring Considerations
- Message Security: Message data may contain sensitive information requiring encryption and access controls
- Configuration Protection: Production configuration (routing rules, transformations, credentials) must be protected from unauthorized modification
- Mirroring Trade-offs: Consider whether message databases should be mirrored (for high availability) or excluded (for performance, accepting message loss during failover)
Documentation References
9. Assesses BI (Business Intelligence) integration tradeoffs
Key Points
- Cube data storage: OLAP cubes store aggregated data in separate globals and databases
- Source data impact: Cube builds query source data, creating I/O load on operational databases
- Build scheduling: Balance cube freshness against impact to production workloads
- Async member for BI: Use disaster recovery async as read-only BI query target
- Fact table design: Denormalized fact tables improve cube build and query performance
Detailed Notes
Overview
Business Intelligence integration in InterSystems IRIS using IntegratedML, SQL queries, or InterSystems IRIS Business Intelligence introduces architectural tradeoffs between analytical capabilities and operational performance. BI queries typically involve large data scans, aggregations, and complex joins that consume significant CPU and I/O resources.
OLAP Cube Storage
- Pre-Aggregated Data: OLAP cubes store pre-aggregated data in cube-specific globals, creating additional database storage requirements
- Cube Build Impact: Cube builds query source data to calculate aggregations, creating substantial I/O load and CPU consumption
- Build Frequency Trade-offs:
- Real-time synchronization: Current data but impacts operational performance
- Scheduled builds (nightly, hourly): Reduced operational impact but stale data
- Incremental builds: Update only changed data, reducing build time and resource consumption
Architectural Patterns for BI Integration
- Async Mirror for Reporting: Use disaster recovery async mirror members as read-only reporting targets; allows intensive BI queries without impacting production users (requires acceptance of data latency)
- Dedicated Reporting Databases: Populate via ETL processes for full isolation of BI from operational systems
Database Design Considerations
Denormalization Trade-offs:
- Normalized Schemas: Reduce operational database size and maintain consistency but require complex joins for BI queries
- Denormalized Fact Tables: Optimize BI query performance at the cost of increased storage and update complexity
Additional Considerations:
- Columnar Storage: Can accelerate analytical queries by storing column data contiguously but requires additional storage overhead
- Index Balancing: Must balance operational query needs against BI query patterns, potentially requiring BI-specific indexes that impact operational insert/update performance
Sharded Architectures
Sharded architectures complicate BI: analytical queries may need to aggregate data across all shard nodes, requiring compute nodes to coordinate distributed queries. The namespace-level sharding architecture provides a middle ground, supporting both transactional and analytical workloads on the same data without full sharding complexity.
Documentation References
Exam Preparation Summary
Critical Concepts to Master
- Namespace-Database Relationships: Understand how namespaces reference multiple databases, the difference between default database assignments and mappings, and when to use local vs. remote databases.
- Database Sizing and Growth: Know initial size, expansion increment, and maximum size settings; understand vertical vs. horizontal scaling approaches; recognize when multivolume databases are appropriate.
- Global Mapping Mechanics: Master subscript-level mapping syntax including ranges, wildcards, BEGIN/END tokens; understand mapping precedence over local content; recognize architectural patterns enabled by mappings.
- Mirroring Constraints: Remember that mirrored databases require journaling, block size is immutable, properties sync from primary to backup, and file streams are not mirrored.
- Configuration for Scale: Know key memory settings (database cache, routine buffer, gmheap), understand ECP configuration for distributed caching, recognize sharding use cases.
- Upgrade Preservation: Understand what is preserved (user databases, configuration) vs. what is upgraded (system databases) during version upgrades.
- Security Integration: Know resource-based database access control, encryption requirements and immutability, read-only mounting, and audit database protection.
- Interoperability Overhead: Recognize message persistence storage requirements, understand purge management importance, know production namespace isolation best practices.
- BI Workload Separation: Understand cube storage requirements, build performance impact, async member usage for reporting, and denormalization tradeoffs.
Common Exam Scenarios
- Scenario: Application with predictable 20% annual data growth over 5 years
- Scenario: Multi-tenant SaaS application with 100 tenants
- Scenario: High-availability requirement with RPO < 1 second
- Scenario: Database experiencing performance degradation as data grows
- Scenario: Regulatory requirement for data-at-rest encryption
- Scenario: Version upgrade from IRIS 2023.x to 2025.x
- Scenario: Interoperability production processing 10 million messages/day
- Scenario: BI cube builds impacting production transaction performance
Hands-On Practice Recommendations
- Create and configure namespaces with multiple database mappings; test global access across mappings
- Configure database sizing with expansion and maximum limits; monitor expansion behavior under load
- Implement global mappings at subscript level; verify mapping precedence and wildcard behavior
- Set up a mirror configuration; add databases to mirror; test failover scenarios
- Configure ECP with application server and data server; monitor network performance
- Enable database encryption; manage encryption keys; measure performance impact
- Deploy an interoperability production; monitor message database growth; configure purge tasks
- Build OLAP cubes from operational data; measure build time and resource consumption; schedule incremental builds
Key Documentation Sections to Review
- GSA.pdf: Sections 3 (Namespaces), 4 (Mappings), 5 (Database Configuration), 18 (Database Maintenance)
- GHA.pdf: Section 2 (Mirroring Overview), Section 4 (Mirror Configuration), Section 5 (Managing Mirrors)
- GSCALE.pdf: Section 1 (Scaling Overview), Section 2 (ECP/Distributed Caching), Section 8 (Sharding)
- GGBL.pdf: Section 2 (Global Structure), Section 4 (Global Mappings)
- GOBJ.pdf: Section 11 (Persistent Classes), Section 12 (Storage Definitions), Section 13 (Global Names)
Memory Aids
- Namespace = Logical Container: Databases are physical; namespaces provide logical organization
- Mirroring = Journaling Required: If it mirrors, it must journal
- Mappings Override Locals: Mapped content "wins" over local content with same name
- Encryption = Immutable: Set at creation; cannot add or remove later
- Scale Up Then Out: Try vertical scaling before horizontal scaling complexity
- BI ≠ OLTP: Separate analytical workloads from operational transactions