T1.1: Determines Database Storage Strategy - InterSystems IRIS Development Professional Study Guide

1. Determines which databases to include in namespaces

Key Points

Namespace structure: Contains default databases for globals, routines, and temporary data
Database separation: Isolate application data, code, and system components logically
Local vs. Remote: Choose between local databases or remote databases via ECP connections
Copy from existing: Duplicate namespace configurations to maintain consistency across environments
Mapping override: Use global, routine, and package mappings to access data across databases

Detailed Notes

Overview

A namespace in InterSystems IRIS serves as a logical workspace that references multiple databases for different purposes. When designing namespace architecture, developers must determine which databases to include based on data access patterns, security requirements, and application boundaries.

Namespace Components

Default Database for Globals: Required for data storage
Default Database for Routines: Optionally separate for code storage
Temporary Database: Can reference IRISTEMP or a custom temporary database

Best Practices for Database Separation

Functional Domain Isolation: Application data should reside in dedicated databases separate from code databases, allowing independent backup schedules and security policies
System Database Protection: IRISSYS and IRISLIB provide core functionality and should not store application data
Shared Reference Data: When multiple applications share common reference data, store it in a shared database and map into multiple namespaces
Remote Databases: Use Enterprise Cache Protocol (ECP) for distributed architectures where application servers access data on remote data servers
Environment Consistency: The "Copy from" feature during namespace creation allows rapid deployment of consistent namespace configurations across development, test, and production environments

Documentation References

2. Recommends architecture based on data growth expectations

Key Points

Initial sizing: Configure initial database size, expansion increment, and maximum size limits
Vertical scaling: Add CPU, memory, and disk resources to existing servers for growth
Horizontal scaling: Distribute data across multiple servers using ECP or sharding
Multivolume databases: Automatically span multiple disk volumes when size thresholds are reached
Growth monitoring: Track database size, free space, and expansion patterns over time

Detailed Notes

Overview

Database architecture must anticipate data growth to avoid performance degradation and system outages. When creating databases, administrators configure three critical size parameters.

Critical Size Parameters

Initial Size: Typically starting at 1 MB minimum
Expansion Increment: Default 12% of current size or 10 MB, whichever is larger
Maximum Size: Default unlimited unless explicitly set

For predictable growth, setting a non-zero expansion value prevents frequent small expansions that can fragment the database.

Vertical Scaling Approaches

Memory Increases: Improves database cache efficiency
Faster Disk Subsystems: Improves I/O throughput
Additional CPU Cores: Supports more concurrent users

However, vertical scaling has practical limits based on hardware maximums and cost constraints.

Horizontal Scaling Options

Enterprise Cache Protocol (ECP): Separates application processing from data storage, allowing multiple application servers to access centralized data servers
Sharding: Partitions data across multiple data nodes, enabling linear scalability for massive datasets
Multivolume Databases: Automatically create new volume files when the current volume reaches a configured threshold size, allowing databases to span multiple file systems or storage arrays without application disruption

Cloud and Monitoring Considerations

For applications with unpredictable growth patterns, cloud deployments with elastic storage and compute resources provide flexibility to scale up or down based on actual demand. Monitoring database free space, expansion frequency, and I/O wait times provides early warning of capacity constraints requiring architectural changes.

Documentation References

3. Structures data appropriately for global mappings

Key Points

Subscript-level mapping: Map specific global subscript ranges to different databases
Wildcard mappings: Use asterisk (*) to map multiple globals with common naming patterns
Range specifications: Define subscript ranges using (BEGIN):(END) or literal values
Mapping priority: Mapped content overrides local content with identical identifiers
Cross-database access: Enable transparent data access across physical database boundaries

Detailed Notes

Overview

Global mappings provide a powerful mechanism for structuring data across multiple databases while maintaining transparent application access. When designing global structure for mapping, developers should organize data hierarchically with logical separation at the top-level subscript, enabling efficient subscript-level mappings.

Example: Facility-Based Partitioning

Storing patient data by facility in ^Patient(facilityID, patientID) allows:

Mapping ^Patient(1) through ^Patient(100) to one database
Mapping ^Patient(101) through ^Patient(200) to another database
Effectively partitioning data by facility across databases

Subscript Mapping Syntax

Numeric Ranges: (1):(100) maps numeric subscript ranges
Alphabetic Ranges: ("A"):("M") maps alphabetic ranges
Complex Multi-Level: ("B",23,"m"):("E",5) maps complex multi-level subscript hierarchies
Special Tokens: (BEGIN) and (END) represent the first and last possible subscript values in collation order
Wildcard Mappings: ^ABC* maps all globals beginning with ABC to a specified database

Mapping Precedence

Critical to understand is mapping precedence: mapped globals take priority over local globals of the same name, potentially hiding local data. Therefore, mappings should be as specific as possible to avoid unintended conflicts.

Architectural Patterns Enabled by Mappings

Data Archival: Map historical subscript ranges to archive databases
Multi-Tenancy: Map tenant-specific subscript ranges to isolated databases
Data Partitioning: Distribute large globals across multiple databases for I/O parallelism
Distributed Architectures: Combine with ECP remote database access to run application logic on application servers while data resides on dedicated data servers

Documentation References

4. Assesses mirroring implications for database design

Key Points

Mirror membership: Only databases added to the mirror configuration are replicated
Journaling required: Mirrored databases must have journaling enabled for synchronization
Property synchronization: Database size and configuration sync from primary to backup/async members
Block size immutability: Cannot change block size after database creation in mirrored environments
Failover considerations: Design for automatic failover with mount-at-startup and resource settings

Detailed Notes

Overview

Database mirroring in InterSystems IRIS provides high availability through automatic failover between primary and backup instances, with optional disaster recovery async members. When designing databases for mirrored environments, several architectural constraints and considerations apply.

Key Constraints

Explicit Configuration: Databases must be explicitly added to the mirror configuration; merely existing on a mirror member does not enable mirroring
Journaling Requirement: Journaling is mandatory for mirrored databases because synchronization relies on journal file transmission from primary to backup members
Block Size Immutability: Block size cannot be changed after database creation, so initial block size selection (default 8KB) must consider long-term requirements
Encryption Status: The encryption status of a mirrored database cannot be changed after creation, requiring careful planning during initial database configuration

Property Synchronization

Database properties on backup and async members synchronize automatically from the primary, including current size, expansion settings, and maximum size. This means manual size adjustments on non-primary members may be overwritten.

Failover Considerations

Mount Required at Startup: Should be enabled to ensure the instance will not become primary unless all required databases mount successfully and complete journal recovery
Resource-Based Security: Settings must be consistent across all mirror members to maintain access control during failover
Stream Data: File streams are not automatically mirrored, requiring separate replication mechanisms if critical
Multivolume Support: Mirrored databases support multivolume configuration, but volume expansion occurs independently on each mirror member based on its local disk availability

Database architecture should minimize dependencies on non-mirrored components to ensure complete application availability after failover.

Documentation References

5. Evaluates configuration settings for scaling applications

Key Points

Database cache sizing: Allocate memory to minimize disk I/O for working set
Lock table configuration: Size lock structures for concurrent transaction volume
Journal settings: Balance data protection with journal write performance impact
ECP configuration: Tune network buffers and connection parameters for distributed caching
Sharding parameters: Configure data nodes, compute nodes, and shard key design

Detailed Notes

Overview

Scaling InterSystems IRIS applications requires careful configuration of multiple system parameters that interact to determine overall performance and capacity.

Database Cache Configuration

Purpose: Database cache (global buffer pool) should be sized to hold the application's working set in memory, minimizing physical disk reads
Undersized Impact: Causes excessive disk I/O and poor performance
Oversized Impact: Wastes memory that could support more application processes
Monitoring: Cache efficiency can be monitored through database cache hit ratios, with targets typically above 95% for OLTP workloads

Key Memory Settings

Lock Table Size: Determines how many concurrent locks the system can manage; applications with high transaction concurrency or long-held locks require larger lock tables
Routine Buffer Pool: Stores compiled routine and class code in memory; adequate sizing reduces compilation overhead and code loading time
Generic Memory Heap (gmheap): Provides shared memory for system structures; should be sized based on the number of concurrent processes and ECP connections

Distributed Architecture Tuning

For distributed architectures using ECP, network buffer sizes and connection pools must be tuned based on network latency and bandwidth. Application servers cache data from data servers; sizing application server caches based on locality of reference patterns improves performance.

Sharded Cluster Decisions

Data Nodes: Number of data nodes for partitioning
Compute Nodes: Deployment for workload separation
Shard Key Selection: Critical for data distribution
Rebalancing Strategies: Required as data volumes grow

Journaling Considerations

Journaling Requirement: Journaling is required for transactions and cannot be disabled in most environments
Journal Buffers: Journal buffers are written to disk every 2 seconds automatically
Non-Journaled Databases: Use non-journaled databases (like IRISTEMP) only for data that can be recreated, such as print queues, temporary data, or session caches
Production Systems: Always enable journaling for databases containing critical data to ensure data protection and recovery capabilities

Capacity Planning

Configuration parameters should be established through capacity planning based on expected workload characteristics (transactions per second, concurrent users, data volume) and validated through performance testing under realistic load conditions.

Documentation References

6. Considers upgrade impacts on database architecture

Key Points

In-place upgrades: Database files and configuration preserved during version upgrades
System database conversion: IRISSYS and IRISLIB upgraded automatically to new version format
Application compatibility: User databases remain compatible across InterSystems IRIS versions
Feature deprecation: Review deprecated features that may affect database access patterns
Backup before upgrade: Always backup databases before performing version upgrades

Detailed Notes

Overview

InterSystems IRIS upgrades preserve existing database architecture and data while updating system software and system databases.

What Gets Preserved

During an in-place upgrade using the REINSTALL=ALL command-line property, the installer preserves:

The iris.cpf configuration file
All user databases (their IRIS.DAT files)
Application code
Namespace configurations

System databases IRISSYS and IRISLIB are upgraded to the new version's format, which may involve internal structure changes to support new features or improved performance.

Database Architecture Considerations

Block Size Compatibility: Databases created with non-standard block sizes (larger than 8KB) must be validated for continued support in the new version
Encryption Settings: Keys must be preserved during upgrades, requiring coordination with key management systems
Mirrored Database Procedures: Typically, the backup member is upgraded first, allowed to rejoin the mirror, then the primary is failed over and upgraded (rolling upgrade approach minimizes downtime)

Application Compatibility

Deprecated API Usage: Review applications for deprecated ObjectScript functions or SQL syntax that may still work but could impact performance or future compatibility
Class Recompilation: Class definitions may need recompilation after upgrade to take advantage of new storage optimizations or index implementations
Custom Collation: Globals using custom collation must be validated for compatibility

Testing and Validation

Pre-Production Testing: Upgrade impact testing should be performed in non-production environments first
Validation Checks: Verify database access performance meets expectations, application functionality remains intact, and backup/restore procedures work correctly
Integrity Checks: Run ^INTEGRIT or DBCHECK utilities post-upgrade to verify database structure consistency
Rollback Planning: Include rollback procedures in case upgrade issues arise, requiring restoration from pre-upgrade backups

Documentation References

7. Addresses security requirements in storage design

Key Points

Database resources: Assign resource names to control read/write/admin access via roles
Database encryption: Enable encryption at database creation for data-at-rest protection
Stream location security: Configure stream directories with appropriate file system permissions
Mount read-only: Prevent modifications by mounting databases in read-only mode
Audit database isolation: IRISAUDIT database must be separate and protected

Detailed Notes

Overview

Security must be integrated into database architecture from initial design through operational deployment.

Resource-Based Access Control

Database Resources: Each database is associated with a resource (named %DB_databasename by default) that controls access permissions
Permission Types: Users and roles are granted permissions (Read, Write, or Use) to database resources
Least Privilege Principle: Assign minimal necessary permissions to each role, and assign roles to users based on their job functions

Database Encryption

Purpose: Protects data at rest from unauthorized access if physical storage media is compromised
Configuration Timing: Encryption must be enabled during database creation; it cannot be added or removed afterward
Key Management: Encryption keys must be available during database mount, requiring secure key storage and distribution mechanisms
Mirroring Requirement: Mirrored databases require consistent encryption settings across all mirror members, with coordinated key management
Performance Impact: Typically less than 5% for most workloads due to hardware acceleration (AES-NI instructions)

Stream Data Security

Default Storage: Streams are stored in subdirectories under the database directory
Custom Locations: Custom stream locations can be specified
File System Permissions: Stream directories must have appropriate permissions to prevent unauthorized access
Sensitive Data: For highly sensitive data, streams should be stored on encrypted file systems or within the database itself

Audit and Special Considerations

IRISAUDIT Database: Must be protected from modification by application code and users; should be on separate physical storage from application databases
Read-Only Mounting: Prevents all modifications, useful for reference databases, archived data, or reporting databases
Network Security: Address network-level protection for remote databases accessed via ECP, including TLS encryption and network access controls

Documentation References

8. Evaluates interoperability functionality costs and benefits

Key Points

ENSLIB database overhead: Interoperability adds message storage and routing database requirements
Message persistence: All messages stored persistently, increasing database size and I/O
Production configuration: Separate namespaces for interoperability productions recommended
Purge management: Message purge schedules critical to control database growth
ECP for productions: Distribute production processing across application servers for scalability

Detailed Notes

Overview

InterSystems IRIS interoperability functionality (productions, business services, operations, and processes) introduces significant database architecture considerations due to message persistence requirements.

Message Storage Implications

Persistent Storage: All messages flowing through productions are stored persistently in the ENSLIB database (or a custom-named database), enabling message resend, replay, and auditing
Database Growth: High-throughput productions processing millions of messages daily can generate gigabytes of database growth
Capacity Planning: Requires separate capacity planning for message storage distinct from application data storage

Cost-Benefit Analysis

Costs:

Performance overhead of message persistence (every message requires database writes for headers, bodies, and search tables)

Benefits:

Guaranteed message delivery
Automatic retry on failure
Complete message audit trails

Purge Management

Message purge management becomes a critical operational task:

Purge schedules must balance retention requirements for compliance and troubleshooting against database size constraints
Automated purge tasks should run regularly to remove messages beyond retention periods
Consider archiving critical messages to separate storage before purge

Architectural Best Practices

Namespace Isolation: Deploy interoperability productions in dedicated namespaces separate from application namespaces
Physical Storage Separation: For high-volume productions, place message database and application database on separate physical storage for I/O parallelism
ECP Scaling: Production processing on application servers with message storage on data servers provides horizontal scalability

Security and Mirroring Considerations

Message Security: Message data may contain sensitive information requiring encryption and access controls
Configuration Protection: Production configuration (routing rules, transformations, credentials) must be protected from unauthorized modification
Mirroring Trade-offs: Consider whether message databases should be mirrored (for high availability) or excluded (for performance, accepting message loss during failover)

Documentation References

9. Assesses BI (Business Intelligence) integration tradeoffs

Key Points

Cube data storage: OLAP cubes store aggregated data in separate globals and databases
Source data impact: Cube builds query source data, creating I/O load on operational databases
Build scheduling: Balance cube freshness against impact to production workloads
Async member for BI: Use disaster recovery async as read-only BI query target
Fact table design: Denormalized fact tables improve cube build and query performance

Detailed Notes

Overview

Business Intelligence integration in InterSystems IRIS using IntegratedML, SQL queries, or InterSystems IRIS Business Intelligence introduces architectural tradeoffs between analytical capabilities and operational performance. BI queries typically involve large data scans, aggregations, and complex joins that consume significant CPU and I/O resources.

OLAP Cube Storage

Pre-Aggregated Data: OLAP cubes store pre-aggregated data in cube-specific globals, creating additional database storage requirements
Cube Build Impact: Cube builds query source data to calculate aggregations, creating substantial I/O load and CPU consumption
Build Frequency Trade-offs:
Real-time synchronization: Current data but impacts operational performance
Scheduled builds (nightly, hourly): Reduced operational impact but stale data
Incremental builds: Update only changed data, reducing build time and resource consumption

Architectural Patterns for BI Integration

Async Mirror for Reporting: Use disaster recovery async mirror members as read-only reporting targets; allows intensive BI queries without impacting production users (requires acceptance of data latency)
Dedicated Reporting Databases: Populate via ETL processes for full isolation of BI from operational systems

Database Design Considerations

Denormalization Trade-offs:

Normalized Schemas: Reduce operational database size and maintain consistency but require complex joins for BI queries
Denormalized Fact Tables: Optimize BI query performance at the cost of increased storage and update complexity

Additional Considerations:

Columnar Storage: Can accelerate analytical queries by storing column data contiguously but requires additional storage overhead
Index Balancing: Must balance operational query needs against BI query patterns, potentially requiring BI-specific indexes that impact operational insert/update performance

Sharded Architectures

Sharded architectures complicate BI: analytical queries may need to aggregate data across all shard nodes, requiring compute nodes to coordinate distributed queries. The namespace-level sharding architecture provides a middle ground, supporting both transactional and analytical workloads on the same data without full sharding complexity.

Documentation References

Exam Preparation Summary

Critical Concepts to Master

Namespace-Database Relationships: Understand how namespaces reference multiple databases, the difference between default database assignments and mappings, and when to use local vs. remote databases.
Database Sizing and Growth: Know initial size, expansion increment, and maximum size settings; understand vertical vs. horizontal scaling approaches; recognize when multivolume databases are appropriate.
Global Mapping Mechanics: Master subscript-level mapping syntax including ranges, wildcards, BEGIN/END tokens; understand mapping precedence over local content; recognize architectural patterns enabled by mappings.
Mirroring Constraints: Remember that mirrored databases require journaling, block size is immutable, properties sync from primary to backup, and file streams are not mirrored.
Configuration for Scale: Know key memory settings (database cache, routine buffer, gmheap), understand ECP configuration for distributed caching, recognize sharding use cases.
Upgrade Preservation: Understand what is preserved (user databases, configuration) vs. what is upgraded (system databases) during version upgrades.
Security Integration: Know resource-based database access control, encryption requirements and immutability, read-only mounting, and audit database protection.
Interoperability Overhead: Recognize message persistence storage requirements, understand purge management importance, know production namespace isolation best practices.
BI Workload Separation: Understand cube storage requirements, build performance impact, async member usage for reporting, and denormalization tradeoffs.

Common Exam Scenarios

Scenario: Application with predictable 20% annual data growth over 5 years
Scenario: Multi-tenant SaaS application with 100 tenants
Scenario: High-availability requirement with RPO < 1 second
Scenario: Database experiencing performance degradation as data grows
Scenario: Regulatory requirement for data-at-rest encryption
Scenario: Version upgrade from IRIS 2023.x to 2025.x
Scenario: Interoperability production processing 10 million messages/day
Scenario: BI cube builds impacting production transaction performance

Hands-On Practice Recommendations

Create and configure namespaces with multiple database mappings; test global access across mappings
Configure database sizing with expansion and maximum limits; monitor expansion behavior under load
Implement global mappings at subscript level; verify mapping precedence and wildcard behavior
Set up a mirror configuration; add databases to mirror; test failover scenarios
Configure ECP with application server and data server; monitor network performance
Enable database encryption; manage encryption keys; measure performance impact
Deploy an interoperability production; monitor message database growth; configure purge tasks
Build OLAP cubes from operational data; measure build time and resource consumption; schedule incremental builds

Key Documentation Sections to Review

GSA.pdf: Sections 3 (Namespaces), 4 (Mappings), 5 (Database Configuration), 18 (Database Maintenance)
GHA.pdf: Section 2 (Mirroring Overview), Section 4 (Mirror Configuration), Section 5 (Managing Mirrors)
GSCALE.pdf: Section 1 (Scaling Overview), Section 2 (ECP/Distributed Caching), Section 8 (Sharding)
GGBL.pdf: Section 2 (Global Structure), Section 4 (Global Mappings)
GOBJ.pdf: Section 11 (Persistent Classes), Section 12 (Storage Definitions), Section 13 (Global Names)

Memory Aids

Namespace = Logical Container: Databases are physical; namespaces provide logical organization
Mirroring = Journaling Required: If it mirrors, it must journal
Mappings Override Locals: Mapped content "wins" over local content with same name
Encryption = Immutable: Set at creation; cannot add or remove later
Scale Up Then Out: Try vertical scaling before horizontal scaling complexity
BI ≠ OLTP: Separate analytical workloads from operational transactions

1. Determines which databases to include in namespaces Report Issue

Key Points

Detailed Notes

Overview

Namespace Components

Best Practices for Database Separation

Documentation References

2. Recommends architecture based on data growth expectations Report Issue

Key Points

Detailed Notes

Overview

Critical Size Parameters

Vertical Scaling Approaches

Horizontal Scaling Options

Cloud and Monitoring Considerations

Documentation References

3. Structures data appropriately for global mappings Report Issue

Key Points

Detailed Notes

Overview

Example: Facility-Based Partitioning

Subscript Mapping Syntax

Mapping Precedence

Architectural Patterns Enabled by Mappings

Documentation References

4. Assesses mirroring implications for database design Report Issue

Key Points

Detailed Notes

Overview

Key Constraints

Property Synchronization

Failover Considerations

Documentation References

5. Evaluates configuration settings for scaling applications Report Issue

Key Points

Detailed Notes

Overview

Database Cache Configuration

Key Memory Settings

Distributed Architecture Tuning

Sharded Cluster Decisions

Journaling Considerations

Capacity Planning

Documentation References

6. Considers upgrade impacts on database architecture Report Issue

Key Points

Detailed Notes

Overview

What Gets Preserved

Database Architecture Considerations

Application Compatibility

Testing and Validation

Documentation References

7. Addresses security requirements in storage design Report Issue

Key Points

Detailed Notes

Overview

Resource-Based Access Control

Database Encryption

Stream Data Security

Audit and Special Considerations

Documentation References

8. Evaluates interoperability functionality costs and benefits Report Issue

Key Points

Detailed Notes

Overview

Message Storage Implications

Cost-Benefit Analysis

Purge Management

Architectural Best Practices

Security and Mirroring Considerations

Documentation References

9. Assesses BI (Business Intelligence) integration tradeoffs Report Issue

Key Points

Detailed Notes

Overview

OLAP Cube Storage

Architectural Patterns for BI Integration

Database Design Considerations

Sharded Architectures

1. Determines which databases to include in namespaces

2. Recommends architecture based on data growth expectations

3. Structures data appropriately for global mappings

4. Assesses mirroring implications for database design

5. Evaluates configuration settings for scaling applications

6. Considers upgrade impacts on database architecture

7. Addresses security requirements in storage design

8. Evaluates interoperability functionality costs and benefits

9. Assesses BI (Business Intelligence) integration tradeoffs