T1.3: Loads Data

Knowledge Review - InterSystems IRIS SQL Specialist

1. Uses the LOAD DATA command effectively

Key Points

  • Bulk Data Loading: LOAD DATA command populates tables from CSV files or JDBC sources with optimized performance
  • Atomic Operation: Entire load is rolled back on failure unless %NOJOURN option disables transactions
  • JVM Requirement: Uses Java-based engine requiring JVM installation and %Gateway_Object:USE privilege
  • Column Mapping: Supports positional or named column mapping with COLUMNS clause and VALUES specification
  • Error Tolerance: Skips invalid rows by default; maxerrors option controls failure threshold

Detailed Notes

Overview

The LOAD DATA command is InterSystems IRIS's high-performance utility for bulk loading data from external sources into SQL tables.

  • Performance: Java-based loading engine can import hundreds of gigabytes of data in seconds instead of hours
  • Prerequisite: Must first define the target table structure using CREATE TABLE (command only populates existing tables)
  • JVM Requirement: Requires Java Virtual Machine installation

Basic Syntax

The basic syntax is: `LOAD DATA FROM FILE 'filepath' INTO tablename`

  • Column matching: By default, matches source columns to target table columns by position using SQL column order (SELECT * order)
  • Header handling: If source file contains a header row, configure to skip it using USING clause with from.file.header option set to true

USING Clause Configuration

LOAD DATA provides extensive configuration options through a JSON-formatted USING clause:

  • from.file.header: Skip first line as header (true/false)
  • from.file.skip: Number of lines to skip
  • from.file.columnseparator: Field delimiter (default is comma)
  • from.file.charset: Character encoding
  • from.file.escapechar: Escape character for literals
  • maxerrors: Maximum tolerable row errors before rollback

Example: `LOAD DATA FROM FILE 'data.csv' INTO MyTable USING {"from":{"file":{"header":true,"columnseparator":"\t"}}}`

Advanced Column Mapping

For advanced column mapping:

  • COLUMNS clause: Specifies source column headers and data types
  • INTO clause: Can specify a subset of target columns
  • VALUES clause: Maps source headers to target columns
  • Capabilities: Load selected columns, reorder columns, or load same source column to multiple target columns

Example: `LOAD DATA FROM FILE 'people.csv' COLUMNS (name VARCHAR, age INT) INTO Sample.Person (Name, Age)`

Performance Options

LOAD DATA supports three performance-enhancing options:

  • LOAD BULK DATA: Disables unique constraints, foreign keys, NOT NULL validation, and index building
  • Requires: %NOCHECK, %NOINDEX, and %NOLOCK privileges
  • LOAD %NOJOURN DATA: Suppresses journaling and transactions
  • Requires: %NOJOURN privilege
  • Combined: LOAD BULK %NOJOURN DATA for maximum speed
  • Warning: These options sacrifice data validation and recoverability; use only for well-validated data

JDBC Data Sources

Loading from JDBC sources: `LOAD DATA FROM JDBC CONNECTION connectionName TABLE sourceTable INTO targetTable`

  • Requirement: Pre-configured JDBC connection
  • Use case: Bulk loading from external relational databases

Error Handling and Atomicity

  • Atomic operation: Entire load is rolled back on failure unless %NOJOURN option disables transactions
  • Row count: %ROWCOUNT variable contains count of successfully loaded rows
  • Invalid rows: Skipped during loading
  • Error logging: Detailed error information written to %SQL_Diag.Result and %SQL_Diag.Message diagnostic tables

2. Works with SQL diagnostics tables (%SQL_Diag schema)

Key Points

  • %SQL_Diag.Result: Stores diagnostic results for SQL DDL import and LOAD DATA operations
  • %SQL_Diag.Message: Contains detailed error messages, severity levels, and SQLCODE values
  • Automatic Logging: LOAD DATA errors are automatically written to diagnostic tables
  • Management Portal Access: View logs via System Operation > System Logs > SQL Diagnostics Logs
  • Relationship Structure: %SQL_Diag.Message has diagResult relationship to %SQL_Diag.Result

Detailed Notes

Overview

The %SQL_Diag schema provides comprehensive diagnostic logging for SQL operations in InterSystems IRIS, particularly for LOAD DATA commands and DDL import operations.

  • Purpose: Captures detailed information about SQL operation execution, errors, and performance characteristics
  • Structure: Two primary tables working together

%SQL_Diag.Result Table

The %SQL_Diag.Result table serves as the master diagnostic log:

  • Content: High-level information about SQL operations
  • Per-operation logging: Each LOAD DATA execution creates an entry
  • Recorded data: Operation status, timing, and overall results
  • Role: Acts as container for related diagnostic messages

%SQL_Diag.Message Table

The %SQL_Diag.Message table contains granular diagnostic details:

  • actor: User or process executing the operation
  • message: Descriptive text explaining the diagnostic event
  • messageTime: Timestamp when message was generated
  • severity: Indication of message importance or error level
  • sqlcode: SQLCODE value for errors or status codes
  • diagResult: Relationship property linking to parent %SQL_Diag.Result record
  • Storage: Underlying data stored in global ^IRIS.sql.diag.message

Error Logging During LOAD DATA

When LOAD DATA encounters errors during bulk loading:

  • Automatic logging: Detailed diagnostic information written automatically
  • Error types captured: Row-level errors, data type mismatches, constraint violations, format problems
  • Diagnostic detail: Captures which rows failed and why
  • Use with maxerrors: Particularly valuable when allowing LOAD DATA to skip problematic rows

Accessing Diagnostic Logs

Multiple interfaces available for accessing SQL diagnostic logs:

  • Management Portal: System Operation > System Logs > SQL Diagnostics Logs
  • Displays chronological log of LOAD DATA commands
  • Shows performance metrics and operation details for current namespace
  • Programmatic access: Query %SQL_Diag tables using SQL
  • Example: `SELECT * FROM %SQL_Diag.Message WHERE severity > 3 ORDER BY messageTime DESC`

Best Practices

  • Regularly monitor diagnostic logs after bulk load operations
  • Filter messages by severity to prioritize critical errors
  • Correlate %SQL_Diag.Message entries with parent %SQL_Diag.Result operations
  • Archive or purge old diagnostic data to prevent unbounded growth
  • Incorporate diagnostic queries into automated load workflows

3. Understands data export options (physical vs logical mode)

Key Points

  • SelectMode Options: Controls data format for queries and exports (Logical=0, ODBC=1, Display=2)
  • Logical Mode (default): Returns data in internal storage format with system characters
  • Display Mode: Transforms data to user-friendly display format using LogicalToDisplay() methods
  • ODBC Mode: Converts data to ODBC-compatible format for external tool compatibility
  • Export Implications: SelectMode affects how dates, times, lists, and null values are represented

Detailed Notes

Overview

InterSystems IRIS provides three distinct data representation modes that control how query results and exported data are formatted.

  • Purpose: Determines whether exported data appears in internal storage format or transformed display format
  • SelectMode values: Logical (0), ODBC (1), Display (2)

Logical Mode (SelectMode=0)

Logical mode is the default setting and returns data exactly as stored in the database's internal format.

  • Dates/Times: Internal integer values (days since December 31, 1840 for dates; seconds since midnight for times)
  • Lists: Two non-printing characters (one before first item, one as separator)
  • Empty strings/BLOBs: Represented by non-display character $CHAR(0)
  • Boolean values: Stored as 1/0 or other internal representations
  • Use cases: Database-to-database transfers, data backups, preserving exact internal representations

Display Mode (SelectMode=2)

Display mode transforms data into human-readable format using each data type's LogicalToDisplay() method.

  • Dates: Formatted strings like "2025-12-20"
  • Times: Display as "14:30:00"
  • Lists: Items separated by blank spaces
  • Empty strings: Appear as true empty strings ("") rather than $CHAR(0)
  • Boolean values: Display as "true"/"false" or "1"/"0" depending on configuration
  • Use cases: Reports, data for human review, files for manual editing before re-import

ODBC Mode (SelectMode=1)

ODBC mode converts data to ODBC-compatible formats using LogicalToOdbc() methods.

  • Dates: YYYY-MM-DD format
  • Times: HH:MM:SS format
  • Lists: Comma separators between items
  • NULL values: Properly represented as SQL NULL
  • Use cases: External relational databases, BI tools, spreadsheet applications, ODBC-connected clients

Configuring SelectMode

SelectMode can be configured at multiple levels:

  • Current process: `$SYSTEM.SQL.Util.SetOption("SelectMode",n)` where n is 0, 1, or 2
  • Embedded SQL: `#sqlcompile select` directive (options: Logical, Display, ODBC, Runtime, Text, FDBMS)
  • Dynamic SQL: %SelectMode property on the statement object
  • DDL import: SelectMode parameter determines mode for imported statements

Round-Trip Data Considerations

Understanding mode transformations is critical for round-trip data scenarios:

  • Export transformation: Data exported in Display or ODBC mode has been transformed from Logical format
  • Re-import requirement: Must convert back to Logical format using DisplayToLogical() or OdbcToLogical() methods
  • Runtime mode: SQL compiler's Runtime select mode allows dynamic conversion based on %SelectMode at execution time

Practical Export Guidelines

  • Logical mode: Database migrations, backups, InterSystems-to-InterSystems transfers
  • ODBC mode: External databases, BI tools, ODBC clients expecting standard SQL formats
  • Display mode: Human-readable reports, CSV files for manual review, custom applications expecting formatted strings
  • Documentation: Always document which SelectMode was used for exports

Exam Preparation Summary

Critical Concepts to Master:

  1. LOAD DATA Syntax: Know basic FROM FILE syntax, COLUMNS clause for column mapping, and VALUES clause for mapping source to target columns
  2. LOAD DATA Options: Understand BULK keyword (disables constraints/indexes), %NOJOURN option (disables journaling/transactions), and their privilege requirements
  3. USING Clause Configuration: Master JSON options for header row handling, delimiter specification, charset encoding, and maxerrors threshold
  4. JVM Requirements: Remember LOAD DATA requires Java Virtual Machine installation and %Gateway_Object:USE privilege
  5. %SQL_Diag Schema: Know that %SQL_Diag.Result logs operations and %SQL_Diag.Message stores detailed error messages with SQLCODE values
  6. Diagnostic Access: Understand Management Portal path (System Operation > System Logs > SQL Diagnostics Logs) for viewing load errors
  7. SelectMode Values: Memorize Logical=0 (internal storage), ODBC=1 (ODBC compatibility), Display=2 (user-friendly format)
  8. Mode Differences: Distinguish how dates, times, lists, empty strings, and NULL values appear in each SelectMode

Common Exam Scenarios:

  • Writing LOAD DATA command with custom delimiter and header row handling
  • Determining appropriate BULK and %NOJOURN usage based on data validation requirements
  • Querying %SQL_Diag tables to identify why a LOAD DATA operation failed
  • Selecting correct SelectMode for exporting data to external ODBC tools
  • Mapping source file columns to target table columns using COLUMNS and VALUES clauses
  • Calculating privileges required for LOAD BULK %NOJOURN DATA operations
  • Interpreting diagnostic message severity levels and SQLCODE values
  • Choosing between Logical, Display, and ODBC modes for specific export requirements

Hands-On Practice Recommendations:

  • Create sample CSV files with headers and use LOAD DATA with various USING configurations
  • Execute LOAD DATA with intentional data errors and review %SQL_Diag.Message table
  • Compare LOAD performance with and without BULK and %NOJOURN options
  • Practice writing COLUMNS and VALUES clauses to map columns in different orders
  • Query SQL Diagnostics Logs from Management Portal after load operations
  • Export table data in Logical, ODBC, and Display modes and compare results
  • Configure maxerrors option and observe rollback behavior when threshold is exceeded
  • Grant %Gateway_Object:USE, %NOCHECK, %NOINDEX, %NOLOCK, and %NOJOURN privileges to test users

Report an Issue