EK9 Security: Input Sanitization and Sensitive Data

EK9 provides a comprehensive security framework with two complementary layers:

Input Sanitization — runtime detection and blocking of common web attacks (SQL injection, XSS, command injection, path traversal, XXE, SSTI) using the sanitized modifier and InputSanitizer class
Sensitive Data Protection — compile-time detection of hardcoded secrets (100+ credential patterns) plus the Sensitive built-in type for safe runtime credential handling with automatic redaction

Both layers follow a "Rejection at the Source" philosophy and are secure by default with zero configuration required, while remaining enterprise extensible for organizations that need custom security policies or specialized logging formats.

The sanitized Parameter Modifier
InputSanitizer Class
Log Format Configuration
SanitizationContext
Integration Examples
Best Practices
Sensitive Type and Secret Detection
Related Compiler Errors

The `sanitized` Parameter Modifier

The simplest way to add input sanitization is the sanitized modifier on incoming String parameters. When a function or method is called with a sanitized parameter, EK9 automatically creates a defensive copy of the String at the call site and sanitizes it before passing it to the function.

#!ek9
defines module introduction

  defines function

    executeQuery()
      -> sql as sanitized String
      <- result as String?
      //The 'sql' parameter is automatically sanitized at the call site
      //before this function receives it - preventing SQL injection
      ...

    processUserInput()
      ->
        name as sanitized String
        comment as sanitized String
      <- success as Boolean?
      //Both parameters are sanitized before use
      ...
//EOF

How it works: The sanitized modifier triggers automatic sanitization:

The original caller's String is never modified
A defensive copy is created at the call site
The copy is passed through the InputSanitizer threat detection
If a threat is detected, the parameter becomes unset
The function receives either the clean String or an unset value

Important Restrictions

The sanitized modifier can only be used on incoming String parameters
It cannot be used on fields/properties in classes, records, or other constructs
It cannot be used on return parameters or local variables
Direct assignment from a sanitized parameter to a local variable is blocked (see E07930)
When overriding methods, the sanitized modifier must match between the super method and the override (Liskov Substitution Principle)

Target Sites vs Call Sites: The "Rejection at the Source" Philosophy

The sanitized modifier is a target site annotation — it belongs on function/method parameter definitions where untrusted data first enters the system. It cannot be used at:

Call sites: myFunc(sanitized arg) — Wrong! The callee defines sanitization, not the caller.
Captures: (sanitized capturedVar) extends Handler — Wrong! Captured variables are already in the trusted boundary.
Variable declarations: name <- sanitized getValue() — Wrong! The function returning the value should sanitize its parameters.

Why this design? Security reasoning becomes simpler when sanitization happens in one place:

Entry points define where untrusted data enters
Everything inside the trust boundary is already safe
The compiler enforces sanitization automatically at call sites

This eliminates ambiguity about who is responsible for sanitization. The function/method author specifies requirements; the compiler enforces them at all call sites.

#!ek9
//WRONG: Trying to sanitize at call site
processInput(sanitized userInput)  //E07942: Caller can't specify sanitization

//WRONG: Trying to sanitize at capture
handler <- (sanitized var) extends BaseHandler  //E07941: Capture already trusted

//WRONG: Trying to sanitize at variable declaration
name <- sanitized getValue()  //E07943: Wrong location for sanitization

//CORRECT: Sanitize at the definition (entry point)
processInput()
  -> data as sanitized String  //OK: Target site annotation
  ...

Why Direct Assignment is Blocked

When you write local <- sanitizedParam, both variables point to the same sanitized copy in memory. This creates "hidden aliasing" where mutations to one variable affect the other — defeating the purpose of defensive copying.

#!ek9
//BAD: Creates hidden alias
processInput()
  -> input as sanitized String
  local <- input      // ERROR: 'local' is alias to 'input'
  local += " modified" // Also modifies 'input'!

//GOOD: Explicit copy
processInput()
  -> input as sanitized String
  local <- String(input)  // OK: Explicit copy constructor
  local += " modified"     // Only affects 'local'

InputSanitizer Class

The InputSanitizer class is the core threat detection engine in EK9. It provides programmatic access to the same threat detection used by the sanitized modifier, allowing explicit control over when and how sanitization occurs.

This class is particularly useful when you need to:

Log threats to a specific output destination
Check safety without modifying the input
Get detailed threat type information
Handle environment variables (skip path traversal checks)

Constructors

#!ek9
//Default - always logs threats to stderr
sanitizer <- InputSanitizer()

//With custom reporter (alternative destination)
sanitizer <- InputSanitizer(Stderr())  //explicit stderr
logFile <- TextFile("/var/log/security.log").output()
sanitizer <- InputSanitizer(logFile)   //log to file
//EOF

The InputSanitizer always logs detected threats — this cannot be disabled. By default, threats are logged to stderr. You can optionally provide a custom StringOutput destination (file, stdout, etc.).

The log format is controlled by the EK9_SANITIZER_LOG_FORMAT environment variable (see Log Format Configuration).

Methods

#!ek9
defines module introduction

  defines program
    SanitizerExample()
      sanitizer <- InputSanitizer(Stderr())

      //Sanitize input - returns unset if threat detected
      userInput <- "some user input"
      result <- sanitizer.sanitize(userInput)
      if result?
        //Safe to use
        processData(result)
      else
        //Threat detected, handle appropriately
        handleThreatDetection()

      //Check safety without modification
      if sanitizer.isSafe(userInput)
        processData(userInput)

      //Get threat type(s) as comma-separated string
      threat <- sanitizer.detectThreat(userInput)
      if threat?
        logThreat(threat)  //e.g., "SQL_INJECTION,XSS"

      //For environment variables (skip path traversal checks)
      envValue <- EnvVars().get("CONFIG_PATH")
      if envValue?
        cleanEnvValue <- sanitizer.sanitizeWithoutPathChecks(envValue.get())
//EOF

Threat Types Detected

The InputSanitizer detects the following threat categories, aligned with the OWASP Top 10:

Threat Type	Description	Example Pattern
SQL_INJECTION	SQL injection patterns	`' OR '1'='1`, `; DROP TABLE`
COMMAND_INJECTION	Shell/OS command injection	`; rm -rf /`, `\| cat /etc/passwd`
PATH_TRAVERSAL	Directory traversal attacks	`../../../etc/passwd`
XSS	Cross-Site Scripting	`<script>alert('xss')</script>`
XXE	XML External Entity attacks	`<!ENTITY xxe SYSTEM "file://">`
SSTI	Server-Side Template Injection	`{{7*7}}`, `${T(java.lang.Runtime)}`

When multiple threats are detected in a single input, the threat types are returned as a comma-separated string (e.g., "SQL_INJECTION,XSS").

Log Format Configuration

The InputSanitizer log format is controlled by the EK9_SANITIZER_LOG_FORMAT environment variable. This allows you to integrate with your existing SIEM infrastructure without any code changes.

# Set log format (default: JSON)
export EK9_SANITIZER_LOG_FORMAT=JSON    # Simple JSON (universal, default)
export EK9_SANITIZER_LOG_FORMAT=ECS     # Elastic Common Schema
export EK9_SANITIZER_LOG_FORMAT=CEF     # Common Event Format
export EK9_SANITIZER_LOG_FORMAT=SIMPLE  # Simple bracket format [THREAT_TYPE]
export EK9_SANITIZER_LOG_FORMAT=SILENT  # Suppress all output (testing only)

JSON Format (Default)

Simple JSON format that works with any log shipper or monitoring tool:

{
  "timestamp": "2026-01-21T10:30:45.123Z",
  "level": "warn",
  "threat": "SQL_INJECTION",
  "message": "Dangerous input detected: ' OR '1'='1"
}

ECS Format (Elastic Common Schema)

ECS-aligned JSON for Elasticsearch, Splunk, Datadog, CloudWatch, and Google Cloud Logging:

{
  "@timestamp": "2026-01-21T10:30:45.123Z",
  "log.level": "warn",
  "event.category": "intrusion_detection",
  "event.type": "denied",
  "event.action": "input_rejected",
  "threat.indicator.type": "SQL_INJECTION",
  "message": "Dangerous input detected",
  "source.ip": "192.168.1.100",
  "service.name": "UserService",
  "ek9.field.name": "userId",
  "ek9.input.value": "' OR '1'='1",
  "ecs.version": "8.11"
}

CEF Format (Common Event Format)

CEF for ArcSight, Azure Sentinel, QRadar, and LogRhythm:

CEF:0|EK9|InputSanitizer|1.0|SQL_INJECTION|Dangerous input detected|8|src=192.168.1.100 svc=UserService cs1=userId cs1Label=FieldName msg=' OR '1'='1

CEF Severity Mapping:

SQL_INJECTION: Severity 8 (High)
COMMAND_INJECTION: Severity 9 (Very High)
PATH_TRAVERSAL: Severity 6 (Medium)
XSS: Severity 7 (High)
XXE: Severity 8 (High)
SSTI: Severity 7 (High)

SIMPLE Format

Minimal bracket format for simple scripts, debugging, or when JSON parsing is overkill:

[SQL_INJECTION] Dangerous input detected: ' OR '1'='1

No timestamp, no context fields — just the threat type and input value.

SILENT Format

Suppresses all sanitizer output. This is useful for testing where the sanitizer behavior is not the focus of the test. For example, when testing bytecode generation for code that uses the sanitized keyword, you may not want sanitizer logs polluting test output.

Warning: Do not use SILENT in production — you will lose visibility into attack attempts. This format is intended only for testing and development scenarios.

SanitizationContext

A record that captures metadata for security event logging. When you call InputSanitizer methods with a SanitizationContext, the context fields are included in the log output (for ECS and CEF formats).

#!ek9
defines module org.ek9.lang

  defines record
    SanitizationContext
      timestamp as DateTime     // When sanitization occurred
      service as String         // "UserService"
      operation as String       // "getUser"
      fieldName as String       // "userId"
      fieldSource as String     // "PATH", "QUERY", "HEADER", "CONTENT"
      sourceIp as String        // Client IP address
      traceId as String         // Request correlation ID
//EOF

Using SanitizationContext with InputSanitizer:

#!ek9
defines module introduction

  defines program
    ContextExample()
      sanitizer <- InputSanitizer()

      //Create context with rich metadata
      context <- SanitizationContext(
        DateTime(),
        "UserService",
        "getUser",
        "userId",
        "PATH",
        "192.168.1.100",
        "trace-abc-123"
      )

      userInput <- "some user input"

      //Sanitize with context - logs include all context fields
      result <- sanitizer.sanitize(userInput, context)
      if result?
        processData(result)

      //Check safety with context
      if sanitizer.isSafe(userInput, context)
        processData(userInput)
//EOF

When context is provided, the log output includes all set fields. Fields that are not set are simply omitted from the log output. This provides rich metadata for security teams to investigate incidents, correlate attacks, and identify patterns.

Integration Examples

TextFile Automatic Sanitization

When reading files specified by user input, use sanitized paths to prevent path traversal:

#!ek9
defines module introduction

  defines function
    readUserFile()
      -> filename as sanitized String
      <- content as String?

      if filename?
        file <- TextFile(filename)
        if file.exists() and file.isReadable()
          content: file.readAll()
//EOF

Web Service Input Handling

In web services, use the sanitized modifier on path parameters, query parameters, and request body fields:

#!ek9
defines module introduction

  defines service

    UserService :/users

      getUser() :/{userId} as GET
        -> userId as sanitized String
        <- response as HTTPResponse?
        //userId is automatically sanitized before reaching this method
        ...

      searchUsers() :/search as GET
        -> query as sanitized String
        <- response as HTTPResponse?
        //query parameter is sanitized
        ...
//EOF

Command-Line Argument Processing

When processing command-line arguments that will be used in file operations or external commands:

#!ek9
defines module introduction

  defines program
    ProcessFiles()
      -> argv as List of String

      sanitizer <- InputSanitizer(Stderr())

      for arg in argv
        cleanArg <- sanitizer.sanitize(arg)
        if cleanArg?
          processFile(cleanArg)
        else
          Stderr().println("Rejected potentially dangerous argument")
//EOF

Best Practices

Use the `sanitized` Modifier by Default

For any function or method that accepts user-provided String input, add the sanitized modifier. This is the simplest and most effective defense:

#!ek9
//GOOD: Default to sanitized for user input
processUserData()
  -> input as sanitized String
  ...

//Only omit sanitized for trusted internal data
processInternalData()
  -> trustedInput as String
  ...

Handle Unset Results Gracefully

When sanitization detects a threat, the parameter becomes unset. Design your business logic to handle this case with appropriate error messages:

#!ek9
createUser()
  -> username as sanitized String
  <- result as Boolean?

  if username?
    //Safe to process
    result: doCreateUser(username)
  else
    //Threat detected - apply normal business validation message
    //This provides defense in depth without information leakage
    result: false
    logValidationError("Invalid username format")

Use InputSanitizer for Environment Variables

Environment variables may legitimately contain paths. Use sanitizeWithoutPathChecks() for these cases:

#!ek9
sanitizer <- InputSanitizer(Stderr())
envVars <- EnvVars()

//Path traversal patterns are legitimate in env vars
configPath <- envVars.get("CONFIG_PATH")
if configPath?
  cleanPath <- sanitizer.sanitizeWithoutPathChecks(configPath.get())
  ...

Enterprise Logging Integration

For production systems, configure the log format via the EK9_SANITIZER_LOG_FORMAT environment variable to integrate with your SIEM:

#Production deployment - route stderr to SIEM
export EK9_SANITIZER_LOG_FORMAT=ECS   #For Elasticsearch/Splunk/Datadog
#or
export EK9_SANITIZER_LOG_FORMAT=CEF   #For ArcSight/Azure Sentinel/QRadar

In production, route stderr to your log aggregator (Fluentd, Filebeat, etc.) and the security events will automatically flow to your SIEM.

Sensitive Type and Secret Detection

Input sanitization protects against malicious input at runtime. EK9 also provides a complementary layer that protects against leaked credentials — both at compile time and at runtime. Together, these two systems form a comprehensive security framework.

Compile-Time Secret Detection

The EK9 compiler automatically scans all string literals (including text segments within interpolated strings) for patterns matching known credential formats. If a hardcoded secret is detected, compilation fails immediately — the secret never reaches version control, build artifacts, or production.

Detected credential categories include:

Error Code	Category	Example Patterns
E11080	Cloud Provider	AWS keys (`AKIA...`), GCP API keys (`AIza...`), Azure connection strings
E11081	Platform Token	GitHub (`ghp_`), GitLab (`glpat-`), Slack (`xoxb-`), npm, Shopify, Heroku
E11082	Private Key	PEM headers: RSA, EC, DSA, PKCS8, OPENSSH private keys
E11083	Database URL	`postgres://user:pass@host`, MySQL, MongoDB, Redis, JDBC
E11084	JWT Token	`eyJhbGci...` three-part header.payload.signature structure
E11086	API Key	Stripe (`sk_test_`), Anthropic (`sk-ant-`), SendGrid (`SG.`), 50+ services

This detection covers 100+ distinct patterns across cloud providers, platform tokens, private keys, database URLs, JWT tokens, and API keys — comparable to commercial tools like GitGuardian and TruffleHog, but enforced at compile time rather than after the fact.

The Sensitive Type

Once secrets are removed from source code, they need to be loaded securely at runtime. The Sensitive built-in type wraps secret values with automatic protection:

Auto-redaction: $ and $$ operators always return "***REDACTED***" — secrets cannot leak through logging, error messages, or string interpolation
Constant-time equality: comparison uses MessageDigest.isEqual() to prevent timing attacks
Controlled construction: the only way to create a set Sensitive value is through EnvVars.sensitiveGet() — there is no String constructor visible to EK9 code
Gated access: reveal() returns the raw secret but requires the Privileged marker trait (E11090)

#!ek9
defines module introduction

  defines class

    HttpClient with trait of Privileged
      apiKey as Sensitive?

      HttpClient()
        -> key as Sensitive
        apiKey: key

      sendRequest()
        <- response as String?
        //reveal() only works because HttpClient has the Privileged trait
        if apiKey?
          header <- apiKey.reveal()
          response: doHttpCall(header)

  defines function

    demo()
      env <- EnvVars()
      //sensitiveGet() is the ONLY way to create a set Sensitive value
      key <- env.sensitiveGet("API_KEY")
      if key?
        client <- HttpClient(key)
        result <- client.sendRequest()

        stdout <- Stdout()
        //Safe: printing key shows "***REDACTED***", not the actual secret
        stdout.println(`Key: ${key}`)
        if result?
          stdout.println(result)
//EOF

The Privileged trait creates an auditable access boundary — searching for with trait of Privileged in any codebase gives a complete list of every class that can access raw secret values.

Two Layers Working Together

The sanitization and sensitive data systems complement each other:

	Input Sanitization	Sensitive Type
Protects against	Malicious input (SQL injection, XSS, etc.)	Credential leakage (API keys, passwords, tokens)
When	Runtime (at function entry points)	Compile time (literals) + Runtime (redaction)
Mechanism	`sanitized` modifier, `InputSanitizer`	`Sensitive` type, `Privileged` trait
On failure	Parameter becomes unset + threat logged	Compilation fails (literals) or redacted output (runtime)

The following compiler errors relate to the sanitized modifier:

E07910 — Sanitized parameter must be String type. The sanitized modifier can only be applied to String parameters.
E07920 — Sanitized only valid on incoming parameters. Cannot use sanitized on return parameters, local variables, or fields.
E07930 — Cannot assign from sanitized parameter. Direct assignment creates hidden aliasing. Use explicit copy constructor instead.
E07940 — Sanitized modifier must match in overrides. When overriding a method, the sanitized modifier must match the super method (Liskov Substitution Principle).
E07941 — Sanitized not allowed in capture. Captured variables are already in the trusted boundary. Sanitize at the original entry point instead.
E07942 — Sanitized not allowed at call site. The function/method definition specifies sanitization, not the caller.
E07943 — Sanitized not allowed in variable declaration. Sanitize at the function/method entry point where data first enters the system.

The following compiler errors relate to secret detection and the Sensitive type:

E11080 — Hardcoded cloud provider credential. AWS, GCP, or Azure credential pattern detected in a string literal.
E11081 — Hardcoded platform token. GitHub, GitLab, Slack, npm, Shopify, Heroku, or other platform token detected.
E11082 — Hardcoded private key material. PEM private key header (RSA, EC, DSA, PKCS8, OPENSSH) detected.
E11083 — Hardcoded database credential. Database URL with embedded password (PostgreSQL, MySQL, MongoDB, Redis, JDBC) detected.
E11084 — Hardcoded JWT token. JWT three-part structure (eyJ...) detected.
E11086 — Hardcoded API key. Known API key pattern (Stripe, Anthropic, OpenAI, SendGrid, etc.) detected.
E11090 — Privileged access required. reveal() called on Sensitive without Privileged trait.

Next Steps

For more details on related topics:

Built-in Types — Sensitive, EnvVars, and Privileged type documentation
Security Types — Overview of security components in the standard library
Code Quality — How EK9 enforces quality at compile time
Web Services — Building secure web services with EK9
Components — Understanding the component model for enterprise extension

Contents

The sanitized Parameter Modifier