EK9 Input Sanitization

EK9 implements a comprehensive "Rejection at the Source" security model that automatically detects and blocks common web attacks before malicious input can propagate through your application. This includes SQL injection, XSS, command injection, path traversal, XXE, and SSTI attacks.

The security framework is designed to be secure by default with zero configuration required, while remaining enterprise extensible for organizations that need custom security policies or specialized logging formats.

Contents


The sanitized Parameter Modifier

The simplest way to add input sanitization is the sanitized modifier on incoming String parameters. When a function or method is called with a sanitized parameter, EK9 automatically creates a defensive copy of the String at the call site and sanitizes it before passing it to the function.

#!ek9
defines module introduction

  defines function

    executeQuery()
      -> sql as sanitized String
      <- result as String?
      //The 'sql' parameter is automatically sanitized at the call site
      //before this function receives it - preventing SQL injection
      ...

    processUserInput()
      ->
        name as sanitized String
        comment as sanitized String
      <- success as Boolean?
      //Both parameters are sanitized before use
      ...
//EOF

How it works: The sanitized modifier triggers automatic sanitization:

Important Restrictions

Target Sites vs Call Sites: The "Rejection at the Source" Philosophy

The sanitized modifier is a target site annotation — it belongs on function/method parameter definitions where untrusted data first enters the system. It cannot be used at:

Why this design? Security reasoning becomes simpler when sanitization happens in one place:

  1. Entry points define where untrusted data enters
  2. Everything inside the trust boundary is already safe
  3. The compiler enforces sanitization automatically at call sites

This eliminates ambiguity about who is responsible for sanitization. The function/method author specifies requirements; the compiler enforces them at all call sites.

#!ek9
//WRONG: Trying to sanitize at call site
processInput(sanitized userInput)  //E07942: Caller can't specify sanitization

//WRONG: Trying to sanitize at capture
handler <- (sanitized var) extends BaseHandler  //E07941: Capture already trusted

//WRONG: Trying to sanitize at variable declaration
name <- sanitized getValue()  //E07943: Wrong location for sanitization

//CORRECT: Sanitize at the definition (entry point)
processInput()
  -> data as sanitized String  //OK: Target site annotation
  ...

Why Direct Assignment is Blocked

When you write local <- sanitizedParam, both variables point to the same sanitized copy in memory. This creates "hidden aliasing" where mutations to one variable affect the other — defeating the purpose of defensive copying.

#!ek9
//BAD: Creates hidden alias
processInput()
  -> input as sanitized String
  local <- input      // ERROR: 'local' is alias to 'input'
  local += " modified" // Also modifies 'input'!

//GOOD: Explicit copy
processInput()
  -> input as sanitized String
  local <- String(input)  // OK: Explicit copy constructor
  local += " modified"     // Only affects 'local'


InputSanitizer Class

The InputSanitizer class is the core threat detection engine in EK9. It provides programmatic access to the same threat detection used by the sanitized modifier, allowing explicit control over when and how sanitization occurs.

This class is particularly useful when you need to:

Constructors

#!ek9
//Default - always logs threats to stderr
sanitizer <- InputSanitizer()

//With custom reporter (alternative destination)
sanitizer <- InputSanitizer(Stderr())  //explicit stderr
logFile <- TextFile("/var/log/security.log").output()
sanitizer <- InputSanitizer(logFile)   //log to file
//EOF

The InputSanitizer always logs detected threats — this cannot be disabled. By default, threats are logged to stderr. You can optionally provide a custom StringOutput destination (file, stdout, etc.).

The log format is controlled by the EK9_SANITIZER_LOG_FORMAT environment variable (see Log Format Configuration).

Methods

#!ek9
defines module introduction

  defines program
    SanitizerExample()
      sanitizer <- InputSanitizer(Stderr())

      //Sanitize input - returns unset if threat detected
      userInput <- "some user input"
      result <- sanitizer.sanitize(userInput)
      if result?
        //Safe to use
        processData(result)
      else
        //Threat detected, handle appropriately
        handleThreatDetection()

      //Check safety without modification
      if sanitizer.isSafe(userInput)
        processData(userInput)

      //Get threat type(s) as comma-separated string
      threat <- sanitizer.detectThreat(userInput)
      if threat?
        logThreat(threat)  //e.g., "SQL_INJECTION,XSS"

      //For environment variables (skip path traversal checks)
      envValue <- EnvVars().get("CONFIG_PATH")
      if envValue?
        cleanEnvValue <- sanitizer.sanitizeWithoutPathChecks(envValue.get())
//EOF

Threat Types Detected

The InputSanitizer detects the following threat categories, aligned with the OWASP Top 10:

Threat Type Description Example Pattern
SQL_INJECTION SQL injection patterns ' OR '1'='1, ; DROP TABLE
COMMAND_INJECTION Shell/OS command injection ; rm -rf /, | cat /etc/passwd
PATH_TRAVERSAL Directory traversal attacks ../../../etc/passwd
XSS Cross-Site Scripting <script>alert('xss')</script>
XXE XML External Entity attacks <!ENTITY xxe SYSTEM "file://">
SSTI Server-Side Template Injection {{7*7}}, ${T(java.lang.Runtime)}

When multiple threats are detected in a single input, the threat types are returned as a comma-separated string (e.g., "SQL_INJECTION,XSS").


Log Format Configuration

The InputSanitizer log format is controlled by the EK9_SANITIZER_LOG_FORMAT environment variable. This allows you to integrate with your existing SIEM infrastructure without any code changes.

# Set log format (default: JSON)
export EK9_SANITIZER_LOG_FORMAT=JSON    # Simple JSON (universal, default)
export EK9_SANITIZER_LOG_FORMAT=ECS     # Elastic Common Schema
export EK9_SANITIZER_LOG_FORMAT=CEF     # Common Event Format
export EK9_SANITIZER_LOG_FORMAT=SIMPLE  # Simple bracket format [THREAT_TYPE]
export EK9_SANITIZER_LOG_FORMAT=SILENT  # Suppress all output (testing only)

JSON Format (Default)

Simple JSON format that works with any log shipper or monitoring tool:

{
  "timestamp": "2026-01-21T10:30:45.123Z",
  "level": "warn",
  "threat": "SQL_INJECTION",
  "message": "Dangerous input detected: ' OR '1'='1"
}

ECS Format (Elastic Common Schema)

ECS-aligned JSON for Elasticsearch, Splunk, Datadog, CloudWatch, and Google Cloud Logging:

{
  "@timestamp": "2026-01-21T10:30:45.123Z",
  "log.level": "warn",
  "event.category": "intrusion_detection",
  "event.type": "denied",
  "event.action": "input_rejected",
  "threat.indicator.type": "SQL_INJECTION",
  "message": "Dangerous input detected",
  "source.ip": "192.168.1.100",
  "service.name": "UserService",
  "ek9.field.name": "userId",
  "ek9.input.value": "' OR '1'='1",
  "ecs.version": "8.11"
}

CEF Format (Common Event Format)

CEF for ArcSight, Azure Sentinel, QRadar, and LogRhythm:

CEF:0|EK9|InputSanitizer|1.0|SQL_INJECTION|Dangerous input detected|8|src=192.168.1.100 svc=UserService cs1=userId cs1Label=FieldName msg=' OR '1'='1

CEF Severity Mapping:

SIMPLE Format

Minimal bracket format for simple scripts, debugging, or when JSON parsing is overkill:

[SQL_INJECTION] Dangerous input detected: ' OR '1'='1

No timestamp, no context fields — just the threat type and input value.

SILENT Format

Suppresses all sanitizer output. This is useful for testing where the sanitizer behavior is not the focus of the test. For example, when testing bytecode generation for code that uses the sanitized keyword, you may not want sanitizer logs polluting test output.

Warning: Do not use SILENT in production — you will lose visibility into attack attempts. This format is intended only for testing and development scenarios.


SanitizationContext

A record that captures metadata for security event logging. When you call InputSanitizer methods with a SanitizationContext, the context fields are included in the log output (for ECS and CEF formats).

#!ek9
defines module org.ek9.lang

  defines record
    SanitizationContext
      timestamp as DateTime     // When sanitization occurred
      service as String         // "UserService"
      operation as String       // "getUser"
      fieldName as String       // "userId"
      fieldSource as String     // "PATH", "QUERY", "HEADER", "CONTENT"
      sourceIp as String        // Client IP address
      traceId as String         // Request correlation ID
//EOF

Using SanitizationContext with InputSanitizer:

#!ek9
defines module introduction

  defines program
    ContextExample()
      sanitizer <- InputSanitizer()

      //Create context with rich metadata
      context <- SanitizationContext(
        DateTime(),
        "UserService",
        "getUser",
        "userId",
        "PATH",
        "192.168.1.100",
        "trace-abc-123"
      )

      userInput <- "some user input"

      //Sanitize with context - logs include all context fields
      result <- sanitizer.sanitize(userInput, context)
      if result?
        processData(result)

      //Check safety with context
      if sanitizer.isSafe(userInput, context)
        processData(userInput)
//EOF

When context is provided, the log output includes all set fields. Fields that are not set are simply omitted from the log output. This provides rich metadata for security teams to investigate incidents, correlate attacks, and identify patterns.


Integration Examples

TextFile Automatic Sanitization

When reading files specified by user input, use sanitized paths to prevent path traversal:

#!ek9
defines module introduction

  defines function
    readUserFile()
      -> filename as sanitized String
      <- content as String?

      if filename?
        file <- TextFile(filename)
        if file.exists() and file.isReadable()
          content: file.readAll()
//EOF

Web Service Input Handling

In web services, use the sanitized modifier on path parameters, query parameters, and request body fields:

#!ek9
defines module introduction

  defines service

    UserService :/users

      getUser() :/{userId} as GET
        -> userId as sanitized String
        <- response as HTTPResponse?
        //userId is automatically sanitized before reaching this method
        ...

      searchUsers() :/search as GET
        -> query as sanitized String
        <- response as HTTPResponse?
        //query parameter is sanitized
        ...
//EOF

Command-Line Argument Processing

When processing command-line arguments that will be used in file operations or external commands:

#!ek9
defines module introduction

  defines program
    ProcessFiles()
      -> argv as List of String

      sanitizer <- InputSanitizer(Stderr())

      for arg in argv
        cleanArg <- sanitizer.sanitize(arg)
        if cleanArg?
          processFile(cleanArg)
        else
          Stderr().println("Rejected potentially dangerous argument")
//EOF


Best Practices

Use the sanitized Modifier by Default

For any function or method that accepts user-provided String input, add the sanitized modifier. This is the simplest and most effective defense:

#!ek9
//GOOD: Default to sanitized for user input
processUserData()
  -> input as sanitized String
  ...

//Only omit sanitized for trusted internal data
processInternalData()
  -> trustedInput as String
  ...

Handle Unset Results Gracefully

When sanitization detects a threat, the parameter becomes unset. Design your business logic to handle this case with appropriate error messages:

#!ek9
createUser()
  -> username as sanitized String
  <- result as Boolean?

  if username?
    //Safe to process
    result: doCreateUser(username)
  else
    //Threat detected - apply normal business validation message
    //This provides defense in depth without information leakage
    result: false
    logValidationError("Invalid username format")

Use InputSanitizer for Environment Variables

Environment variables may legitimately contain paths. Use sanitizeWithoutPathChecks() for these cases:

#!ek9
sanitizer <- InputSanitizer(Stderr())
envVars <- EnvVars()

//Path traversal patterns are legitimate in env vars
configPath <- envVars.get("CONFIG_PATH")
if configPath?
  cleanPath <- sanitizer.sanitizeWithoutPathChecks(configPath.get())
  ...

Enterprise Logging Integration

For production systems, configure the log format via the EK9_SANITIZER_LOG_FORMAT environment variable to integrate with your SIEM:

#Production deployment - route stderr to SIEM
export EK9_SANITIZER_LOG_FORMAT=ECS   #For Elasticsearch/Splunk/Datadog
#or
export EK9_SANITIZER_LOG_FORMAT=CEF   #For ArcSight/Azure Sentinel/QRadar

In production, route stderr to your log aggregator (Fluentd, Filebeat, etc.) and the security events will automatically flow to your SIEM.


The following compiler errors relate to the sanitized modifier:


Next Steps

For more details on related topics: