Compiler Fuzzing in EK9

EK9 includes a built-in grammar-based fuzzer that generates random EK9 programs, compiles them, and tracks statistics including error code coverage, phase penetration, and compiler crashes. This is used for compiler quality assurance — verifying that the compiler handles every possible input gracefully.

This feature has been built into the compiler and shipped, so that anyone can test the compiler.
If you do find an crashes or issues, then let us know and supply the 'offending' ek9 source that caused the issue. Then we can address it. We build new versions of the complier weekly, so a fix won't take long.

Looking for test generation? If you want to generate tests for your code, see Test Generation instead. This page documents the compiler fuzzer that tests the compiler itself.

Fuzzer dashboard overview showing KPI cards and status banner

What is Compiler Fuzzing?

The EK9 compiler has a 22-phase compilation pipeline, over 300 distinct error codes, and supports 29 construct types with complex interactions. Grammar-based fuzzing systematically exercises this complexity by generating random-but-structurally-plausible EK9 programs and compiling them.

The fuzzer answers three questions:

Running the Fuzzer

The fuzzer runs for a specified number of minutes, generating and compiling programs continuously:

  • $ ek9 -fuzz 30 // 30 minutes, human-readable output
  • $ ek9 -fuzz0 30 // Terse CI pass/fail (one line)
  • $ ek9 -fuzz2 1440 // 24 hours, JSON for pipelines
  • $ ek9 -fuzz6 60 // 1 hour, HTML dashboard

The format suffix convention (-fuzz0, -fuzz2, -fuzz6) mirrors the test format convention (-t0, -t2, -t6) — each action owns its format suffixes.

Crash-triggering source files are always saved to ./fuzz-crashes/ regardless of output format. The HTML dashboard is written to ./fuzz-report/.

Three-Strand Generation Strategy

The fuzzer uses three complementary generation strategies to maximise the diversity of generated programs. Each strand produces different error distributions, and together they exercise the compiler more thoroughly than any single strategy could.

Strand 1: Template-Based ATN Generation

The primary strand uses 25+ built-in templates for common EK9 patterns (classes, functions, traits, records, etc.) and fills them using ANTLR4 Augmented Transition Network (ATN) walks of the EK9 grammar. At each grammar decision point, the generator makes a random choice, producing structurally plausible code. This strand produces the highest density of syntactically correct programs.

Strand 2: Compiler-Aware Injection

This strand harvests real symbols from previously compiled Q&A example files (426 templates) and injects them into generated programs. By using real type names, method signatures, and module structures, these programs exercise deeper semantic analysis phases that pure random generation rarely reaches.

Strand 3: Template Mutation

This strand takes working Q&A example files and applies single-point mutations: dropping modifiers, swapping types, changing operators, altering indentation, duplicating lines, injecting boolean literals, stripping guards, and swapping adjacent statements. These targeted mutations exercise specific error detection paths (E08010, E08030, E08081, E11050, etc.) that random generation is unlikely to trigger.

Strand Share Strength
Template-Based ATN~50%High volume, broad grammar coverage, many parse errors for parser robustness
Compiler-Aware~25%Deeper phase penetration, exercises type resolution and semantic checks
Template Mutation~25%Targeted error code coverage, exercises specific detection logic

Output Formats

Human-Readable (-fuzz)

The default format prints terminal histograms, phase penetration charts, and a summary to stdout. Suitable for interactive monitoring during development:

EK9 Fuzzer: 30 minutes, seed 1709312456789
Programs: 14,832 | Parse: 72.4% | Crashes: 631 | Errors: 156/307 (50.8%)

Phase Distribution:
  READING                        ████░░░░░░░░░░░░░░░░  5.8%
  SYMBOL_DEFINITION              █████████░░░░░░░░░░░░ 23.9%
  FULL_RESOLUTION                ██████████░░░░░░░░░░░ 26.0%
  CODE_GENERATION_AGGREGATES     ████████████░░░░░░░░░ 32.3%

Terse CI (-fuzz0)

One-line pass/fail for CI gates. Returns exit code 0 if no new crashes were found:

FUZZ OK: 14832 programs, 0 new crashes, 156/307 errors (50.8%) in 30m

JSON (-fuzz2)

Produces two files for programmatic analysis:

{
  "duration": "PT30M",
  "programs": 14832,
  "parseRate": 0.724,
  "crashes": 631,
  "errorCoverage": { "triggered": 156, "total": 307 },
  "phases": { "READING": 860, "SYMBOL_DEFINITION": 3546, ... },
  "constructs": { "class": 4231, "function": 3892, ... }
}

HTML Dashboard (-fuzz6)

Generates an interactive dashboard at ./fuzz-report/index.html with charts, heatmaps, and drill-down details. This is the richest output format and the recommended way to review fuzzing results.

Reading the Dashboard

The HTML dashboard (-fuzz6) provides eight visualisation sections. Each is described below with the key metrics to watch.

Status Banner and KPI Cards

Fuzzer dashboard status banner with duration, throughput, and four KPI donut charts

The status banner shows duration, programs generated, throughput, crash count, and corrections. The border colour indicates overall health: green (no crashes), amber (few crashes), or red (significant crashes).

Four KPI donut charts provide at-a-glance metrics:

Timing Breakdown and Source Statistics

Timing breakdown donuts showing generation, parse check, and compilation time split

Source statistics grid showing min/avg/max lines, total bytes, files generated, and compile rate

Three mini-donuts show where time is spent: generation, parse checking, and compilation. Source statistics show min/avg/max lines per program, total bytes generated, file counts, and compile rate. If compilation dominates, programs are reaching deep phases (good). If parse checking dominates, most programs fail early (consider adjusting generation strategy).

Phase Distribution

Stacked bar chart showing program counts per compilation phase

Horizontal bars show how far programs penetrate the 20-phase compilation pipeline. Each bar represents a phase where programs were rejected — programs that pass a phase move to the next bar. A healthy distribution shows programs spread across all phases, not clustered at the front.

Error Code Coverage Heatmap

Interactive heatmap grid of 307 error codes grouped by category

The largest dashboard section shows all ~307 compiler error codes as a searchable, filterable grid. Error codes are grouped by category (E01xxx Lexer/Parser, E05xxx Hierarchy, E06xxx Resolution, etc.).

Use the search box to find specific error codes, or the filter buttons (All / Triggered / Untriggered) to focus on gaps.

Construct Coverage

Heatmap of 29 EK9 construct types with crash indicators

A heatmap of all 29 EK9 construct types: class, function, record, trait, service, component, program, enumeration, generic-type, dynamic-class, dynamic-function, and more. Colour intensity indicates frequency of generation. Red-bordered cells with a pulse animation indicate constructs that have caused compiler crashes.

The goal is uniform coverage across all construct types. If some constructs are underrepresented, the generation templates may need adjustment.

Control Flow Coverage

Grouped bar charts for control flow types with statement, guard, and expression sub-bars

Grouped horizontal bars for each control flow type (for-in, do-while, for-range, switch, if, while, throw, try-catch, stream, etc.). Each type has three sub-bars:

Crash badges on specific control flow types highlight where the compiler is most vulnerable. Stream operations and deeply nested constructs often reveal the most interesting bugs.

Argument Count Distribution

Heatmap grid showing frequency of 0 to 25+ argument counts

Shows the frequency distribution of argument counts (0-25+) in generated functions and methods. A realistic distribution has most functions with 0-3 parameters, with decreasing frequency for higher counts. Edge cases at 15+ parameters stress the compiler's parameter handling.

Template Usage

Template utilization heatmap showing usage across 426 Q and A example files

Shows utilisation of the 426 Q&A example templates used by Strand 2 (compiler-aware injection) and Strand 3 (template mutation). Identifies underused templates that may need attention to ensure comprehensive coverage.

Best Practices

See Also