Your code passed review.

It still has a

heap buffer overflow.

DELTA CRS

catches it at the merge request.

Diff-Evaluated Localization, Triage, and Attack Cyber Reasoning System

Isaac Hung · Sin Liang Lee

The Problem

Bugs slip through every check

Traditional development workflows catch logic bugs, but memory safety vulnerabilities — buffer overflows, integer overflows, use-after-free — are invisible to code review, unit tests, and CI pipelines.

{ } Code Written PASS
? Code Review PASS
Test Unit Tests PASS
CI CI Pipeline PASS
!! Production CRASH

70% of security vulnerabilities in C/C++ are memory safety bugs.
They cost $4.88M per breach on average.
They're undetectable by code review alone.

The Technique

What is Fuzz Testing?

Fuzzing throws millions of random inputs at your code to find inputs that cause crashes. Combined with sanitizers (runtime checks), it catches buffer overflows, memory corruption, and other bugs that human reviewers miss.

fuzz_demo

Random Input

{ }

http_parse_request()

Result

Click "Run Fuzzer" to start

The fuzzer found a heap buffer overflow — a write past the end of allocated memory. An attacker could exploit this for remote code execution.

Developer Experience

One click. Zero config.

Assign Delta CRS as a reviewer on your merge request. It analyzes your code, fuzzes it, finds vulnerabilities, commits the fixes, and posts a report. You just review and merge.

1

Assign Reviewer

Developer assigns Delta CRS

2

Analyze

Read diff, find vulnerabilities

3

Fuzz

Generate harness, run campaign

4

Commit Fix

Push patches to MR branch

5

Report

Post findings as MR comment

Architecture

3-Step Duo Flow

Three specialized AI agents chained via GitLab Duo routers. Each step has scoped tools — analyze reads, fuzz writes and executes, report commits and posts.

1

analyze

10 tools · 180s timeout

Reads the MR diff, loads full source files for context, identifies vulnerability classes (buffer overflows, integer issues, input handling), plans fuzz targets.

get_merge_requestlist_merge_request_diffsread_filefind_filesgrep
2

fuzz

5 tools · 300s timeout

Generates a libFuzzer harness targeting vulnerable functions. Compiles with clang + ASan. Runs the compile-fix loop. Executes fuzzing campaign. Reproduces crashes.

create_file_with_contentsrun_commandread_file
3

report

6 tools · 120s timeout

Reads vulnerable files, generates fixes, commits patches directly to the MR branch via create_commit. Posts structured report with severity badges, stack traces, and patch diffs.

create_commitcreate_merge_request_noteget_merge_requestget_repository_file
analyze.final_answer fuzz.final_answer MR comment + commits

Key Innovation

Compile-Fix Loop

LLMs hallucinate struct members and invent function signatures. Instead of hoping for perfect code, we compile immediately and feed errors back to Claude until it gets it right.

1

Claude generates harness

LLVMFuzzerTestOneInput() targeting http_parse_request...

2

clang: 2 errors

error: cannot initialize 'char *' with rvalue of type 'void *'

error: no member named 'uri' in 'http_request_t'

3

Claude fixes errors

Added (char*) cast, changed req.uri to req.path...

4

Compilation successful

Built fuzz_target with -fsanitize=address,fuzzer

!

CRITICAL: heap-buffer-overflow

Found in 5 seconds · 754,071 executions · CWE-122

Powered by Anthropic

6 Claude AI Tasks

Each task uses a different Claude interaction pattern — structured JSON, code generation, or multi-turn tool use — with purpose-built system prompts.

Diff Analysis

generate_json

Identifies security-relevant changes, classifies risk, suggests fuzz targets

Harness Generation

generate_code

Writes complete libFuzzer harnesses targeting vulnerable functions

Harness Evaluation

generate_json

Scores existing harnesses 1-10, identifies coverage gaps

Corpus Synthesis

generate_code

Generates Python seed scripts producing protocol-aware inputs

Dictionary Generation

generate_json x3

Three parallel strategies: operand extraction, bug triggers, format strings

Patch Generation

generate_with_tools

Multi-turn ReAct agent with view_file, search_symbol tools for fault localization

Real Output

What the MR Gets

A structured security report posted as an MR comment, plus auto-committed patches.

AI

Delta CRS

posted 2 minutes ago

Delta CRS Security Fuzzing Report

MetricValue
Duration1.3 min
Total crashes1
Severity1 CRITICAL

CRITICAL — heap-buffer-overflow (WRITE)

Location: http_parse_request at minihttp.c:201

CWE: CWE-122

In MR diff: Yes

Auto-committed fix

--- a/minihttp.c

+++ b/minihttp.c

- memcpy(req->body, body_start, content_length);

+ if (content_length > MAX_BODY_SIZE) return -1;

+ memcpy(req->body, body_start, content_length);

Demo Target

4 Real Vulnerabilities

Our demo uses minihttp — a minimal HTTP/1.1 parser with 4 intentional security bugs. Delta CRS finds and patches them automatically.

CWE-122 CRITICAL

Header Buffer Overflow

http_parse_headers()

Unbounded memcpy into 256-byte buffer from attacker-controlled header value

CWE-190 CRITICAL

Integer Overflow

http_parse_headers()

atoi() on Content-Length wraps negative, causing undersized malloc then heap overflow

CWE-122 HIGH

Chunked Overflow

http_decode_chunked()

Per-chunk size check but no cumulative length check — total exceeds output buffer

CWE-193 MEDIUM

URL Decode Off-by-One

http_url_decode()

Null terminator written one byte past destination buffer end

Security fuzzing on every merge request.

Built with Claude on the GitLab Duo Agent Platform.
Adapted from DARPA AIxCC 1st place winner.

12 Python modules
3,300+ lines of code
15 GitLab Duo tools
6 Claude AI tasks