Skip to content

AD2042: NoUnicodeSymbolsPE

Summary

Property Value
ID AD2042
Name NoUnicodeSymbolsPE
Category Correctness
Severity Warning
Applies to PE (Windows)

Description

PE binaries should not contain symbols with suspicious Unicode characters that could be used for Trojan Source attacks or visual obfuscation.

How It Works

The rule scans exported symbols and debug information for:

  1. Bidirectional control characters (RLO, LRO, etc.)
  2. Homoglyph characters that resemble ASCII
  3. Zero-width characters
  4. Other potentially deceptive Unicode

Why This Matters

Unicode-based attacks can make malicious code appear legitimate to human reviewers while remaining functional.

Trojan Source Attack

// Appears as:
if (access_granted) {
    safe_action();
}

// Actually executes:
if (access_granted) {
    malicious_action();  // Hidden by Unicode
}

Dangerous Characters

Character Code Point Risk
RLO U+202E Reverses text display
LRO U+202D Overrides direction
ZWNJ U+200C Invisible separator
Cyrillic 'а' U+0430 Looks like ASCII 'a'

Supply Chain Impact

Stage Risk
Code review Malicious code invisible
Compilation Compiler sees real code
Binary Contains misleading symbols
Debugging Confusing symbol names

Resolution

  1. Audit source code for suspicious Unicode
  2. Configure editors to reveal hidden characters
  3. Use compiler warnings for Unicode issues
  4. Rebuild with clean source files

Compiler Flags

# GCC
gcc -Wbidi-chars=any program.c

# Clang
clang -Wunicode program.c