AD3021: NoUnicodeSymbols¶
Summary¶
| Property | Value |
|---|---|
| ID | AD3021 |
| Name | NoUnicodeSymbols |
| Category | Correctness |
| Severity | Warning |
| Applies to | ELF (Linux/Unix) |
Description¶
This rule checks that ELF binaries do not contain symbols with Unicode characters that could be used for "Trojan Source" attacks or other obfuscation techniques. Malicious Unicode characters can make code appear different from what it actually does.
Why This Matters¶
Unicode-based attacks like "Trojan Source" can make malicious code invisible to human reviewers while remaining functional to compilers. These attacks target the supply chain and code review process, making them particularly dangerous.
The Trojan Source Attack (CVE-2021-42574)¶
Bidirectional Unicode characters can reorder how code displays:
# This appears safe in some editors:
if access_granted:
}return evil_action(); # RLO reverses display
safe_action()
# But actually executes:
if access_granted:
safe_action() # This is actually commented out!
return evil_action() # This actually runs!
Attack Categories¶
| Attack Type | Technique | Risk |
|---|---|---|
| Trojan Source | Bidirectional text | Hidden malicious code |
| Homoglyph | Look-alike chars | Typosquatting, confusion |
| Zero-width | Invisible chars | Hide code differences |
Bidirectional Control Characters¶
| Character | Code Point | Effect |
|---|---|---|
| RLO | U+202E | Right-to-Left Override |
| LRO | U+202D | Left-to-Right Override |
| RLE | U+202B | Right-to-Left Embedding |
| LRE | U+202A | Left-to-Right Embedding |
| U+202C | Pop Directional Formatting | |
| RLI/LRI | U+2067/2066 | Isolate variants |
Homoglyph Attack Example¶
// Legitimate function
void authenticate(char* password) { ... }
// Malicious function with Cyrillic 'а' (U+0430)
void аuthenticate(char* password) {
log_password_to_attacker(password);
real_authenticate(password);
}
// Code calling 'authenticate' might call either!
Supply Chain Implications¶
1. Attacker submits "innocent" pull request
2. Code review: Looks harmless to humans
3. Merge: Malicious code enters codebase
4. Build: Compiler sees real (malicious) code
5. Distribution: Malware in legitimate package
Detection Challenges¶
| Environment | Visibility |
|---|---|
| Most text editors | Characters invisible or confusing |
| GitHub (updated) | Now warns on Bidi |
| Diff tools | Often don't show |
| Command line | May or may not show |
Defense Layers¶
| Layer | Protection |
|---|---|
| Compiler warnings | -Wbidi-chars |
| Binary analysis | This rule |
| Code review tools | Unicode highlighting |
| CI/CD gates | Reject suspicious commits |
- Trojan Source attacks: Bidirectional Unicode characters can reorder code visually
- Homoglyph attacks: Look-alike characters can disguise malicious functions
- Code review bypass: Unicode tricks can hide malicious code from reviewers
- Supply chain security: Detects potentially compromised dependencies
Dangerous Unicode Categories¶
- Bidirectional control characters: RLO, LRO, RLE, LRE, PDF, etc.
- Homoglyphs: Characters that look like ASCII but aren't (е vs e, а vs a)
- Zero-width characters: ZWSP, ZWNJ, ZWJ
- Confusable characters: Mathematical symbols that resemble letters
How to Fix¶
Use ASCII-only identifiers¶
// Bad: Contains Cyrillic 'а' (U+0430) instead of ASCII 'a'
void dаnger() { } // Looks like "danger" but isn't
// Good: ASCII only
void danger() { }
Enable compiler warnings¶
Scan for suspicious characters¶
Detection Method¶
aldur scans all symbol names in:
- .symtab (symbol table)
- .dynsym (dynamic symbols)
- .strtab and .dynstr (string tables)
Example¶
Warning: Binary contains suspicious Unicode symbols
Pass: All symbols use ASCII characters only
References¶
- Trojan Source: Invisible Vulnerabilities
- CVE-2021-42574 - Bidirectional text vulnerability
- Unicode Security Considerations