Documentation Index
Fetch the complete documentation index at: https://mintlify.com/angr/angr/llms.txt
Use this file to discover all available pages before exploring further.
angr’s decompiler transforms binary code into human-readable pseudocode through a sophisticated multi-stage pipeline that includes CFG recovery, variable analysis, type inference, and code structuring.
Overview
The Decompiler analysis converts functions from machine code to high-level pseudocode by:
- Converting to AIL (angr Intermediate Language)
- Recovering variables and types
- Performing optimizations
- Structuring control flow
- Generating C-like code
Basic Usage
import angr
# Load the binary
p = angr.Project('/bin/ls', load_options={'auto_load_libs': False})
# Generate CFG first (or use an existing one)
cfg = p.analyses.CFGFast()
# Get a function to decompile
func = cfg.kb.functions['main']
# Decompile it
dec = p.analyses.Decompiler(func, cfg=cfg.model)
# Get the pseudocode
if dec.codegen is not None:
print(dec.codegen.text)
Output Example
void main(int argc, char **argv, char **envp) {
int v0;
char *v1;
if (argc > 1) {
v1 = argv[1];
v0 = strlen(v1);
printf("Length: %d\n", v0);
}
return;
}
Decompilation Pipeline
CFG Recovery
The decompiler requires a normalized CFG. If not provided, one is generated:dec = p.analyses.Decompiler(func) # Auto-generates CFG
Clinic: AIL Conversion
Binary code is lifted to AIL (angr Intermediate Language), a higher-level IR:
- Converts VEX IR to AIL
- Simplifies expressions
- Removes redundant operations
Variable Recovery
Identifies stack variables, registers, and global variables:# Access recovered variables
for var in dec.clinic.variable_kb.variables[func.addr]:
print(f"Variable: {var}")
Optimization Passes
Multiple optimization stages:
- Constant propagation
- Dead code elimination
- Expression simplification
- Call expression folding
Type Inference
Infers variable types through constraint solving:
- Analyzes operations on variables
- Propagates type constraints
- Resolves to concrete types
Structuring
Recovers high-level control structures:
- Identifies loops (while, do-while, for)
- Recovers if-then-else
- Handles switch statements
- Removes goto statements where possible
Code Generation
Generates C-like pseudocode:print(dec.codegen.text) # C pseudocode
Configuration Options
Basic Options
dec = p.analyses.Decompiler(
func,
# Provide existing CFG
cfg=cfg.model,
# Optimization preset: 'default', 'basic', 'minimal'
preset='default',
# Generate code (set False to only analyze)
generate_code=True,
# Flavor: 'pseudocode' or 'source'
flavor='pseudocode',
)
Advanced Options
from angr.analyses.decompiler import DecompilationOption
dec = p.analyses.Decompiler(
func,
# Custom optimization passes
optimization_passes=my_passes,
# Variable knowledge base (for reuse)
variable_kb=kb,
# Control structuring algorithm
options=[
(DecompilationOption.STRUCTURER, 'phoenix'),
],
# Use/update cache
use_cache=True,
update_cache=True,
)
Decompilation Presets
Default Preset: Best balance of speed and qualitydec = p.analyses.Decompiler(func, preset='default')
Includes:
- All standard optimizations
- Type inference
- Advanced structuring
Basic Preset: Faster but less optimizeddec = p.analyses.Decompiler(func, preset='basic')
Includes:
- Essential optimizations only
- Basic type inference
- Simple structuring
Minimal Preset: Fastest, minimal optimizationdec = p.analyses.Decompiler(func, preset='minimal')
Includes:
- Constant folding only
- No type inference
- Minimal structuring
Accessing Decompilation Results
Generated Code
# Get the pseudocode text
if dec.codegen is not None:
pseudocode = dec.codegen.text
print(pseudocode)
# Get the structured representation
cfunc = dec.codegen.cfunc
print(cfunc) # CFunction object
# Map positions to statements
for pos_map in dec.codegen.pos_to_node.items():
print(pos_map)
AIL Graph
# Access the optimized AIL graph
import networkx as nx
for block in dec.clinic.graph.nodes():
print(f"Block at {block.addr:#x}:")
for stmt in block.statements:
print(f" {stmt}")
# Visualize
nx.write_gexf(dec.clinic.graph, "ail_graph.gexf")
# Access variable manager
var_manager = dec.clinic.variable_kb.variables[func.addr]
# Iterate over variables
for var in var_manager.variables:
print(f"Variable: {var}")
print(f" Type: {var.type if hasattr(var, 'type') else 'unknown'}")
print(f" Size: {var.size}")
# Stack variables
from angr.sim_variable import SimStackVariable
if isinstance(var, SimStackVariable):
print(f" Stack offset: {var.offset}")
Recovered Structures
# Access the structured sequence
if dec.seq_node is not None:
from angr.analyses.decompiler.structuring import SequenceNode
def print_structure(node, indent=0):
prefix = " " * indent
if isinstance(node, SequenceNode):
print(f"{prefix}Sequence:")
for n in node.nodes:
print_structure(n, indent + 1)
else:
print(f"{prefix}{type(node).__name__}")
print_structure(dec.seq_node)
Custom Optimization Passes
Create custom optimizations:
from angr.analyses.decompiler.optimization_passes import OptimizationPass
class MyOptimization(OptimizationPass):
ARCHES = ["X86", "AMD64"]
PLATFORMS = ["linux", "windows"]
def __init__(self, func, **kwargs):
super().__init__(func, **kwargs)
def _check(self):
# Return True if optimization applies
return True
def _analyze(self, cache=None):
# Perform the optimization
# Modify self.out_graph
pass
# Use custom passes
my_passes = [
MyOptimization,
# ... other passes
]
dec = p.analyses.Decompiler(func, optimization_passes=my_passes)
Error Handling
try:
dec = p.analyses.Decompiler(func)
if dec.codegen is None:
print("Decompilation failed")
if dec.errors:
for error in dec.errors:
print(f"Error: {error}")
except Exception as e:
print(f"Decompilation error: {e}")
Example: Batch Decompilation
import angr
def decompile_all_functions(binary_path, output_dir):
"""Decompile all functions in a binary."""
import os
# Load and analyze
p = angr.Project(binary_path, load_options={'auto_load_libs': False})
cfg = p.analyses.CFGFast(normalize=True)
os.makedirs(output_dir, exist_ok=True)
# Decompile each function
for func_addr, func in cfg.kb.functions.items():
if func.is_simprocedure or func.is_plt:
continue
print(f"Decompiling {func.name} at {func_addr:#x}")
try:
dec = p.analyses.Decompiler(func, cfg=cfg.model)
if dec.codegen is not None:
# Save to file
filename = f"{func.name}_{func_addr:x}.c"
filepath = os.path.join(output_dir, filename)
with open(filepath, 'w') as f:
f.write(f"// Function: {func.name}\n")
f.write(f"// Address: {func_addr:#x}\n\n")
f.write(dec.codegen.text)
print(f" -> Saved to {filename}")
else:
print(f" -> Failed to decompile")
except Exception as e:
print(f" -> Error: {e}")
# Usage
decompile_all_functions('/bin/ls', './decompiled')
Example: Comparing Different Presets
import angr
import time
p = angr.Project('/bin/ls', load_options={'auto_load_libs': False})
cfg = p.analyses.CFGFast()
func = cfg.kb.functions['main']
presets = ['minimal', 'basic', 'default']
for preset in presets:
print(f"\n=== Preset: {preset} ===")
start = time.time()
dec = p.analyses.Decompiler(func, preset=preset)
elapsed = time.time() - start
print(f"Time: {elapsed:.2f}s")
if dec.codegen:
lines = dec.codegen.text.split('\n')
print(f"Lines of code: {len(lines)}")
print(f"Preview:\n{lines[0]}\n{lines[1] if len(lines) > 1 else ''}")
Advanced Features
Inline Functions
# Inline specific functions during decompilation
dec = p.analyses.Decompiler(
func,
inline_functions=[helper_func_addr]
)
Type Hints
from angr.sim_type import SimTypeInt, SimTypePointer
# Set function prototype
func.prototype = SimTypeFunction(
[SimTypePointer(SimTypeChar()), SimTypeInt()],
SimTypeInt()
)
# Decompile with type information
dec = p.analyses.Decompiler(func)
Decompilation Cache
# First decompilation
dec1 = p.analyses.Decompiler(func, use_cache=True, update_cache=True)
# Second decompilation uses cache
dec2 = p.analyses.Decompiler(func, use_cache=True)
print("Used cache:", dec2.cache is not None)
# Access cached results
if (func.addr, 'pseudocode') in p.kb.decompilations:
cached = p.kb.decompilations[(func.addr, 'pseudocode')]
print(cached.codegen.text)
Common Issues
Performance: Decompilation can be slow for large functions. Consider:
- Using
preset='basic' for faster results
- Decompiling specific functions instead of whole binaries
- Enabling cache for repeated decompilations
Quality: Decompilation quality depends on:
- Binary complexity (obfuscation, optimizations)
- Architecture support
- Available type information
- CFG accuracy
Next Steps
Variable Recovery
Deep dive into variable analysis
CFG Analysis
Understanding the CFG input