disassembler linear sweep Shingled recusrive traversal anti-disassembly

relative disassembler performance

capstone - disassembler. converse of key zydis xed distrom iced bddisasm yaxpeax

McSema - older trail of bits lifter. Uses llvm as IR remill lift to llvm bitcode anvill processing remill rellic makes C like code

BAP ANGR - VEX which is valgrind’s ir

gtirb ddisasm grammatech “GTIRB explicitly does NOT represent instructions or instruction semantics but does provide symbolic operand information and access to the bytes. “

Speculative disssembly decode every offset. Refine blocks. Spedi, open source spcualtive disassembler Nucleus paper Compiler-Agnostic Function Detection in Binaries

superset disassembler kenneth civuentes thesis

probablistic disassembly using proabablistic datalog? bap mc + datalog?

Formally Verified Lifting of C-Compiled x86-64 Binaries Scalable validation of binary lifters

  • Linear
  • Recursive



Delay slots are an annoyance. Some architectures allow instructions to exist in the shadow of jump instructions that logically execute beofre the jump instruction. This makes sense from a micro architectural perspective, but it is bizarre to disassemble paper osr - offset shift range. I;ve seen this called value set analysis? thesis


Analyzing Memory Accesses in x86 Executables Reps and Balakrishnnan

byteweight - bap neural network thing identifies function starts

Ghidra repackaging: lifting bits sleigh pypcode

An Empirical Study on ARM Disassembly Tools

ben’s ll2l

Decompiler Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation Graph schema matching? Smart methodology. Take codebase, decompile, compare number of gotos in original vs decompiled functions. Find hotspots. Binary search for passes responsible

DREAM Pheonix

FoxDec - Formally verified x86-64 decompilation

relic - no more gotos

a comb for decompiled C

Cifuentes thesis kind of a ludicrous amount of info “Decompiler compiler” references BB91 BB92 bbl91 bbl93 bow91 bow93 A Compendium of Formal Techniques for Software Maintenance the redo project final report

Decompilation: The Enumeration of Types and Grammars From programs to object code and back again using logic programming: Compilation and decompilation The art of computer un-programming: Reverse engineering in Prolog Generating Decompilers

Polymorphic Type Inference for Machine Code

Static Single Assignment for Decompilation - Emmerik

Decompilers and beyond Ilfak Guilfanov, Hex-Rays SA, 2008

Type-Based Decompilation? (or Program Reconstruction via Type Reconstruction)- Alan Mycroft

Loop recovery

Getting structured control flow from CFG Stackify

havlak and tarjan testing redicibility with union find

A New Algorithm for Identifying Loops in Decompilation

dfs. label nodes. Intervals of labels are subsets of nodes. timestamp of first visit timestamp of last ivist. backedge forward edge, cross edge

Also… egraphs

webassembly is at the core of both of these Allllll things come around I wonder if webassembly would be a good universal disassembler ir someone should do that


  • IDA
  • Ghidra
  • Binary Ninja
  • Cutter
  • angr management

decompiler explorer Hmm. too bad it’s not a web service

Binary Ninja

SCC- shellcde compiler. Why is this a top level thing?

Binary View Making a plugin - has remote denug to vs code

current_function bv.functions

f.callers f.callees

Types ` bv.parse_type_string(“uint64_t”) ` slow but convenient

Type.bool Type.char

#!/usr/bin/env python3
import binaryninja
with binaryninja.open_view("/bin/ls") as bv:
    print(f"Opening {bv.file.filename} which has {len(list(bv.functions))} functions")

IR Tower

Lifted IL




See ghidra notes


import angr #, monkeyhex
proj = angr.Project('/bin/true')
state = proj.factory.entry_state()

code = '''
int fact(int x){
  int acc = 1;
  while(x > 0){
    acc *= x;
  return acc;
import tempfile
import subprocess
import angr #, monkeyhex
import os
with tempfile.NamedTemporaryFile(suffix=".c") as fp:
  with tempfile.TemporaryDirectory() as mydir:
    outfile = mydir + "/fact"
    print(["gcc",  "-g",  "-c","-O1", "-o",  outfile,], check=True))
    print(["objdump", "-d", outfile], check=True))

    proj = angr.Project(outfile)
    block = proj.factory.block(proj.entry)
    print(block.instructions) # numebr of instryuctions
    state = proj.factory.entry_state()
    #state = proj.factory.entry_state()

echo "
int fact(int x){
   int acc = 1;
   for (int i = 1; i < x; i++){
      acc *= i;
   return acc;
" > /tmp/fact.c
gcc /tmp/fact.c -c -o /tmp/fact 
echo "
int max(int x){
  return x > 0 ? x : -x;
" > /tmp/max.c
gcc /tmp/max.c -c -o /tmp/max 
import angr
p = angr.Project('/tmp/fact')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
blocks = list(entry_func.blocks)
irsb = blocks[0].vex
import angr
p = angr.Project('/tmp/max')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
blocks = list(entry_func.blocks)
irsb = blocks[0].vex
from pyvex.stmt import *
from pyvex.expr import *

jmp_table = ["""

#for addr in addrs:
#  jmp_table.append(f"case 0x{addr:x}: goto {label(addr)};")
# assert(false); // unexpected indirect jump
def proc_binop(expr):
    # TODO: perhaps I need casts here to signed unsigned?
    if expr.op == "Iop_Sub64":
      return f"({proc_expr(expr.args[0])} - {proc_expr(expr.args[1])})"
    elif expr.op == "Iop_Sub32":
      return f"({proc_expr(expr.args[0])} - {proc_expr(expr.args[1])})"
    elif expr.op == "Iop_Add64":
      return f"({proc_expr(expr.args[0])} + {proc_expr(expr.args[1])})"
    elif expr.op == "Iop_Add32":
      return f"({proc_expr(expr.args[0])} + {proc_expr(expr.args[1])})"
    elif expr.op == "Iop_Shr64":
      return f"({proc_expr(expr.args[0])} >> {proc_expr(expr.args[1])})"
    elif expr.op == "Iop_And64":
      return f"({proc_expr(expr.args[0])} & {proc_expr(expr.args[1])})"
    elif expr.op == "Iop_Xor64":
      return f"({proc_expr(expr.args[0])} & {proc_expr(expr.args[1])})"
    elif expr.op == "Iop_CmpLT32S":
      return f"({proc_expr(expr.args[0])} < {proc_expr(expr.args[1])})"
    elif expr.op == "Iop_Mul32":
      return f"({proc_expr(expr.args[0])} * {proc_expr(expr.args[1])})"
      assert False, f"Unimplemented binop {expr.op}"
def proc_unop(expr):
  if expr.op == "Iop_64to32":
    return f"((int32_t) {proc_expr(expr.args[0])})"
  elif expr.op == "Iop_32Uto64":
    return f"((int64_t) {proc_expr(expr.args[0])})"
  elif expr.op == "Iop_1Uto64":
    return f"((int64_t) {proc_expr(expr.args[0])})"
  elif expr.op == "Iop_64to1":
    return f"((bool) {proc_expr(expr.args[0])})"
    assert False, f"Unimplemented unop {expr.op}"

def proc_expr(expr : IRExpr):
    if isinstance(expr, Binder):
        assert False
    elif isinstance(expr, VECRET):
        assert False
    elif isinstance(expr, GSPTR):
        assert False
    elif isinstance(expr, RdTmp):
        return str(expr)
    elif isinstance(expr, Get):
        assert expr.ty == "Ity_I64"
        return f"state->gr[{expr.offset}]"
    elif isinstance(expr, Qop):
        assert False
    elif isinstance(expr, Triop):
        assert False
    elif isinstance(expr, Binop):
        return proc_binop(expr)
    elif isinstance(expr, Unop):
        return proc_unop(expr)
    elif isinstance(expr, Load):
        #assert expr.ty == "Ity_I32"  # TODO. We need to deal with these types and endianess. Yuck
        return f"state->mem[{expr.addr}]"
    elif isinstance(expr, Const):
        return str(expr)
    elif isinstance(expr, ITE):
        return f"{proc_expr(expr.cond)} ? {proc_expr(expr.iftrue)}: {proc_expr(expr.iffalse)}"
    elif isinstance(expr, CCall):
        assert False
      assert False

def label(addr):
  return f"L_0x{addr:x}" 
def proc_type(typ):
  if ty == "Ity_I32":
    return "int32_t"
  elif ty == "ItyI64":
    return "int64_t"
    assert False

def proc_stmt(stmt : IRStmt):
    if isinstance(stmt, NoOp):
         return "// NoOp"
    elif isinstance(stmt, IMark):
        # assert PC = expected pc here?
        # f"assert(state->gr[{}] == stmt.addr);"
        #return f"break; case 0x{stmt.addr:x}: //{label(stmt.addr)}:" # also print original assembly in comment
        return f"//{label(stmt.addr)}:"
    elif isinstance(stmt, AbiHint):
        return f"return; // {stmt}" # TODO: add return parameter? Or state is good. Is abihint always at returns?
    elif isinstance(stmt, Put):

        return f"state->gr[{stmt.offset}] = {proc_expr(};"
        assert False
    elif isinstance(stmt, PutI):
        assert False
    elif isinstance(stmt, WrTmp):
        return f"t{stmt.tmp} = {proc_expr(};"
    elif isinstance(stmt, Store):
        return f"state->mem[{stmt.addr}] = {proc_expr(};"
    elif isinstance(stmt, CAS):
        assert False
    elif isinstance(stmt, LLSC):
        assert False
    elif isinstance(stmt, MBE):
        assert False
    elif isinstance(stmt, Dirty):
        assert False
    elif isinstance(stmt, Exit):
        assert stmt.jk == "Ijk_Boring"
        return f"if({proc_expr(stmt.guard)}){{ state->gr[{stmt.offsIP}] = 0x{stmt.dst.value:x}; break; }}"
        # TODO deal with other jumpkind
        # What represents an indirect jump? goto jump_table;
        # elif stmt.jk == "Ijk_Ret" f"return;"
        # elif stmt.jk == Ijk_Call   hmmm. f"{funname}()" maybe?
        # elif stmt.jk == "Ijk_Exit"
    elif isinstance(stmt, LoadG):
        assert False
      print("unrecognized IRStmt")
      assert False

output = []
tmps = set()
for block in blocks:
  output.append(f"case 0x{block.addr:x}:")
  for stmt in block.vex.statements:
    if isinstance(stmt,WrTmp):
  # output.append("goto jump_table;")
output.append("default: assert(0); // Unexpected PC value. Something has gone awry.")


header = """
#include <stdint.h>
#include <assert.h>
#include <stdbool.h>
#define PUT(reg) reg
#define PC 184

typedef struct state_t 
int64_t *mem;
int64_t *gr;
} state_t;

with open("/tmp/decomp.c", "w") as file:
    file.write(f"void {}_decomp(state_t *state){{\n") # use
    file.write("int " + ",".join([f"t{tmp}" for tmp in tmps]) + ";\n") # declare temps
    file.write(f"state->gr[PC] = 0x{entry_func.addr:x};\n") # initilizae PC to entry point
    file.write("while(1){\n") # interpreter loop invariant on PC? Could make gas?
    file.writelines([x + "\n" for x in output])
clang-format -i  --style=google /tmp/decomp.c
cat /tmp/decomp.c
gcc /tmp/decomp.c -O2 -c -o /tmp/decomp.o -Wall -Wextra -Wcast-align -Wcast-qual -Wmissing-declarations
esbmc /tmp/decomp.c --function max_decomp #--goto-functions-only

Control flow encoding. Do I do separate jump table, go to jump table every time? while(true){switch[pc]{ case: case: case: case default: } } This is fairly conservative. Big block encoding trusts fall through behavior. Byte address and then cast mem pointers.

import angr
p = angr.Project('/tmp/fact')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
blocks = list(entry_func.blocks)
irsb = blocks[0].vex
import pyvex

stmts = []
counter = 0
def fresh():
  global counter
  counter += 1
  return f"%v{counter}"

def print_expr(dst, expr):
  if isinstance(expr, pyvex.expr.Const):
    stmts.append( { "op": "const", "type": "int", dest: dst, "value": expr.con })
  elif isinstance(expr, pyvex.expr.Binop):
    args = [fresh() for _ in range(2)]
    for a, e in zip(args, expr.child_expressions):
    stmts.append({ "op": expr.op, "type": "int", "dest": dst, "args": args })
  elif isinstance(expr, pyvex.expr.RdTmp):
    #stmts.append({ "op": "id", "type": "int", "dest": dst, "args":  })
  elif isinstance(expr, pyvex.expr.Get):
    return f"%{expr.offset}"
    assert False

for stmt in irsb.statements:
    if isinstance(stmt, pyvex.IRStmt.Store):
      expr = print_expr(
      assert stmt.end == "Iend_LE"
      expr["op"] = "store"
      expr["dest"] = expr.addr
    elif isinstance(stmt, pyvex.IRStmt.Put):
    elif isinstance(stmt, pyvex.IRStmt.WrTmp):
    elif isinstance(stmt, pyvex.IRStmt.IMark):
      assert "unrecognized stmt" == None


See notes on patching

Binary reversing hilbert curves for binary vsiualization benford’s law binwalk entropy visualization Decomperson: How Humans Decompile and What We Can Learn From It


Debuggers windbg

Code Search Joern codeql



Icicle: A Re-designed Emulator for Grey-Box Firmware Fuzzing semantics powered by sleigh.


Qemu plugins Qemu has user land and system

echo '
#include <stdio.h>
int main(){
  printf("hello world\n");
  return 0;
' > /tmp/hello.c
gcc /tmp/hello.c -o /tmp/hello
qemu-x86_64  -strace -singlestep -d in_asm,cpu /tmp/hello 


  • strace
  • trace
  • plugin
  • d logs
  • singlestep (now one-insn-per-tb)
qemu-x86_64 --help
+ qemu-x86_64 --help
usage: qemu-x86_64 [options] program [arguments...]
Linux CPU emulator (compiled for x86_64 emulation)

Options and associated environment variables:

Argument             Env-variable      Description
-h                                     print this help
-g port              QEMU_GDB          wait gdb connection to 'port'
-L path              QEMU_LD_PREFIX    set the elf interpreter prefix to 'path'
-s size              QEMU_STACK_SIZE   set the stack size to 'size' bytes
-cpu model           QEMU_CPU          select CPU (-cpu help for list)
-E var=value         QEMU_SET_ENV      sets targets environment variable (see below)
-U var               QEMU_UNSET_ENV    unsets targets environment variable (see below)
-0 argv0             QEMU_ARGV0        forces target process argv[0] to be 'argv0'
-r uname             QEMU_UNAME        set qemu uname release string to 'uname'
-B address           QEMU_GUEST_BASE   set guest_base address to 'address'
-R size              QEMU_RESERVED_VA  reserve 'size' bytes for guest virtual address space
-d item[,...]        QEMU_LOG          enable logging of specified items (use '-d help' for a list of items)
-dfilter range[,...] QEMU_DFILTER      filter logging based on address range
-D logfile           QEMU_LOG_FILENAME write logs to 'logfile' (default stderr)
-p pagesize          QEMU_PAGESIZE     set the host page size to 'pagesize'
-singlestep          QEMU_SINGLESTEP   run in singlestep mode
-strace              QEMU_STRACE       log system calls
-seed                QEMU_RAND_SEED    Seed for pseudo-random number generator
-trace               QEMU_TRACE        [[enable=]<pattern>][,events=<file>][,file=<file>]
-plugin              QEMU_PLUGIN       [file=]<file>[,<argname>=<argvalue>]
-version             QEMU_VERSION      display version information and exit

QEMU_LD_PREFIX  = /etc/qemu-binfmt/x86_64
QEMU_STACK_SIZE = 8388608 byte

You can use -E and -U options or the QEMU_SET_ENV and
QEMU_UNSET_ENV environment variables to set and unset
environment variables for the target process.
It is possible to provide several variables by separating them
by commas in getsubopt(3) style. Additionally it is possible to
provide the -E and -U options multiple times.
The following lines are equivalent:
    -E var1=val2 -E var2=val2 -U LD_PRELOAD -U LD_DEBUG
    -E var1=val2,var2=val2 -U LD_PRELOAD,LD_DEBUG
Note that if you provide several changes to a single variable
the last change will stay in effect.

See <> for how to report bugs.
More information on the QEMU project at <>.
qemu-x86_64 -d help
Log items (comma separated):
out_asm         show generated host assembly code for each compiled TB
in_asm          show target assembly code for each compiled TB
op              show micro ops for each compiled TB
op_opt          show micro ops after optimization
op_ind          show micro ops before indirect lowering
int             show interrupts/exceptions in short format
exec            show trace before each executed TB (lots of logs)
cpu             show CPU registers before entering a TB (lots of logs)
fpu             include FPU registers in the 'cpu' logging
mmu             log MMU-related activities
pcall           x86 only: show protected mode far calls/returns/exceptions
cpu_reset       show CPU state before CPU resets
unimp           log unimplemented functionality
guest_errors    log when the guest OS does something invalid (eg accessing a
non-existent register)
page            dump pages at beginning of user mode emulation
nochain         do not chain compiled TBs so that "exec" and "cpu" show
complete traces
plugin          output from TCG plugins

strace          log every user-mode syscall, its input, and its result
trace:PATTERN   enable trace events

Use "-d trace:help" to get a list of trace events.


Mayhem Fuzzy-sat running smt queries through a fuzzer Angora SLF eclipser

fuzzing challeneges and reflection

fuzzing 22

google fuzzbench


rode0day rolling fuzzing competition



  • klee
  • sage

Qsym hybrid fuzzing. concolic execution.

syzkaller kernel fuzzer go-fuzz fuzzili winafl

Fuzzers compile in extra information to give coverage guidance

Fuzzers use a corpus of input

Using fuzzer to solve csp. Write checker. Fuzz it. It’s randomized search

Fuzzgym makes a lot of sense to put neural heuristics in there afl++ qemy libfuzzer vs afl vs honggfuzz corpus grammar based fuzzing, differential fuzzing ghidra for fuzzing


AFL. AFL++ fork of afl


compile using afl-clang-fast++ or use qemu mode. afl fuzzing training

echo "
int main(){
  if(x > 0){
  return 42;
}" > /tmp/bug.c
afl-gcc /tmp/bug.c -o /tmp/bug
afl-fuzz -i /tmp/corpus -o /tmp/out /tmp/bug

AFT qemu deferred forkserver

Symbolic Execution

MAAT Ttrail of bits using ghidra symqemu

unicorn - ripped out the heart of qemu and made it programmatically accessible. Based on an old version of qemu though


primus - bap’s emulator framework

panda - built on qemu. record and replay executions


CWE - common weakenss enumeration

integer overflow

null pointer dereference

sophos - comprehensive exploit prevention 2018

Current State of Exploit Dev 2020


microsoft exploits and exploit kits


Control Flow integrity is a broad term for many of these CONFIRM: Evaluating Compatibility and Relevance of Control-flow Integrity Protections for Modern Software

DEP - data execution prevention executable space protection This says DEP is Windows terminology? NX bit

shadow stack

control flow guard - windows reverse flow guard extreme flow guard kernel data protection

stack canary -fstack-protector. Guard variable put on stack SSP stack smashing protection. Stackguard, Propolice. Buffer overflow protection

ASLR ASLP A Address Space Layout Randomization. Libraries are linked in at a different location. This make code reuse in an exploit more difficult.

Fat pointers

endbr intel control flow enforcement technology (CET). Valid locations for indirect jumps.

ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?

checksec tells you about which things are enabled. which also has a rundown of the different things and how you could check them manually. Can output into xml, json, csv

gcc options -no-pie -pie -fpie -no-stack-protection -fstack-protector-all -z execstack makes stack executable

RELRO - relocation read only. GOT table becomes read only. Prevents relocation attacks

binary diversification - compiler differently every time. code reuse becomes way harder diversification make many versions of binary to make code reuse attacks harder. disunison


Buffer Overflows

buffer overflow When a buffer overflow occurs you are writing to memory that possibly had a different purpose. Maybe other stack variables, maybe return address pointers, maybe over heap metadata.

Sanitization of user input Off by one errors String termination


AWP Arbitrary write primitive CWE-123: Write-what-where Condition ARP arbitrary read primitive

Return Oriented Programming (ROP)

rp ropper [pwntool]

ropgadget ropium supports semantics queries. hmm in cpp. V impressive.

return to libc libc is very common and you can weave together libc calls. “Solar Designer” solar designer 1997

ropc-llvm ropc

smashing the stack for fun and profit - stacks are no longer executable geometry of innocent flesh on the bone. ROP

rop emporium

rop ftw

pop_gadget ; value ; nextgadget loads from stack into register

pure buffer overflow from command line:

#include <stdio.h>
int main(int argc, char *argv[])
    char buf[256];
    memcpy(buf, argv[1],strlen(argv[1]));



rop prevention by binary rewriting ropguard

stack pivoting moving over to a different stack.

ret2libc ret2dlresolve ret2csu ret2plt


jop rocket. blackhat talk

Data Oriented Programming (DOP)

Block Oriented Programming

Heap house of rust

If you can overwrite a struct that contains a pointer, you can use this to obtain reads or writes when that pointer is read or written. If the struct contains a code pointer, you get control flow execution.

Heap layout problem Heap layout manipulation


Double free Use after free advanced doug lea malloc hacking valgrind massif perf-mem, valgrind massif, and heaptrack

advanced doug lea malloc - phreak post

glibc. ldd /bib/ls - symbolic link probably glibc 2.27 actually pie You can run it? /lib/x86_64-linux-gnu/

malloc chunks of memory new/delete make_unique

heap history viewer

pwndbg vis let’s you see allocated chunks. How do it do it? vmmap also shows memory regions classified top chunk. I think the top chunk is resized to hand out new memory. Metadata - size field. prev_in_use flag. Allocator hands out in discrete sizes, not arbitrarily flexible

Playing for K(H)eaps: Understanding and Improving Linux Kernel Exploit Reliability defragmentation


libheap examine glibc heap in gdb. seems like there is a python model of heap in here.


chunks - has prev size, size, info bits. Free chunks have pointers in content top chunk - large piece of memory new chunks are carved out of last remainder chunk tcache fastbin - last in first out. a stacklike structure

large bins - varying size. chunks put in order

printf(“%p”) is your friend

free lists when we call free memory fastbin. hold free checks of a specific size set context-sections code b main vis to visualize heap fastbins command -x20 - 0x80 arenas main arena - glibc data section pwndbg arena

fd forward pointer

House of Force

fastbin dup

arbitrary write double free fastbin will return twice. frame command to context code

immediate double free is prtoected


find_fake_fast one_gadget - constraints


n vis

unsorted bin doubly linked circular free unsorted chunks have forward and backward pointers. new free added to head. malloced from tail

adjeacent chunks get considlated prev in use flags unlinking. chunks being removed from doubly linked list

2000 solar designer voodoo malloc tricks

Type Confusion


double free ps4 Joy of explooitation the kernel

Browser Attacking JavaScript Engines: A case study of JavaScriptCore and CVE-2016-4622

How I started chasing speculative type confusion bugs in the kernel and ended up with ‘real’ ones

Return to sender Detecting kernel exploits with eBPF

Automated Exploit Generation (AEG)

sean heelan thesis

usenix security heaphopper angr symbolic analysis for heap exploits? archeap maze toward heap feng shui backward search from heaphopper teerex discover of memory corrupton vulen [symcc]


Elf stuff

See linker elfmaster elf reverse

example interesting elf files overlaying headers. smallest that doesn’t voilate spec

binary golf workshop size coding dead bytes libgolf UNIX ELF Parasites and virus - Silvio Cesare Elf Binary Mangling Pt. 4: Limit Break


SUID GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems.


commando vm

yolo space hacker. steam game of ctf stuff exploit_me very vulnerable example applications pwntools



What is Binary Analysis

Dynamic - It feels like you’re running the binary in some sense. Maybe on an emulated environment Static - It feels like you’re not running the binary

Fuzzing is definitely dynamic. Dataflow analysis on a CFG is static There are greys areas. Symbolic execution starts to feel like a grey area. I would consider it to be largely dynamic, but you are executing in a rather odd way.

Trying to understand a binary Why?

  • Finding vulnerabilities for defense or offense
    • buffer overflows
    • double frees
    • use after frees
    • memory leaks - just bad performance
    • info leaks - bad security
  • Verification - Did your compiler produce a thing that does what your code said?
  • Reversing/Cracking closed source software.
  • Patching and Code injection. Finding Bugs for use in speed runs. Game Genie.
  • Auditing
  • Aids for manual RE work. RE is useful because things may be undocumented intentionally or otherwise. You want to reuse a chip, or turn on secret functionality, or reverse a protocol.
  • Discovery of patent violation or GPL violations
  • Comparing programs. Discovering what has been patched.

I don’t want my information stolen or held ransom. I don’t want people peeping in on my conversations. I don’t want my computer wrecked. These are all malicious actors We also don’t want our planes and rockets crashing. This does not require maliciousness on anyone’s part persay.

  • Symbol recovery
  • Disassembly
  • CFG recovery
  • Taint tracking
  • symbolic execution A list of tools CSE597: Binary-level program analysis Spring 19 Gang Tan

Program Analysis

What’s the difference? Binaries are less structured than what you’ll typically find talked about in program analysis literature.

Binaries are tough because we have tossed away or the coupling has become very loose between high level intent and constructs and what is there.

How are binaries made

C preprocessor -> Maintains file number information isn’t that interesting

C compiler -> assembly. You can ask for this assembly with -S. You can also Or more cut up C -> IR IR -> MIR (what does gcc do? RTL right? ) MIR -> Asm

Misc kernel exploitation brwoser exploitation blog posts

  • Arbitrary code guard
  • Code interity guard
  • hypervisor protected code integrity (HVCI) “the acg of kernel mode”
  • vistualization based security VBS. credential guard
  • local privilege escalation (LPE)


  • elfmaster ryan oneill


  • pwndbg
  • heap commands. For exminging heap structur

  • gef can track malloc and free. That makes sense

cfi directives - call frame information automatic bug insertion using joern phaser slither

Hiding instructions in instructions

Thomas stars

SGX enclaves

obfuscation snapchat ollvm vmprotect opaque preciates - one branch always taken

chris domas tom 7

firmadyne emulating and analyzing firmware


burp suite idor - autorize


shellcode encoding and decoding - sometimes you need to avoid things like \0 termination. Shellcode generators. What do they do? shellcode database

google dorking Like using google with special commands? Why “dork”? shodan


-A -T4. OS detection nmap nse - nmap scriping engine. There is a folder of scripts

p0f - passive sniffing. fingerprinting

malware reversing class live overflow youtube exploit education rop emporium linux exploitation course yara - patterns to recognize malware. Byte level patterns? Sigma snort

SIEM IDS - intrusin detection systems

shellcode encoder/decoder/generator synesthesia

FLIRT exploit examples

Gray Hat Hacking The Shellcoder’s handbook Attacking network Protocols Implementing Effective Code Review

Hacking: sergey weird machine paper

blackhat defcon bluehat ccc bsides ctf project zero kpaersky blog spectre/meltdown

return oriented programming sounds like my backwards pass. Huh.

Digital forensics

radare2, a binary analysis thingo. rax is useful for conversion of hex

binary ninja






Maybe we should get a docker of all sorts of tools. Kali Linux?

klee, afl, other fuzzers? valgrind



ROP - solve substitution cyphers john the ripper. Brute force password cracker


Best CTFs. I probably don’t want the most prestigious ones? They’ll be too high level? I want the simple stuff - check out the heap exploitation github thing


metasploit, pacu - aws, cobalt strike

and the pwn category of ctf

ROP JOP SROP BOP - block oriented

return 2 libc - a subset of rop?

ryan chapman syscall

privilege escalation - getuid effective id.. Inherit user and group from parent process. switching to user resets the setuid bit. sticky bits id command

shellcode - binary that launchs the shell system call execv(“/bin/sh”, NULL, NULL) - args and env params

intel vs at&t syntax Load up addresses constantrs in binary with .string gcc -static -nostdlib objcopy –section .text=outfile exiting cleanly is smart. Helps know what is screwing up ldd

Trying out shellcode mmap. mprotect? read() deref function pointer

gdb x for eXamine $rsi x/5i $rip gives assembly? x/gx break *0xx040404 n next s step ni si

strace is useful first debugging


system calls set rax to syscall number. call syscall instruction man yada strace

  • fork
  • execve
  • read
  • write
  • wait
  • brk - program brk. change size of data segment. sbrk by increments. sbrk(0) returns current address of break

stack. rbp, rsp. stack grows down decreasing. Rsp + 0x8 is on stack, rbp - 8 is on stack most systems are little endian calling conventions. rdi rsi rdx rcx r8 r9, return in rax rbx rbp r12 r13 r14 r15 are callee saved. guaranteed not smashed opcode listing - assembly repl

binary files

file - tells info about file
elf - interpreter, 
 - sections - text, plt/got resolve and siprach library calls, data preinitilize data, rodata, global read only,, bss for uniitialized data. sections are not required to run a binary
 - symbols - 
- segments - where to load

readelf, objdump, nm - reads symbols, patchelf, objcopy, strip, kaitai struct

process loading what to load. look for #! or elf magic. /proc/sys/fs/binsmt_misc can match a string there. hand off to elf defined interpeter is dynamically linked.

Then it’s onto ld probably. LD_PRELOAD,, LD_LIBRARY_PATH,, DT_RUNTIME in binary file,, system wide /etc/, /lib and /usr/lib relocations updated /proc/self/maps libc is almost always linked. printf, scanf, socket, atoi, amlloc, free



ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?

checksec tells you about which things are enabled.

gcc options -no-pie -no-stack-protection


attaching to gdb and/or a process is really useful. cyclic bytes can let you localize what ends up where in a buffer overflow for example cyclic_find

Examples from