disassembler linear sweep Shingled recusrive traversal anti-disassembly

relative disassembler performance

capstone - disassembler. converse of key zydis xed distrom iced bddisasm yaxpeax

McSema - older trail of bits lifter. Uses llvm as IR remill lift to llvm bitcode anvill processing remill rellic makes C like code


gtirb ddisasm grammatech

Speculative disssembly decode every offset. Refine blocks. Spedi, open source spcualtive disassembler Nucleus paper Compiler-Agnostic Function Detection in Binaries

superset disassembler kenneth civuentes thesis

probablistic disassembly using proabablistic datalog? bap mc + datalog?

Formally Verified Lifting of C-Compiled x86-64 Binaries Scalable validation of binary lifters

  • Linear
  • Recursive


Delay slots are an annoyance. Some architectures allow instructions to exist in the shadow of jump instructions that logically execute beofre the jump instruction. This makes sense from a micro architectural perspective, but it is bizarre to disassemble paper osr - offset shift range. I;ve seen this called value set analysis? thesis


Analyzing Memory Accesses in x86 Executables Reps and Balakrishnnan

byteweight - bap neural network thing identifies function starts

Ghidra repackaging: lifting bits sleigh pypcode

An Empirical Study on ARM Disassembly Tools


relic - no more gotos

a comb for decompiled C

Loop recovery

Getting structured control flow from CFG Stackify

havlak and tarjan testing redicibility with union find

A New Algorithm for Identifying Loops in Decompilation

dfs. label nodes. Intervals of labels are subsets of nodes. timestamp of first visit timestamp of last ivist. backedge forward edge, cross edge

Also… egraphs

webassembly is at the core of both of these Allllll things come around I wonder if webassembly would be a good universal disassembler ir someone should do that


  • IDA
  • Ghidra
  • Binary Ninja
  • Cutter

decompiler explorer Hmm. too bad it’s not a web service


Binary reversign hilbert curves for binary vsiualization benford’s law binwalk entropy visualization


Mayhem Fuzzy-sat running smt queries through a fuzzer Angora SLF eclipser

fuzzing challeneges and reflection

fuzzing 22

google fuzzbench


rode0day rolling fuzzing competition


  • AFL. AFL++ fork of afl tutorials. compile using afl-clang-fast++ or use qemu mode.
  • libfuzzer clagg++ -fsantizer=address,fuzzer tutorial
  • honggfuzz


  • klee
  • sage

Qsym hybrid fuzzing. concolic execution.

syzkaller kernel fuzzer go-fuzz fuzzili winafl

Fuzzers compile in extra information to give coverage guidance

Fuzzers use a corpus of input

Using fuzzer to solve csp. Write checker. Fuzz it. It’s randomized search

Fuzzgym makes a lot of sense to put neural heuristics in there

Symbolic Execution symqemu

unicorn - ripped out the heart of qemu and made it programmatically accessible. Based on an old version of qemu though


primus - bap’s emulator framework

panda - built on qemu. record and replay executions


CWE - common weakenss enumeration

integer overflow

null pointer dereference

sophos - comprehensive exploit prevention 2018

Current State of Exploit Dev 2020


microsoft exploits and exploit kits


Control Flow integrity is a broad term for many of these CONFIRM: Evaluating Compatibility and Relevance of Control-flow Integrity Protections for Modern Software

DEP - data execution prevention executable space protection This says DEP is Windows terminology? NX bit

shadow stack

control flow guard - windows reverse flow guard extreme flow guard kernel data protection

stack canary -fstack-protector. Guard variable put on stack SSP stack smashing protection. Stackguard, Propolice. Buffer overflow protection

ASLR ASLP A Address Space Layout Randomization. Libraries are linked in at a different location. This make code reuse in an exploit more difficult.

Fat pointers

endbr intel control flow enforcement technology (CET). Valid locations for indirect jumps.

ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?

checksec tells you about which things are enabled. which also has a rundown of the different things and how you could check them manually. Can output into xml, json, csv

gcc options -no-pie -pie -fpie -no-stack-protection -fstack-protector-all -z execstack makes stack executable

RELRO - relocation read only. GOT table becomes read only. Prevents relocation attacks

binary diversification - compiler differently every time. code reuse becomes way harder diversification make many versions of binary to make code reuse attacks harder. disunison


Buffer Overflows

buffer overflow When a buffer overflow occurs you are writing to memory that possibly had a different purpose. Maybe other stack variables, maybe return address pointers, maybe over heap metadata.

Sanitization of user input Off by one errors String termination


AWP Arbitrary write primitive CWE-123: Write-what-where Condition ARP arbitrary read primitive

Return Oriented Programming (ROP)

rp ropper [pwntool]

ropgadget ropium supports semantics queries. hmm in cpp. V impressive.

return to libc libc is very common and you can weave together libc calls. “Solar Designer” solar designer 1997

ropc-llvm ropc

smashing the stack for fun and profit - stacks are no longer executable geometry of innocent flesh on the bone. ROP

rop emporium

rop ftw

pop_gadget ; value ; nextgadget loads from stack into register

pure buffer overflow from command line:

#include <stdio.h>
int main(int argc, char *argv[])
    char buf[256];
    memcpy(buf, argv[1],strlen(argv[1]));



rop prevention by binary rewriting ropguard

stack pivoting moving over to a different stack.

ret2libc ret2dlresolve ret2csu ret2plt


jop rocket. blackhat talk

Data Oriented Programming (DOP)

Block Oriented Programming


If you can overwrite a struct that contains a pointer, you can use this to obtain reads or writes when that pointer is read or written. If the struct contains a code pointer, you get control flow execution.

Heap layout problem Heap layout manipulation


Double free Use after free advanced doug lea malloc hacking valgrind massif perf-mem, valgrind massif, and heaptrack

advanced doug lea malloc - phreak post

glibc. ldd /bib/ls - symbolic link probably glibc 2.27 actually pie You can run it? /lib/x86_64-linux-gnu/

malloc chunks of memory new/delete make_unique

heap history viewer

pwndbg vis let’s you see allocated chunks. How do it do it? vmmap also shows memory regions classified top chunk. I think the top chunk is resized to hand out new memory. Metadata - size field. prev_in_use flag. Allocator hands out in discrete sizes, not arbitrarily flexible

Playing for K(H)eaps: Understanding and Improving Linux Kernel Exploit Reliability defragmentation


libheap examine glibc heap in gdb. seems like there is a python model of heap in here.


chunks - has prev size, size, info bits. Free chunks have pointers in content top chunk - large piece of memory new chunks are carved out of last remainder chunk tcache fastbin - last in first out. a stacklike structure

large bins - varying size. chunks put in order

printf(“%p”) is your friend

free lists when we call free memory fastbin. hold free checks of a specific size set context-sections code b main vis to visualize heap fastbins command -x20 - 0x80 arenas main arena - glibc data section pwndbg arena

fd forward pointer

House of Force

fastbin dup

arbitrary write double free fastbin will return twice. frame command to context code

immediate double free is prtoected


find_fake_fast one_gadget - constraints


n vis

unsorted bin doubly linked circular free unsorted chunks have forward and backward pointers. new free added to head. malloced from tail

adjeacent chunks get considlated prev in use flags unlinking. chunks being removed from doubly linked list

2000 solar designer voodoo malloc tricks

Type Confusion


double free ps4

Browser Attacking JavaScript Engines: A case study of JavaScriptCore and CVE-2016-4622

Automated Exploit Generation (AEG)

sean heelan thesis

usenix security heaphopper angr symbolic analysis for heap exploits? archeap maze toward heap feng shui backward search from heaphopper teerex discover of memory corrupton vulen [symcc]


Elf stuff

See linker elfmaster elf reverse


commando vm

yolo space hacker. steam game of ctf stuff exploit_me very vulnerable example applications pwntools



What is Binary Analysis

Dynamic - It feels like you’re running the binary in some sense. Maybe on an emulated environment Static - It feels like you’re not running the binary

Fuzzing is definitely dynamic. Dataflow analysis on a CFG is static There are greys areas. Symbolic execution starts to feel like a grey area. I would consider it to be largely dynamic, but you are executing in a rather odd way.

Trying to understand a binary Why?

  • Finding vulnerabilities for defense or offense
    • buffer overflows
    • double frees
    • use after frees
    • memory leaks - just bad performance
    • info leaks - bad security
  • Verification - Did your compiler produce a thing that does what your code said?
  • Reversing/Cracking closed source software.
  • Patching and Code injection. Finding Bugs for use in speed runs. Game Genie.
  • Auditing
  • Aids for manual RE work. RE is useful because things may be undocumented intentionally or otherwise. You want to reuse a chip, or turn on secret functionality, or reverse a protocol.
  • Discovery of patent violation or GPL violations
  • Comparing programs. Discovering what has been patched.

I don’t want my information stolen or held ransom. I don’t want people peeping in on my conversations. I don’t want my computer wrecked. These are all malicious actors We also don’t want our planes and rockets crashing. This does not require maliciousness on anyone’s part persay.

  • Symbol recovery
  • Disassembly
  • CFG recovery
  • Taint tracking
  • symbolic execution A list of tools CSE597: Binary-level program analysis Spring 19 Gang Tan

Program Analysis

What’s the difference? Binaries are less structured than what you’ll typically find talked about in program analysis literature.

Binaries are tough because we have tossed away or the coupling has become very loose between high level intent and constructs and what is there.

How are binaries made

C preprocessor -> Maintains file number information isn’t that interesting

C compiler -> assembly. You can ask for this assembly with -S. You can also Or more cut up C -> IR IR -> MIR (what does gcc do? RTL right? ) MIR -> Asm

Misc kernel exploitation brwoser exploitation blog posts

  • Arbitrary code guard
  • Code interity guard
  • hypervisor protected code integrity (HVCI) “the acg of kernel mode”
  • vistualization based security VBS. credential guard
  • local privilege escalation (LPE)


  • elfmaster ryan oneill


  • pwndbg
  • heap commands. For exminging heap structur

  • gef can track malloc and free. That makes sense goodman on binary rewriting binrec - lift program merge lifted bytecode into debloated egalito BOLT lifting bits/grr

cfi directives - call frame information automatic bug insertion using joern phaser slither

Hiding instructions in instructions

Thomas stars

SGX enclaves

obfuscation snapchat ollvm vmprotect opaque preciates - one branch always taken

chris domas tom 7

firmadyne emulating and analyzing firmware


burp suite idor - autorize


shellcode encoding and decoding - sometimes you need to avoid things like \0 termination. Shellcode generators. What do they do? shellcode database

google dorking Like using google with special commands? Why “dork”? shodan


-A -T4. OS detection nmap nse - nmap scriping engine. There is a folder of scripts

p0f - passive sniffing. fingerprinting

malware reversing class live overflow youtube exploit education rop emporium linux exploitation course yara - patterns to recognize malware. Byte level patterns? Sigma snort

SIEM IDS - intrusin detection systems

shellcode encoder/decoder/generator synesthesia

FLIRT exploit examples

Gray Hat Hacking The Shellcoder’s handbook Attacking network Protocols Implementing Effective Code Review

Hacking: sergey weird machine paper

blackhat defcon bluehat ccc bsides ctf project zero kpaersky blog spectre/meltdown

return oriented programming sounds like my backwards pass. Huh.

Digital forensics

radare2, a binary analysis thingo. rax is useful for conversion of hex

binary ninja






Maybe we should get a docker of all sorts of tools. Kali Linux?

klee, afl, other fuzzers? valgrind



ROP - solve substitution cyphers john the ripper. Brute force password cracker


Best CTFs. I probably don’t want the most prestigious ones? They’ll be too high level? I want the simple stuff - check out the heap exploitation github thing


metasploit, pacu - aws, cobalt strike

and the pwn category of ctf

ROP JOP SROP BOP - block oriented

return 2 libc - a subset of rop?

ryan chapman syscall

privilege escalation - getuid effective id.. Inherit user and group from parent process. switching to user resets the setuid bit. sticky bits id command

shellcode - binary that launchs the shell system call execv(“/bin/sh”, NULL, NULL) - args and env params

intel vs at&t syntax Load up addresses constantrs in binary with .string gcc -static -nostdlib objcopy –section .text=outfile exiting cleanly is smart. Helps know what is screwing up ldd

Trying out shellcode mmap. mprotect? read() deref function pointer

gdb x for eXamine $rsi x/5i $rip gives assembly? x/gx break *0xx040404 n next s step ni si

strace is useful first debugging


system calls set rax to syscall number. call syscall instruction man yada strace

  • fork
  • execve
  • read
  • write
  • wait
  • brk - program brk. change size of data segment. sbrk by increments. sbrk(0) returns current address of break

stack. rbp, rsp. stack grows down decreasing. Rsp + 0x8 is on stack, rbp - 8 is on stack most systems are little endian calling conventions. rdi rsi rdx rcx r8 r9, return in rax rbx rbp r12 r13 r14 r15 are callee saved. guaranteed not smashed opcode listing - assembly repl

binary files

file - tells info about file
elf - interpreter, 
 - sections - text, plt/got resolve and siprach library calls, data preinitilize data, rodata, global read only,, bss for uniitialized data. sections are not required to run a binary
 - symbols - 
- segments - where to load

readelf, objdump, nm - reads symbols, patchelf, objcopy, strip, kaitai struct

process loading what to load. look for #! or elf magic. /proc/sys/fs/binsmt_misc can match a string there. hand off to elf defined interpeter is dynamically linked.

Then it’s onto ld probably. LD_PRELOAD,, LD_LIBRARY_PATH,, DT_RUNTIME in binary file,, system wide /etc/, /lib and /usr/lib relocations updated /proc/self/maps libc is almost always linked. printf, scanf, socket, atoi, amlloc, free



ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?

checksec tells you about which things are enabled.

gcc options -no-pie -no-stack-protection


attaching to gdb and/or a process is really useful. cyclic bytes can let you localize what ends up where in a buffer overflow for example cyclic_find

Examples from