Binary Analysis & CTF stuff
- Reversing
- Debuggers
- Code Search
- Emulation
- Exploits
- Elf stuff
- CTF
- What is Binary Analysis
- Misc
- nmap - Digital forensics - pwn.college
- shellcode
Reversing
Disassembly
disassembler linear sweep Shingled recusrive traversal anti-disassembly https://github.com/AppleReer/Anti-Disassembly-On-Arm64
relative disassembler performance
capstone http://www.capstone-engine.org/ - disassembler. converse of key zydis xed distrom iced bddisasm yaxpeax
McSema - older trail of bits lifter. Uses llvm as IR remill lift to llvm bitcode anvill processing remill rellic makes C like code
BAP ANGR - VEX which is valgrind’s ir
gtirb ddisasm grammatech https://grammatech.github.io/gtirb/ https://github.com/GrammaTech/gtirb “GTIRB explicitly does NOT represent instructions or instruction semantics but does provide symbolic operand information and access to the bytes. “
Speculative disssembly https://ieeexplore.ieee.org/document/7745279 decode every offset. Refine blocks. Spedi, open source spcualtive disassembler https://github.com/abenkhadra/spedi Nucleus paper https://mistakenot.net/papers/eurosp-2017.pdf Compiler-Agnostic Function Detection in Binaries
superset disassembler kenneth https://personal.utdallas.edu/~hamlen/bauman18ndss.pdf civuentes thesis
probablistic disassembly using proabablistic datalog? bap mc + datalog?
Formally Verified Lifting of C-Compiled x86-64 Binaries Scalable validation of binary lifters
- Linear
- Recursive
-
Delay slots are an annoyance. Some architectures allow instructions to exist in the shadow of jump instructions that logically execute beofre the jump instruction. This makes sense from a micro architectural perspective, but it is bizarre to disassemble
https://rev.ng/ rev.ng paper osr - offset shift range. I;ve seen this called value set analysis? rev.ng thesis
Analyzing Memory Accesses in x86 Executables Reps and Balakrishnnan
byteweight - bap neural network thing identifies function starts
Ghidra repackaging: lifting bits sleigh pypcode https://github.com/StarCrossPortal/sleighcraft https://github.com/black-binary/sleigh
An Empirical Study on ARM Disassembly Tools
Decompiler
https://x.com/mahal0z/status/1717600833037377613?s=20 https://www.zionbasque.com/files/publications/sailr_usenix24.pdf Ahoy SAILR! There is No Need to DREAM of C: A Compiler-Aware Structuring Algorithm for Binary Decompilation Graph schema matching? Smart methodology. Take codebase, decompile, compare number of gotos in original vs decompiled functions. Find hotspots. Binary search for passes responsible
DREAM Pheonix
FoxDec - Formally verified x86-64 decompilation
relic - no more gotos
a comb for decompiled C https://rev.ng/downloads/asiaccs-2020-presentation.pdf
Cifuentes thesis kind of a ludicrous amount of info “Decompiler compiler” references BB91 BB92 bbl91 bbl93 bow91 bow93 A Compendium of Formal Techniques for Software Maintenance the redo project final report
Decompilation: The Enumeration of Types and Grammars From programs to object code and back again using logic programming: Compilation and decompilation The art of computer un-programming: Reverse engineering in Prolog Generating Decompilers
https://www2.cs.arizona.edu/~collberg/Teaching/620/1999/Handouts/rmeadows1.pdf
Polymorphic Type Inference for Machine Code
Static Single Assignment for Decompilation - Emmerik https://twitter.com/brightprogramer/status/1626841275801702400?s=20
Decompilers and beyond Ilfak Guilfanov, Hex-Rays SA, 2008
Type-Based Decompilation? (or Program Reconstruction via Type Reconstruction)- Alan Mycroft
Loop recovery
Getting structured control flow from CFG Stackify https://gitlab.inria.fr/why3/why3/-/merge_requests/453
havlak and tarjan testing redicibility with union find
A New Algorithm for Identifying Loops in Decompilation
dfs. label nodes. Intervals of labels are subsets of nodes. timestamp of first visit timestamp of last ivist. backedge forward edge, cross edge
https://zenodo.org/record/6727752#.YsxRw9rMI2w
Also… egraphs
webassembly is at the core of both of these Allllll things come around I wonder if webassembly would be a good universal disassembler ir someone should do that https://github.com/nforest/awesome-decompilation
Interactive
- IDA
- Ghidra
- Binary Ninja
- Cutter
- angr management
https://github.com/GrammaTech/gtirb-vscode
decompiler explorer Hmm. too bad it’s not a web service
Binary Ninja
SCC- shellcde compiler. Why is this a top level thing?
Binary View Making a plugin - has remote denug to vs code https://docs.binary.ninja/dev/plugins.html
current_function
bv.functions
f.callers f.callees
Types
https://docs.binary.ninja/guide/type.html#boolean ` bv.parse_type_string(“uint64_t”) ` slow but convenient
Type.bool Type.char
#!/usr/bin/env python3
import binaryninja
with binaryninja.open_view("/bin/ls") as bv:
print(f"Opening {bv.file.filename} which has {len(list(bv.functions))} functions")
IR Tower
Lifted IL
LLIL
HLIL
Ghidra
See ghidra notes
Angr
import angr #, monkeyhex
proj = angr.Project('/bin/true')
state = proj.factory.entry_state()
code = '''
int fact(int x){
int acc = 1;
while(x > 0){
acc *= x;
x--;
}
return acc;
}
'''
import tempfile
import subprocess
import angr #, monkeyhex
import os
with tempfile.NamedTemporaryFile(suffix=".c") as fp:
with tempfile.TemporaryDirectory() as mydir:
fp.write(code.encode())
fp.flush()
fp.seek(0)
print(fp.readlines())
print(fp.name)
print(mydir)
outfile = mydir + "/fact"
print(outfile)
print(subprocess.run(["gcc", "-g", "-c","-O1", "-o", outfile, fp.name], check=True))
print(os.listdir(mydir))
print(subprocess.run(["objdump", "-d", outfile], check=True))
proj = angr.Project(outfile)
print(dir(proj))
print(proj.arch)
print(proj.entry)
print(proj.filename)
print(proj.loader)
block = proj.factory.block(proj.entry)
print(block.pp())
print(block)
print(block.vex)
print(block.instructions) # numebr of instryuctions
print(dir(proj.analyses))
state = proj.factory.entry_state()
print(dir(state))
print(proj.loader.find_symbol("fact"))
#state = proj.factory.entry_state()
print(state.step())
print(dir(state.solver))
print(state.solver.all_variables)
echo "
int fact(int x){
int acc = 1;
for (int i = 1; i < x; i++){
acc *= i;
}
return acc;
}
" > /tmp/fact.c
gcc /tmp/fact.c -c -o /tmp/fact
echo "
int max(int x){
return x > 0 ? x : -x;
}
" > /tmp/max.c
gcc /tmp/max.c -c -o /tmp/max
import angr
p = angr.Project('/tmp/fact')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
print(entry_func.name)
blocks = list(entry_func.blocks)
print(blocks)
irsb = blocks[0].vex
print(dir(entry_func))
print(entry_func.block_addrs)
print(dir(blocks[0]))
print(entry_func.addr)
import angr
p = angr.Project('/tmp/max')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
print(entry_func.name)
blocks = list(entry_func.blocks)
print(blocks)
irsb = blocks[0].vex
from pyvex.stmt import *
from pyvex.expr import *
jmp_table = ["""
jump_table:
switch(state.gr[184]){
"""]
#for addr in addrs:
# jmp_table.append(f"case 0x{addr:x}: goto {label(addr)};")
#jmp_table.append("""
#default:
# assert(false); // unexpected indirect jump
#}
#""")
def proc_binop(expr):
#print(expr.op)
# TODO: perhaps I need casts here to signed unsigned?
if expr.op == "Iop_Sub64":
return f"({proc_expr(expr.args[0])} - {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Sub32":
return f"({proc_expr(expr.args[0])} - {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Add64":
return f"({proc_expr(expr.args[0])} + {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Add32":
return f"({proc_expr(expr.args[0])} + {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Shr64":
return f"({proc_expr(expr.args[0])} >> {proc_expr(expr.args[1])})"
elif expr.op == "Iop_And64":
return f"({proc_expr(expr.args[0])} & {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Xor64":
return f"({proc_expr(expr.args[0])} & {proc_expr(expr.args[1])})"
elif expr.op == "Iop_CmpLT32S":
return f"({proc_expr(expr.args[0])} < {proc_expr(expr.args[1])})"
elif expr.op == "Iop_Mul32":
return f"({proc_expr(expr.args[0])} * {proc_expr(expr.args[1])})"
else:
assert False, f"Unimplemented binop {expr.op}"
def proc_unop(expr):
#print(expr.op)
if expr.op == "Iop_64to32":
return f"((int32_t) {proc_expr(expr.args[0])})"
elif expr.op == "Iop_32Uto64":
return f"((int64_t) {proc_expr(expr.args[0])})"
elif expr.op == "Iop_1Uto64":
return f"((int64_t) {proc_expr(expr.args[0])})"
elif expr.op == "Iop_64to1":
return f"((bool) {proc_expr(expr.args[0])})"
else:
assert False, f"Unimplemented unop {expr.op}"
def proc_expr(expr : IRExpr):
#print(expr)
if isinstance(expr, Binder):
assert False
elif isinstance(expr, VECRET):
assert False
elif isinstance(expr, GSPTR):
assert False
elif isinstance(expr, RdTmp):
return str(expr)
elif isinstance(expr, Get):
assert expr.ty == "Ity_I64"
return f"state->gr[{expr.offset}]"
elif isinstance(expr, Qop):
assert False
elif isinstance(expr, Triop):
assert False
elif isinstance(expr, Binop):
return proc_binop(expr)
elif isinstance(expr, Unop):
return proc_unop(expr)
elif isinstance(expr, Load):
#assert expr.ty == "Ity_I32" # TODO. We need to deal with these types and endianess. Yuck
return f"state->mem[{expr.addr}]"
elif isinstance(expr, Const):
return str(expr)
elif isinstance(expr, ITE):
return f"{proc_expr(expr.cond)} ? {proc_expr(expr.iftrue)}: {proc_expr(expr.iffalse)}"
elif isinstance(expr, CCall):
assert False
else:
assert False
def label(addr):
return f"L_0x{addr:x}"
def proc_type(typ):
if ty == "Ity_I32":
return "int32_t"
elif ty == "ItyI64":
return "int64_t"
else:
assert False
def proc_stmt(stmt : IRStmt):
#print(stmt.tag)
if isinstance(stmt, NoOp):
return "// NoOp"
elif isinstance(stmt, IMark):
# assert PC = expected pc here?
# f"assert(state->gr[{}] == stmt.addr);"
#return f"break; case 0x{stmt.addr:x}: //{label(stmt.addr)}:" # also print original assembly in comment
return f"//{label(stmt.addr)}:"
elif isinstance(stmt, AbiHint):
return f"return; // {stmt}" # TODO: add return parameter? Or state is good. Is abihint always at returns?
elif isinstance(stmt, Put):
return f"state->gr[{stmt.offset}] = {proc_expr(stmt.data)};"
assert False
elif isinstance(stmt, PutI):
assert False
elif isinstance(stmt, WrTmp):
return f"t{stmt.tmp} = {proc_expr(stmt.data)};"
elif isinstance(stmt, Store):
return f"state->mem[{stmt.addr}] = {proc_expr(stmt.data)};"
elif isinstance(stmt, CAS):
assert False
elif isinstance(stmt, LLSC):
assert False
elif isinstance(stmt, MBE):
assert False
elif isinstance(stmt, Dirty):
assert False
elif isinstance(stmt, Exit):
print(stmt.jk)
assert stmt.jk == "Ijk_Boring"
return f"if({proc_expr(stmt.guard)}){{ state->gr[{stmt.offsIP}] = 0x{stmt.dst.value:x}; break; }}"
# TODO deal with other jumpkind
# What represents an indirect jump? goto jump_table;
# elif stmt.jk == "Ijk_Ret" f"return;"
# elif stmt.jk == Ijk_Call hmmm. f"{funname}()" maybe?
# elif stmt.jk == "Ijk_Exit"
elif isinstance(stmt, LoadG):
assert False
else:
print("unrecognized IRStmt")
assert False
output = []
tmps = set()
for block in blocks:
#print(dir(block))
#print(block.instructions)
output.append("/*")
output.append(str(block.disassembly))
output.append("*/")
output.append(f"case 0x{block.addr:x}:")
for stmt in block.vex.statements:
#stmt.pp()
output.append(proc_stmt(stmt))
if isinstance(stmt,WrTmp):
tmps.add(stmt.tmp)
# output.append("goto jump_table;")
output.append("break;")
output.append("default: assert(0); // Unexpected PC value. Something has gone awry.")
#print("\n".join(output))
header = """
#include <stdint.h>
#include <assert.h>
#include <stdbool.h>
#define PUT(reg) reg
#define PC 184
typedef struct state_t
{
int64_t *mem;
int64_t *gr;
} state_t;
"""
with open("/tmp/decomp.c", "w") as file:
file.write(header)
file.write(f"void {entry_func.name}_decomp(state_t *state){{\n") # use entry_func.name?
file.write("int " + ",".join([f"t{tmp}" for tmp in tmps]) + ";\n") # declare temps
file.write(f"state->gr[PC] = 0x{entry_func.addr:x};\n") # initilizae PC to entry point
file.write("while(1){\n") # interpreter loop invariant on PC? Could make gas?
file.write("switch(state->gr[PC]){")
file.writelines([x + "\n" for x in output])
file.write("}}}\n")
clang-format -i --style=google /tmp/decomp.c
cat /tmp/decomp.c
gcc /tmp/decomp.c -O2 -c -o /tmp/decomp.o -Wall -Wextra -Wcast-align -Wcast-qual -Wmissing-declarations
esbmc /tmp/decomp.c --function max_decomp #--goto-functions-only
Control flow encoding. Do I do separate jump table, go to jump table every time?
while(true){switch state.gr[pc]{ case: case: case: case default: } }
This is fairly conservative.
Big block encoding trusts fall through behavior.
Byte address and then cast mem pointers.
import angr
p = angr.Project('/tmp/fact')
cfg = p.analyses.CFGFast()
print("This is the graph:", cfg.graph)
entry_func = cfg.kb.functions[p.entry]
print(entry_func.name)
blocks = list(entry_func.blocks)
print(blocks)
irsb = blocks[0].vex
import pyvex
stmts = []
counter = 0
def fresh():
global counter
counter += 1
return f"%v{counter}"
def print_expr(dst, expr):
if isinstance(expr, pyvex.expr.Const):
stmts.append( { "op": "const", "type": "int", dest: dst, "value": expr.con })
elif isinstance(expr, pyvex.expr.Binop):
args = [fresh() for _ in range(2)]
for a, e in zip(args, expr.child_expressions):
print_expr(a,e)
stmts.append({ "op": expr.op, "type": "int", "dest": dst, "args": args })
elif isinstance(expr, pyvex.expr.RdTmp):
print(expr.tmp)
#stmts.append({ "op": "id", "type": "int", "dest": dst, "args": })
elif isinstance(expr, pyvex.expr.Get):
return f"%{expr.offset}"
else:
print(expr)
print(type(expr))
assert False
for stmt in irsb.statements:
stmt.pp()
if isinstance(stmt, pyvex.IRStmt.Store):
expr = print_expr(stmt.data)
print(stmt.end)
assert stmt.end == "Iend_LE"
expr["op"] = "store"
expr["dest"] = expr.addr
stmts.append(expr)
elif isinstance(stmt, pyvex.IRStmt.Put):
print_expr(stmt.data)
print(stmt.offset)
elif isinstance(stmt, pyvex.IRStmt.WrTmp):
print_expr(stmt.data)
print(stmt.tmp)
elif isinstance(stmt, pyvex.IRStmt.IMark):
pass
else:
print(stmt)
print(type(stmt))
assert "unrecognized stmt" == None
print(stmts)
Patching
See notes on patching
Binary reversing
https://corte.si/posts/visualisation/binvis/index.html hilbert curves for binary vsiualization benford’s law binwalk entropy visualization
https://www.usenix.org/system/files/sec22-burk.pdf Decomperson: How Humans Decompile and What We Can Learn From It
Debuggers
https://github.com/x64dbg/x64dbg https://rr-project.org/ windbg
Code Search
https://github.com/weggli-rs/weggli Joern codeql
Emulation
Icicle: A Re-designed Emulator for Grey-Box Firmware Fuzzing https://github.com/icicle-emu/icicle-emu semantics powered by sleigh.
Qemu
Qemu plugins Qemu has user land and system
echo '
#include <stdio.h>
int main(){
printf("hello world\n");
return 0;
}
' > /tmp/hello.c
gcc /tmp/hello.c -o /tmp/hello
qemu-x86_64 -strace -singlestep -d in_asm,cpu /tmp/hello
options https://www.qemu.org/docs/master/user/main.html
- strace
- trace
- plugin
- d logs
- singlestep (now one-insn-per-tb)
qemu-x86_64 --help
+ qemu-x86_64 --help
usage: qemu-x86_64 [options] program [arguments...]
Linux CPU emulator (compiled for x86_64 emulation)
Options and associated environment variables:
Argument Env-variable Description
-h print this help
-help
-g port QEMU_GDB wait gdb connection to 'port'
-L path QEMU_LD_PREFIX set the elf interpreter prefix to 'path'
-s size QEMU_STACK_SIZE set the stack size to 'size' bytes
-cpu model QEMU_CPU select CPU (-cpu help for list)
-E var=value QEMU_SET_ENV sets targets environment variable (see below)
-U var QEMU_UNSET_ENV unsets targets environment variable (see below)
-0 argv0 QEMU_ARGV0 forces target process argv[0] to be 'argv0'
-r uname QEMU_UNAME set qemu uname release string to 'uname'
-B address QEMU_GUEST_BASE set guest_base address to 'address'
-R size QEMU_RESERVED_VA reserve 'size' bytes for guest virtual address space
-d item[,...] QEMU_LOG enable logging of specified items (use '-d help' for a list of items)
-dfilter range[,...] QEMU_DFILTER filter logging based on address range
-D logfile QEMU_LOG_FILENAME write logs to 'logfile' (default stderr)
-p pagesize QEMU_PAGESIZE set the host page size to 'pagesize'
-singlestep QEMU_SINGLESTEP run in singlestep mode
-strace QEMU_STRACE log system calls
-seed QEMU_RAND_SEED Seed for pseudo-random number generator
-trace QEMU_TRACE [[enable=]<pattern>][,events=<file>][,file=<file>]
-plugin QEMU_PLUGIN [file=]<file>[,<argname>=<argvalue>]
-version QEMU_VERSION display version information and exit
Defaults:
QEMU_LD_PREFIX = /etc/qemu-binfmt/x86_64
QEMU_STACK_SIZE = 8388608 byte
You can use -E and -U options or the QEMU_SET_ENV and
QEMU_UNSET_ENV environment variables to set and unset
environment variables for the target process.
It is possible to provide several variables by separating them
by commas in getsubopt(3) style. Additionally it is possible to
provide the -E and -U options multiple times.
The following lines are equivalent:
-E var1=val2 -E var2=val2 -U LD_PRELOAD -U LD_DEBUG
-E var1=val2,var2=val2 -U LD_PRELOAD,LD_DEBUG
QEMU_SET_ENV=var1=val2,var2=val2 QEMU_UNSET_ENV=LD_PRELOAD,LD_DEBUG
Note that if you provide several changes to a single variable
the last change will stay in effect.
See <https://qemu.org/contribute/report-a-bug> for how to report bugs.
More information on the QEMU project at <https://qemu.org>.
qemu-x86_64 -d help
Log items (comma separated):
out_asm show generated host assembly code for each compiled TB
in_asm show target assembly code for each compiled TB
op show micro ops for each compiled TB
op_opt show micro ops after optimization
op_ind show micro ops before indirect lowering
int show interrupts/exceptions in short format
exec show trace before each executed TB (lots of logs)
cpu show CPU registers before entering a TB (lots of logs)
fpu include FPU registers in the 'cpu' logging
mmu log MMU-related activities
pcall x86 only: show protected mode far calls/returns/exceptions
cpu_reset show CPU state before CPU resets
unimp log unimplemented functionality
guest_errors log when the guest OS does something invalid (eg accessing a
non-existent register)
page dump pages at beginning of user mode emulation
nochain do not chain compiled TBs so that "exec" and "cpu" show
complete traces
plugin output from TCG plugins
strace log every user-mode syscall, its input, and its result
trace:PATTERN enable trace events
Use "-d trace:help" to get a list of trace events.
Fuzzing
Mayhem Fuzzy-sat running smt queries through a fuzzer Angora SLF eclipser
fuzzing challeneges and reflection
oss-fuzz
rode0day rolling fuzzing competition
Greybox
whitebox
- klee
- sage
Qsym hybrid fuzzing. concolic execution.
syzkaller kernel fuzzer go-fuzz fuzzili winafl
Fuzzers compile in extra information to give coverage guidance
Fuzzers use a corpus of input
Using fuzzer to solve csp. Write checker. Fuzz it. It’s randomized search
Fuzzgym makes a lot of sense to put neural heuristics in there
https://www.youtube.com/watch?v=sjLFf9q2NRc&ab_channel=FuzzingLabs-PatrickVentuzelo afl++ qemy libfuzzer vs afl vs honggfuzz corpus grammar based fuzzing, differential fuzzing
https://github.com/airbus-cyber/ghidralligator ghidra for fuzzing
AFL
AFL. AFL++ fork of afl
compile using afl-clang-fast++ or use qemu mode.
https://github.com/mykter/afl-training afl fuzzing training
https://afl-1.readthedocs.io/en/latest/user_guide.html
echo "
int main(){
if(x > 0){
assert(0);
}
return 42;
}" > /tmp/bug.c
afl-gcc /tmp/bug.c -o /tmp/bug
afl-fuzz -i /tmp/corpus -o /tmp/out /tmp/bug
AFT qemu deferred forkserver https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md
Symbolic Execution
MAAT Ttrail of bits using ghidra
https://github.com/eurecom-s3/symcc symqemu
unicorn - ripped out the heart of qemu and made it programmatically accessible. Based on an old version of qemu though
KLEE
primus - bap’s emulator framework
panda https://github.com/panda-re/panda - built on qemu. record and replay executions
Vulnerabilities
CWE - common weakenss enumeration https://attack.mitre.org/
integer overflow https://cwe.mitre.org/data/definitions/190.html
sophos - comprehensive exploit prevention 2018
Current State of Exploit Dev 2020
Windows
microsoft exploits and exploit kits
Mitigations
Control Flow integrity is a broad term for many of these CONFIRM: Evaluating Compatibility and Relevance of Control-flow Integrity Protections for Modern Software
DEP - data execution prevention executable space protection This says DEP is Windows terminology? NX bit
shadow stack
control flow guard - windows reverse flow guard extreme flow guard kernel data protection
stack canary https://www.keil.com/support/man/docs/armclang_ref/armclang_ref_cjh1548250046139.htm -fstack-protector. Guard variable put on stack SSP stack smashing protection. Stackguard, Propolice. https://embeddedartistry.com/blog/2020/05/18/implementing-stack-smashing-protection-for-microcontrollers-and-embedded-artistrys-libc/ Buffer overflow protection
ASLR ASLP A Address Space Layout Randomization. Libraries are linked in at a different location. This make code reuse in an exploit more difficult.
Fat pointers
endbr intel control flow enforcement technology (CET). Valid locations for indirect jumps.
ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?
checksec tells you about which things are enabled. https://opensource.com/article/21/6/linux-checksec which also has a rundown of the different things and how you could check them manually. Can output into xml, json, csv
gcc options -no-pie -pie -fpie -no-stack-protection -fstack-protector-all -z execstack makes stack executable
RELRO - relocation read only. GOT table becomes read only. Prevents relocation attacks
binary diversification - compiler differently every time. code reuse becomes way harder diversification make many versions of binary to make code reuse attacks harder. disunison
Exploits
Buffer Overflows
buffer overflow When a buffer overflow occurs you are writing to memory that possibly had a different purpose. Maybe other stack variables, maybe return address pointers, maybe over heap metadata.
Sanitization of user input Off by one errors String termination
Primitives
AWP Arbitrary write primitive CWE-123: Write-what-where Condition ARP arbitrary read primitive
Return Oriented Programming (ROP)
ropgadget ropium supports semantics queries. hmm in cpp. V impressive.
return to libc libc is very common and you can weave together libc calls. “Solar Designer” solar designer 1997
smashing the stack for fun and profit - stacks are no longer executable
https://acmccs.github.io/papers/geometry-ccs07.pdf geometry of innocent flesh on the bone. ROP
https://github.com/sashs/Ropper
pop_gadget ; value ; nextgadget loads from stack into register
pure buffer overflow from command line:
#include <stdio.h>
int main(int argc, char *argv[])
{
char buf[256];
memcpy(buf, argv[1],strlen(argv[1]));
printf(buf);
}
rop prevention by binary rewriting ropguard
stack pivoting moving over to a different stack.
ret2libc ret2dlresolve ret2csu ret2plt
https://medium.com/cyber-unbound/buffer-overflows-ret2libc-ret2plt-and-rop-e2695c103c4c
JOP COP
jop rocket. blackhat talk
https://i.blackhat.com/asia-21/Thursday-Handouts/as-21-Brizendine-Babcock-Prebuilt-Jop-Chains-With-The-Jop-Rocket-wp.pdf
https://github.com/Bw3ll/JOP_ROCKET
Data Oriented Programming (DOP)
Heap
https://c4ebt.github.io/2021/01/22/House-of-Rust.html house of rust
If you can overwrite a struct that contains a pointer, you can use this to obtain reads or writes when that pointer is read or written. If the struct contains a code pointer, you get control flow execution.
Heap layout problem Heap layout manipulation
Metadata
Double free Use after free
http://phrack.org/issues/61/6.html advanced doug lea malloc hacking
https://milianw.de/blog/heaptrack-a-heap-memory-profiler-for-linux.html valgrind massif perf-mem, valgrind massif, and heaptrack
https://heap-exploitation.dhavalkapil.com/ https://github.com/DhavalKapil/heap-exploitation
advanced doug lea malloc - phreak post
glibc.
ldd /bib/ls
libc.so.6 - symbolic link probably
glibc 2.27
libc-2.31.so actually
pie
You can run it? /lib/x86_64-linux-gnu/libc.so.6
malloc chunks of memory new/delete make_unique
pwndbg vis
let’s you see allocated chunks. How do it do it?
vmmap also shows memory regions classified
top chunk. I think the top chunk is resized to hand out new memory.
Metadata - size field. prev_in_use flag.
Allocator hands out in discrete sizes, not arbitrarily flexible
Playing for K(H)eaps: Understanding and Improving Linux Kernel Exploit Reliability defragmentation
slake
libheap examine glibc heap in gdb. seems like there is a python model of heap in here.
https://ir0nstone.gitbook.io/notes/types/heap/bins
chunks - has prev size, size, info bits. Free chunks have pointers in content top chunk - large piece of memory new chunks are carved out of last remainder chunk tcache fastbin - last in first out. a stacklike structure
large bins - varying size. chunks put in order
printf(“%p”) is your friend
free lists when we call free memory fastbin. hold free checks of a specific size set context-sections code b main vis to visualize heap fastbins command -x20 - 0x80 arenas main arena - glibc data section pwndbg arena
fd forward pointer
House of Force
fastbin dup
arbitrary write double free fastbin will return twice. frame command to context code
immediate double free is prtoected
free_hook
find_fake_fast one_gadget - constraints https://github.com/david942j/one_gadget
malloc_alloc
unsafe unlink
n vis
unsorted bin doubly linked circular free unsorted chunks have forward and backward pointers. new free added to head. malloced from tail
adjeacent chunks get considlated prev in use flags unlinking. chunks being removed from doubly linked list
2000 solar designer voodoo malloc tricks
Type Confusion
Kernel
double free ps4 https://github.com/Cryptogenic/Exploit-Writeups/blob/master/FreeBSD/PS4%205.05%20BPF%20Double%20Free%20Kernel%20Exploit%20Writeup.md
https://twitter.com/sirdarckcat/status/1584846038866989056?s=20&t=udFq9u7zLY-5-Ae6VrdqeQ Joy of explooitation the kernel http://slides.kernel.kitchen
Browser
https://twitter.com/5aelo?lang=en Attacking JavaScript Engines: A case study of JavaScriptCore and CVE-2016-4622
How I started chasing speculative type confusion bugs in the kernel and ended up with ‘real’ ones
Return to sender Detecting kernel exploits with eBPF https://github.com/Gui774ume/krie
Automated Exploit Generation (AEG)
usenix security heaphopper angr symbolic analysis for heap exploits? archeap maze toward heap feng shui backward search from heaphopper teerex discover of memory corrupton vulen [symcc]
Elf stuff
See linker elfmaster elf reverse
example interesting elf files overlaying headers. smallest that doesn’t voilate spec
binary golf workshop https://codegolf.stackexchange.com/ size coding dead bytes libgolf UNIX ELF Parasites and virus - Silvio Cesare Elf Binary Mangling Pt. 4: Limit Break
CTF
SUID https://gtfobins.github.io/ GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems.
Misconfiguration https://osquery.readthedocs.io/en/stable/
commando vm
yolo space hacker. steam game of ctf stuff exploit_me very vulnerable example applications pwntools
pwndbg
cyclic_gen
What is Binary Analysis
Dynamic - It feels like you’re running the binary in some sense. Maybe on an emulated environment Static - It feels like you’re not running the binary
Fuzzing is definitely dynamic. Dataflow analysis on a CFG is static There are greys areas. Symbolic execution starts to feel like a grey area. I would consider it to be largely dynamic, but you are executing in a rather odd way.
Trying to understand a binary Why?
- Finding vulnerabilities for defense or offense
- buffer overflows
- double frees
- use after frees
- memory leaks - just bad performance
- info leaks - bad security
- Verification - Did your compiler produce a thing that does what your code said?
- Reversing/Cracking closed source software.
- Patching and Code injection. Finding Bugs for use in speed runs. Game Genie.
- Auditing
- Aids for manual RE work. RE is useful because things may be undocumented intentionally or otherwise. You want to reuse a chip, or turn on secret functionality, or reverse a protocol.
- Discovery of patent violation or GPL violations
- Comparing programs. Discovering what has been patched.
I don’t want my information stolen or held ransom. I don’t want people peeping in on my conversations. I don’t want my computer wrecked. These are all malicious actors We also don’t want our planes and rockets crashing. This does not require maliciousness on anyone’s part persay.
- Symbol recovery
- Disassembly
- CFG recovery
- Taint tracking
- symbolic execution
https://github.com/analysis-tools-dev/dynamic-analysis A list of tools https://analysis-tools.dev/ CSE597: Binary-level program analysis Spring 19 Gang Tan
Program Analysis
What’s the difference? Binaries are less structured than what you’ll typically find talked about in program analysis literature.
Binaries are tough because we have tossed away or the coupling has become very loose between high level intent and constructs and what is there.
How are binaries made
C preprocessor -> Maintains file number information isn’t that interesting
C compiler -> assembly. You can ask for this assembly with -S
. You can also
Or more cut up
C -> IR
IR -> MIR (what does gcc do? RTL right? )
MIR -> Asm
Misc
https://twitter.com/33y0re/status/1528719776142475264?s=20&t=vcTgXMu6ZeZRjj7LiSMcFg kernel exploitation brwoser exploitation blog posts
- Arbitrary code guard
- Code interity guard
- hypervisor protected code integrity (HVCI) “the acg of kernel mode”
- vistualization based security VBS. credential guard
-
local privilege escalation (LPE)
People
-
elfmaster ryan oneill
GDB
https://github.com/HyperDbg/HyperDbg
- pwndbg
-
heap commands. For exminging heap structur
- gef can track malloc and free. That makes sense
cfi directives - call frame information
joern.io https://github.com/RUB-SysSec/EvilCoder automatic bug insertion using joern phaser slither
Hiding instructions in instructions https://lucris.lub.lu.se/ws/portalfiles/portal/78489284/nop_obfs.pdf
Thomas stars https://github.com/bsoddreams?tab=stars
SGX enclaves
obfuscation snapchat ollvm vmprotect https://github.com/void-stack/VMUnprotect opaque preciates - one branch always taken
chris domas https://github.com/xoreaxeaxeax/movfuscator tom 7
firmadyne emulating and analyzing firmware
burp suite idor - autorize
bloodhound
shellcode encoding and decoding - sometimes you need to avoid things like \0 termination. https://www.ired.team/offensive-security/code-injection-process-injection/writing-custom-shellcode-encoders-and-decoders Shellcode generators. What do they do? shellcode database
google dorking Like using google with special commands? Why “dork”? shodan
nmap
-A -T4. OS detection nmap nse - nmap scriping engine. There is a folder of scripts
p0f - passive sniffing. fingerprinting
malware reversing class live overflow youtube exploit education rop emporium linux exploitation course yara - patterns to recognize malware. Byte level patterns? Sigma snort
SIEM IDS - intrusin detection systems https://en.wikipedia.org/wiki/Intrusion_detection_system
shellcode encoder/decoder/generator https://www.msreverseengineering.com/blog/2017/7/15/the-synesthesia-shellcode-generator-code-release-and-future-directions synesthesia
FLIRT https://github.com/avast/retdec
https://github.com/grimm-co/NotQuite0DayFriday exploit examples
Gray Hat Hacking The Shellcoder’s handbook Attacking network Protocols Implementing Effective Code Review
https://objective-see.com/blog/blog_0x64.html
Hacking: http://langsec.org/papers/Bratus.pdf sergey weird machine paper
https://github.com/sashs/filebytes
blackhat defcon bluehat ccc https://en.wikipedia.org/wiki/Security_BSides bsides ctf project zero kpaersky blog https://usa.kaspersky.com/blog/ spectre/meltdown https://www.youtube.com/watch?v=b7urNgLPJiQ&ab_channel=PinkDraconian
return oriented programming sounds like my backwards pass. Huh.
Digital forensics
- Volatility https://www.volatilityfoundation.org/
- wireshark
- sleuth kit?
radare2, a binary analysis thingo. rax is useful for conversion of hex
binary ninja
ghidra
IDA
RSACTFTool
factordb
manticore
Maybe we should get a docker of all sorts of tools. Kali Linux? https://github.com/zardus/ctf-tools
klee, afl, other fuzzers? valgrind
cwe-checker
shellcode
ROP
https://quipqiup.com/ - solve substitution cyphers
https://github.com/openwall/john john the ripper. Brute force password cracker
ropper
Best CTFs. I probably don’t want the most prestigious ones? They’ll be too high level? I want the simple stuff
https://ctf101.org/ - check out the heap exploitation github thing
pwntools
metasploit, pacu - aws, cobalt strike
and the pwn category of ctf
ROP JOP SROP BOP - block oriented
return 2 libc - a subset of rop?
pwn.college
ryan chapman syscall
https://github.com/revng/revng
privilege escalation - getuid effective id.. Inherit user and group from parent process. switching to user resets the setuid bit. sticky bits id command
shellcode - binary that launchs the shell system call execv(“/bin/sh”, NULL, NULL) - args and env params
intel vs at&t syntax Load up addresses constantrs in binary with .string gcc -static -nostdlib objcopy –section .text=outfile exiting cleanly is smart. Helps know what is screwing up ldd
Trying out shellcode mmap. mprotect? read() deref function pointer
gdb x for eXamine $rsi x/5i $rip gives assembly? x/gx break *0xx040404 n next s step ni si
strace is useful first debugging
Intro
system calls set rax to syscall number. call syscall instruction https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/ man yada strace
- fork
- execve
- read
- write
- wait
- brk - program brk. change size of data segment. sbrk by increments. sbrk(0) returns current address of break
stack. rbp, rsp. stack grows down decreasing. Rsp + 0x8 is on stack, rbp - 8 is on stack most systems are little endian calling conventions. rdi rsi rdx rcx r8 r9, return in rax rbx rbp r12 r13 r14 r15 are callee saved. guaranteed not smashed
http://ref.x86asm.net/coder64.html opcode listing https://github.com/yrp604/rappel - assembly repl https://github.com/zardus/ctf-tools
binary files
file - tells info about file
elf - interpreter,
- sections - text, plt/got resolve and siprach library calls, data preinitilize data, rodata, global read only,, bss for uniitialized data. sections are not required to run a binary
- symbols -
- segments - where to load
readelf, objdump, nm - reads symbols, patchelf, objcopy, strip, kaitai struct https://www.intezer.com/blog/malware-analysis/executable-linkable-format-101-part-2-symbols/
process loading
https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-4.html what to load. look for #! or elf magic. /proc/sys/fs/binsmt_misc can match a string there. hand off to elf defined interpeter is dynamically linked.
Then it’s onto ld probably. LD_PRELOAD,, LD_LIBRARY_PATH,, DT_RUNTIME in binary file,, system wide /etc/ld.so.conf, /lib and /usr/lib relocations updated /proc/self/maps https://gist.github.com/CMCDragonkai/10ab53654b2aa6ce55c11cfc5b2432a4 libc is almost always linked. printf, scanf, socket, atoi, amlloc, free
shellcode
Protection
ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?
checksec tells you about which things are enabled.
gcc options -no-pie -no-stack-protection
pwntools
attaching to gdb and/or a process is really useful. cyclic bytes can let you localize what ends up where in a buffer overflow for example cyclic_find