Reversing

Disassembly

disassembler linear sweep recusrive traversal anti-disassembly https://github.com/AppleReer/Anti-Disassembly-On-Arm64

relative disassembler performance

capstone http://www.capstone-engine.org/ - disassembler. converse of key zydis xed distrom iced bddisasm yaxpeax

McSema - older trail of bits lifter. Uses llvm as IR remill lift to llvm bitcode anvill processing remill rellic makes C like code

BAP ANGR

Speculative disssembly https://ieeexplore.ieee.org/document/7745279 decode every offset. Refine blocks. Spedi, open source spcualtive disassembler https://github.com/abenkhadra/spedi Nucleus paper https://mistakenot.net/papers/eurosp-2017.pdf Compiler-Agnostic Function Detection in Binaries

superset disassembler kenneth https://personal.utdallas.edu/~hamlen/bauman18ndss.pdf civuentes thesis

probablistic disassembly using proabablistic datalog? bap mc + datalog?

Formally Verified Lifting of C-Compiled x86-64 Binaries

Interactive

  • IDA
  • Ghidra
  • Binary Ninja
  • Cutter

https://github.com/GrammaTech/gtirb-vscode

decompiler explorer Hmm. too bad it’s not a web service

Fuzzing

Mayhem Fuzzy-sat running smt queries through a fuzzer Angora SLF eclipser

fuzzing challeneges and reflection

fuzzing 22

google fuzzbench

oss-fuzz

rode0day rolling fuzzing competition

Greybox

  • AFL. AFL++ fork of afl tutorials. compile using afl-clang-fast++ or use qemu mode.
  • libfuzzer clagg++ -fsantizer=address,fuzzer myfile.cc tutorial
  • honggfuzz

whitebox

  • klee
  • sage

Qsym hybrid fuzzing. concolic execution.

syzkaller kernel fuzzer go-fuzz fuzzili winafl

Fuzzers compile in extra information to give coverage guidance

Fuzzers use a corpus of input

Using fuzzer to solve csp. Write checker. Fuzz it. It’s randomized search

Fuzzgym makes a lot of sense to put neural heuristics in there

Symbolic Execution

https://github.com/eurecom-s3/symcc symqemu

unicorn - ripped out the heart of qemu and made it programmatically accessible. Based on an old version of qemu though

KLEE

primus - bap’s emulator framework

panda https://github.com/panda-re/panda - built on qemu. record and replay executions

Vulnerabilities

CWE - common weakenss enumeration

integer overflow https://cwe.mitre.org/data/definitions/190.html

null pointer dereference

Mitigations

Control Flow integrity is a broad term for many of these CONFIRM: Evaluating Compatibility and Relevance of Control-flow Integrity Protections for Modern Software

DEP - data execution prevention executable space protection This says DEP is Windows terminology? NX bit

shadow stack

stack canary https://www.keil.com/support/man/docs/armclang_ref/armclang_ref_cjh1548250046139.htm -fstack-protector. Guard variable put on stack SSP stack smashing protection. Stackguard, Propolice. https://embeddedartistry.com/blog/2020/05/18/implementing-stack-smashing-protection-for-microcontrollers-and-embedded-artistrys-libc/ Buffer overflow protection

ASLR ASLP A Address Space Layout Randomization. Libraries are linked in at a different location. This make code reuse in an exploit more difficult.

Fat pointers

endbr intel control flow enforcement technology (CET). Valid locations for indirect jumps.

ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?

checksec tells you about which things are enabled. https://opensource.com/article/21/6/linux-checksec which also has a rundown of the different things and how you could check them manually. Can output into xml, json, csv

gcc options -no-pie -pie -fpie -no-stack-protection -fstack-protector-all -z execstack makes stack executable

RELRO - relocation read only. GOT table becomes read only. Prevents relocation attacks

binary diversification - compiler differently every time. code reuse becomes way harder diversification make many versions of binary to make code reuse attacks harder. disunison

Exploits

Buffer Overflows

buffer overflow When a buffer overflow occurs you are writing to memory that possibly had a different purpose. Maybe other stack variables, maybe return address pointers, maybe over heap metadata.

Sanitization of user input Off by one errors String termination

Primitives

AWP Arbitrary write primitive CWE-123: Write-what-where Condition ARP arbitrary read primitive

Return Oriented Programming (ROP)

return to libc libc is very common and you can weave together libc calls. “Solar Designer” solar designer 1997

ropc-llvm ropc

smashing the stack for fun and profit - stacks are no longer executable

https://acmccs.github.io/papers/geometry-ccs07.pdf geometry of innocent flesh on the bone. ROP

https://github.com/sashs/Ropper

rop emporium

rop ftw

pop_gadget ; value ; nextgadget loads from stack into register

pure buffer overflow from command line:

#include <stdio.h>
int main(int argc, char *argv[])
{
    char buf[256];
    memcpy(buf, argv[1],strlen(argv[1]));
    printf(buf);
}

ropeme

angrop

Data Oriented Programming (DOP)

Block Oriented Programming

Heap

Heap layout problem Heap layout manipulation

Metadata

Double free Use after free

http://phrack.org/issues/61/6.html advanced doug lea malloc hacking

https://milianw.de/blog/heaptrack-a-heap-memory-profiler-for-linux.html valgrind massif perf-mem, valgrind massif, and heaptrack

https://heap-exploitation.dhavalkapil.com/

advanced doug lea malloc - phreak post

glibc. ldd /bib/ls libc.so.6 - symbolic link probably glibc 2.27 libc-2.31.so actually pie You can run it? /lib/x86_64-linux-gnu/libc.so.6

malloc chunks of memory new/delete make_unique

heap history viewer

Automated Exploit Generation (AEG)

sean heelan thesis

usenix security heaphopper angr symbolic analysis for heap exploits? archeap maze toward heap feng shui backward search from heaphopper teerex discover of memory corrupton vulen [symcc]

CTF

What is Binary Analysis

Dynamic - It feels like you’re running the binary in some sense. Maybe on an emulated environment Static - It feels like you’re not running the binary

Fuzzing is definitely dynamic. Dataflow analysis on a CFG is static There are greys areas. Symbolic execution starts to feel like a grey area. I would consider it to be largely dynamic, but you are executing in a rather odd way.

Trying to understand a binary Why?

  • Finding vulnerabilities for defense or offense
    • buffer overflows
    • double frees
    • use after frees
    • memory leaks - just bad performance
    • info leaks - bad security
  • Verification - Did your compiler produce a thing that does what your code said?
  • Reversing/Cracking closed source software.
  • Patching and Code injection. Finding Bugs for use in speed runs. Game Genie.
  • Auditing
  • Aids for manual RE work. RE is useful because things may be undocumented intentionally or otherwise. You want to reuse a chip, or turn on secret functionality, or reverse a protocol.
  • Discovery of patent violation or GPL violations
  • Comparing programs. Discovering what has been patched.

I don’t want my information stolen or held ransom. I don’t want people peeping in on my conversations. I don’t want my computer wrecked. These are all malicious actors We also don’t want our planes and rockets crashing. This does not require maliciousness on anyone’s part persay.

  • Symbol recovery
  • Disassembly
  • CFG recovery
  • Taint tracking
  • symbolic execution

https://github.com/analysis-tools-dev/dynamic-analysis A list of tools https://analysis-tools.dev/ CSE597: Binary-level program analysis Spring 19 Gang Tan

Program Analysis

What’s the difference? Binaries are less structured than what you’ll typically find talked about in program analysis literature.

Binaries are tough because we have tossed away or the coupling has become very loose between high level intent and constructs and what is there.

How are binaries made

C preprocessor -> Maintains file number information isn’t that interesting

C compiler -> assembly. You can ask for this assembly with -S. You can also Or more cut up C -> IR IR -> MIR (what does gcc do? RTL right? ) MIR -> Asm

Disassembly

  • Linear
  • Recursive
  • Shingled

Disasm.Driver

Delay slots are an annoyance. Some architectures allow instructions to exist in the shadow of jump instructions that logically execute beofre the jump instruction. This makes sense from a micro architectural perspective, but it is bizarre to disassemble

Misc

https://twitter.com/33y0re/status/1528719776142475264?s=20&t=vcTgXMu6ZeZRjj7LiSMcFg kernel exploitation brwoser exploitation blog posts

  • Arbitrary code guard
  • Code interity guard
  • hypervisor protected code integrity (HVCI) “the acg of kernel mode”
  • vistualization based security VBS. credential guard
  • local privilege escalation (LPE)

GDB

  • pwndbg
  • heap commands. For exminging heap structur

  • gef can track malloc and free. That makes sense

https://twitter.com/peter_a_goodman/status/1503016499824537600?s=20&t=1Z4ew6rnGnFiMTSrQJSmKw goodman on binary rewriting binrec - lift program merge lifted bytecode into debloated egalito BOLT lifting bits/grr

cfi directives - call frame information

joern.io https://github.com/RUB-SysSec/EvilCoder automatic bug insertion using joern phaser slither

Hiding instructions in instructions https://lucris.lub.lu.se/ws/portalfiles/portal/78489284/nop_obfs.pdf

Thomas stars https://github.com/bsoddreams?tab=stars

SGX enclaves

obfuscation snapchat ollvm vmprotect https://github.com/void-stack/VMUnprotect opaque preciates - one branch always taken

chris domas https://github.com/xoreaxeaxeax/movfuscator tom 7

firmadyne emulating and analyzing firmware

dronesploit

burp suite idor - autorize

bloodhound

shellcode encoding and decoding - sometimes you need to avoid things like \0 termination. https://www.ired.team/offensive-security/code-injection-process-injection/writing-custom-shellcode-encoders-and-decoders Shellcode generators. What do they do? shellcode database

google dorking Like using google with special commands? Why “dork”? shodan

nmap

-A -T4. OS detection nmap nse - nmap scriping engine. There is a folder of scripts

p0f - passive sniffing. fingerprinting

malware reversing class live overflow youtube exploit education rop emporium linux exploitation course yara - patterns to recognize malware. Byte level patterns? Sigma snort

SIEM IDS - intrusin detection systems https://en.wikipedia.org/wiki/Intrusion_detection_system

shellcode encoder/decoder/generator https://www.msreverseengineering.com/blog/2017/7/15/the-synesthesia-shellcode-generator-code-release-and-future-directions synesthesia

FLIRT https://github.com/avast/retdec

https://github.com/grimm-co/NotQuite0DayFriday exploit examples

Gray Hat Hacking The Shellcoder’s handbook Attacking network Protocols Implementing Effective Code Review

https://objective-see.com/blog/blog_0x64.html

Hacking: http://langsec.org/papers/Bratus.pdf sergey weird machine paper

https://github.com/sashs/filebytes

blackhat defcon bluehat ccc https://en.wikipedia.org/wiki/Security_BSides bsides ctf project zero kpaersky blog https://usa.kaspersky.com/blog/ spectre/meltdown https://www.youtube.com/watch?v=b7urNgLPJiQ&ab_channel=PinkDraconian

return oriented programming sounds like my backwards pass. Huh.

Digital forensics

radare2, a binary analysis thingo. rax is useful for conversion of hex

binary ninja

ghidra

IDA

RSACTFTool

factordb

manticore

Maybe we should get a docker of all sorts of tools. Kali Linux? https://github.com/zardus/ctf-tools

klee, afl, other fuzzers? valgrind

cwe-checker

shellcode

ROP

https://quipqiup.com/ - solve substitution cyphers

https://github.com/openwall/john john the ripper. Brute force password cracker

ropper

Best CTFs. I probably don’t want the most prestigious ones? They’ll be too high level? I want the simple stuff

https://ctf101.org/ - check out the heap exploitation github thing

pwntools

metasploit, pacu - aws, cobalt strike

and the pwn category of ctf

ROP JOP SROP BOP - block oriented

return 2 libc - a subset of rop?

pwn.college

ryan chapman syscall

http://ref.x86asm.net/

https://github.com/revng/revng

privilege escalation - getuid effective id.. Inherit user and group from parent process. switching to user resets the setuid bit. sticky bits id command

shellcode - binary that launchs the shell system call execv(“/bin/sh”, NULL, NULL) - args and env params

intel vs at&t syntax Load up addresses constantrs in binary with .string gcc -static -nostdlib objcopy –section .text=outfile exiting cleanly is smart. Helps know what is screwing up ldd

Trying out shellcode mmap. mprotect? read() deref function pointer

gdb x for eXamine $rsi x/5i $rip gives assembly? x/gx break *0xx040404 n next s step ni si

strace is useful first debugging

Intro

system calls set rax to syscall number. call syscall instruction https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/ man yada strace

  • fork
  • execve
  • read
  • write
  • wait
  • brk - program brk. change size of data segment. sbrk by increments. sbrk(0) returns current address of break

stack. rbp, rsp. stack grows down decreasing. Rsp + 0x8 is on stack, rbp - 8 is on stack most systems are little endian calling conventions. rdi rsi rdx rcx r8 r9, return in rax rbx rbp r12 r13 r14 r15 are callee saved. guaranteed not smashed

http://ref.x86asm.net/coder64.html opcode listing https://github.com/yrp604/rappel - assembly repl https://github.com/zardus/ctf-tools

binary files

file - tells info about file
elf - interpreter, 
 - sections - text, plt/got resolve and siprach library calls, data preinitilize data, rodata, global read only,, bss for uniitialized data. sections are not required to run a binary
 - symbols - 
- segments - where to load

readelf, objdump, nm - reads symbols, patchelf, objcopy, strip, kaitai struct https://www.intezer.com/blog/malware-analysis/executable-linkable-format-101-part-2-symbols/

process loading

https://0xax.gitbooks.io/linux-insides/content/SysCall/linux-syscall-4.html what to load. look for #! or elf magic. /proc/sys/fs/binsmt_misc can match a string there. hand off to elf defined interpeter is dynamically linked.

Then it’s onto ld probably. LD_PRELOAD,, LD_LIBRARY_PATH,, DT_RUNTIME in binary file,, system wide /etc/ld.so.conf, /lib and /usr/lib relocations updated /proc/self/maps https://gist.github.com/CMCDragonkai/10ab53654b2aa6ce55c11cfc5b2432a4 libc is almost always linked. printf, scanf, socket, atoi, amlloc, free

shellcode

Protection

ASLR - Addresses are randomized cat /proc/mem/self ? To look at what actually loaded Also ldd shows were libraries get loaded in memory Stack canaries - set once per binary run, so with forking you can brute force them or maybe leak them?

checksec tells you about which things are enabled.

gcc options -no-pie -no-stack-protection

pwntools

attaching to gdb and/or a process is really useful. cyclic bytes can let you localize what ends up where in a buffer overflow for example cyclic_find

Examples from pwn.college