Ghidra is a disassembler and decompiler built by the NSA (started somewhere in the early 2000s)and open source relatively recently. It has pretty thorough architecture support. It’s a free decompiler.

Ghidra is programmable in Java and python for the gui.

The Ghidra decompiler is actually largely written in C++ and spoken to by the Java GUI frontend over some kind of protocol. You can find it in Ghidra/Features/Decompiler/src/decompile/ One can actually directily call functions of the decompiler. There are a couple projects that do this and more are being made everday it seems.

The help docuemntation inside ghidra itself is useful and I’m not sure it is reflected online anywhere.

shared projects and ghidra server - interesting. I’.ve never needed this,

-overwrite -process exe analyze existing binary -scriptpath path/to/script to add a script -postScript scriptname to actuallly run it. Also there is -preScript

general Workflow

Figure out what functions are. Name them Correct types name variables

bytes view allows hex editting

Ghidra Scripting

A lot of the juiciest stuff is in the FlatProgramAPI

The ghidra api docs are pretty decent. I also like to open up the python console (Window > Python) and tab autocomplete just to see what is available.

scripts have accss to a variable called state?

import ensurepip
import pip

hmm. This is not so successful

So. just import a list of tuples

import ghidra
f = getFunction("mufunname")
from import DecompInterface

decomp = DecompInterface()
timeout = None
res = decompiler.decompileFunction(func, 0, timeout)
    if res and result.decompileCompleted():

hf = res.getHighFunction()

[c.getHighVariable().getInstances() for c in ccode if 'getHighVariable' in dir(c) and c.getHighVariable() != None]
highvar.getInstances() # a list of varnode

vnode.getDef() # definition
vnode.getDescendents() # users

defpcode = vnode.getDef() #
defpcode.getParent() # parent basic block
seq = defpcode.getSeqnum() # sequence number? It looks like it has a code address in it
seq.getTarget() # returns assembly address. 

seqnum unique, maintains original address.

highsymbol I can go from varnode to high var? getlonedescend. getloneascend would also be useful

proj = ghidra.base.project.GhidraProject.createproject("/tmp/", "testproject", False)
f ="/bin/true")
prog = proj.importProgramFast(f)

api = ghidra.program.flatapi.FlatProgramApi(prog)

Widgets are available for quick GUI elements

import docking.widgets 

BasicCodeBlockModel high function getbasicblocks start,address, stop address, iterate over pcode InSize, getIn

livein = {b.getStart() : set() for b in block}
liveout = {b.getStart() : set() for b in block}

for i in range(len(blocks)): # is this enough iterations? worst case is straight line graph?
    for blk in blocks:
        lives = {} #copy(liveout[b.getStart()])
        for outind in range(blk.OutSize()):
            succblk = blk.getOut(outind)
        for pcode in blk:
        livein[blk.getStart()] = lives

for i in range(block.getOutSize()):

on write

availout = {}

for pcode in blk.getIterator():
    out = pcode.getOutput()
    highvar = out.getHigh()
    if highvar != None:
        avail[highVar.getName()] = set(out) 
    for v in pcode.getInputs(): # if in input, infers it is available here

# but also being read from after the patch kind of infers it could be read from.
# polairty flips inside patch ode being replaced
# maybe if getTarget <= currentAddress <= nextInstruction

confidence scores? do monte carlo allocation inline uniques

getFalseOut, getTrueOUt


Ghidra has a headless mode that is still using the Java stuff, but doesn’t bring up the GUI window.

analyzeHeadless PATH_TO_PROJECTS PROJECT_NAME -import /path/to/binary simple headless scripts

echo '
import sys
from import DecompInterface
from ghidra.util.task import ConsoleTaskMonitor

decompinterface = DecompInterface()
functions = currentProgram.getFunctionManager().getFunctions(True)

with open("decompiled_output.c", "w") as output_file:
    for function in list(functions):
        # Add a comment with the name of the function
        # print "// Function: " + str(function)
        output_file.write("// Function: " + str(function))

        # Decompile each function
        decompiled_function = decompinterface.decompileFunction(function, 0, ConsoleTaskMonitor())
        # Print Decompiled Code
        print decompiled_function.getDecompiledFunction().getC()
' > /tmp/

echo "
int foo(int x){
    return x*x + 1;
" > /tmp/foo.c
# huh. ghidra doesn't support dwarf 5? That seems nuts.
gcc -gdwarf-4 -c /tmp/foo.c -o /tmp/foo.o

cd ~/Downloads/ghidra_10.4_PUBLIC/support
rm -rf /tmp/ghidraproject
mkdir /tmp/ghidraproject
./analyzeHeadless /tmp/ghidraproject Project1 -import /tmp/foo.o -postScript /tmp/ #2>/dev/null
#./analyzeHeadless /tmp/ghidraproject Project1 -log /tmp/ghidralog -process foo.o -postscript /tmp/ /tmp/decompile.c
#cat /tmp/decompile.c

decomp_opt / decomp_dbg are command line tools hidden inside the ghidra directory structure. T

To get it working you need to set an environment variable SLEIGHHOME=myghidradirectory It needs this to find the archicture files. THese are compiled from sleigh specs .sla files.

make decomp_dbg
make doc
SLEIGHHOME=myghidradirectory ./decomp_dbg

load file /bin/true save mysavefile.xml restore addpath codedata init,target,,runnnnnnnnnn, dump hits, dump crossrefs -

This is where most of the commands are registered

this is the main of decomp

This seems to be the java side of the communication to the decompiler is. i think the ghidra binary is the one with this interface not decomp? This is the main of ghidra_opt

sligh_compile actually compiles sla files - this is where most of them are decompile dump force hex map hash load function source option loadfile print language print xml print C types prit C produce C list action print param

entry point of function can also be given manually

parse line - to give C prototypes? parse file foo.h

load fine mytest read symbols load function main decomp print C disassemble rename param_1 argc print high iStack12

pcode stuff: op.hh funcdata.hh


Pcode is the intermediate representation that instructions from different architecures are lifted to. Doing so is the first step of disassembling.

There is a difference between Raw pcode and high pcode. Raw comes right of an instruction, high pcode has a couple more constructs to encode higher level notions like phi nodes, function calls, etc. So Pcode kind of represents at least 2 IRs in a sense, that share datatypes.

varnodes are inputs and outputs. address space, offset into space, and size


uniform address space notion. Registers ar modelling as a separate RAM. Temporary address space and constant adress space

Basically its


implementing new archtrcture is ghidra slides


anltr grammar of sleigh

xml scheme

Specifying Representations of Machine Instructions SLED paper, source of sleigh

The University of Queensland Binary Translator (UQBT) Framework

Huh, they called jit’s “dynamic compilers”


Resources angr in ghidra

formal semantics for ghidra high pcode. mentions interpeter for low pcode. haskell pcode interpeter

Yara search get basic blocks

kaiju ghihorn

Binary code coverage visualizer plugin for Ghidra

ghidra chatgpt

Ghidra script to export C pseudo-code on multiple files, including defined types

ghidra golf

pypcode How to bind to the lifter. a haskell pcode interpeter. niiice. sleigh parser