The loader loads all the objects and exports an abstraction of the memory of the process. What you see here is an address space with loaded and rebased binaries.
可以注意到,返回的结果都是 BV 类型,并不是python 中的 int 类型,BV是位向量(bitvector)的简称,实际上就是一串比特序列,angr 使用位向量表示 CPU 数据。
以下展示位向量和 int 的相互转换
>>> bv = state.solver.BVV(0x1234, 32) # create a 32-bit-wide bitvector with value 0x1234
<BV32 0x1234> # BVV stands for bitvector value
>>> state.solver.eval(bv) # convert to python int
0x1234
>>> stub_func = angr.SIM_PROCEDURES['stubs']['ReturnUnconstrained'] # this is a CLASS
>>> proj.hook(0x10000, stub_func()) # hook with an instance of the class
>>> proj.is_hooked(0x10000) # these functions should be pretty self-explanitory
True
>>> proj.hooked_by(0x10000)
<ReturnUnconstrained>
>>> proj.unhook(0x10000)
>>> @proj.hook(0x20000, length=5)
... def my_hook(state):
... state.regs.rax = 1
>>> proj.is_hooked(0x20000)
True
states state 代表程序的一个实例镜像,模拟执行某个时刻的状态。保存运行状态的上下文信息,如内存/寄存器等。
在执行开始之前,我们通过设置 state 对象初始化寄存器/内存/栈帧等信息。在结束执行后,会返回 state 对象,可以提取需要的值进行求解。 基础执行 使用 state.step()接口进行简单的执行,该方法会返回一个 SimSuccessors 对象,该对象有个 .successors 属性 , 是后继状态的列表。
关于执行,在 SM 中同样涉及,通常使用 SM 管理 state 的执行。 状态预设 除了使用.entry_state() 创建 state 对象, 我们还可以根据需要使用其他构造函数创建 state:
名称描述
.entry_state()constructs a state ready to execute at the main binary's entry point.
.blank_stateconstructs a "blank slate" blank state, with most of its data left uninitialized.
.call_stateWhen accessing uninitialized data, an unconstrained symbolic value will be returned.constructs a state ready to execute a given function.
.full_init_stateconstructs a state that is ready to execute through any initializers that need to be run before the main binary's entry point访问寄存器 通过 state.regs 对象的属性访问以及修改寄存器的数据
def load(self, addr, size=None, condition=None, fallback=None, add_constraints=None, action=None, endness=None,
inspect=True, disable_actions=False, ret_on_segv=False):
"""
Loads size bytes from dst.
:param addr: The address to load from. #读取的地址
:param size: The size (in bytes) of the load. #大小
:param condition: A claripy expression representing a condition for a conditional load.
:param fallback: A fallback value if the condition ends up being False.
:param add_constraints: Add constraints resulting from the merge (default: True).
:param action: A SimActionData to fill out with the constraints.
:param endness: The endness to load with. #端序
....
def store(self, addr, data, size=None, condition=None, add_constraints=None, endness=None, action=None,
inspect=True, priv=None, disable_actions=False):
"""
Stores content into memory.
:param addr: A claripy expression representing the address to store at. #内存地址
:param data: The data to store (claripy expression or something convertable to a claripy expression).#写入的数据
:param size: A claripy expression representing the size of the data to store. #大小
...
>>> s = proj.factory.blank_state()
>>> s.memory.store(0x4000, s.solver.BVV(0x0123456789abcdef0123456789abcdef, 128))
>>> s.memory.load(0x4004, 6) # load-size is in bytes
<BV48 0x89abcdef0123>
参数 endness 用于设置端序。
可选的值如下
LE – 小端序(little endian, least significant byte is stored at lowest address)
BE – 大端序(big endian, most significant byte is stored at lowest address)
ME – 中间序(Middle-endian. Yep.)
>>> s.options.add(angr.options.LAZY_SOLVES)
# Create a new state with lazy solves enabled
>>> s = proj.factory.entry_state(add_options={angr.options.LAZY_SOLVES})
# Create a new state without simplification options enabled
>>> s = proj.factory.entry_state(remove_options=angr.options.simplification)
状态插件(state plugin) 除了前面提到的 options, SimState 中的内容都是以插件的方式进行存储,这种设计可以模块化,方便维护和拓展。
这些插件称为状态插件(state plugin),angr 内部实现了多种插件。如 memory / history / globals / callstack 等。
memory 插件前面已经提到(内存访问章节),下面简单介绍 history 和 callstack 插件。 history 插件 该插件记录状态的执行路径,实际上是 history 结点的链表,可以通过 .parent 来遍历列表。
history 存储的一些值以 history.recent_NAME 格式命名,对应的迭代器为 history.NAME 。
如以下代码会按顺序输出基本块的地址。
for addr in state.history.bbl_addrs:
print hex(addr)
如果想快速查看链表的所有结点,可以使用 .hardcopy 方法,例state.history.bbl_addrs.hardcopy
以下是 history 存储的部分值:
名称描述
history.descriptionsa listing of string descriptions of each of the rounds of execution performed on the state.
history.bbl_addrsa listing of the basic block addresses executed by the state.
history.jumpkindsa listing of the disposition of each of the control flow transitions in the state's history, as VEX enum strings.
history.eventsa semantic listing of "interesting events" which happened during execution, such as the presence of a symbolic jump condition, the program popping up a message box, or execution terminating with an exit code.
history.actionsusually empty, but if you add the angr.options.refs options to the state, it will be populated with a log of all the memory, register, and temporary value accesses performed by the program.调用栈(callstack)插件 该插件记录执行时栈帧的信息,也是链表格式。可以直接对 state.callstack 进行迭代获得每次执行的栈帧信息。直接访问 state.callstack 可以获得当前状态的调用栈。
以下是 callstack 记录的部分信息:
callstack.func_addr : the address of the function currently being executed
callstack.call_site_addr: the address of the basic block which called the current function
callstack.stack_ptr : he value of the stack pointer from the beginning of the current function
callstack.ret_addr : the location that the current function will return to if it returns
此外,angr 还内置了许多其他的状态插件,比如 heap、gdb、libc、 filesystem等等,位于 angr/state_plugin 目录。
除了使用内置状态插件外,我们也可以编写自己的插件,具体查看文档说明 模拟管理器(Simulation Managers) 前面已经介绍过 SM,通过它我们可以同时控制一组 state 的符号执行。我们可以通过 stash 对一组 state 进行执行、筛选、合并和移动等操作。
>>> simgr = proj.factory.simulation_manager(state)
<SimulationManager with 1 active>
>>> simgr.move(from_stash='deadended', to_stash='authenticated', filter_func=lambda s: b'Welcome' in s.posix.dumps(1))
>>> simgr
<SimulationManager with 2 authenticated, 1 deadended>
angr 会对 state 进行分类,归到不同的 stash,以下是部分特殊 stash 列表
名称描述
activeThis stash contains the states that will be stepped by default, unless an alternate stash is specified.
deadendA state goes to the deadended stash when it cannot continue the execution for some reason, including no more valid instructions, unsat state of all of its successors, or an invalid instruction pointer.
prunedWhen using LAZY_SOLVES, states are not checked for satisfiability unless absolutely necessary. When a state is found to be unsat in the presence of LAZY_SOLVES, the state hierarchy is traversed to identify when, in its history, it initially became unsat. All states that are descendants of that point (which will also be unsat, since a state cannot become un-unsat) are pruned and put in this stash.
unconstrainedIf the save_unconstrained option is provided to the SimulationManager constructor, states that are determined to be unconstrained (i.e., with the instruction pointer controlled by user data or some other source of symbolic data) are placed here.
unsatIf the save_unsat option is provided to the SimulationManager constructor, states that are determined to be unsatisfiable (i.e., they have constraints that are contradictory, like the input having to be both "AAAA" and "BBBB" at the same time) are placed here.explore 通过调用 explore 方法,我们可以探索执行路径,在进行 explore 时,可以设置 find 和 avoid 参数,以便找到符合我们预期的路径。
函数接口如下:
>>> proj = angr.Project('examples/CSCI-4968-MBE/challenges/crackme0x00a/crackme0x00a')
>>> simgr = proj.factory.simgr()
>>> simgr.explore(find=lambda s: b"Congrats" in s.posix.dumps(1))
<SimulationManager with 1 active, 1 found>
>>> s = simgr.found[0] # 获取通过 explore 找到符合条件的状态
>>> flag = s.posix.dumps(0)
>>> print(flag)
g00dJ0B!
explore 技术 angr 提供了多种 explore 技术,即进行路径探索时所采用的策略,可以在 angr.exploration_techniques 条目下中找到。
每个策略都是 ExplorationTechnique 对象,根据策略不同,angr 对 ExplorationTechnique 中的 setup、step 等方法进行覆盖。
通过 simgr.use_technique(tech)设定不同的策略。
下面部分列出策略
名称描述
DFSDepth first search. Keeps only one state active at once, putting the rest in the deferred stash until it deadends or errors.
LengthLimiterPuts a cap on the maximum length of the path a state goes through.
TracerAn exploration technique that causes execution to follow a dynamic trace recorded from some other source.
Oppologistif this technique is enabled and angr encounters an unsupported instruction, it will concretize all the inputs to that instruction and emulate the single instruction using the unicorn engine, allowing execution to continue.
ThreadingAdds thread-level parallelism to the stepping process.
SpillerWhen there are too many states active, this technique can dump some of them to disk in order to keep memory consumption low. 关于使用信息请查看 API文档 求解引擎 通过 state.solver 访问求解引擎,angr 的求解引擎是 claripy 用于求解约束。 位向量(bitvector) 位向量是比特序列,既可以表示具体值,也可以是符号变量。
通过 BVV(value,size) 和 BVS( name, size) 接口创建位向量,也可以用 FPV 和 FPS 来创建浮点值和符号。
# get a fresh state without constraints
>>> state = proj.factory.entry_state()
>>> input = state.solver.BVS('input', 64)
>>> operation = (((input + 4) * 3) >> 1) + input
>>> output = 200
>>> state.solver.add(operation == output)
>>> state.solver.eval(input)
0x3333333333333381
如果约束冲突,无法求解,则 state 为 unsatisfiable 状态,可以通过 state.satisfiable() 检查约束是否可解。 更多求解方式 除了朴素的 eval ,angr 提供了多种解析方式 。
接口描述
solver.eval(expression)将会解出一个可行解
solver.eval_one(expression)将会给出一个表达式的可行解,若有多个可行解,则抛出异常
solver.eval_upto(expression, n)将会给出最多n个可行解,如果不足n个就给出所有的可行解。
solver.eval_exact(expression, n)将会给出n个可行解,如果解的个数不等于n个,将会抛出异常。
solver.min(expression)给出最小可行解
solver.max(expression)给出最大可行解 同时可以设置 extra_constraints 和 cast_to参数对结果进行限制或转换。 执行引擎 angr使用一系列引擎(SimEngine的子类)来模拟被执行代码对输入状态产生的影响。源码位于 angr/engines 目录下。
以下是默认的引擎列表
名称描述
failure enginekicks in when the previous step took us to some uncontinuable state
syscall enginekicks in when the previous step ended in a syscall
hook enginekicks in when the current address is hooked
unicorn enginekicks in when the UNICORN state option is enabled and there is no symbolic data in the state
VEX enginekicks in as the final fallback.分析 angr 内置了许多程序分析方法。可以在 angr.analyses 下查看。
通过 project.analyses.name 进行调用,如 project.analyses.CFGFast() 。同时我们也可以编写自己的分析方法,具体可以查看 文档 。
以下表格列出一些常用的方法。
名字描述
CFGFast快速地获取程序控制流图(静态)
CFGEmulated通过动态模拟获取程序控制流图
VFG执行值集分析,生成值流图(Value Flow Graph)
DDG数据依赖图
DFG为每个在CFG中出现的基本块构建数据流图
BackwardSlice后向切片
Identifier库函数识别 angr 文档仅对 CFG、BackwardSlice、function Identifier 这三种技术进行介绍,如果想使用其他技术,可以查看API / 源码或者向开发者提 issue 。 CFG CFGFast 使用静态分析获得 CFG, 速度较快,但是不太准确。 CFGEmulated 使用符号执行获得 CFG, 耗时长,相对准确。
如果不知道该选择哪一种,就先尝试 CFGFast 。
此外,angr 的 CFG 接口是 CFGFast 的简称,如果需要使用 CFGEmulated,请直接使用 CFGEmulated。
使用示例
CFG(必须): A control flow graph (CFG) of the program. This CFG must be an accurate CFG (CFGEmulated).
Target (必须): Target, which is the final destination that your backward slice terminates at.
CDG (可选):A control dependence graph (CDG) derived from the CFG.
angr has a built-in analysis CDG for that purpose.
DDG (可选) A data dependence graph (DDG) built on top of the CFG.
angr has a built-in analysis DDG for that purpose.
以下是文档的使用示例
>>> import angr
# Load the project
>>> b = angr.Project("examples/fauxware/fauxware", load_options={"auto_load_libs": False})
# Generate a CFG first. In order to generate data dependence graph afterwards, you’ll have to:
# - keep all input states by specifying keep_state=True.
# - store memory, register and temporary values accesses by adding the angr.options.refs option set.
# Feel free to provide more parameters (for example, context_sensitivity_level) for CFG
# recovery based on your needs.
>>> cfg = b.analyses.CFGEmulated(keep_state=True,
... state_add_options=angr.sim_options.refs,
... context_sensitivity_level=2)
# 生成控制流依赖图
>>> cdg = b.analyses.CDG(cfg)
# 生成数据流依赖图
>>> ddg = b.analyses.DDG(cfg)
# See where we wanna go... let’s go to the exit() call, which is modeled as a
# SimProcedure.
>>> target_func = cfg.kb.functions.function(name="exit")
# We need the CFGNode instance
>>> target_node = cfg.get_any_node(target_func.addr)
# Let’s get a BackwardSlice out of them!
# `targets` is a list of objects, where each one is either a CodeLocation
# object, or a tuple of CFGNode instance and a statement ID. Setting statement
# ID to -1 means the very beginning of that CFGNode. A SimProcedure does not
# have any statement, so you should always specify -1 for it.
>>> bs = b.analyses.BackwardSlice(cfg, cdg=cdg, ddg=ddg, targets=[ (target_node, -1) ])
# Here is our awesome program slice!
>>> print(bs)
function identifier 用于识别库函数,目前仅针对 CGC 文件。
>>> import angr
# get all the matches
>>> p = angr.Project("../binaries/tests/i386/identifiable")
>>> idfer = p.analyses.Identifier()
# note that .run() yields results so make sure to iterate through them or call list() etc
>>> for addr, symbol in idfer.run():
... print(hex(addr), symbol)
0x8048e60 memcmp
0x8048ef0 memcpy
0x8048f60 memmove
0x8049030 memset
0x8049320 fdprintf
0x8049a70 sprintf
0x8049f40 strcasecmp
....