Simulating Slices of iOS Apps
Reading time: 10 minutes
Table of contents
✱ Note
In 2019, I built a work-for-hobby iOS simulator on a strict regimen of weekends and coffee. While the full details of this project will stay in-house, there’s enough I can share to hopefully be interesting!
First up, here’s what this looks like running against a simple demo app. On the right is the bona-fide iOS simulator from Xcode, and on the left is the simulator I built.
Simulating Opcodes
The heart of this simulator is a classic interpreter runloop. Each tick, the next opcode is parsed from the binary and interpreted to update the virtual machine state. Here’s a simple opcode handler:
exec_context.py
@mnemonic_handler(["add"])
def add(exec_context: ExecContext, instr: CsInsn) -> None:
dest = AArch64Interpreter._op0_storage(exec_context, instr, Register)
offset_op = instr.operands[2]
if offset_op.type == ARM64_OP_IMM:
offset_imm = offset_op.value.imm
elif offset_op.type == ARM64_OP_REG:
offset_reg = AArch64Interpreter._op2_storage(
exec_context,
instr,
expected_type=Register,
)
offset_imm = offset_reg.read(ConstantValue).value()
else:
raise NotImplementedError(f'Unknown operand type: {offset_op.type}')
source = AArch64Interpreter._op1_storage(
exec_context,
instr,
expected_type=Register,
)
source_val = source.read(ConstantValue).value()
exec_context.set_imm(dest, source_val + offset_imm)
The ldr
and str
instruction families were by far the most gruesome to pin down, as they both come in a variety of different flavors and modes. The simulator needs to handle a slew of load and store variants: immediate, pre-indexed, pre-indexed writeback, and post-indexed writeback, just to name a handful. These implementations were the largest of the instruction handlers, and they led to subtle bugs when the implementations were a bit off.
test_aarch64_load_instructions.py
# Load/store addressing rules (found through experimentation)
# * You can only use writeback if there's an integer offset
# * Pre-index addressing mode must use an integer offset
# * You cannot use pre- and post- indexed addressing in the same instruction
# * You cannot write-back in pre-indexed addressing
# * You cannot write-back without an int offset
# * The destination register may not be the stack pointer
# * ldp may not have a register offset in either addressing mode
#
# The tests below cover these `ldr` variants:
# ldr x0, #0x10000
# ldr x0, [x1]
# ldr x0, [x1], #0x10
# ldr x0, [x1, #0x10]
# ldr x0, [x1, #0x10]!
# ldr x0, [x1, x2]
# ldr x0, [sp]
# ldr x0, [sp], #0x10
# ldr x0, [sp, #0x10]
# ldr x0, [sp, #0x10]!
# ldr x0, [sp, x1]
# ldr x0, [x1, x2, lsl #3]
To get real-world code to work, I needed to implement all sorts of wacky opcodes interacting with floating point and SIMD registers.
exec_context.py
@mnemonic_handler(["scvtf", "ucvtf"])
def scvtf(exec_context: ExecContext, instr: CsInsn) -> None:
dest = AArch64Interpreter._op0_storage(exec_context, instr, expected_type=SIMDRegister)
source = AArch64Interpreter._op1_storage(exec_context, instr, expected_type=Register)
source_val = source.read(ConstantValue).value()
converted_val = int.from_bytes(struct.pack("d", source_val), "little")
dest.write(ConstantValue(converted_val))
VM Architecture
The simulator was built on a fundamental approximation of the von Neumann architecture: every piece of data was a Variable
, and a Variable
is always held in a VariableStorage
. A CPU register is one kind of VariableStorage
, and a memory cell is another.
This design was a misstep that I’d rework if I came back to the project: modelling each memory word as an object’s entire storage cell precludes any sensible possibility of operating on a buffer of bytes and modifying data across word boundaries. The strategy I chose, however, does work pretty well for application code that just stores pointers to heap-allocated objects in memory words and sends messages to them.
Symbol Modeling
Eventually, the simulated code is going to branch
and, worse yet, branch
to something imported from another binary. Instead of building a full dynamic linker, I added special support to the bl
mnemonic handler to perform bespoke operations when certain branches were performed.
For example, if the simulator saw that a bl
was being performed to the _random
imported symbol, it could trap into its own in-house random()
implementation.
modelled_functions.py
@modelled_function(["_random"])
def _random(simulator: "Simulator", instr: ObjcUnconditionalBranchInstruction) -> None:
r = random.randint(0, (2 ** 31) - 1)
simulator.current_exec_context.register("x0").write(ConstantValue(r))
Much more interesting, though, is _objc_msgSend
.
modelled_functions.py
@modelled_function(["_objc_msgSend"])
def _objc_msgSend(simulator: "Simulator", instr: ObjcUnconditionalBranchInstruction) -> None:
selname_ptr = simulator.current_exec_context.register("x1").read(ConstantValue).value()
selector_name = simulator.binary.read_string_at_address(selname_ptr)
if not selector_name:
raise ValueError(f"Failed to find messaged selector {selname_ptr}")
receiver = simulator.current_exec_context.deref_reg("x0")
is_classmethod = isinstance(receiver, ObjcMetaclass)
# More below...
This implementation would produce fake objects on a virtual heap, instead of simulating the real Objective-C runtime. I ended up with my own itty bitty standard library.
objc_class.py
class NSNumber(NSObject):
CLASS_NAME = "_OBJC_CLASS_$_NSNumber"
def __init__(self, binary: MachoBinary, machine: ExecContext, selector_name: str) -> None:
super().__init__(
binary,
machine,
class_name=self.CLASS_NAME,
selector_name=selector_name,
)
self.number: Optional[int] = None
def __eq__(self, other: Any) -> bool:
if not issubclass(type(other), NSNumber):
return False
return self.number == cast(NSNumber, other).number
def __repr__(self) -> str:
return f"[@{self.number}]"
@classmethod
@implements_objc_class_methods([
"numberWithInt:",
"numberWithBool:",
"numberWithUnsignedInt:",
"numberWithUnsignedInteger:",
])
def number_with_int(
cls,
binary: MachoBinary,
selector_name: str,
machine: ExecContext,
) -> "NSNumber":
num = cls(binary, machine, selector_name)
val = machine.register("x2").read(ConstantValue).value()
# Validate that we've received a number literal and not something unexpected
if machine.is_val_pointer(val):
raise WrongVariableClassError(
f"+[NSNumber numberWithInt:{hex(val)}] "
f"({machine.deref_mem(VirtualMemoryPointer(val))})"
)
num.number = machine.register("x2").read(ConstantValue).value()
return num
I also wrote implementations of funny constructors✱ like +[NSDictionary dictionaryWithObjectsAndKeys:]
. It was interesting to see how these kinds of call sites work under the hood!
✱ Note
__objc_dictobj
and __objc_arraydata
.In this case, the compiler will place the first argument of the variadic list in x2
. The compiler will arrange the rest of the arguments on the stack, starting with x2
’s corresponding key. It’s the implementation’s responsibility to iterate the list on the stack, alternately popping off values and keys, until a NULL
is reached.
The Mach-O VM loader never got quite up to the standards of the real thing. Instead of faithfully following whatever was described in the Mach-O, I implemented specific support for mapping various bits of the binary as the need arose:
exec_context.py
# Map Objective-C selector strings
logging.debug("VM mapping __objc_selrefs...")
selref_sect = macho_binary.section_with_name("__objc_selrefs", "__DATA")
if selref_sect:
for raw_addr in range(selref_sect.address, selref_sect.end_address, sizeof(c_uint64)):
addr = VirtualMemoryPointer(raw_addr)
# Just copy the value directly from the real binary into VM memory
self.map_static_binary_data(addr, ConstantValue(macho_binary.read_word(addr)))
self.memory(addr).set_readonly()
Debugger
Simulators and emulators are notoriously difficult to debug, as often the errors only become visible in the higher-level logic of whatever you’re simulating.
I built a debugger that allowed me to run lldb
on a real iOS device on one side, and the simulator on another, and run each forwards until the register or memory states of the simulator diverged from the real thing.
Since this is a virtual environment, it was also straightforward for me to snapshot the machine state at every instruction, which facilitated reverse debugging (’time travelling backwards’) to any previous execution point. I called this ‘v
isit mode’, since the REPL allowed you to run all the normal inspection commands (r
ead, ex
amine, p
rint, etc.) as if execution was paused at a previous instruction pointer value.
Dynamic Linker
Eventually, I did branch out✱ to truly mapping and invoking other binaries! In this tiny demo, a binary loads a framework and successfully branches to one of its exported symbols:
✱ Note
I then moved on to CoreFoundation
, and wrote a pile of hacks to get it running. In this demo, I’m dynamically loading CoreFoundation
, and the runtime is creating real ObjC strings and arrays!
My lldb
comparison tool was essential here. I found that I needed to execute the routines specified by LC_ROUTINES_64
so that CoreFoundation
had an opportunity to create its Objective-C classes and populate them in __CFRuntimeObjCClassTable
. CoreFoundation queried this table when trying to use _CFAllocatorAllocate
.
It was incredible to watch the simulated CoreFoundation bootstrap the runtime by calling things like object_setClass()
on __NSCFArray
! I also found that I could force CoreFoundation
to do everything in-house, instead of shelling out to Foundation
, by patching the memory in CoreFoundation
’s ___FoundationPresent.present
flag to 0
✱.
✱ Note
CoreFoundation
wrong here, as it’s been a few years. Please feel welcome to send a correction if anything doesn’t sound right!Testing the Simulator
I wrote a unit testing harness that allowed me to thoroughly test the implementations of these mnemonics, particularly trickier ones like ldr
and str
.
test_aarch64_branch_instructions.py
# Given an unconditional branch-with-link to a label
source = """
; Move a sentinel value into x30 so we can verify it's restored after returning from a subroutine call
mov x30, 0xbeef
; Branch to a subroutine
bl SubroutineLabel
; Raise a breakpoint after the above call returns
brk #0x1
SubroutineLabel:
; Move some sentinel data so we can verify that this subroutine ran
mov x0, 0xcafe
; Return to caller
ret
; This should not run
brk #0x2
"""
# When I simulate the code and encounter a breakpoint
breakpoint_exc = simulate_until_breakpoint(source)
# Then an exception with code 1 has been raised
assert breakpoint_exc.exception_code == 1
# And the subroutine has run
assert breakpoint_exc.exec_context.register("x0").read(ConstantValue).value() == 0xcafe
# And the link register has been modified
assert breakpoint_exc.exec_context.register("x30").read(ConstantValue).value() != 0xbeef
I locked down all sorts of behavior, such as the behavior of comparison flags:
test_aarch64_compare_instructions.py
# Given I evaluate a negative comparison of a number and itself in another register
source = """
mov x2, #0x800
mov x3, #-0x800
cmn x2, x3
"""
# When I simulate the code
with simulate_assembly(source) as ctxs:
# The correct status flags are set
assert ctxs[0].condition_flags == {
ConditionFlag.EQUAL,
ConditionFlag.LESS_EQUAL,
ConditionFlag.GREATER_EQUAL,
}
Propagating Unknown Data
The goal of this project wasn’t to simulate an application from start to finish, but rather to simulate a specific tree of execution to make decisions about the code’s behavior.
This means that the simulator natively works with lots of unknown (‘unscoped’) data. As an example, any arguments to the simulator’s chosen entry point will definitely be unscoped by the simulation.
This unscoped data is represented by one of a few special types, such as FunctionArgumentObject
and NonLocalVariable
. These objects proliferate themselves when the simulated code tries to use them. For example, sending a message or accessing a field of a NonLocalVariable
will spawn a NonLocalDataLoad
as an output. It’s all a pile of hacks, but it works well enough.
Sometimes, the simulated code will, fairly, try to access an ivar from an object that was created through the simulator’s fake Objective-C runtime. This would yield an UninitializedVariable
instance if we didn’t do anything else, which is no fun and typically causes the simulated code to complain. So, the simulator will walk the ivar table and instantiate dummy NSObjects
to pop into these fields.
It’s difficult to say what to do when the simulated code performs a conditional branch that relies on unscoped data. So, I made the simulator split into two trees of execution: one where the condition was true, and one where it was false. Both paths would then be followed.
This caused lots of wasted work, because every path where if ((self = [super init])) {}
fails was simulated. I eventually improved things such that execution was only split in two when necessary. This allowed me to correctly follow conditional branches when all the implicated data was available.
Observing Results
A simulator isn’t much good without some way to observe what the simulated code is doing. The simulated code’s output is obvious in the GUI-centric demo up top, but it’s less clear how this works when we’re simulating code without any UI.
For my use case, I wanted to observe the system state at various different instruction pointer values. This allowed me to construct human-consumable stack traces with all the dynamic arguments to each function filled in.
The simulator API allowed the programmer to specify all the instruction pointer values that they were interested in observing. The simulator would then follow all the execution trees, splitting off into subtrees with different sets of constraints when the simulator wasn’t sure which direction of a conditional was correct. At the end, all the machine snapshots across all the possible execution trees were returned to the programmer, who could then inspect the register and memory state at each snapshot.
Fun Bugs
Since the simulator runs untrusted code, I wanted to make sure that the simulator could gracefully handle infinite loops in the simulated code. I added a basic loop detector to ensure that the simulator always terminated.
One day, though, the simulator got stuck despite my initial efforts.
Like I mentioned above, the simulator had special support for certain functions that I had specifically modeled. For functions undefined by the simulator, though, the simulator would just pop a NonLocalVariable
into x0
and carry on✱.
✱ Note
This works fine, unless the function being called is abort()
! In this case, a code path tried to abort()
, then the code that happened to immediately follow ended up doing a backwards jump. The result was the world’s most polite infinite loop, in which the client code asked the simulator to stop on every iteration, but fell on deaf ears.
The fix here was simple: I modeled abort()
to terminate the current execution path, and improved my loop detector.
This project offered many satisfying problems along the way. It’s always fun to paint yourself into a big system with its own quirks and constraints, then find ways out of them. Thanks for following along!