07 Nov

Knowing if and when you can fit a JMP5 binary hook.

First an interesting read on API hooking methods: http://help.madshi.net/ApiHookingMethods.htm

Traditionally and perhaps the most logical way to do a function hook is to overwrite the code entry point with a 5 byte 32bit relative offset JMP instruction.
IMHO sort of the “bread and butter” of binary hooking .
madCodeHook actually uses a 6 byte 32bit absolute offset JMP instruction.
(Incidentally, some people have been known to resort to using rather unusual instruction combinations of various lengths in attempts to hide from anti-hack detections).

Our number one problem is when the function to be hooked is less then 5 bytes in size.
One solution is to use a one byte exception hook instead. These are one byte opcodes like an int3 instruction.
This will work well (with the addition of a custom exception handler) although the exception overhead is a bit costly compared to the few cycles of a JMP5.
madCodeHook uses it’s “mixture mode” for some of these cases. Applies only to API hooks of course as there is no import/export table for regular code functions.

The second problem is relative branch (JMP or CALL) instructions.
If the entry point is a 32bit offset relative branch then it’s not a problem as this is a trivial case to create in our proxy code. But if the branch is between another instruction later, or it’s smaller (8 or 16bit offset) type then it’s nontrivial problem.
It would take some serious analysis and proxy code generation if at all possible.

I got a bit obsessed about this wondering how much of these and possibly other unknown issues come into play.
The answers might tell me how much time and consideration I must spend for these, and hopefully see stuff that would be easier to handle now rather then later.

I decided to do a quick and dirty study to help examine a large amount of DLL exports looking for problem cases and to get some overall statistics of their occurances. At the same time I could build the code analyzer part too.

How it works:

1) Loads in a DLL raw into memory.
2) Parses the PE header, rejecting the DLL if it’s compressed (hopefully) or if it has no exports to look at.
3) Walks through every exported function looking at it’s entry code.
Here the first X amount of lines are disassembled (in a visual text way) logged to file, then the code analyzer looks at the the first X amount of instruction bytes (normally five) needed for the hook. And detect issues and logs them for later review.

This whole process can be recursive to look at any number off DLLs in a batch.
To examine the log files one searches for all instances of the character ‘*’ to see everything, or just key strings to search for particular issues.

These are with corresponding stat labels and descriptions:

** CODE::HAS_INST_ERROR Decode errors: Instruction decode error.
*** CODE::HAS_REL_INST Relative branches: Problem relative branch.
** Relative 32bit entry branch Branch entries: Relative branch on entry.
** CODE::WILL_OVERFLOW Over follows: Overflow into alignment bytes.
*** CODE::NOT_ENOUGH_ROOM Out of room: Not enough room for hook.

The results:

First a log of “kernel32.dll” stats with some samples of the various issues.


-------------------------------------------------------------------------------
XP32 SP3: "kernel32.dll"

Samples:
[50] “CloseProfileUserMapping” 51 7C82C87D
[00] E8 0EFDFEFF CALL 0xFFFEFD13 <–
[05] 833D D450887C 00 CMP DWORD [0x7C8850D4], 0x0
[0C] 74 17 JZ 0x25
[0E] 56 PUSH ESI
[0F] BE D050887C MOV ESI, 0x7C8850D0
[14] 56 PUSH ESI
[15] FF15 7010807C CALL [0x7C801070]
[1B] 6A 00 PUSH 0x0
** Relative 32bit entry branch.

=======================================================================
[101] “CreateProcessInternalWSecure” 102 7C880311
[00] 8BC0 MOV EAX, EAX
[02] C3 RET
[03] 90 NOP <–
[04] 90 NOP <–
[05] 90 NOP
[06] 90 NOP
[07] 90 NOP
[08] 33C0 XOR EAX, EAX
** CODE::WILL_OVERFLOW

=======================================================================
[106] “CreateSocketHandle” 107 7C86C7D4
[00] 6A 78 PUSH 0x78
[02] E8 83CBF9FF CALL 0xFFF9CB8A <–
[07] 33C0 XOR EAX, EAX
[09] C3 RET
[0A] 90 NOP
[0B] 90 NOP
[0C] 90 NOP
[0D] 90 NOP
*** CODE::HAS_REL_INST

========== Stats =========
DLLs: 1
Total exports: 937
Redundant exports: 17 1.8%
Decode errors: 0 0.0%
Relative branches: 5 0.5%
Branch entries: 4 0.4%
Over follows: 5 0.5%
Out of room: 0 0.0%

Out of the 937 exports in “kernel32.dll” of nontrivial problem issues there are only 5 relative branches, and zero “out of room” cases.
And now some large batch runs:

-------------------------------------------------------------------------------
Windows XP 32bit SP3 all DLLs in "Windowssystem32"

========== Stats =========
DLLs: 1482
Total exports: 89640 — Notes..
Redundant exports: 8717 9.7%
Decode errors: 23 0.0% — Most if not all from irrelevant data exports
Relative branches: 3547 4.0% — Many are irrelevant data exports
Branch entries: 1507 1.7% — Mostly CALL, some JMP
Over follows: 696 0.8% — Most are NULL or other simple few instruction count returns
Out of room: 309 0.3% — Most are incorect hits from irrelevant data exports

At the most the nontrivial problem percentage is just:
(“Relative branches” 4.0 + “Out of room” 0.3) = 4.3%
That’s about 3855 of 89640 exports.

Same but using six byte requirement for JMP6 absolute address type hooks:
========== Stats =========
DLLs: 1482
Total exports: 89640
Redundant exports: 8717 9.7%
Decode errors: 23 0.0%
Relative branches: 9082 10.1%
Branch entries: 0 0.0% < — Not relevant, and any JMP6 branches not considered
Over follows: 898 1.0%
Out of room: 397 0.4%
Nontrivial problem percentage: ~10.5%

——————————————————————————-
Windows 7 32bit all DLLs in “Windowssystem32”

========== Stats =========
DLLs: 1812
Total exports: 60929
Redundant exports: 3670 6.0%
Decode errors: 51 0.1%
Relative branches: 2420 4.0%
Branch entries: 661 1.1%
Over follows: 753 1.2%
Out of room: 114 0.2%

Similar situation with XP32.
At the most the nontrivial problem percentage is just ~4.2%

JMP6 size for comparison:
========== Stats =========
DLLs: 1812
Total exports: 60929
Redundant exports: 3670 6.0%
Decode errors: 60 0.1%
Relative branches: 3963 6.5%
Branch entries: 0 0.0%
Over follows: 969 1.6%
Out of room: 136 0.2%
Nontrivial problem percentage: ~6.7%

——————————————————————————-
All DLLs in LOTRO MORPG game folder:
========== Stats =========
DLLs: 37
Total exports: 14886
Redundant exports: 1604 10.8%
Decode errors: 0 0.0%
Relative branches: 1027 6.9%
Branch entries: 1413 9.5%
Over follows: 14 0.1%
Out of room: 993 6.7%
Nontrivial problem percentage: ~13.6%

——————————————————————————-
All the DLLs from the “Full Tilt Poker” client:
========== Stats =========
DLLs: 14
Total exports: 24638
Redundant exports: 3853 15.6%
Decode errors: 0 0.0%
Relative branches: 933 3.8%
Branch entries: 425 1.7%
Over follows: 128 0.5%
Out of room: 42 0.2%
Nontrivial problem percentage: ~4%


Note there is a fair amount of error in the results because a certian percent of all exports are actually incorrectly data, not code exports. My tool rejects exports in known data sections but there are still a fair amount in “.text” sections that ultimately end up logged as direct decode errors and problem issues.

I also did some tests with one more byte size for absolute offset JMP6 types that madCodeHook uses. For WinXP32 SP3 one can see that there is over twice the chance of having a relative branch issue, and about 22% more overflow issues.
On Win7 there is less of a difference. Although since the overall percentage of the problem is slow low a JMP6 should fit about the same as a JMP5.

An interesting and revealing thing is that the “out of code space” issue is not that common.
It turns out that most of these small stub like return functions that we run into typically have plenty of alignment bytes (0xCC/int3, or 0x90/NOP) following! (see the sample “** CODE::WILL_OVERFLOW in the kernel32.dll dump above)
This is the majority of the cases out of thousands of DLLs and almost 100k functions examined.
Furthermore there is actually sort of a pattern to it (from tedious manual review of hundreds of cases). A lot of time the actual out of room ones are COM DLL exports like “DllUnregisterServer()” and “DllCanUnloadNow()”, which are unlikely to be targets hook anyhow.
Also note stat wise things are probably a bit skewed as many of the functions examined are probably not even desired for hook targets.

Conclusion:

The problems (“lack of space” and “non-entry relative branches”) showed up as the two main issues to contend with.
Even then their occurrence as stats show are pretty rare and thus are probably not worth spending to much time and effort on.

[ DLLExportTest hook test tool thing w/source code download here ]

Leave a Reply