Friday, February 19, 2010

Semester goals

This semester will be focused around adding compatibility support for the FLAIR toolkit from IDA. FLIRT (Fast Library Recognition Technology) is the core of FLAIR. In their paper at http://www.hex-rays.com/idapro/flirt.htm, they describe their technique. The basic idea is to create signatures from library files so that they can be recognized in static executables. This is vital for embedded systems as they are always static images (except for some systems running Linux and such, but I consider those more like a traditional workstation for now even though lightweight). The basic idea is this closed (unknown publically) format supposedly maintains a binary tree sturcture of the functions. As code is shared at the beginnings (such as the classic Intel setup frame pointer stuff...push %ebp; mov %esp, %ebp), the tree format is walked. The original function data is not shown in this format, which may be useful for copyright reasons.
There are several simple rules to generating the signatures. First, it is considered much better to not generate a signature than to generate a false positive. Thus, only functions of a certain byte length have signatures generated for them by default as they are likely to generate false signatures.
To generate a signature, start by grabbing a compiled library file. Running uvelf2pat (or similar depending on input format) results in the generation of an intermediate human readable pattern file. It can then be reviewed to verify the signatures they want to generate look accurate and see if there are any functions that were rejected from signature creation due to, say, being too small. Once this looks correct, a tool will try to convert it to a .sig file and see if the sigs conflict with other signatures in the program's database. If successful, a .sig file will be genreated that can now be used to recognize statically compiled functions.
I have several quality goals. The first is to implement several unit tests. A fundamental generic unit test will be to take generated signatures and make sure they recognize functions at the correct addresses in the original library. Next, a list of functions in that library in a static binary will be used as a reference and should detect all. Another useful unit test would be to see if I can get generated signatures to load into IDA or other tools that can use them such as certain OllyDbg plugins.
A nice feature that FLIRT doesn't do right now is I might see if I can automaticlly extract source code from the compiled files (if availible) and include them in an annotated analysis file. Alternativly, I may eventually create a custom analysis format to support my own features and to avoid the uncertainty of the .sig format.
For future work building on this, I'd like to implement several things. First, once x86 support is added, I'd like to index a large collection of static binaries, say from old video game images, to see if I can cluster the development libraries used. This could be useful once the tool is more developed to automatically scan software for licensing violations by providing signatures for GPL or similar code. Another feature that shouldn't be too difficult after the inclusion of signature support is to generate map files so that the function recognition can be imported into programs such as OllyDbg so this data can be useful even as the full decompiler is not yet ready.