Sunday, July 25, 2010

New utility program: uvstructoffset

Under util/uvstructoffset/
Here and there I find the need to verify the APIs I create have the correct structure alignment. I would do this manually with something like:
printf("some_member: 0x%.4X\n", offsetof(struct my_struct_t), some_member);
But this can be automated if there was a program to parse the C code. pycparser seemed like the most convenient C parser to use. After some coding, I could transform structures:

struct sig_header_t
uint8_t dell1800FP;
short atomicForceMicroscope;
uint16_t fearAndLoathingInLasVegas;
uint8_t oatmeal;
char paperTowels;
uint16_t jointedGlasswares;
uint8_t micrometer;
int orangina;
char accidentWaitingToHappen[16];
uint8_t SATADrive;
uint16_t fastSteeringMirror;
uint32_t hamburgers;
} __attribute__((__packed__));
Into this:
struct sig_header_t
dell1800FP @ 0x0000
atomicForceMicroscope @ 0x0001
fearAndLoathingInLasVegas @ 0x0003
oatmeal @ 0x0005
paperTowels @ 0x0006
jointedGlasswares @ 0x0007
micrometer @ 0x0009
orangina @ 0x000A
accidentWaitingToHappen @ 0x000E
SATADrive @ 0x001E
fastSteeringMirror @ 0x001F
hamburgers @ 0x0021

Formatting doesn't line up on the web, but you get the idea. The C parser only parses the structure and doesn't try to guess the actual offsets. This is done with gcc since this is really the only reliable way to do it based on all of the different data types and such are considered. For decompiler use, config files will have to be referenced for data sizes and alignment.
After working with JSON, I've decided it is the sort of data exchange format I've been looking for for a while. I've never been a fan of XML because I find it overcompicated to parse and work with for general use. If I need any sort of convient data exchange between programs where performance in't an issue but convenience is, I'll probably use it as the format. As required, these can be migrated to higher performance formats. This will include configuration files and structures definitions.
Eventually I will be needing to parse out structures in the decompiled/disassembled files. Since these files don't need to be parsed often and for other data exchange reasons, I will be using the aforementioned Python parser to output a JSON structure definition.

Saturday, July 10, 2010

ELF object dump repaired

As commented in the previous post, the old ELF code was very ugly and a thorn in my side. It is now fixed and running better than other and much easier to maintain. Example object file:
[mcmaster@gespenst bin]$ objdump --syms --reloc analysis/sub_0EC3.elf

analysis/sub_0EC3.elf: file format elf32-i386

00000000 l df *ABS* 00000000 candela_pltl1_rev_3.bin
00000000 l d .text 00000000 .text
00000000 *UND* 00000000 sub_0FCB
00000000 *UND* 00000000 sub_0FE5
00000000 *UND* 00000000 sub_101C
00000000 *UND* 00000000 sub_75E3
00000000 g F .text 00000107 sub_0EC3

0000009e R_386_16 sub_0FCB
000000c0 R_386_16 sub_0FCB
00000103 R_386_16 sub_0FE5
000000a1 R_386_16 sub_101C
000000c3 R_386_16 sub_75E3
I'm still using 386 object file format and still unsure how I want to deal with that in the future. For now it seems like the logical thing so that tools like objdump will play with it. I could in theory generate a patch for binutils, but that would be annoying for decompiler users and a pain to maintain. Maybe I can make a plugin patch for binutils and try to get it into the mainline? It ha some very limited (read: not useful enough) plugin capability. I didn't look into it too much, but from a quick grep only nm and ar support binutils plugins. At first it didn't look like there is any notion of full architecture plugins, but now I'm thinking its just that only that for some odd reason only certain tools allow use of uninstalled plugins. Maybe I'll send an e-mail to their mailing list for advice. clang/llvm is a more modern project and might work better with this stuff. Unfortunately, I haven't yet spent any time to learn it. From a quick look at it, they expose a lot of API stuff that might be good if I wanted to allow my arch files compile executables, but they seem to still depend on binutils for day to day object inspection. clang can do some linking, but I don't know where this functionality comes from. For the short term, I might consider making a very basic binutils skeleton to work with these files, probably in Python. I'm afraid this will feature creep and never get replaced though and is ultimately not the right way to do things.

After some more thinking and research, I've decided by and by the far easiest and cleanest way to do this is to simply provide wrapper programs over binutils. The basic idea will be to parse out binutils files and redirect them to files temporarily fixed up to be compatible with binutils. Basically, this will involve temporarily changing the object type to EM_386 since that's what all constants will be based off of. While this will obviously run into a number of issues, it unfortunate seems by far the cleanest solution.

Sunday, July 4, 2010

Summer progress: installer, licensing, fixing bugs

This summer I'm doing malware analysis. The first time I've held a full time reverse engineering job, I'm getting time to try out a lot more tools and develop skills. These will undoubtably be valuable to this project.
The following is written with a huge patch queue sitting on my laptop (probably over 4k lines of git diff, maybe more once I'm done). I was hoping to stabilize the ELF code before commiting, hopefully my laptop won't blow up in between.
I played around briefly with Installjammer (website, github) and at first glance it seems pretty nice. It does seem more targetted towards Windows, but I was able to make an InstallShield type installer within minutes for Linux and was impresed. The biggest thing lacking I saw at first glance was no support for .rpm or .deb files, so I might consider asking what it would take to get those supported if I am still interested. Screenshot of the quick test installer running on Fedora 13:
I will probably be dropping support for SpiderApe in the near future. My previous system had unstable Python code which towards the end of its life I was working on fixing, but this was unbeknonst to me when I first tried using the Python APIs and was quite dissapointed with them. I'm still dissapointed in their lack of good error handling (Python_Init() or w/e its called returns void), but when it works it seems to work decent. Fedora 13 doesn't seem to ship a static lib, so I might try to do a prefixed installation to grab that.
On that note, I fixed the build process to be a lot cleaner with regards to using PREFIX variables on dependencies. The original reason why I started playing with using package uninstalled was mostly related to binutils not exporting everything I'd like it to that was needed for the binutils .sig generation when using rpat as a reference.
For a number of legal reasons, I'm going to dual license the project under BSD/GPL. I still like the freedom that BSD provides, but for a couple of reasons, including the otherwise inability to distribute binaries, I'm going to tack on a GPL licensing option.
The ELF/object generation system is being rewritten. It used a fixup based linking approach that just didn't work well. The code is much larger than it could have been using a sevearl pass approach. This is likely responsible for the errors that have been occuring and the rewrite seems nearly done and is resulting in much cleaner code.
Tonight I will hopefully finish the ELF rewrite or be very close. But, tonight is the night of fire, and I like fire. So I might be busy. Happy 4th!