Wednesday, March 31, 2010

FLIRTing with binaries

Worked out the bugs from porting the rpat utility into my engine as a BFD based .pat generator. Basically, during the "can engine parse format" check I created, there was a faulty if block along the lines of:
abfd = bfd_openr("in.obj");
if( abfd != NULL )
error
Which obviously isn't correct as it should be error if we *didn't* get a bfd pointer. Second error was related to mixing up a constant I defined for the leading signature length, which is fixed at 0x20, versus the total signature length, which is somewhat fixed at 0x8000. I don't know the exact reason for this limitation at this time, but I've seen it mentioned several different sources.
Submitted an abstract to the URP symposium. We'll see how that goes. I'm a bit annoyed that they wanted .doc instead of say .pdf or raw text, but at least it wasn't docx in which case I probably would have skipped the whole thing out of convenience and principle. I have a Signals and Systems exam that day, but the symposium isn't until the afternoon, so should be fine.
I've also made some progress on better EPROM rippers. Most images I have I've got by breadboarding an 8051 based MCU circuit using the LITEC 8051 dev kit as a base. I have layout for a board, but need time to etch it. Alternatively, I could (and should) just solder up a perf board based ZIF socket reader using the expansion module on the LITEC compatible boards. Also, I got an EPROM programmer/reader in the mail today. Its a Prompro 7:

It seems a bit unstable at first work. Maybe needs some old filter caps replaced and such as the unit is fairly old. I'll try to do it a brief service this weekend if the two midterms on Monday don't keep me busy. And, as further reason to make the 8051 expansion module, I obviously can't rip its EPROM using the Prompro 7. Finally, I have a lot of EPROMs it came with if anyone needs any.

Sunday, March 21, 2010

Break + some after results

First, the Valgrinding went all right. No invalid memory usage errors were detected, only leaks. Fixed most of the leaks, but there are a few (possibly just one that is cascading) that was difficult to trace even with Valgrind magic. So, I'll have to look into that later to see if I can solve those. The Valgrind results indicate that some objects are not freed, even though their parent constructor seems to free them. Best guess is they are somehow being overwritten and the original object is never freed.
I evaluated several unit testing frameworks to in an attempt to keep the code healthy. We do unit testing at work, but use a large scale framework we wrote ourselves since we needed strong interaction with our army of VMs. After glossing over several lists of unit test frameworks, I first ruled out any commercial frameworks for monetary and principal reasons. The two remaining that looked promising were CppUnit (JUnit C++ port) and Google test framework. Knowing JUnit is pretty standard, I went for that. Some simple engine init/deinit tests have been written and hopefully some more tests will be written in the future now that a testing framework is in place.
.pat format is well underway. Much of the FLIRT framework was written and the .pat core was written and compiles. However, a relocation source needs to be written to feed into the pattern generation. Obviously one such source will be from the configuration file driven uvudec framework. However, I've also been asked before about my support / consideration for libbfd (GNU binutils core), so I did some research on how to use libbfd. Spent some time looking through binutils sources and reading a guide doc I found that highlighed important features (thanks Cygnus!). Unfortunately, I haven't seen any small examples, and only one or two nearly pseudocode examples are provdided on the Cygnus manual on the basis that they don't call things like bfd_init(). I can't get libbfd to recognize my file formats. Hmm. So I migth strip down objdump and see if I can figure out what it does differently.
On the same note, I tried to play around with the rpat utility (http://www.woodmann.net/fravia/rpat-en.html) as it does what I'm trying to do, just with libbfd only as its function/relocation parsing engine. Its written very hackish and I rewrote a lot of it to get it to compile and play nice. However, after this it had the same issues as my hello world, probably because I'm making the same mistake twice.
So, more updates after I fix the libbfd issue. Also, why doesn't libbfd expose the demangling function? Its put into bucomm.c instead and linked against every binutils program instead of being in libbfd.a. Annoying. Might e-mail them about that.

Monday, March 8, 2010

Critical bugs fixed, ready to grind

Two bugs were introduced through the argument merge. The largest was related to this function:
uv_err_t UVD::changeConfig(UVDConfig *config)
{
if( m_config )
{
delete m_config;
}
m_config = config;
return UV_ERR_OK;
}
Which would be called after argument parsing to set the config after we had parsed it in main. However, config parsing had been moved into libuvudec, so this effectively passed in such that config == m_config, result in itself getting deleted, and then set. This typically crashed in the cleanup code as it tried to access various element in m_config which were now presumably invalid memory addresses.
The second had to do with inadvertently freeing UVD's m_data, the data we are decompiling in the decompiler engine. I added this free because several items were missing in UVD's deinit(), but this was data handed by the main program and was not to be deleted by the engine. This resulted in a double free.

Here is some initial stats from Valgrind.
Doing only a basic engine initialization, deinitialization:
==14553== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 25 from 1)
==14553== malloc/free: in use at exit: 260,666 bytes in 6,750 blocks.
==14553== malloc/free: 22,177 allocs, 15,427 frees, 613,701 bytes allocated.
==14553== For counts of detected errors, rerun with: -v
==14553== searching for pointers to 6,750 not-freed blocks.
==14553== checked 223,760 bytes.
==14553==
==14553== LEAK SUMMARY:
==14553== definitely lost: 111,708 bytes in 5,815 blocks.
==14553== possibly lost: 2,095 bytes in 17 blocks.
==14553== still reachable: 146,863 bytes in 918 blocks.
==14553== suppressed: 0 bytes in 0 blocks.
==14553== Use --leak-check=full to see details of leaked memory.

Doing decompile/disassemble:
==14805== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 25 from 1)
==14805== malloc/free: in use at exit: 6,635,642 bytes in 303,200 blocks.
==14805== malloc/free: 1,305,883 allocs, 1,002,683 frees, 113,107,916 bytes allocated.
==14805== For counts of detected errors, rerun with: -v
==14805== searching for pointers to 303,200 not-freed blocks.
==14805== checked 3,190,136 bytes.
==14805==
==14805== LEAK SUMMARY:
==14805== definitely lost: 3,416,500 bytes in 244,937 blocks.
==14805== possibly lost: 3,564 bytes in 53 blocks.
==14805== still reachable: 3,215,578 bytes in 58,210 blocks.
==14805== suppressed: 0 bytes in 0 blocks.
==14805== Use --leak-check=full to see details of leaked memory.

So, needs some work, but in a way at least I'm not losing 100's of MB. However, going to work to see if can solve these.

Unit testing and such

Mixed progress so far. Crash from help screen was trivial to track, but during running if a full run is allowed or during cleanup if I abort early, the program crashes. Not sure if they are the same error, but working through Valgrind to fix it. Speaking of that, apparently Valgrind is allergic to static exes because it can't hook (or maybe just ID?) functions, so don't do it. The dynamic build was apparently broken, but now its fixed (/usr/local/bin/ld: ../bin/uvudec: hidden symbol `__dso_handle' in /usr/lib/gcc/i386-redhat-linux/4.1.2/crtbegin.o is referenced by DSO: change from ld to gcc -dynamic)
One thing I've been meaning to do for a bit is implement unit testing. I'm considering CppUnit (http://sourceforge.net/apps/mediawiki/cppunit/index.php?title=Main_Page) since its based off of JUnit which is pretty standard and the Google framework (http://code.google.com/p/googletest/downloads/list) also looked pretty appealing.

Sunday, March 7, 2010

Bust a grind with Valgrind

Made a bunch of changes. In particular, its moved to a proprety based configuration at the core. Other small features include a version embedded in libuvudec for verifying a good version was linked against and removal of a large number of global variables leftover from its C days. The global situation still has mostly a band aid on it since most of them just used a global configuration object now instead, but its probably still better than it was before.
Heres an old --help screen:
[mcmaster@gespenst uvudec]$ ./uvudec --help
uvudec version 0.2.0.0
Copyright 2009 John McMaster
Portions copyright GNU (MD5 implementation)
JohnDMcMaster@gmail.com

Usage: ./uvudec <args>
Args:
--verbose: verbose output. Equivilent to --verbose=3
--verbose=<level>: set verbose level. 0 (none) - 3 (most)
--verbose-init: for selectivly debugging configuration file reading
--verbose-analysis: for selectivly debugging code analysis
--verbose-processing: for selectivly debugging code post-analysis
--verbose-printing: for selectivly debugging print routine
--config-language=<language>: default config interpreter language (plugins may require specific)
python: use Python
javascript: use javascript
--addr-min=<min>: minimum analysis address
--addr-max=<max>: maximum analysis address
--addr-exclude-min=<min>: minimum exclusion address
--addr-exclude-max=<max>: maximum exclusion address
--addr-comment: put comments on addresses
--addr-label: label addresses for jumping
--analysis-only[=<bool>]: only do analysis, don't print data
--analysis-address=<address>: only output analysis data for specified address
--opcode-usage: opcode usage count table
--analysis-dir=<dir>: create skeleton data suitible for stored analysis
--input=<file>: source for data
--output=<file>: output program (default: stdout)
--debug=<file>: debug output (default: stdout)
--print-jumped-addresses=<bool>: whether to print information about jumped to addresses (*1)
--print-called-addresses=<bool>: whether to print information about called to addresses (*1)
--useless-ascii-art: append nifty ascii art headers to output files
--help: print this message and exit
--version: print version and exit

Special files: -: stdin
<bool>:
true includes case insensitive "true", non-zero numbers (ie 1)
false includes case insensitve "false", 0

*1: WARNING: currently slow, may be fixed in future releases


And heres the new:
[mcmaster@gespenst bin]$ ./uvudec --help
***main
uvudec version 0.3.0
libuvudec version 0.3.0
Copyright 2009 John McMaster <johndmcmaster@gmail.com>
Portions copyright GNU (MD5 implementation)

Usage: ./uvudec <args>
Args:
--help (action.help): print this message and exit
--version (action.version): print version and exit
--verbose (debug.level): debug verbosity level
--verbose-init (debug.init): selectivly debug initialization
--verbose-analysis (debug.processing): selectivly debugging code analysis
--verbose-processing (debug.analysis): selectivly debugging code post-analysis
--verbose-printing (debug.printing): selectivly debugging print routine
--debug-file (debug.file): debug output (default: stdout)
--config-language (config.language): default config interpreter language (plugins may require specific)
python: use Python
javascript: use javascript
--addr-include-min (target.address_include.min): minimum analysis address
--addr-include-max (target.address_include.max): maximum analysis address
--addr-exclude-min (target.address_exclude.min): minimum exclusion address
--addr-exclude-max (target.address_exclude.max): maximum exclusion address
--analysis-address (target.address): only output analysis data for specified address
--analysis-only (analysis.only): only do analysis, don't print data
--analysis-dir (analysis.dir): create data suitible for stored analysis
--flow-analysis (analysis.flow_technique): how to trace jump, calls
linear: start at beginning, read all instructions linearly, then find jump/calls (default)
trace: start at all vectors, analyze all segments called/branched recursivly
--opcode-usage (output.opcode_usage): opcode usage count table
--print-jumped-addresses (output.jumped_addresses): whether to print information about jumped to addresses (*1)
--print-called-addresses (output.called_addresses): whether to print information about called to addresses (*1)
--useless-ascii-art (output.useless_ascii_art): append nifty ascii art headers to output files
--addr-comment (output.address_comment): put comments on addresses
--addr-label (output.address_label): label addresses for jumping
--input (target.file): source file for data
--output (output.file): output program (default: stdout)


SEVERE ERROR
Received signal: SIGSEGV

Which seems to have gone okay, including the parsing. Except one little thing you may notice at the bottom. Oops. I found the place roughly where the error occurred, but having issues. In fact, the program will run until about when the disassembling to intermediate representation is done and then will crash on the first instruction.
So, enter Valgrind. I've played with it for some trivial programs before and its probably time I learn how to do some more serious automated bug analysis anyway as it might give me some good ideas for this project. However, getting a large number of false positives and the suppression file doesn't seem to be helping. Grr. Oh and blogspot didn't like my angle brackets so I had to escape them.

Saturday, March 6, 2010

Break goals

Its semester break! And what better to do than to hammer at this project?
In preparation for the FLIRT tools, one of the things I've been meaning to do for a while is begin to rework the configuration into something more proper. My plans are to move the opcode configuration into using libconfig (http://www.hyperrealm.com/libconfig/). However, of higher priority is reworking the argument parsing. I'm moving to a property based approach like what Java, Mozilla products, and others do. Ex: assembly.style=AT&T. There will still be long and short forms for some arguments that will map to property values. This allows a more organized hierarchy and better transistion to config files. Finally, this new structure allows better scaling of the argument parsing to multiple executables.
The biggest work done this far involved seperating code into the decompiler library and the main executable as well as creating skeleton projects for each of the tools I want to develop this semester. Let the coding madness begin! I've already coded up the argument parsing core.
The argument parsing style is somewhat modeled after a product I reverse engineered which I probably shouldn't disclose for legal reasons. They used a structure like this:

struct/class argument {
error handler function pointer
prev pointer
next pointer
argument name
number expected arguments
argument description
}

And would iterate over the linked list for each argument found. I'm using a similar structure, but have a richer need for argument flexibility (ie they only supported /argkey [argval]), so doing an individual argument preparse followed by some decoder logic to make things scale a little better.