Universal Decompiler

Saturday, January 8, 2011

GUI mk2 starting to form

From my "hello word" style GUI from before, a potentially usable first revision is starting to form. Its focus is mostly going to be on learning to make custom widgets and concurrency/multi-threading issues. It currently looks something like this:

I've gotten some feedback from various people on what they'd want in a disassembler GUI as well as general UI concepts and am pondering them over. It seems like I'm going to have a number of dynamicly generated scrolling text widgets, so I'm going to continue to try to work on how to form that nicely as part of my progress. Disassembly, hex, and strings are currently of this type. I'm not sure if strings really needs to be dynamic, but I figure strings could potentially get quite large for larger programs.

Saturday, December 18, 2010

Finally some GUI progress

As mentioned in the last post, I think I'm finally getting the hang of Qt. Although my code has many flaws, I have something resembling a scrolling hex view:

Now that I'm starting to get the hang of things, hopefully I'll start to figure out the various issues and get a basic usable GUI rolled out.

EDIT:
Made a more generic dynamically generated text scrolling widget:

QtDesigner plugin for both a generic implemetnation and a hexdump more defined version. Handles window resizing events. Horizontal scrolling and other issues are not handled well yet though. It seems this may be hard to implement as width well be dynamically generated. Is it annoying to the user for the horizontal scroll to be changing width? I'll need to solve this problem in more detail as I try rendering the disassembly output, which is variable width. Maybe I'll have it fixed scroll to some largish value and allow it to occasionally scroll further if a really long line is present?
One of the main issues still present is that there will be no way for a user to select text and other interaction issues. I should be able to implement detecting which line was clicked on with little effort though, which should allow for basic navigation. I personally like to copy and paste from IDA/Olly, so I'd sorely miss this feature. I can do a full export or screen copy with little effort though, so I can do that for now.
Now that I'm more comfortable with Qt, I looked more into what it would take to implement this by replacing the document instance in QTextEdit/QPlainTextEdit. Unfortunately, the key functions needed are not virtual, so without some binjitsu magic (which I'd really prefer not to do...), it is not possible to replace these with dynamic text generation.
The widgets are also starting to be implemented in the main GUI and it is going to go through a major revision. Maybe not usable yet, but getting closer.

Thursday, December 16, 2010

Conquering my Qt fears and semester results

The three main goals I was hoping to accomplish this semester:
-Architecture improvements, especially regarding a plugin system
-License scanner
-Get a basic fully functional GUI

Most of the semester focused on objective 1 and I think I did fairly well in that regard. I feel there is at least a solid foundation for a plugin system and additional interfaces will be added as needed. Regarding objective 2, I didn't get the full application I was hoping to due to various issues, but I did improve the FLIRT support and wrote a research paper on the limitations of FLIRT demonstrated using my toolkit and verifying against IDA. I'll hopefully be releasing it soon, ask me for a draft if you'd like to see it. Right now its titled "Issues with FLIRT Aware Malware." Back to the objective, I honestly just didn't put the effort into accomplishing objective 2. After seeing a lot of the issues with FLIRT, I also wasn't sure if it really was a good function recognition algorithm to spend time creating signatures for. Presumably though, I can automatically create signatures once I gather up the libraries, so it might not be such a big deal.

I did get a basic GUI going, but not to the level I was hoping to. Several things got in the way of this. First, I knew it was a risk that I didn't know Qt very well. I wouldn't say that I know it well yet, but I'm beginning to become competent. Second, someone offered some help, but didn't follow through. This made me focus on other things hoping they were going to help me get a code example of the widget I needed. I did get some help here and there and one of the main things that became clear was that I needed to subclass QAbstractScrollArea for proper support.

So what was the widget I needed? Basically, rendering all of the disassembly area ahead of time had numerous issues. It took too much time to render ahead of time, a pain to keep track of position, and more. The solution: a custom widget that intelligently rendered on demand as the window is scrolled. The problem: the limited Qt work I did was through Qt designer using stock widgets.

Maybe this wouldn't have been so bad if I at least had done work with other GUI frameworks and knew what phrases like model view controller (MVC) meant. As an example, I was referenced to the Okteta KDE project which has a widget very similar to what I needed. However, while the code seems to be designed well, there were several issues. First, it was designed to work as a library, but I wasn't really sure how to build it as it was in some KDE/CMake build system hybrid or something. I don't know CMake and get confused easily when "cmake ." or w/e doesn't work due to some error message. This isn't a huge deal because, all things considered, there were that many files and I could just use my own build script. It is somewhat annoying though that I have this library installed on my computer, but there seems to be no -dev package for it. Second, it used a very flexible enterprise style model view controller design. Normally this would be a solution and not a problem, but I don't really know MVC, so it didn't work out well. This may have been solved as I realized there was a Qt designer plugin for their widgets. Unfortunately, the dependencies seemed to explode for it and I didn't get a chance to try to finish it. Before I was importing source files as needed, I may instead just import the entire project (a couple hundred source files I think).

One of the things as I was doing more and more of this was that there were actually two distinct problems of implementing my widget. The first was that I needed to understand QAbstractScrollArea itself. That is, how the viewport object worked and such. Second, I needed to learn how to make a widget to display hex to set to the viewport. Part of my confusion on this was the what I'd consider a poor example of QAbstractScrollArea: a widget that scrolls another widget. This functionality was too linear and didn't really help one not really familiar with Qt what was really going on.

To fix some of these problems, I've been trying to read up on more Qt stuff such as MVC architecture. As I was reading through one of the examples, codeeditor, I quickly realized this strangely may show what I need to solve my problems despite looking quite different than what I was looking for. They key has to do with the line numbering on the side. I think I ran into this example before, but maybe didn't realize the significance. This widget demonstrates two things: how to render a text based widget and how to use the viewport. This is essentially exactly what I need to get my application rolling. I may consider some point in the future to use something like the Okteta library, but for now I think I finally have the starting point I need to develop my widgets and at least get something working. I'd like to write a small tutorial on harnessing QAbstractScrollArea for beginners as I really think there could have been a better example for it.

Sunday, November 14, 2010

Python API alpha

I've fixed some of the issues I was having with Python. A simple example:

[mcmaster@gespenst bin]$ ipython

In [1]: import uvudec

In [2]: uvd = uvudec.uvd.getUVDFromFileName('candela.bin')

In [3]: dissassembly = uvd.disassemble()
In [4]: print dissassembly[0:200]
LJMP #0x0026
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
LJMP #0x0DA9
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV

Basically, this is the sort of construct I needed:

%typemap(in, numinputs=0) UVD ** (UVD *temp)
{
$1 = &temp;
}

%typemap(argout) (UVD **)
{
PyObject *to_add = SWIG_NewPointerObj(*$1, $descriptor(UVD *), SWIG_POINTER_OWN);
$result = SWIG_AppendOutput($result, to_add);
}

I initially had some issues with appending objects to the none type generated in the default exception handler (I actually need to look more into why this was required in the first place), but they seem to have gone away now. The issue was that if you appended an object to a none type, it would return a list with the object being the only member of that list...w/e.
Things seem to work now at least at a basic level, but there's a bunch of things in both C++ and Python/SWIG that will need to be cleaned up for this to be convenient to use. I guess the next big thing will be to figure out how to make my iterators translate cleanly. In particular, it doesn't look like they are being compared correctly. Maybe need to add some sort of generator translation functionality as well?
Example of current iterator code:

itr = uvd.begin()
while itr is not uvd.end() and itr.getPosition() < 0x10:
print '0x%04X: %s' % (itr.getPosition(), itr.getCurrent())
itr.next()

0x0000: LJMP #0x0026
0x0003: MOV R7, A
0x0004: MOV R7, A
0x0005: MOV R7, A
0x0006: MOV R7, A
0x0007: MOV R7, A
0x0008: MOV R7, A
0x0009: MOV R7, A
0x000A: MOV R7, A
0x000B: LJMP #0x0DA9
0x000E: MOV R7, A
0x000F: MOV R7, A

Wednesday, November 3, 2010

Python API

One of the things I've been playing around with recently is using SWIG to generate a Python API. The following issues have/had to be solved
-Translate my error code return types to exceptions (DONE)
-Call UVDInit() on module load, UVDDeinit() on module unload (DONE)
-Fix some argument parsing related issues (DONE...sorta, my argument parsing code needs some redesign)
-Translate Object ** in stuff to returned instances (in progress)
The first item was done with this code (still some technicalities, but the general idea anyway):

%include "typemaps.i"
%typemap(out) uv_err_t
{
if( UV_FAILED($1) )
{
SWIG_exception(SWIG_RuntimeError, uv_err_str($1));
}
else
{
Py_RETURN_NONE;
}
}

The next issue was solved with this:

%pythoncode %{
# Seems to work
class InitDeinit:
def __init__(self):
# print 'Calling UVDInit()'
init()
get_config().parseArgs()

def __del__(self):
# print 'Calling UVDDeinit()'
deinit()
# Dummy instance to get created and destroyed
# We could get init to be executed globally...but I don't know about deinit
obj = InitDeinit()
%}

The last issue is only partially solved:

%typemap(in, numinputs=0) UVD **out (UVD *temp)
{
$1 = &temp;
}

Which removed UVD **out from the function input arguments and generated a temporary variable, UVD *temp, to pass into the C++ function. And, once needed, the types needed to be translate can be found with:

find -mindepth 3 -name '*.h' -exec fgrep '**' {} ';' |sed 's/^.*[(]//g' |sed 's/[)].*$//g' |awk -F ',' '{ for(i=1;i<=NF;++i) print $i }' |fgrep '**' |fgrep -v '***' |tr -d '[:blank:]' |grep -v '^$' |fgrep UVD |awk -F '**' '{ print $1 }' |sort -u

The problem is I need something to return the value in the temp variable. Its something related to "%typemap(argout) UVD **out", but I can't figure out the exact syntax for the correct result. On a last note, IDAPython manually translates all of their functions. It results in about 8,000 lines of C/C++ code. Although this will have some automatically generated code that will be much longer, it seems I can get this done in under 500 lines. The current code is about 300 lines including the SWIG .i file, a Makefile, and a utility .h and .cpp file. With the functions I currently included for wrapping, its generating a 19125 C++ interface file. On that note, the code is also much more verbose that if written by hand, but I guess all things considered if it works, I don't care if the automatically generated source file is a bit long. In any case, the effort to support interfaces will be (in theory) as simple as %include "uvd/core/uvd.h" as I've done for the first few test files. Granted, there will likely have to be some special cases, but overall SWIG seems to be pretty powerful at automating this. Some of this will be to simply name the input arguments appropriately as SWIG can match rules based on argument names.

Also, I started talking to Silvio Cesare about library recognition since he seems to be doing some related research. I mostly focused on implementing the existing FLIRT algorithm, while he's working on trying to improve on some of its failures. For example, someone posted on his blog this about malware using FLIRT's simplistic library recognition algorithm to hide themselves. Basically, all a virus has to do is match the prefix and write some relocation free code padded with some bytes to create a CRC16 collision, which is relatively easy. These are good reminders about the issues with FLIRT, but its still a good starting place. A lot of my current interest is with API reverse engineering and thus does not typically see such attacks.

On another note, I was reading some details about Stuxnet, which is quite an impressive virus. My friend was shocked to know I had missed this given my interests and being employed as a malware analyst when it came out. Basically, I don't read/watch the news or anything. Anyway, if someone had told me about it, I would have said "yes its possible, but the sheer amount of effort makes it highly unlikely to happen in the near future." Well, someone cared enough and lo and behold, we have a worm using multiple 0 day windows vulnerabilities, multiple stolen certificates, and to top it off its payload installs rootkits onto PLC controllers to attack critical industrial processes. Yummy. Maybe I heard about it, thought it was "just another virus," and dismissed it.

I've also decided I'm going to write console object loader plugins. Video games provide an interesting scenario for library recognition. Many of the ROMs and the toolchain are kept very proprietary. That is, you can't easily get even the development toolchain, let alone any sourcecode for it. So, what I was thinking to try was to run some clustering algorithms on the ROMs to see if I could identify the stock libraries / assembly routines used within a vendor or given by the manufacturer. Since this would be much easier to do in Python than C++, this was the excuse to write the Python bindings. Alternatively, I would have had to write the data to intermediate files and would not be able to directly interface to the engine.

Finally, I wrote some basic autoconf support. I don't think I'm using the correct macros for everything. I might migrate to CMake at some point, but for now I'd rather have it work for dev than spend a lot of time reworking the build system.

Friday, October 29, 2010

Revenge of the unit test

I wrote some unit tests a while ago, but I was too lazy to run them. I looked into FOSS Continuous Integration (CI) testing solutions, but couldn't really find anything that really caught my eye.
I played around some more with CDash (http://www.cdash.org/), which was at the top of my list. Unfortunately, it currently still seems pretty SVN/CVS oriented and without previous experience with CDash, the entry barrier to use it seems relatively high. Its natural zone of comfort seems to be SVN/CMake/Doxygen. There is a test server on the Kitware website at http://my.cdash.org/index.php?project=uvudec . I also tried to setup a local server which I may have had more luck with, but I couldn't get one of the dependencies installed. I'm told that Kitware is moving to git for one of their projects, so support for git might be cleaner in the near future.
I was also recommended to look into CIJoe (http://github.com/defunkt/cijoe). Unfortunately, it seems to crash for me. This may be because Fedora runs an olllld version of Ruby. Since this will be a dedicated virtual server anyway, I'll try to setup a VM to give it a better shot. They have a cool logo thought and even sell merchandise with it:

In the end, I decided my current needs are very modest and it would be better to get some crude hacked together server running than nothing at all. So, enter UVBuilder. It can be found at util/uvbuilder . Basically it uses a JSON config file to checkout, update, build, and run the code. It then e-mails results if there seems to be some change in status. Its very simplistic and has some dumb features like needing to checkout two copies of the code that can be solved with minor effort if I care.
I also played some more with using Valgrind to extensivly test the code. I had noticed that sometimes I would hexdump a data object and valgrind would trigger on that object if there was an error. So, I added a function that I called "poke" that would iterate over a block of memory that effectively does nothing. However, it makes the control flow appear to depend on the values by executing a statement like if( *ptr ) doNothing(); on each value. I found that it seems that std::map might leave uninitialized values though, so I'll have to make more custom Valgrind ignore files if I want testing to be truly effective at these tests.
So, next steps are to get all of the existing unit tests to pass (3 regressions, they all seem related to the same issue), and then beef up the unit tests now that I don't have to keep manually running them.

EDIT: all original unit tests pass after various fixes. Now to write a bunch of FLIRT tests.
EDIT: obj2pat unit tests created. And I'll try to get back to the comment below in the near future

Wednesday, October 27, 2010

FLIRT nearing completion

bfd based .pat generation is probably at an acceptable level. The behavior for handling short length names is kinda ill defined, so I'm not sure if there's much I can do about that. Additionally, FLAIR implements some x86 specific linker relocation fixup that I currently don't support since its currently all architecture independent. Also, while the basic architecture is there, much of the code should be moved out of the uvdbfd plugin and into the main engine. If I write a .pat generator for the uvdasm plugin (configuration file based architecture), this should be accomplished at the same time. I also figured out what one of the bits meant in the .sig format that has been annoying me for a bit. The reference .sig dumper had showed some of the function offsets being negative which didn't make sense to me. However, I finally figured out that this refers to local symbols (ie the a static global function in C/C++).
Generally, I'd consider .pat generation much harder than .sig since .pat is very platform specific and I'm guessing .pat stuff more or less isn't. I'm hoping that by Monday I should have uvpat2sig working smoothly. For starters, the old signature file dumping code was not integrated into the engine. Now, the signature file is actually loaded and then printed by dumping the loaded database. This is critical since in order to actually do signature matching, I'll need to load these up.
There are several issues with the current FLIRT engine. First, I haven't nailed down the overall file checksum computation. I'm guessing its just a CRC16 on the tree section (ie excluding the header), but haven't confirmed this. Second, compression/decompression isn't implemented. This isn't a high priority item and can be done later with presumably little impact on the loading mechanism. Next, the .sig file seems to leave out a lot of items from the .pat file. I need to figure out more accurately what items it leaves out and why. In particular, it looks like it only keeps one (the first?) external reference in a function. Finally, there is some attribute in the .sig file I don't understand. It seems to be some sort of referenced data with an offset and value, but I haven't yet devoted time to figure out what it refers to.