Sunday, November 14, 2010

Python API alpha

I've fixed some of the issues I was having with Python. A simple example:
[mcmaster@gespenst bin]$ ipython

In [1]: import uvudec

In [2]: uvd = uvudec.uvd.getUVDFromFileName('candela.bin')

In [3]: dissassembly = uvd.disassemble()
In [4]: print dissassembly[0:200]
LJMP #0x0026
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
LJMP #0x0DA9
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV R7, A
MOV
Basically, this is the sort of construct I needed:
%typemap(in, numinputs=0) UVD ** (UVD *temp)
{
$1 = &temp;
}

%typemap(argout) (UVD **)
{
PyObject *to_add = SWIG_NewPointerObj(*$1, $descriptor(UVD *), SWIG_POINTER_OWN);
$result = SWIG_AppendOutput($result, to_add);
}
I initially had some issues with appending objects to the none type generated in the default exception handler (I actually need to look more into why this was required in the first place), but they seem to have gone away now. The issue was that if you appended an object to a none type, it would return a list with the object being the only member of that list...w/e.
Things seem to work now at least at a basic level, but there's a bunch of things in both C++ and Python/SWIG that will need to be cleaned up for this to be convenient to use. I guess the next big thing will be to figure out how to make my iterators translate cleanly. In particular, it doesn't look like they are being compared correctly. Maybe need to add some sort of generator translation functionality as well?
Example of current iterator code:
itr = uvd.begin()
while itr is not uvd.end() and itr.getPosition() < 0x10:
print '0x%04X: %s' % (itr.getPosition(), itr.getCurrent())
itr.next()

0x0000: LJMP #0x0026
0x0003: MOV R7, A
0x0004: MOV R7, A
0x0005: MOV R7, A
0x0006: MOV R7, A
0x0007: MOV R7, A
0x0008: MOV R7, A
0x0009: MOV R7, A
0x000A: MOV R7, A
0x000B: LJMP #0x0DA9
0x000E: MOV R7, A
0x000F: MOV R7, A

Wednesday, November 3, 2010

Python API

One of the things I've been playing around with recently is using SWIG to generate a Python API. The following issues have/had to be solved
-Translate my error code return types to exceptions (DONE)
-Call UVDInit() on module load, UVDDeinit() on module unload (DONE)
-Fix some argument parsing related issues (DONE...sorta, my argument parsing code needs some redesign)
-Translate Object ** in stuff to returned instances (in progress)
The first item was done with this code (still some technicalities, but the general idea anyway):
%include "typemaps.i"
%typemap(out) uv_err_t
{
if( UV_FAILED($1) )
{
SWIG_exception(SWIG_RuntimeError, uv_err_str($1));
}
else
{
Py_RETURN_NONE;
}
}
The next issue was solved with this:
%pythoncode %{
# Seems to work
class InitDeinit:
def __init__(self):
# print 'Calling UVDInit()'
init()
get_config().parseArgs()

def __del__(self):
# print 'Calling UVDDeinit()'
deinit()
# Dummy instance to get created and destroyed
# We could get init to be executed globally...but I don't know about deinit
obj = InitDeinit()
%}
The last issue is only partially solved:
%typemap(in, numinputs=0) UVD **out (UVD *temp)
{
$1 = &temp;
}
Which removed UVD **out from the function input arguments and generated a temporary variable, UVD *temp, to pass into the C++ function. And, once needed, the types needed to be translate can be found with:
find -mindepth 3 -name '*.h' -exec fgrep '**' {} ';' |sed 's/^.*[(]//g' |sed 's/[)].*$//g' |awk -F ',' '{ for(i=1;i<=NF;++i) print $i }' |fgrep '**' |fgrep -v '***' |tr -d '[:blank:]' |grep -v '^$' |fgrep UVD |awk -F '**' '{ print $1 }' |sort -u
The problem is I need something to return the value in the temp variable. Its something related to "%typemap(argout) UVD **out", but I can't figure out the exact syntax for the correct result. On a last note, IDAPython manually translates all of their functions. It results in about 8,000 lines of C/C++ code. Although this will have some automatically generated code that will be much longer, it seems I can get this done in under 500 lines. The current code is about 300 lines including the SWIG .i file, a Makefile, and a utility .h and .cpp file. With the functions I currently included for wrapping, its generating a 19125 C++ interface file. On that note, the code is also much more verbose that if written by hand, but I guess all things considered if it works, I don't care if the automatically generated source file is a bit long. In any case, the effort to support interfaces will be (in theory) as simple as %include "uvd/core/uvd.h" as I've done for the first few test files. Granted, there will likely have to be some special cases, but overall SWIG seems to be pretty powerful at automating this. Some of this will be to simply name the input arguments appropriately as SWIG can match rules based on argument names.

Also, I started talking to Silvio Cesare about library recognition since he seems to be doing some related research. I mostly focused on implementing the existing FLIRT algorithm, while he's working on trying to improve on some of its failures. For example, someone posted on his blog this about malware using FLIRT's simplistic library recognition algorithm to hide themselves. Basically, all a virus has to do is match the prefix and write some relocation free code padded with some bytes to create a CRC16 collision, which is relatively easy. These are good reminders about the issues with FLIRT, but its still a good starting place. A lot of my current interest is with API reverse engineering and thus does not typically see such attacks.

On another note, I was reading some details about Stuxnet, which is quite an impressive virus. My friend was shocked to know I had missed this given my interests and being employed as a malware analyst when it came out. Basically, I don't read/watch the news or anything. Anyway, if someone had told me about it, I would have said "yes its possible, but the sheer amount of effort makes it highly unlikely to happen in the near future." Well, someone cared enough and lo and behold, we have a worm using multiple 0 day windows vulnerabilities, multiple stolen certificates, and to top it off its payload installs rootkits onto PLC controllers to attack critical industrial processes. Yummy. Maybe I heard about it, thought it was "just another virus," and dismissed it.

I've also decided I'm going to write console object loader plugins. Video games provide an interesting scenario for library recognition. Many of the ROMs and the toolchain are kept very proprietary. That is, you can't easily get even the development toolchain, let alone any sourcecode for it. So, what I was thinking to try was to run some clustering algorithms on the ROMs to see if I could identify the stock libraries / assembly routines used within a vendor or given by the manufacturer. Since this would be much easier to do in Python than C++, this was the excuse to write the Python bindings. Alternatively, I would have had to write the data to intermediate files and would not be able to directly interface to the engine.

Finally, I wrote some basic autoconf support. I don't think I'm using the correct macros for everything. I might migrate to CMake at some point, but for now I'd rather have it work for dev than spend a lot of time reworking the build system.