One of the things I've been playing around with recently is using SWIG to generate a Python API. The following issues have/had to be solved
-Translate my error code return types to exceptions (DONE)
-Call UVDInit() on module load, UVDDeinit() on module unload (DONE)
-Fix some argument parsing related issues (DONE...sorta, my argument parsing code needs some redesign)
-Translate Object ** in stuff to returned instances (in progress)
The first item was done with this code (still some technicalities, but the general idea anyway):
%include "typemaps.i"
%typemap(out) uv_err_t
{
if( UV_FAILED($1) )
{
SWIG_exception(SWIG_RuntimeError, uv_err_str($1));
}
else
{
Py_RETURN_NONE;
}
}
The next issue was solved with this:
%pythoncode %{
# Seems to work
class InitDeinit:
def __init__(self):
# print 'Calling UVDInit()'
init()
get_config().parseArgs()
def __del__(self):
# print 'Calling UVDDeinit()'
deinit()
# Dummy instance to get created and destroyed
# We could get init to be executed globally...but I don't know about deinit
obj = InitDeinit()
%}
The last issue is only partially solved:
%typemap(in, numinputs=0) UVD **out (UVD *temp)
{
$1 = &temp;
}
Which removed UVD **out from the function input arguments and generated a temporary variable, UVD *temp, to pass into the C++ function. And, once needed, the types needed to be translate can be found with:
find -mindepth 3 -name '*.h' -exec fgrep '**' {} ';' |sed 's/^.*[(]//g' |sed 's/[)].*$//g' |awk -F ',' '{ for(i=1;i<=NF;++i) print $i }' |fgrep '**' |fgrep -v '***' |tr -d '[:blank:]' |grep -v '^$' |fgrep UVD |awk -F '**' '{ print $1 }' |sort -u
The problem is I need something to return the value in the temp variable. Its something related to "%typemap(argout) UVD **out", but I can't figure out the exact syntax for the correct result. On a last note, IDAPython manually translates all of their functions. It results in about 8,000 lines of C/C++ code. Although this will have some automatically generated code that will be much longer, it seems I can get this done in under 500 lines. The current code is about 300 lines including the SWIG .i file, a Makefile, and a utility .h and .cpp file. With the functions I currently included for wrapping, its generating a 19125 C++ interface file. On that note, the code is also much more verbose that if written by hand, but I guess all things considered if it works, I don't care if the automatically generated source file is a bit long. In any case, the effort to support interfaces will be (in theory) as simple as %include "uvd/core/uvd.h" as I've done for the first few test files. Granted, there will likely have to be some special cases, but overall SWIG seems to be pretty powerful at automating this. Some of this will be to simply name the input arguments appropriately as SWIG can match rules based on argument names.
Also, I started talking to
Silvio Cesare about library recognition since he seems to be doing some related research. I mostly focused on implementing the existing FLIRT algorithm, while he's working on trying to improve on some of its failures. For example, someone posted on his blog
this about malware using FLIRT's simplistic library recognition algorithm to hide themselves. Basically, all a virus has to do is match the prefix and write some relocation free code padded with some bytes to create a CRC16 collision, which is relatively easy. These are good reminders about the issues with FLIRT, but its still a good starting place. A lot of my current interest is with API reverse engineering and thus does not typically see such attacks.
On another note, I was reading some details about Stuxnet, which is quite an impressive virus. My friend was shocked to know I had missed this given my interests and being employed as a malware analyst when it came out. Basically, I don't read/watch the news or anything. Anyway, if someone had told me about it, I would have said "yes its possible, but the sheer amount of effort makes it highly unlikely to happen in the near future." Well, someone cared enough and lo and behold, we have a worm using multiple 0 day windows vulnerabilities, multiple stolen certificates, and to top it off its payload installs rootkits onto PLC controllers to attack critical industrial processes. Yummy. Maybe I heard about it, thought it was "just another virus," and dismissed it.
I've also decided I'm going to write console object loader plugins. Video games provide an interesting scenario for library recognition. Many of the ROMs and the toolchain are kept very proprietary. That is, you can't easily get even the development toolchain, let alone any sourcecode for it. So, what I was thinking to try was to run some clustering algorithms on the ROMs to see if I could identify the stock libraries / assembly routines used within a vendor or given by the manufacturer. Since this would be much easier to do in Python than C++, this was the excuse to write the Python bindings. Alternatively, I would have had to write the data to intermediate files and would not be able to directly interface to the engine.
Finally, I wrote some basic autoconf support. I don't think I'm using the correct macros for everything. I might migrate to CMake at some point, but for now I'd rather have it work for dev than spend a lot of time reworking the build system.