Sunday, October 3, 2010

Plugin away at architecture enhancements

Okay so I wrote this at 1:30 AM or something and I'm really tired, but its hopefully at least somewhat coherent brain dump.
With the new semester comes many new goals. One of the major goals was to get a plugin system up and running.
The first part of this was to modularize the architecture instead of assuming UVD Disassembler config file based architecture which is obviously not gong to work in every case. I spent most of a day or two a few weekends ago and this is now done. First, I separated the code out so that it worked as a dynamically assigned UVDArchitecture object. The second part was to make it a plugin. Although I don't like the way load paths and such are treated (security issues for starters), I have at least a basic system running. Example output:
[mcmaster@gespenst bin]$ ./uvudec.dynamic --help
uvudec version 0.3.0
libuvudec version 0.3.0
Copyright 2009-2010 John McMaster
Portions copyright GNU (MD5 implementation)

Usage: ./uvudec.dynamic
Args:
--debug-general (debug.flag.general): set given debug flag
--debug-flirt (debug.flag.flirt): set given debug flag
--help (action.help): print this message and exit
--version (action.version): print version and exit
...
--plugin (plugin.name): load given library name as plugin
--plugin-path (plugin.path.append): append dir to plugin search path
--plugin-path-prepend (plugin.path.prepend): prepend dir to plugin search path
--input (target.file): source file for data
--output (output.file): output program (default: stdout)

Loaded plugins (1 / 1):

Plugin uvdasm:
--config-language (config.language): default config interpreter language (plugins may require specific)
python: use Python (default)
--config-language-interface (config.language_interface): how to access a specifc interpreter (options as availible)
exec: execute interpreter, parse results
API: use binary API to interpreter (default)
--arch-file (arch.file): architecture/CPU module file
Which reminds me, the GNU hash thing doesn't need to be in there and should be removed since its GPL contamination. Plugins can be also added easily by third parties as the recursive make simply tries to execute all makefiles one level down from the $ROOT/plugin dir. So, also you have to do is copy an example plugin and you can try it out without editing any of the core makefiles. Maybe I'll do something similar to Linux style kernel modules where you latch onto an installed build system. There is an issue with current code installing nicely and having modules compile the same because installed headers will all be prefixed with uvd/, but for dev they currently aren't. I could put all headers in a uvd dir. I was thinking of simply sym linking uvd dir to main code dir. This wouldn't work on native Windows builds, but I don't know if I will ever support that anyway. Probably Cygwin or mingw at best. I should look at some projects like libxml2 that I know do this and see how their build system handles it.
The other major area that needs to be improved, and more so than architecture, is object file abstraction. Assuming input files were only raw binaries leads to lazy addressing throughout the code where its assumed all addresses are absolute. This isn't quite true as RAM is in a different address space as code (on some architectures), but since I'm not doing heavy analysis yet, these haven't been well separated. These improvements will be primarily driven by the need to abstract the object file formats and provide the ability to write plugins
I've never written a plugin architeiture before and so I've played around with several design patterns to try to make it work smooth. Ultimately, it seems different components need different interfaces to work well. What its seeming to boil down to though is that analysis events are best broadcasted as events and if people need to listen to them they register to a single callback and filter out the ones they don't want. However, with heavier duty objects such as architecture instantiation, I haven't quite decided on the exact mechanism. There are several constraints influencing this decision. First, ability to register loaders should be accessible for non-plugins as I haven't strictly made binary loaders and such have to be dynamic libraries and such yet. This requirement may change in the future as it would make the code more regular. Second, it is probably a good idea for factory methods to automatically unregister themselves if a plugin is unloaded. Third, for interactive purposes, it is desirable to be able to probe how good of a match a loader is to an input format without actually instantiating the engine, if possible.
To deal with the first issue, there needs to be a non-plugin way to register creation callbacks. So, simply iterating over the loaded plugins and calling a loadObjectFile or such function is not enough. At the very least, there would also need to be an additional list of registered callbacks. So, this seems to favor the latter method exclusively as it simplifies the code. The main downside is that plugins may have to be more aware of architectural changes. I might make it so that the plugin engine registers loader handlers to give an additional option if its not too much effort.
For the second, using the hybrid plugin and registration method would solve this and I might do that. For the last issue, this creates the issue of needing to maintain correlated function sets. Its certainly easier to do this with plugins since they provide coordinated data structures. Otherwise, I would have to create some hash maps during registration to correlate them. And, since there was no identifier returned to know what engine (plugin) it was, it would be difficult to track which engine it was that provided the best loading. These factors seem to indicate that the best long term solution would be to make every architecture and binary format loader its own plugin.
As far as the actual design of the object abstraction itself, I've been looking into several existing object abstraction frameworks. Mostly though I'm going off of binutils since its what I'm most familiar with. They seem to have a bias towards ELF, but do in fact support a lot of architectures and object formats.
On the GUI side. I tried adding clickable address links, but led to performance issues with large documents. I'll have to look into this some more. I'm thinking of making a custom "scroll" bar that shows the address locations and generates the screen only as needed. I know from time to time I like to copy and paste stuff though, and this might make doing that difficult, but possible, over areas larger than one screen. Also, IDA 6 has been released with a new spiffy Qt driven GUI. I probably won't get to try it out at least until February if I go work for MIT Lincoln Labs full time. But I did get to look at some screenshots and it looks nice.

No comments:

Post a Comment