Saturday, October 9, 2010

Object architecture plugin-able

The object engine is now pluginable. This was the last major planned core architecture enhancement for the semester. Object plugins allow say, to create a plugin for Microchip's C18 compiler if it used some format (I think it did, but I don't remember for sure). However, this did result in one hack that I haven't figured out if is actually a good idea or really is a hack. There is now a function getPrimaryExecutableAddressSpace() that returns the area where the main program is expected to reside. It seems this will work in most cases for the general "just disassemble the binary" case, so maybe this isn't an issue. Or maybe I should iterate over all executable segments? The issue with the latter is that often these aren't desirable to print. For example, on Linux x86 with ELF files, those aren't desirable. However, one of the engine enhancements was grouping object file format, architecture, operating system (future placeholder), and debugger (future placeholder) into a UVDRuntime class. This makes it so that if enough information is given, it is possible to combine attributes to resolve these special cases.
There are still several parts of the code that need to be improved and will be hammered out more as I need to support a more advanced architecture like x86. The specific semantics of how to read data from within a memory section need to be refined. For example, should the encapsulated data object actually do the mappings for the virtual address space? For example, say the address space actually starts at 0x08400000. Should the first valid address be 0 or 0x08400000 on that data object? One issue with the latter is that if an architecture does not provide byte level addressing, a lot of functions may not make sense. It seems most if not all architectures base their addressing on bytes, not words, so this may not be an issue. Since the number of memory spaces should be fairly small with perhaps up to a thousand on a large Windows program with many DLL dependencies, it may not hurt to provide objects that can map the addresses in both ways as a convenience since it looks like each mapper object only consumes about 24 bytes (virtual func table, target data, start target address, end target address, start read address, end read address) which is relatively small.
Now that the core engine enhancements are done, I will work on the library recognition enhancements. There are several large action items. First, make the BFD output matches FLAIR output exactly. There are some advanced reference features and my lack of understanding of the symbol references that are getting in the way of that. Second, decouple FLIRT code from bfd code. This will mean that if someone can write an object module that finds relocations, it can generate pat files. Finally, implement library recognition in the analysis engine. This is a lower priority since the generated .pat and .sig files can be used in other programs, so are not useless by themselves.
I'm also still trying to figure out GUI issues. The short term solution seems to be to make the scroll bar based on address only and only render the area currently shown. An example software that does this is the Metasm GUI.
Advantages:
  • Much faster render time
  • It will be easier to figure out scroll bar position. Changing data analysis does not mean we'd have to recalculate our position. For a large program, and most are probably large enough, the user probably wouldn't notice this difference anyway due to the huge number of elements at each position
Disadvantages
  • We must perform our own logic when an address is clicked on instead of using tags. This was probably going to happen eventually anyway in case we jumped address spaces
  • To make scrolling be line (which seems more intuitive to me), we will have to buffer some of the nearby rendering and feed it line by line
  • Some advanced level text editor/viewer features may be lost (copy text over multiple pages, etc)

No comments:

Post a Comment