For those not familiar with relocations, they are basically something that needs to be fixed up in a future once a value is known. In an object file for example, there are relocations for the address of global variables and functions not yet linked in. ELF files themselves contain a number of relocations in the core file structure as you are building it, such as header information positions in the file. Since you can't update this information until the entire file is laid out, you must essentially do relocations on the file to patch up the addresses once known.
There are two common models for solving this issue: the fixup approach and the two pass approach. In the fixup approach, you record all of the locations needing fixups and calculate them after placing all of the objects. In the two pass approach, you iterate a second time over all of the data and have each piece fill in its missing data know that everything has been placed.
I decided somewhat arbitrarily to try the first approach in constructing the ELF object output binaries. For each file metadata location needing fixups, there is a powerful (ie read wordy, slow) interface to fixup various locations in the file. I made some bad naming decisions etc etc that caused some issues. The main kiler was that construction order became an issue because relocations couldn't be issued until certain information was filled in (namely certain data needed to compute the value such as a section not yet assigned) that complicated the workings.
On the code cleanup list is to take this old style and replace it with the alternative style I was considering which was to do the two pass style. This would create all of the objects and reference the data structures together to make coherent objects. Then, a updateValues() or similar would be called to have each object to update all of the offsets and such it needs to be written to a file. Finally, one would iterate over all of the parts of the ELF file and write it out section by section.
All in all, I'm sure this approach will bring up some issues as well. A hybrid approach may be used, but this will hopefully solve some of the headaches of the current implementation, reduce code, and speed up further improvements.
Thursday, December 17, 2009
Wednesday, November 11, 2009
Static database coming soon
The second milestone should be reached soon. Functions will be output to individual relocatable ELF files. Relocation 0'd code will also be savable to aid in static function analysis by rapidly finding previously known functions. Currently saves a raw binary of all the functions and broken ELF files. Working on fixing the broken ELF files and then outputting raw relocation 0'd binaries should be trivial. Not all forms of relocations will be detected at this point and probably never will be. However, "obvious" global variables and such are the first goal and should be done in the near future.
Milestone three, targetted for February, will include basic static analysis of function flow in C form. Basic flow analysis is in fact already performed, but the result is discarded except for mining out function calls.
Milestone three, targetted for February, will include basic static analysis of function flow in C form. Basic flow analysis is in fact already performed, but the result is discarded except for mining out function calls.
Sunday, November 1, 2009
Updates
The code repo was changed to rpisec but forget to update it on the main RCOS page so now it looks like theres no code checked in or something. The code is actually at git.rpisec.net/uvudec.git.
Progress is underway for better static analysis of binaries to produce object files. Recent progress has been on code cleanup and related. With midterms out of the way, I have some time again and have been working towards making a big checkin for the changes. I should have branched the code and done some smaller commits with the other fixups, but I'm still working on getting comfterable with GIT.
Alex commited Python API code which should now make the Python interpreter code much faster instead of using system() (although system based is still availible). I haven't tested it yet, but I do know that at least the old python code still works.
Progress is underway for better static analysis of binaries to produce object files. Recent progress has been on code cleanup and related. With midterms out of the way, I have some time again and have been working towards making a big checkin for the changes. I should have branched the code and done some smaller commits with the other fixups, but I'm still working on getting comfterable with GIT.
Alex commited Python API code which should now make the Python interpreter code much faster instead of using system() (although system based is still availible). I haven't tested it yet, but I do know that at least the old python code still works.
Sunday, September 27, 2009
Subscribe to:
Posts (Atom)