When reversing embedded code, it is often the case that completely different devices are built around a common code base, either due to code re-use by the vendor, or through the use of third-party software; this is especially true of devices running the same Real Time Operating System.
For example, I have two different routers, manufactured by two different vendors, and released about four years apart. Both devices run VxWorks, but the firmware for the older device included a symbol table, making it trivial to identify most of the original function names:
The older device with the symbol table is running VxWorks 5.5, while the newer device (with no symbol table) runs VxWorks 5.5.1, so they are pretty close in terms of their OS version. However, even simple functions contain a very different sequence of instructions when compared between the two firmwares:
Of course, binary variations can be the result of any number of things, including differences in the compiler version and changes to the build options.
Despite this, it would still be quite useful to take the known symbol names from the older device, particularly those of standard and common subroutines, and apply them to the newer device in order to facilitate the reversing of higher level functionality.
With the FLIRT signatures, IDA was able to identify 164 functions, some of which, like os_memcpy and udp_cksum, are quite useful.
Of course, FLIRT signatures will only identify functions that start with the same sequence of instructions, and many of the standard POSIX functions, such as printf and strcmp, were not found.
Because FLIRT signatures only examine the first 32 bytes of a function, there are also many signature collisions between similar functions, which can be problematic:
;--------- (delete these lines to allow sigmake to read this file) ; add '+' at the start of a line to select a module ; add '-' if you are not sure about the selection ; do nothing if you want to exclude all modules div_r 54 B8C8 00000000000000000085001A0000081214A00002002010210007000D2401FFFF ldiv_r 54 B8C8 00000000000000000085001A0000081214A00002002010210007000D2401FFFF proc_sname 00 0000 0000102127BDFEF803E0000827BD0108................................ proc_file 00 0000 0000102127BDFEF803E0000827BD0108................................ atoi 00 0000 000028250809F52A2406000A........................................ atol 00 0000 000028250809F52A2406000A........................................ PinChecksum FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD wps_checksum1 FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD wps_checksum2 FF 5EB5 00044080010440213C046B5F000840403484CA6B010400193C0ECCCC35CECCCD _d_cmp FC 1FAF 0004CD02333907FF240F07FF172F000A0006CD023C18000F3718FFFF2419FFFF _d_cmpe FC 1FAF 0004CD02333907FF240F07FF172F000A0006CD023C18000F3718FFFF2419FFFF _f_cmp A0 C947 0004CDC2333900FF241800FF173800070005CDC23C19007F3739FFFF0099C824 _f_cmpe A0 C947 0004CDC2333900FF241800FF173800070005CDC23C19007F3739FFFF0099C824 m_get 00 0000 00803021000610423C04803D8C8494F0................................ m_gethdr 00 0000 00803021000610423C04803D8C8494F0................................ m_getclr 00 0000 00803021000610423C04803D8C8494F0................................ ...
Alternative Signature Approaches
Examining the functions between the two VxWorks firmwares shows that there are a small fraction (about 3%) of unique subroutines that are identical between both firmware images:
Signatures can be created over the entirety of these functions in order to generate more accurate fingerprints, without the possibility of collisions due to similar or identical function prologues in unrelated subroutines.
Still other functions are very nearly identical, as exemplified by the following functions which only differ by a couple of instructions:
A simple way to identify these similar, but not identical, functions in an architecture independent manner is to generate “fuzzy” signatures based only on easily identifiable actions, such as memory accesses, references to constant values, and function calls.
In the above function for example, we can see that there are six code blocks, one which references the immediate value 0xFFFFFFFF, one which has a single function call, and one which contains two function calls. As long as no other functions match this “fuzzy” signature, we can use these unique metrics to identify this same function in other IDBs. Although this type of matching can catch functions that would otherwise go unidentified, it also has a higher propensity for false positives.
A bit more reliable metric is unique string references, such as this one in gethostbyname:
Likewise, unique constants can also be used for function identification, particularly subroutines related to crypto or hashing:
Even identifying functions whose names we don’t know can be useful. Consider the following code snippet in sub_801A50E0, from the VxWorks 5.5 firmware:
This unidentified function calls memset, strcpy, atoi, and sprintf; hence, if we can find this same function in other VxWorks firmware, we can identify these standard functions by association.
Alternative Signatures in Practice
I wrote an IDA plugin to automate these signature techniques and apply them to the VxWorks 5.5.1 firmware:
This identified nearly 1,300 functions, and although some of those are probably incorrect, it was quite successful in locating many standard POSIX functions:
Like any such automated process, this is sure to produce some false positives/negatives, but having used it successfully against several RTOS firmwares now, I’m quite happy with it (read: “it works for me”!).