info:c_memory_structure
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
info:c_memory_structure [2012/10/15 02:52] – created moritz | info:c_memory_structure [2012/10/16 15:23] (current) – [Using the debug info dump] moritz | ||
---|---|---|---|
Line 1: | Line 1: | ||
====== Determining C memory layout ====== | ====== Determining C memory layout ====== | ||
- | A while ago I had to write a Python program that generated C structures as C expects to find them in memory. The challenge was that the target system had a different alignment, endianness and word size than the computer running the Python scripts on. In this article I am referring to Python but the result can be transferred to any other programming language. | + | A while ago I had to write a Python program that generated C structures as C expects to find them in memory. The challenge was that the target system had a different alignment, endianness and word size than the computer running the Python scripts on. This is especially important when developing for multiple targets, like i386 and AMD64, different ARM platforms and MIPS. In this article I am referring to Python but the result can be transferred to any other programming language. |
===== Goals ===== | ===== Goals ===== | ||
- | The primary goal of this effort was to represent C structures as Python objects in order to set and get fields. To access the data structures remotely I implemented a special program that I am not going to cover here. The mapping should be automatic, and correct. | + | The primary goal of this effort was to represent C structures as Python objects in order to set and get fields. To access the data structures remotely I implemented a special program that I am not going to cover here. The mapping should be automatic, and correct. The C header files exist already, it is not an option to change them. Also, some structures are marked to be packed using ''# |
+ | |||
+ | |||
+ | For example, I have the following C structure: | ||
+ | <code c> | ||
+ | typedef struct atype_t { | ||
+ | int i; | ||
+ | char c; | ||
+ | short s; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | On 32-bit machines, it is assumed that an '' | ||
+ | |||
+ | On the Python side I want to be able to use the following code: | ||
+ | <code Python> | ||
+ | atype = atype_t() | ||
+ | atype.i = 1 | ||
+ | atype.c = 4 | ||
+ | atype.s = 8 | ||
+ | byteArray = atype.bytes() | ||
+ | </ | ||
+ | |||
+ | For completeness, | ||
===== Approaches ===== | ===== Approaches ===== | ||
Line 15: | Line 38: | ||
The fist one was to parse the C header files for typedef/ | The fist one was to parse the C header files for typedef/ | ||
- | * Information about # | + | * Information about '' |
* Alignment on the target machine can only be guessed. It would be possible to implement an algorithm that generates an alignment for structures but there is no guarantee that the compiler produces the same alignment. | * Alignment on the target machine can only be guessed. It would be possible to implement an algorithm that generates an alignment for structures but there is no guarantee that the compiler produces the same alignment. | ||
* Overhead: I am only interested in the structures, not in the function AST. | * Overhead: I am only interested in the structures, not in the function AST. | ||
Line 22: | Line 45: | ||
There exist several tools to generate bindings for C. However, all of them are targeted to be run on the same machine type, there is no option to set endianess, word size and alignment. An example is [[http:// | There exist several tools to generate bindings for C. However, all of them are targeted to be run on the same machine type, there is no option to set endianess, word size and alignment. An example is [[http:// | ||
+ | |||
+ | ==== Interfacing to the compiler ==== | ||
+ | |||
+ | The only program that really knows the structure layout in memory is the compiler. It decides about the in-memory structure. Hence, I started investigating if there was a way to extract this information from the compiler. | ||
+ | |||
+ | ===== The missing piece: .debug_info section ===== | ||
+ | |||
+ | C compilers can store additional information to the compiled output in program sections. For ELF files, there is an optional '' | ||
+ | |||
+ | For different compilers exist different tools to access the debug section content. For GCC, there is objdump, for IAR there exists ielfdump. Both allow to print the debug section in a human-readable form. However, the structure is not documented and requires the programmer to reverse engineer it. (For GCC it is open-source, | ||
+ | |||
+ | |||
+ | ==== Example program ==== | ||
+ | |||
+ | Throughout the article, we will be using the following sample program saved as '' | ||
+ | |||
+ | <code C> | ||
+ | #include < | ||
+ | |||
+ | typedef struct atype_t { | ||
+ | unsigned int i; | ||
+ | unsigned short s; | ||
+ | char c; | ||
+ | } atype_t; | ||
+ | |||
+ | int main(char **argv, int argl) { | ||
+ | atype_t atype; | ||
+ | int i; | ||
+ | atype.i = 0x12345678; | ||
+ | atype.s = 0x90AB; | ||
+ | atype.c = 0xCD; | ||
+ | | ||
+ | const unsigned char *c = (const unsigned char*) &atype; | ||
+ | for (i = 0; i < sizeof(atype); | ||
+ | printf(" | ||
+ | } | ||
+ | printf(" | ||
+ | return 0; | ||
+ | } | ||
+ | </ | ||
+ | |||
+ | Running it on my machine produces the following output: | ||
+ | < | ||
+ | My machine is obviously a [[http:// | ||
+ | |||
+ | ==== Obtaining the debug section data ==== | ||
+ | |||
+ | In order to examine the debug section, the program first needs to be compiled with debug symbols. When using GCC, this can be done by adding the '' | ||
+ | |||
+ | Next, we use the '' | ||
+ | < | ||
+ | |||
+ | Here is the output on my machine: | ||
+ | |||
+ | < | ||
+ | |||
+ | Contents of the .debug_info section: | ||
+ | |||
+ | Compilation Unit @ offset 0x0: | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | |||
+ | This is a lot of output. Let's explain the different parts. | ||
+ | |||
+ | Each line represents either the start of a new entity or adds an attribute to one. All entities and attributes have a unique address, encoded in hex. Each entity has a type, which determines the attributes it has. The entity line consists of a nesting level, the unique address followed by the entity' | ||
+ | |||
+ | ==== Finding the size information ==== | ||
+ | |||
+ | |||
+ | Fist, it tells us what format the file has. In this case it is a '' | ||
+ | |||
+ | Next, it shows the contents of the debug info section. There can be different compilation units, here it is just one. It tells us that pointers indeed have a length of 8 bits. The total length of the debug info section is 0x12c. | ||
+ | |||
+ | The first compilation unit is marked with ''< | ||
+ | |||
+ | The first children of the compile unit are '' | ||
+ | |||
+ | Skipping to address 0x79 it gets more interesting: | ||
+ | < | ||
+ | It shows that our structure occupies 8 bytes in memory, which we can verify using the output of our sample program. Nested under the structure type, we find several '' | ||
+ | |||
+ | < | ||
+ | Here, '' | ||
+ | |||
+ | |||
+ | ==== Problems ==== | ||
+ | |||
+ | This approach has the downside that it requires a certain format of the debug dump. Also, it does not specify the endianness of the types. | ||
+ | |||
+ | ===== IAR ===== | ||
+ | |||
+ | The same can be done with the tools that come with the IAR C/C++ compiler. The ELF dump utility is called '' | ||
+ | |||
+ | |||
info/c_memory_structure.1350269551.txt.gz · Last modified: 2012/10/15 02:52 by moritz