User Tools

Site Tools


This is an old revision of the document!

Determining C memory layout

A while ago I had to write a Python program that generated C structures as C expects to find them in memory. The challenge was that the target system had a different alignment, endianness and word size than the computer running the Python scripts on. In this article I am referring to Python but the result can be transferred to any other programming language.


The primary goal of this effort was to represent C structures as Python objects in order to set and get fields. To access the data structures remotely I implemented a special program that I am not going to cover here. The mapping should be automatic, and correct. The C header files exist already, it is not an option to change them. Also, some structures are marked to be packed using #pragma directives.

For example, I have the following C structure:

typedef struct atype_t {
    int i;
    char c;
    short s;

On 32-bit machines, it is assumed that an


has a length of 32 bits, a char 8 and a short 16 bits. However, these are just assumptions and will not hold in general. As most computers today have a word size of 64 bits these assumptions are false.

On the Python side I want to be able to use the following code:

atype = atype_t()
atype.i = 1
atype.c = 4
atype.s = 8
byteArray = atype.bytes()

For completeness, I also want to convert a given byte array into a Python structure object.


As this seems to be a common problem I evaluated different options to solve the problem.

Parsing C

The fist one was to parse the C header files for typedef/struct elements in order to extract the relevant information. However, parsing C is non-trivial as it requires first to run the C preprocessor. Only afterwards it is correct C code, which can be parsed by C parsers. For Python there exists pycparser which aims to provide a complete AST for C. While this approach works in theory, it has several drawbacks.

  • Information about #define's are lost.
  • Alignment on the target machine can only be guessed. It would be possible to implement an algorithm that generates an alignment for structures but there is no guarantee that the compiler produces the same alignment.
  • Overhead: I am only interested in the structures, not in the function AST.

Generate bindings

There exist several tools to generate bindings for C. However, all of them are targeted to be run on the same machine type, there is no option to set endianess, word size and alignment. An example is ctypes, many others are listed on the Python wiki.

Interfacing to the compiler

The only program that really knows the structure layout in memory is the compiler. It decides about the in-memory structure. Hence, I started investigating if there was a way to extract this information from the compiler.

The missing piece: .debug_section

C compilers can store additional information to the compiled output in program sections. For ELF files, there is an optional .debug_section that contains compiler-specific data. It has no standardized structure. Internally it is used to help debugging a program, for example to determine what variable is at what memory location etc. It also stores debug information about the C structures, which is what I am looking for.

For different compilers exist different tools to access the debug section content. For GCC, there is objdump, for IAR there exists ielfdump. Both allow to print the debug section in a human-readable form. However, the structure is not documented and requires the programmer to reverse engineer it. (For GCC it is open-source, which is not the case for the IAR C compiler).

Example program

Throughout the article, we will be using the following sample program:

#include <stdio.h>
typedef struct atype_t {
    unsigned int i;
    unsigned short s;
    char c;
} atype_t;
int main(char **argv, int argl) {
    atype_t atype;
    int i;
    atype.i = 0x12345678;
    atype.s = 0x90AB;
    atype.c = 0xCD;
    const unsigned char *c = (const unsigned char*) &atype;
    for (i = 0; i < sizeof(atype); i++) {
        printf("%02x ", c[i]);
    return 0;
info/c_memory_structure.1350271552.txt.gz · Last modified: 2012/10/15 05:25 by moritz