User Tools

Site Tools


info:c_memory_structure

This is an old revision of the document!


Determining C memory layout

A while ago I had to write a Python program that generated C structures as C expects to find them in memory. The challenge was that the target system had a different alignment, endianness and word size than the computer running the Python scripts on. In this article I am referring to Python but the result can be transferred to any other programming language.

Goals

The primary goal of this effort was to represent C structures as Python objects in order to set and get fields. To access the data structures remotely I implemented a special program that I am not going to cover here. The mapping should be automatic, and correct.

For example, I have the following C structure:

typedef struct atype_t {
    int i;
    char c;
    short s;
}

On 32-bit machines, it is assumed that an

int

has a length of 32 bits, a char 8 and a short 16 bits. However, these are just assumptions and will not hold in general. As most computers today have a word size of 64 bits these assumptions are false.

On the Python side I want to be able to use the following code:

atype = atype_t()
atype.i = 1
atype.c = 4
atype.s = 8
byteArray = atype.bytes()

For completeness, I also want to convert a given byte array into a Python structure object.

Approaches

As this seems to be a common problem I evaluated different options to solve the problem.

Parsing C

The fist one was to parse the C header files for typedef/struct elements in order to extract the relevant information. However, parsing C is non-trivial as it requires first to run the C preprocessor. Only afterwards it is correct C code, which can be parsed by C parsers. For Python there exists pycparser which aims to provide a complete AST for C. While this approach works in theory, it has several drawbacks.

  • Information about #define's are lost.
  • Alignment on the target machine can only be guessed. It would be possible to implement an algorithm that generates an alignment for structures but there is no guarantee that the compiler produces the same alignment.
  • Overhead: I am only interested in the structures, not in the function AST.

Generate bindings

There exist several tools to generate bindings for C. However, all of them are targeted to be run on the same machine type, there is no option to set endianess, word size and alignment. An example is ctypes, many others are listed on the Python wiki.

Interfacing to the compiler

The only program that really knows the structure layout in memory is the compiler. It decides about the in-memory structure.

info/c_memory_structure.1350270112.txt.gz · Last modified: 2012/10/15 05:01 by moritz

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki