User Tools

Site Tools


info:c_memory_structure

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
info:c_memory_structure [2012/10/15 05:28] – [Example program] moritzinfo:c_memory_structure [2012/10/16 17:23] (current) – [Using the debug info dump] moritz
Line 1: Line 1:
 ====== Determining C memory layout ====== ====== Determining C memory layout ======
  
-A while ago I had to write a Python program that generated C structures as C expects to find them in memory. The challenge was that the target system had a different alignment, endianness and word size than the computer running the Python scripts on. In this article I am referring to Python but the result can be transferred to any other programming language.+A while ago I had to write a Python program that generated C structures as C expects to find them in memory. The challenge was that the target system had a different alignment, endianness and word size than the computer running the Python scripts on. This is especially important when developing for multiple targets, like i386 and AMD64, different ARM platforms and MIPS. In this article I am referring to Python but the result can be transferred to any other programming language.
  
 ===== Goals ===== ===== Goals =====
Line 17: Line 17:
 </code> </code>
  
-On 32-bit machines, it is assumed that an <code>int</code> has a length of 32 bits, a ''char'' 8 and a ''short'' 16 bits. However, these are just assumptions and will not hold in general. As most computers today have a word size of 64 bits these assumptions are false.+On 32-bit machines, it is assumed that an ''int'' has a length of 32 bits, a ''char'' 8 and a ''short'' 16 bits. However, these are just assumptions and will not hold in general. As most computers today have a word size of 64 bits these assumptions are false.
  
 On the Python side I want to be able to use the following code: On the Python side I want to be able to use the following code:
Line 50: Line 50:
 The only program that really knows the structure layout in memory is the compiler. It decides about the in-memory structure. Hence, I started investigating if there was a way to extract this information from the compiler. The only program that really knows the structure layout in memory is the compiler. It decides about the in-memory structure. Hence, I started investigating if there was a way to extract this information from the compiler.
  
-===== The missing piece: .debug_section =====+===== The missing piece: .debug_info section =====
  
-C compilers can store additional information to the compiled output in program sections. For ELF files, there is an optional ''.debug_section'' that contains compiler-specific data. It has no standardized structure. Internally it is used to help debugging a program, for example to determine what variable is at what memory location etc. It also stores debug information about the C structures, which is what I am looking for.+C compilers can store additional information to the compiled output in program sections. For ELF files, there is an optional ''.debug_info'' (may have a different name depending on the compiler) that contains compiler-specific data. It has no standardized structure. Internally it is used to help debugging a program, for example to determine what variable is at what memory location etc. It also stores debug information about the C structures, which is what I am looking for.
  
 For different compilers exist different tools to access the debug section content. For GCC, there is objdump, for IAR there exists ielfdump. Both allow to print the debug section in a human-readable form. However, the structure is not documented and requires the programmer to reverse engineer it. (For GCC it is open-source, which is not the case for the IAR C compiler). For different compilers exist different tools to access the debug section content. For GCC, there is objdump, for IAR there exists ielfdump. Both allow to print the debug section in a human-readable form. However, the structure is not documented and requires the programmer to reverse engineer it. (For GCC it is open-source, which is not the case for the IAR C compiler).
Line 59: Line 59:
 ==== Example program ==== ==== Example program ====
  
-Throughout the article, we will be using the following sample program:+Throughout the article, we will be using the following sample program saved as ''struct.c'':
  
 <code C> <code C>
Line 89: Line 89:
 <code>78 56 34 12 ab 90 cd 00</code> <code>78 56 34 12 ab 90 cd 00</code>
 My machine is obviously a [[http://en.wikipedia.org/wiki/Endianness|little-endian]] machine. My machine is obviously a [[http://en.wikipedia.org/wiki/Endianness|little-endian]] machine.
 +
 +==== Obtaining the debug section data ====
 +
 +In order to examine the debug section, the program first needs to be compiled with debug symbols. When using GCC, this can be done by adding the ''-g'' option. This will cause GCC to record all kind of information in different ELF sections, all prefixed by ''debug_''.
 +
 +Next, we use the ''objdump'' utility to extract the debug section and print its content. Assuming that the program has been compiled to the binary ''struct'', the following command reveals the debug section contents:
 +<code>objdump --dwarf=info struct</code>
 +
 +Here is the output on my machine:
 +
 +<code>struct:     file format elf64-x86-64
 +
 +Contents of the .debug_info section:
 +
 +  Compilation Unit @ offset 0x0:
 +   Length:        0x12c (32-bit)
 +   Version:       2
 +   Abbrev Offset: 0
 +   Pointer Size:  8
 + <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
 +    <c>   DW_AT_producer    : (indirect string, offset: 0x1e): GNU C 4.7.1
 +    <10>   DW_AT_language    : 1 (ANSI C)
 +    <11>   DW_AT_name        : (indirect string, offset: 0x7d): struct.c
 +    <15>   DW_AT_comp_dir    : (indirect string, offset: 0x60): /home/moritz/dev/struct_test
 +    <19>   DW_AT_low_pc      : 0x40055c
 +    <21>   DW_AT_high_pc     : 0x4005ce
 +    <29>   DW_AT_stmt_list   : 0x0
 + <1><2d>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <2e>   DW_AT_byte_size   : 8
 +    <2f>   DW_AT_encoding    : 7 (unsigned)
 +    <30>   DW_AT_name        : (indirect string, offset: 0x2a): long unsigned int
 + <1><34>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <35>   DW_AT_byte_size   : 1
 +    <36>   DW_AT_encoding    : 8 (unsigned char)
 +    <37>   DW_AT_name        : (indirect string, offset: 0x44): unsigned char
 + <1><3b>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <3c>   DW_AT_byte_size   : 2
 +    <3d>   DW_AT_encoding    : 7 (unsigned)
 +    <3e>   DW_AT_name        : (indirect string, offset: 0x0): short unsigned int
 + <1><42>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <43>   DW_AT_byte_size   : 4
 +    <44>   DW_AT_encoding    : 7 (unsigned)
 +    <45>   DW_AT_name        : (indirect string, offset: 0x2f): unsigned int
 + <1><49>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <4a>   DW_AT_byte_size   : 1
 +    <4b>   DW_AT_encoding    : 6 (signed char)
 +    <4c>   DW_AT_name        : (indirect string, offset: 0x46): signed char
 + <1><50>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <51>   DW_AT_byte_size   : 2
 +    <52>   DW_AT_encoding    : 5 (signed)
 +    <53>   DW_AT_name        : (indirect string, offset: 0x86): short int
 + <1><57>: Abbrev Number: 3 (DW_TAG_base_type)
 +    <58>   DW_AT_byte_size   : 4
 +    <59>   DW_AT_encoding    : 5 (signed)
 +    <5a>   DW_AT_name        : int
 + <1><5e>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <5f>   DW_AT_byte_size   : 8
 +    <60>   DW_AT_encoding    : 5 (signed)
 +    <61>   DW_AT_name        : (indirect string, offset: 0x57): long int
 + <1><65>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <66>   DW_AT_byte_size   : 8
 +    <67>   DW_AT_encoding    : 7 (unsigned)
 +    <68>   DW_AT_name        : (indirect string, offset: 0x90): sizetype
 + <1><6c>: Abbrev Number: 4 (DW_TAG_pointer_type)
 +    <6d>   DW_AT_byte_size   : 8
 +    <6e>   DW_AT_type        : <0x72>
 + <1><72>: Abbrev Number: 2 (DW_TAG_base_type)
 +    <73>   DW_AT_byte_size   : 1
 +    <74>   DW_AT_encoding    : 6 (signed char)
 +    <75>   DW_AT_name        : (indirect string, offset: 0x4d): char
 + <1><79>: Abbrev Number: 5 (DW_TAG_structure_type)
 +    <7a>   DW_AT_name        : (indirect string, offset: 0x3c): atype_t
 +    <7e>   DW_AT_byte_size   : 8
 +    <7f>   DW_AT_decl_file   : 1
 +    <80>   DW_AT_decl_line   : 3
 +    <81>   DW_AT_sibling     : <0xaa>
 + <2><85>: Abbrev Number: 6 (DW_TAG_member)
 +    <86>   DW_AT_name        : i
 +    <88>   DW_AT_decl_file   : 1
 +    <89>   DW_AT_decl_line   : 4
 +    <8a>   DW_AT_type        : <0x42>
 +    <8e>   DW_AT_data_member_location: 2 byte block: 23 0 (DW_OP_plus_uconst: 0)
 + <2><91>: Abbrev Number: 6 (DW_TAG_member)
 +    <92>   DW_AT_name        : s
 +    <94>   DW_AT_decl_file   : 1
 +    <95>   DW_AT_decl_line   : 5
 +    <96>   DW_AT_type        : <0x3b>
 +    <9a>   DW_AT_data_member_location: 2 byte block: 23 4 (DW_OP_plus_uconst: 4)
 + <2><9d>: Abbrev Number: 6 (DW_TAG_member)
 +    <9e>   DW_AT_name        : c
 +    <a0>   DW_AT_decl_file   : 1
 +    <a1>   DW_AT_decl_line   : 6
 +    <a2>   DW_AT_type        : <0x72>
 +    <a6>   DW_AT_data_member_location: 2 byte block: 23 6 (DW_OP_plus_uconst: 6)
 + <1><aa>: Abbrev Number: 7 (DW_TAG_typedef)
 +    <ab>   DW_AT_name        : (indirect string, offset: 0x3c): atype_t
 +    <af>   DW_AT_decl_file   : 1
 +    <b0>   DW_AT_decl_line   : 7
 +    <b1>   DW_AT_type        : <0x79>
 + <1><b5>: Abbrev Number: 8 (DW_TAG_subprogram)
 +    <b6>   DW_AT_external    : 1
 +    <b7>   DW_AT_name        : (indirect string, offset: 0x52): main
 +    <bb>   DW_AT_decl_file   : 1
 +    <bc>   DW_AT_decl_line   : 9
 +    <bd>   DW_AT_prototyped  : 1
 +    <be>   DW_AT_type        : <0x57>
 +    <c2>   DW_AT_low_pc      : 0x40055c
 +    <ca>   DW_AT_high_pc     : 0x4005ce
 +    <d2>   DW_AT_frame_base  : 0x0 (location list)
 +    <d6>   DW_AT_GNU_all_tail_call_sites: 1
 +    <d7>   DW_AT_sibling     : <0x11e>
 + <2><db>: Abbrev Number: 9 (DW_TAG_formal_parameter)
 +    <dc>   DW_AT_name        : (indirect string, offset: 0x99): argv
 +    <e0>   DW_AT_decl_file   : 1
 +    <e1>   DW_AT_decl_line   : 9
 +    <e2>   DW_AT_type        : <0x11e>
 +    <e6>   DW_AT_location    : 2 byte block: 91 48 (DW_OP_fbreg: -56)
 + <2><e9>: Abbrev Number: 9 (DW_TAG_formal_parameter)
 +    <ea>   DW_AT_name        : (indirect string, offset: 0x19): argl
 +    <ee>   DW_AT_decl_file   : 1
 +    <ef>   DW_AT_decl_line   : 9
 +    <f0>   DW_AT_type        : <0x57>
 +    <f4>   DW_AT_location    : 2 byte block: 91 44 (DW_OP_fbreg: -60)
 + <2><f7>: Abbrev Number: 10 (DW_TAG_variable)
 +    <f8>   DW_AT_name        : (indirect string, offset: 0x13): atype
 +    <fc>   DW_AT_decl_file   : 1
 +    <fd>   DW_AT_decl_line   : 10
 +    <fe>   DW_AT_type        : <0xaa>
 +    <102>   DW_AT_location    : 2 byte block: 91 50 (DW_OP_fbreg: -48)
 + <2><105>: Abbrev Number: 11 (DW_TAG_variable)
 +    <106>   DW_AT_name        : i
 +    <108>   DW_AT_decl_file   : 1
 +    <109>   DW_AT_decl_line   : 11
 +    <10a>   DW_AT_type        : <0x57>
 +    <10e>   DW_AT_location    : 2 byte block: 91 6c (DW_OP_fbreg: -20)
 + <2><111>: Abbrev Number: 11 (DW_TAG_variable)
 +    <112>   DW_AT_name        : c
 +    <114>   DW_AT_decl_file   : 1
 +    <115>   DW_AT_decl_line   : 16
 +    <116>   DW_AT_type        : <0x124>
 +    <11a>   DW_AT_location    : 2 byte block: 91 60 (DW_OP_fbreg: -32)
 + <1><11e>: Abbrev Number: 4 (DW_TAG_pointer_type)
 +    <11f>   DW_AT_byte_size   : 8
 +    <120>   DW_AT_type        : <0x6c>
 + <1><124>: Abbrev Number: 4 (DW_TAG_pointer_type)
 +    <125>   DW_AT_byte_size   : 8
 +    <126>   DW_AT_type        : <0x12a>
 + <1><12a>: Abbrev Number: 12 (DW_TAG_const_type)
 +    <12b>   DW_AT_type        : <0x34>
 +</code>
 +
 +This is a lot of output. Let's explain the different parts.
 +
 +Each line represents either the start of a new entity or adds an attribute to one. All entities and attributes have a unique address, encoded in hex. Each entity has a type, which determines the attributes it has. The entity line consists of a nesting level, the unique address followed by the entity's type ID and a human readable name of the type. Attribute definitions start with the unique ID followed by the attribute name to be defined and a colon. After the colon, the actual value of the attribute follows. It can be a number, a string or a pointer to another entity. The pointer is encoded as ''<0x34>'' which references the entity with unique ID ''34''. This way, the debug info tree can be traversed.
 +
 +==== Finding the size information ====
 +
 +
 +Fist, it tells us what format the file has. In this case it is a ''elf64-x86-64''. We know that pointers are 64 bits ling on this platform.
 +
 +Next, it shows the contents of the debug info section. There can be different compilation units, here it is just one. It tells us that pointers indeed have a length of 8 bits. The total length of the debug info section is 0x12c.
 +
 +The first compilation unit is marked with ''<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)''. It means that at nesting level 0, address 0xb in the section is an entity of type 1, which is a ''DW_TAG_compile_unit''. It has several members which are shown on the next few lines. There is nothing interesting for us.
 +
 +The first children of the compile unit are ''DW_TAG_base_type''s. The ''DW_AT_byte_size'' member tells us how long the base type is, what encoding it has (''DW_AT_encoding'') and its name (''DW_AT_name'').
 +
 +Skipping to address 0x79 it gets more interesting: The definition of our structure.
 +<code> <1><79>: Abbrev Number: 5 (DW_TAG_structure_type)</code>
 +It shows that our structure occupies 8 bytes in memory, which we can verify using the output of our sample program. Nested under the structure type, we find several ''DW_TAG_member'' children which represent the structure members. Most interesting is the ''DW_AT_data_member_location'' which tells us at what position inside the structure a member is positioned. The member ''i'' is at position 0, ''s'' is at 4 and ''c'' at 6. This leads us to the following in-memory structure:
 +
 +<code>i3 i2 i1 i0 s1 s0 c0 00</code>
 +Here, ''iN'' represent the different bytes of the integer ''i'', same holds for ''s'' and ''c''. There is one byte padding at the end. The size of each member type can be determined by following the ''DW_AT_type'' pointer inside the debug info section.
 +
 +
 +==== Problems ====
 +
 +This approach has the downside that it requires a certain format of the debug dump. Also, it does not specify the endianness of the types.
 +
 +===== IAR =====
 +
 +The same can be done with the tools that come with the IAR C/C++ compiler. The ELF dump utility is called ''ielfdump''.
 +
 +
 +
info/c_memory_structure.txt · Last modified: 2012/10/16 17:23 by moritz

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki