Memory Map In C

A typical memory representation of C program consists of following sections.
1. Text segment

2. Initialized data segment
3. Uninitialized data segment
4. Stack
5. Heap

A typical memory layout of a running process
1. Text Segment:

A text segment , also known as a code segment or simply as text, is one of the sections of a program in an object file or in memory, which contains executable instructions.

As a memory region, a text segment may be placed below the heap or stack in order to prevent heaps and stack overflows from overwriting it.
Usually, the text segment is sharable so that only a single copy needs to be in memory for frequently executed programs, such as text editors, the C compiler, the shells, and so on. Also, the text segment is often read-only, to prevent a program from accidentally modifying its instructions.
2. Initialized Data Segment:

Initialized data segment, usually called simply the Data Segment. A data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer.

Note that, data segment is not read-only, since the values of the variables can be altered at run time.
This segment can be further classified into initialized read-only area and initialized read-write area.
For instance the global string defined by char s[] = “hello world” in C and a C statement like int debug=1 outside the main (i.e. global) would be stored in initialized read-write area. And a global C statement like const char* string = “hello world” makes the string literal “hello world” to be stored in initialized read-only area and the character pointer variable string in initialized read-write area.
Ex: static int i = 10 will be stored in data segment and global int i = 10 will also be stored in data segment
3. Uninitialized Data Segment:

Uninitialized data segment, often called the “bss” (Block Started by Symbol) segment, named after an ancient assembler operator that stood for “block started by symbol.” Data in this segment is initialized by the kernel to arithmetic 0 before the program starts executing

uninitialized data starts at the end of the data segment and contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code.
For instance a variable declared static int i; would be contained in the BSS segment.

For instance a global variable declared int j; would be contained in the BSS segment.

4. Stack:

The stack area traditionally adjoined the heap area and grew the opposite direction; when the stack pointer met the heap pointer, free memory was exhausted. (With modern large address spaces and virtual memory techniques they may be placed almost anywhere, but they still typically grow opposite directions.)

The stack area contains the program stack, a LIFO structure, typically located in the higher parts of memory. On the standard PC x86 computer architecture it grows toward address zero; on some other architectures it grows the opposite direction. A “stack pointer” register tracks the top of the stack; it is adjusted each time a value is “pushed” onto the stack. The set of values pushed for one function call is termed a “stack frame”; A stack frame consists at minimum of a return address.
Stack, where automatic variables are stored, along with information that is saved each time a function is called. Each time a function is called, the address of where to return to and certain information about the caller’s environment, such as some of the machine registers, are saved on the stack. The newly called function then allocates room on the stack for its automatic and temporary variables. This is how recursive functions in C can work. Each time a recursive function calls itself, a new stack frame is used, so one set of variables doesn’t interfere with the variables from another instance of the function.
5. Heap:

Heap is the segment where dynamic memory allocation usually takes place.

The heap area begins at the end of the BSS segment and grows to larger addresses from there.The Heap area is managed by malloc, realloc, and free, which may use the brk and sbrk system calls to adjust its size (note that the use of brk/sbrk and a single “heap area” is not required to fulfill the contract of malloc/realloc/free; they may also be implemented using mmap to reserve potentially non-contiguous regions of virtual memory into the process’ virtual address space). The Heap area is shared by all shared libraries and dynamically loaded modules in a process.
Examples.
The size(1) command reports the sizes (in bytes) of the text, data, and bss segments. ( for more details please refer man page of size(1) )
1. Check the following simple C program
#include <stdio.h>int main(void){return 0;}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
960        248          8       1216        4c0    memory-layout
2. Let us add one global variable in program, now check the size of bss (highlighted in red color).
#include <stdio.h>int global; /* Uninitialized variable stored in bss*/int main(void){return 0;}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
960        248         12       1220        4c4    memory-layout
3. Let us add one static variable which is also stored in bss.
#include <stdio.h>int global; /* Uninitialized variable stored in bss*/int main(void){static int i; /* Uninitialized static variable stored in bss */return 0;}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
960        248         16       1224        4c8    memory-layout
4. Let us initialize the static variable which will then be stored in Data Segment (DS)
#include <stdio.h>int global; /* Uninitialized variable stored in bss*/int main(void){static int i = 100; /* Initialized static variable stored in DS*/return 0;}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
960         252         12       1224        4c8    memory-layout
5. Let us initialize the global variable which will then be stored in Data Segment (DS)
#include <stdio.h>int global = 10; /* initialized global variable stored in DS*/int main(void){static int i = 100; /* Initialized static variable stored in DS*/return 0;}
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text       data        bss        dec        hex    filename
960         256          8       1224        4c8    memory-layout
1.      Where are global local static extern variables stored?
  • Local Variables are stored in Stack. Register variables are stored in Register. Global & static variables are stored in data segment (BSS). The memory created dynamically are stored in Heap and the C program instructions get stored in code segment and the extern variables also stored in data segment
2.     What does BSS Segment store?
  • BSS segment stores the uninitialized global and static variables and initializes them to zero. I read that BSS segment doesn’t consume memory, then where those it store these variables? You probably read that the BSS segment doesn’t consume space in the executable file on disk. When the executable loaded, the BSS segment certainly does consume space in memory. Space is allocated and initialized to zero by the OS loader
3.         Global variable and Local variable-
  • Global variables once declared they can be used anywhere in the program i.e. even in many functions. If possible u can use the global variables in the different user defined header files as like in packages in java. On the other hand global variables values can be changed programmatically local variables are local to a functional and can’t be used beyond that function.
4.    Static variable and Global variable?
  • Static variables once declared they remain the same in the entire program and those values can’t be changed programmatically.
    global variables: check above description
ASSEMBLER, LINKER AND LOADER:
Normally the C’s program building process involves four stages and utilizes different ‘tools’ such as a preprocessor, compiler, assembler, and linker.
  • At the end there should be a single executable file.  Below are the stages that happen in order regardless of the operating system/compiler and graphically illustrated in Figure w.1.
1.   Preprocessing is the first pass of any C compilation. It processes include-files, conditional compilation instructions and macros.
2.   Compilation is the second pass. It takes the output of the preprocessor, and the source code, and generates assembler source code.
3.   Assembly is the third stage of compilation. It takes the assembly source code and produces an assembly listing with offsets. The assembler output is stored in an object file.
4.   Linking is the final stage of compilation. It takes one or more object files or libraries as input and combines them to produce a single (usually executable) file. In doing so, it resolves references to external symbols, assigns final addresses to procedures/functions and variables, and revises code and data to reflect new addresses (a process called relocation).

  • Bear in mind that if you use the IDE type compilers, these processes quite transparent.
  • Now we are going to examine more details about the process that happen before and after the linking stage.  For any given input file, the file name suffix (file extension) determines what kind of compilation is done and the example for GCC is listed in Table w.1.
  • In UNIX/Linux, the executable or binary file doesn’t have extension whereas in Windows the executables for example may have .exe, .com and .dll
File extension
Description
file_name.c
C source code which must be preprocessed.
file_name.i
C source code which should not be preprocessed.
file_name.ii
C++ source code which should not be preprocessed.
file_name.h
C header file (not to be compiled or lin
file_name.ccfile_name.cpfile_name.C
C++ source code which must be preprocessed.  For file_name.cxx, the xx must both be literally character x and file_name.C, is capital c.
file_name.s
Assembler code.
file_name.S
Assembler code which must be preprocessed.
file_name.o
Object file by default, the object file name for a source file is made by replacing the extension .c, .i, .s etc with .o
Table w.1


The following Figure shows the steps involved in the process of building the C program starting from the compilation until the loading of the executable image into the memory for program running.   

Figure w.1:  Compile, link & execute stages for running program

W.2  OBJECT FILES and EXECUTABLE
  • After the source code has been assembled, it will produce an Object files (e.g. .o, .obj) and then linked, producing an executable files.
  • An object and executable come in several formats such as ELF (Executable and Linking Format) and COFF (Common Object-File Format).  For example, ELF is used on Linux systems, while COFF is used on Windows systems.
  • Other object file formats are listed in the following Table.
Object File Format
Description
a.out
The a.out format is the original file format for Unix.  It consists of three sections: text, data, and bss, which are for program code, initialized data, and uninitialized data, respectively.  This format is so simple that it doesn’t have any reserved place for debugging information.  The only debugging format for a.out is stabs, which is encoded as a set of normal symbols with distinctive attributes.
COFF
The COFF (Common Object File Format) format was introduced with System V Release 3 (SVR3) Unix. COFF files may have multiple sections, each prefixed by a header. The number of sections is limited.  The COFF specification includes support for debugging but the debugging information was limited.  There is no file extension for this format.
ECOFF
A variant of COFF.  ECOFF is an Extended COFF originally introduced for Mips and Alpha workstations.
XCOFF
The IBM RS/6000 running AIX uses an object file format called XCOFF (eXtended COFF). The COFF sections, symbols, and line numbers are used, but debugging symbols are dbx-style stabs whose strings are located in the .debug section (rather than the string table).  The default name for an XCOFF executable file is a.out.
PE
Windows 9x and NT use the PE (Portable Executable) format for their executables.  PE is basically COFF with additional headers.  The extension normally .exe.
ELF
The ELF (Executable and Linking Format) format came with System V Release 4 (SVR4) Unix.  ELF is similar to COFF in being organized into a number of sections, but it removes many of COFF’s limitations.  ELF used on most modern Unix systems, including GNU/Linux, Solaris and Irix. Also used on many embedded systems.
SOM/ESOM
SOM (System Object Module) and ESOM (Extended SOM) is HP’s object file and debug format (not to be confused with IBM’s SOM, which is a cross-language Application Binary Interface – ABI).
Table w.2
  • When we examine the content of these object files there are areas called sections.  Sections can hold executable code, data, dynamic linking information, debugging data, symbol tables, relocation information, comments, string tables, and notes.
  • Some sections are loaded into the process image and some provide information needed in the building of a process image while still others are used only in linking object files.
  • There are several sections that are common to all executable formats (may be named differently, depending on the compiler/linker) as listed below:
Section
Description
.text
This section contains the executable instruction codes and is shared among every process running the same binary. This section usually has READ and EXECUTE permissions only. This section is the one most affected by optimization.
.bss
BSS stands for ‘Block Started by Symbol’. It holds un-initialized global and static variables. Since the BSS only holds variables that don’t have any values yet, it doesn’t actually need to store the image of these variables. The size that BSS will require at runtime is recorded in the object file, but the BSS (unlike the data section) doesn’t take up any actual space in the object file.
.data
Contains the initialized global and static variables and their values. It is usually the largest part of the executable. It usually has READ/WRITE permissions.
.rdata
Also known as .rodata (read-only data) section. This contains constants and string literals.
.reloc
Stores the information required for relocating the image while loading.
Symbol table
A symbol is basically a name and an address.  Symbol table holds information needed to locate and relocate a program’s symbolic definitions and references. A symbol table index is a subscript into this array. Index 0 both designates the first entry in the table and serves as the undefined symbol index.  The symbol table contains an array of symbol entries.
Relocation records
Relocation is the process of connecting symbolic references with symbolic definitions. For example, when a program calls a function, the associated call instruction must transfer control to the proper destination address at execution. Re-locatable files must have relocation entries’ which are necessary because they contain information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process’s program image.  Simply said relocation records are information used by the linker to adjust section contents.
Table w.3:  Segments in executable file
  • The following is an example of the object file content dumping using readelf program.  Other utility can be used is objdump. 
/* testprog1.c */
#include <stdio.h>
static void display(int i, int *ptr);
int main(void)
{
      int x = 5;
      int *xptr = &x;
      printf(“In main() program:\n”);
      printf(“x value is %d and is stored at address %p.\n”, x, &x);
      printf(“xptr pointer points to address %p which holds a value of %d.\n”, xptr, *xptr);
      display(x, xptr);
      return 0;
}

void display(int y, int *yptr)
{
      char var[7] = “ABCDEF”; 
      printf(“In display() function:\n”);
      printf(“y value is %d and is stored at address %p.\n”, y, &y);
      printf(“yptr pointer points to address %p which holds a value of %d.\n”, yptr, *yptr);
}
[bodo@bakawali test]$ gcc -c testprog1.c
[bodo@bakawali test]$ readelf -a testprog1.o
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                                  ELF32
  Data:                                   2′s complement, little endian
  Version:                                1 (current)
  OS/ABI:            UNIX – System V
  ABI Version:              0
  Type:              REL (Relocatable file)
  Machine:           Intel 80386
  Version:           0×1
  Entry point address:      0×0
  Start of program headers: 0 (bytes into file)
  Start of section headers: 672 (bytes into file)
  Flags:                    0×0
  Size of this header:      52 (bytes)
  Size of program headers:  0 (bytes)
  Number of program headers:       0
  Size of section headers:         40 (bytes)
  Number of section headers:       11
  Section header string table index:      8

Section Headers:
  [Nr] Name          Type          Addr     Off        Size     ES Flg Lk Inf Al
  [ 0]               NULL          00000000 000000 000000 00      0   0  0
  [ 1] .text         PROGBITS      00000000 000034 0000de 00  AX   0   0  4
  [ 2] .rel.text     REL           00000000 00052c 000068 08      9   1  4
  [ 3] .data         PROGBIT       00000000 000114 000000 00         WA  0   0  4
  [ 4] .bss          NOBIT         00000000 000114 000000 00  WA  0   0  4
  [ 5] .rodata              PROGBITS      00000000 000114 00010a 00      A  0   0  4
  [ 6] .note.GNU-stack      PROGBITS      00000000 00021e 000000 00      0   0  1
  [ 7] .comment      PROGBITS      00000000 00021e 000031 00      0   0  1
  [ 8] .shstrtab     STRTAB 00000000 00024f 000051 00       0   0  1
  [ 9] .symtab              SYMTAB 00000000 000458 0000b0 10     10  9  4
  [10] .strtab              STRTAB 00000000 000508 000021 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no program headers in this file.

Relocation section ‘.rel.text’ at offset 0x52c contains 13 entries:
 Offset       Info                 Type   Sym.Value  Sym. Name
0000002d  00000501 R_386_32 00000000   .rodata
00000032  00000a02 R_386_PC32      00000000   printf
00000044  00000501 R_386_32 00000000   .rodata
00000049  00000a02 R_386_PC32      00000000   printf
0000005c  00000501 R_386_32 00000000   .rodata
00000061  00000a02 R_386_PC32      00000000   printf
0000008c  00000501 R_386_32 00000000   .rodata
0000009c  00000501 R_386_32 00000000   .rodata
000000a1  00000a02 R_386_PC32      00000000   printf
000000b3  00000501 R_386_32 00000000   .rodata
000000b8  00000a02 R_386_PC32      00000000   printf
000000cb  00000501 R_386_32 00000000   .rodata
000000d0  00000a02 R_386_PC32      00000000   printf

There are no unwind sections in this file.

Symbol table ‘.symtab’ contains 11 entries:
   Num:    Value     Size Type    Bind        Vis             Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS testprog1.c
     2: 00000000     0 SECTION LOCAL  DEFAULT    1
     3: 00000000     0 SECTION LOCAL  DEFAULT    3
     4: 00000000     0 SECTION LOCAL  DEFAULT    4
     5: 00000000     0 SECTION LOCAL  DEFAULT    5
     6: 00000080     94 FUNC   LOCAL  DEFAULT    1 display
     7: 00000000     0 SECTION LOCAL  DEFAULT    6
     8: 00000000     0 SECTION LOCAL  DEFAULT    7
     9: 00000000     128 FUNC  GLOBAL DEFAULT    1 main
    10: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND printf                NEXT->>

1 comment: