A typical memory representation of C program consists of
following sections.
1. Text segment
2. Initialized data segment
3. Uninitialized data segment
4. Stack
5. Heap
A typical memory layout of a running process
1. Text Segment:
A text segment , also known as a code segment or simply as text, is one of the
sections of a program in an object file or in memory, which contains executable
instructions.
As a memory region, a text segment may be placed below
the heap or stack in order to prevent heaps and stack overflows from
overwriting it.
Usually, the text segment is sharable so that only a
single copy needs to be in memory for frequently executed programs, such as
text editors, the C compiler, the shells, and so on. Also, the text segment is
often read-only, to prevent a program from accidentally modifying its
instructions.
2. Initialized Data Segment:
Initialized data segment, usually called simply the Data Segment. A data
segment is a portion of virtual address space of a program, which contains the
global variables and static variables that are initialized by the programmer.
Note that, data segment is not read-only, since the
values of the variables can be altered at run time.
This segment can be further classified into initialized
read-only area and initialized read-write area.
For instance the global string defined by char s[] =
“hello world” in C and a C statement like int debug=1 outside the main (i.e.
global) would be stored in initialized read-write area. And a global C
statement like const char* string = “hello world” makes the string literal
“hello world” to be stored in initialized read-only area and the character
pointer variable string in initialized read-write area.
Ex: static int i = 10 will be stored in data segment and
global int i = 10 will also be stored in data segment
3. Uninitialized Data Segment:
Uninitialized data segment, often called the “bss” (Block Started by Symbol)
segment, named after an ancient assembler operator that stood for “block
started by symbol.” Data in this segment is initialized by the kernel to arithmetic
0 before the program starts executing
uninitialized data starts at the end of the data segment
and contains all global variables and static variables that are initialized to
zero or do not have explicit initialization in source code.
For instance a variable declared static int i; would be
contained in the BSS segment.
For instance a global variable declared int j; would be contained in the BSS
segment.
4. Stack:
The stack area traditionally adjoined the heap area and grew the opposite
direction; when the stack pointer met the heap pointer, free memory was
exhausted. (With modern large address spaces and virtual memory techniques they
may be placed almost anywhere, but they still typically grow opposite
directions.)
The stack area contains the program stack, a LIFO
structure, typically located in the higher parts of memory. On the standard PC
x86 computer architecture it grows toward address zero; on some other
architectures it grows the opposite direction. A “stack pointer” register
tracks the top of the stack; it is adjusted each time a value is “pushed” onto
the stack. The set of values pushed for one function call is termed a “stack
frame”; A stack frame consists at minimum of a return address.
Stack, where automatic variables are stored, along with information
that is saved each time a function is called. Each time a function is called,
the address of where to return to and certain information about the caller’s
environment, such as some of the machine registers, are saved on the stack. The
newly called function then allocates room on the stack for its automatic and
temporary variables. This is how recursive functions in C can work. Each time a
recursive function calls itself, a new stack frame is used, so one set of
variables doesn’t interfere with the variables from another instance of the
function.
5. Heap:
Heap is the segment where dynamic memory allocation usually takes place.
The heap area begins at the end of the BSS segment and
grows to larger addresses from there.The Heap area is managed by malloc,
realloc, and free, which may use the brk and sbrk system calls to adjust its
size (note that the use of brk/sbrk and a single “heap area” is not required to
fulfill the contract of malloc/realloc/free; they may also be implemented using
mmap to reserve potentially non-contiguous regions of virtual memory into the
process’ virtual address space). The Heap area is shared by all shared
libraries and dynamically loaded modules in a process.
Examples.
The size(1) command reports the sizes (in bytes) of the text,
data, and bss segments. ( for more details please refer man page of size(1) )
1. Check the following simple C program
#include <stdio.h>int
main(void){return 0;}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text
data
bss
dec hex filename
960
248
8
1216 4c0
memory-layout
2. Let us add one global variable in program, now check
the size of bss (highlighted in red color).
#include <stdio.h>int global;
/* Uninitialized variable stored in bss*/int main(void){return 0;}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text
data
bss
dec hex filename
960
248 12
1220 4c4
memory-layout
3. Let us add one static variable which is also stored in
bss.
#include <stdio.h>int global;
/* Uninitialized variable stored in bss*/int main(void){static int i; /*
Uninitialized static variable stored in bss */return 0;}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text
data
bss
dec hex filename
960
248 16
1224 4c8
memory-layout
4. Let us initialize the static variable which will then
be stored in Data Segment (DS)
#include <stdio.h>int global;
/* Uninitialized variable stored in bss*/int main(void){static int i = 100;
/* Initialized static variable stored in DS*/return 0;}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text
data
bss
dec hex filename
960 252
12
1224 4c8
memory-layout
5. Let us initialize the global variable which will then
be stored in Data Segment (DS)
#include <stdio.h>int global =
10; /* initialized global variable stored in DS*/int main(void){static int i
= 100; /* Initialized static variable stored in DS*/return 0;}
|
[narendra@CentOS]$ gcc memory-layout.c -o memory-layout
[narendra@CentOS]$ size memory-layout
text
data
bss
dec hex filename
960 256
8
1224 4c8
memory-layout
1. Where are global local
static extern variables stored?
- Local Variables are stored in Stack. Register variables are stored in
Register. Global & static variables are stored in data segment (BSS).
The memory created dynamically are stored in Heap and the C program
instructions get stored in code segment and the extern variables also
stored in data segment
2. What does BSS Segment store?
- BSS segment stores the uninitialized global and static variables and
initializes them to zero. I read that BSS segment doesn’t consume memory,
then where those it store these variables? You probably read that the BSS
segment doesn’t consume space in the executable file on disk. When
the executable loaded, the BSS segment certainly does consume space
in memory. Space is allocated and initialized to zero by the OS loader
3. Global
variable and Local variable-
- Global variables once declared they can be used anywhere in the
program i.e. even in many functions. If possible u can use the global
variables in the different user defined header files as like in packages
in java. On the other hand global variables values can be changed
programmatically local variables are local to a functional and can’t be
used beyond that function.
4. Static variable and Global variable?
- Static variables once declared they remain the same in the entire program and those values can’t be changed programmatically.global variables: check above description
ASSEMBLER, LINKER AND LOADER:
Normally the C’s program building process involves four
stages and utilizes different ‘tools’ such as a preprocessor, compiler,
assembler, and linker.
- At the end there should be a single executable file. Below are
the stages that happen in order regardless of the operating
system/compiler and graphically illustrated in Figure w.1.
1. Preprocessing is the first pass
of any C compilation. It processes include-files, conditional compilation
instructions and macros.
2. Compilation is the second
pass. It takes the output of the preprocessor, and the source code, and
generates assembler source code.
3. Assembly is the third
stage of compilation. It takes the assembly source code and produces an
assembly listing with offsets. The assembler output is stored in an object
file.
4. Linking is the final
stage of compilation. It takes one or more object files or libraries as input
and combines them to produce a single (usually executable) file. In doing so,
it resolves references to external symbols, assigns final addresses to
procedures/functions and variables, and revises code and data to reflect new
addresses (a process called relocation).
- Bear in mind that if you use the IDE type compilers, these processes
quite transparent.
- Now we are going to examine more details about the process that happen
before and after the linking stage. For any given input file, the
file name suffix (file extension) determines what kind of compilation is
done and the example for GCC is listed in Table w.1.
- In UNIX/Linux, the executable or binary file doesn’t have extension
whereas in Windows the executables for example may have .exe, .com
and .dll.
File
extension
|
Description
|
file_name.c
|
C source code which
must be preprocessed.
|
file_name.i
|
C source code which
should not be preprocessed.
|
file_name.ii
|
C++ source code
which should not be preprocessed.
|
file_name.h
|
C header file (not
to be compiled or lin
|
file_name.ccfile_name.cpfile_name.C
|
C++ source code
which must be preprocessed. For file_name.cxx, the xx must both be
literally character x and file_name.C, is capital c.
|
file_name.s
|
Assembler code.
|
file_name.S
|
Assembler code
which must be preprocessed.
|
file_name.o
|
Object file by
default, the object file name for a source file is made by replacing the
extension .c, .i, .s etc with .o
|
Table
w.1
|
The following Figure shows the steps involved in the process of
building the C program starting from the compilation until the loading of
the executable image into the memory for program running.
Figure w.1:
Compile, link & execute stages for running program
W.2
OBJECT FILES and EXECUTABLE
- After the source code has been assembled, it will produce an Object
files (e.g. .o, .obj) and then linked, producing an
executable files.
- An object and executable come in several formats such as ELF
(Executable and Linking Format) and COFF (Common Object-File
Format). For example, ELF is used on Linux systems, while COFF is
used on Windows systems.
- Other object file formats are listed in the following Table.
Object
File Format
|
Description
|
a.out
|
The a.out format is
the original file format for Unix. It consists of three sections: text,
data, and bss, which are for program code, initialized data, and uninitialized
data, respectively. This format is so simple that it doesn’t have any
reserved place for debugging information. The only debugging format for
a.out is stabs, which is encoded as a set of normal symbols with distinctive
attributes.
|
COFF
|
The COFF (Common
Object File Format) format was introduced with System V Release 3 (SVR3)
Unix. COFF files may have multiple sections, each prefixed by a header. The
number of sections is limited. The COFF specification includes support
for debugging but the debugging information was limited. There is no
file extension for this format.
|
ECOFF
|
A variant of
COFF. ECOFF is an Extended COFF originally introduced for Mips and
Alpha workstations.
|
XCOFF
|
The IBM RS/6000
running AIX uses an object file format called XCOFF (eXtended COFF).
The COFF sections, symbols, and line numbers are used, but debugging symbols
are dbx-style stabs whose strings are located in the .debug section (rather
than the string table). The default name for an XCOFF executable file
is a.out.
|
PE
|
Windows 9x and NT
use the PE (Portable Executable) format for their executables.
PE is basically COFF with additional headers. The extension normally
.exe.
|
ELF
|
The ELF (Executable
and Linking Format) format came with System V Release 4 (SVR4) Unix.
ELF is similar to COFF in being organized into a number of sections, but it
removes many of COFF’s limitations. ELF used on most modern Unix
systems, including GNU/Linux, Solaris and Irix. Also used on many embedded
systems.
|
SOM/ESOM
|
SOM (System Object Module)
and ESOM (Extended SOM) is HP’s object file and debug format (not to be
confused with IBM’s SOM, which is a cross-language Application Binary
Interface – ABI).
|
Table
w.2
|
- When we examine the content of these object files there are areas
called sections. Sections can hold executable code, data, dynamic
linking information, debugging data, symbol tables, relocation
information, comments, string tables, and notes.
- Some sections are loaded into the process image and some provide
information needed in the building of a process image while still others
are used only in linking object files.
- There are several sections that are common to all executable formats
(may be named differently, depending on the compiler/linker) as listed
below:
Section
|
Description
|
.text
|
This section
contains the executable instruction codes and is shared among every process
running the same binary. This section usually has READ and EXECUTE
permissions only. This section is the one most affected by optimization.
|
.bss
|
BSS stands for
‘Block Started by Symbol’. It holds un-initialized global and static
variables. Since the BSS only holds variables that don’t have any values yet,
it doesn’t actually need to store the image of these variables. The size that
BSS will require at runtime is recorded in the object file, but the BSS
(unlike the data section) doesn’t take up any actual space in the object
file.
|
.data
|
Contains the
initialized global and static variables and their values. It is usually the
largest part of the executable. It usually has READ/WRITE permissions.
|
.rdata
|
Also known as
.rodata (read-only data) section. This contains constants and string
literals.
|
.reloc
|
Stores the
information required for relocating the image while loading.
|
Symbol table
|
A symbol is
basically a name and an address. Symbol table holds information needed
to locate and relocate a program’s symbolic definitions and references. A
symbol table index is a subscript into this array. Index 0 both designates
the first entry in the table and serves as the undefined symbol index.
The symbol table contains an array of symbol entries.
|
Relocation records
|
Relocation is the
process of connecting symbolic references with symbolic definitions. For
example, when a program calls a function, the associated call instruction
must transfer control to the proper destination address at execution.
Re-locatable files must have relocation entries’ which are necessary because
they contain information that describes how to modify their section contents,
thus allowing executable and shared object files to hold the right
information for a process’s program image. Simply said relocation
records are information used by the linker to adjust section contents.
|
Table
w.3: Segments in executable file
|
- The following is an example of the object file content dumping using
readelf program. Other utility can be used is objdump.
/* testprog1.c */
#include <stdio.h>
static void display(int i, int *ptr);
int main(void)
{
int x = 5;
int *xptr = &x;
printf(“In main() program:\n”);
printf(“x value is %d and
is stored at address %p.\n”, x, &x);
printf(“xptr pointer
points to address %p which holds a value of %d.\n”, xptr, *xptr);
display(x, xptr);
return 0;
}
void display(int y, int *yptr)
{
char var[7] =
“ABCDEF”;
printf(“In display()
function:\n”);
printf(“y value is %d and
is stored at address %p.\n”, y, &y);
printf(“yptr pointer
points to address %p which holds a value of %d.\n”, yptr, *yptr);
}
[bodo@bakawali test]$ gcc -c testprog1.c
[bodo@bakawali test]$ readelf -a testprog1.o
ELF Header:
Magic: 7f 45 4c 46 01 01
01 00 00 00 00 00 00 00 00 00
Class:
ELF32
Data:
2′s complement, little endian
Version:
1 (current)
OS/ABI: UNIX
– System V
ABI
Version:
0
Type:
REL (Relocatable file)
Machine: Intel
80386
Version: 0×1
Entry point
address: 0×0
Start of program headers: 0 (bytes
into file)
Start of section headers: 672 (bytes
into file)
Flags:
0×0
Size of this
header: 52 (bytes)
Size of program headers: 0
(bytes)
Number of program
headers: 0
Size of section
headers: 40 (bytes)
Number of section
headers: 11
Section header string table
index: 8
Section Headers:
[Nr]
Name
Type
Addr Off
Size ES Flg Lk Inf Al
[ 0]
NULL 00000000 000000
000000 00 0 0 0
[ 1]
.text
PROGBITS 00000000 000034 0000de 00
AX 0 0 4
[ 2]
.rel.text
REL 00000000 00052c
000068 08 9 1 4
[ 3]
.data
PROGBIT 00000000 000114 000000
00 WA 0
0 4
[ 4]
.bss
NOBIT 00000000 000114 000000
00 WA 0 0 4
[ 5]
.rodata
PROGBITS 00000000 000114 00010a
00 A 0 0 4
[ 6]
.note.GNU-stack
PROGBITS 00000000 00021e 000000
00 0 0 1
[ 7]
.comment PROGBITS
00000000 00021e 000031 00 0 0 1
[ 8]
.shstrtab STRTAB 00000000 00024f 000051
00 0 0 1
[ 9]
.symtab
SYMTAB 00000000 000458 0000b0 10 10 9 4
[10] .strtab
STRTAB 00000000 000508 000021 00 0
0 1
Key to Flags:
W (write), A (alloc), X (execute), M
(merge), S (strings)
I (info), L (link order), G (group),
x (unknown)
O (extra OS processing required) o
(OS specific), p (processor specific)
There are no program headers in this file.
Relocation section ‘.rel.text’ at offset
0x52c contains 13 entries:
Offset
Info
Type Sym.Value Sym. Name
0000002d 00000501 R_386_32
00000000 .rodata
00000032 00000a02 R_386_PC32
00000000 printf
00000044 00000501 R_386_32
00000000 .rodata
00000049 00000a02
R_386_PC32 00000000 printf
0000005c 00000501 R_386_32
00000000 .rodata
00000061 00000a02
R_386_PC32 00000000 printf
0000008c 00000501 R_386_32
00000000 .rodata
0000009c 00000501 R_386_32
00000000 .rodata
000000a1 00000a02
R_386_PC32 00000000 printf
000000b3 00000501 R_386_32
00000000 .rodata
000000b8 00000a02
R_386_PC32 00000000 printf
000000cb 00000501 R_386_32 00000000
.rodata
000000d0 00000a02
R_386_PC32 00000000 printf
There are no unwind sections in this file.
Symbol table ‘.symtab’ contains 11 entries:
Num:
Value Size Type
Bind
Vis Ndx
Name
0:
00000000 0 NOTYPE LOCAL DEFAULT UND
1:
00000000 0 FILE LOCAL
DEFAULT ABS testprog1.c
2:
00000000 0 SECTION LOCAL
DEFAULT 1
3:
00000000 0 SECTION LOCAL
DEFAULT 3
4:
00000000 0 SECTION LOCAL
DEFAULT 4
5: 00000000
0 SECTION LOCAL DEFAULT 5
6:
00000080 94 FUNC LOCAL
DEFAULT 1 display
7:
00000000 0 SECTION LOCAL
DEFAULT 6
8:
00000000 0 SECTION LOCAL
DEFAULT 7
9:
00000000 128 FUNC GLOBAL
DEFAULT 1 main
10:
00000000 0 NOTYPE GLOBAL DEFAULT UND printf NEXT->>
how can we execute any file without linker ?
ReplyDelete