CSC 334-Systems Programming -2020-2021 Session-Lecture-3- Code, Data and Stack Segments.pptx
Size: 714.7 KB
Language: en
Added: Oct 18, 2025
Slides: 29 pages
Slide Content
Code, Data and Stack Segments CSC 334 Systems Programming 2020/ 2021 Session Lecture - 3
Code, Data and Stack Segments The memory layout of a C program has various segments to store data and code instructions
Code, Data and Stack Segments … A Compiler driver invokes the language preprocessor, compiler, assembler, and linker, as needed on behalf of the user A Preprocessor (or precompiler) is a program that processes its input data to produce output that is used as input to another program . ... In some computer languages (e.g., C and PL/I) there is a phase of translation known as preprocessing A Compiler is a program that converts instructions into a machine-code or lower-level form so that they can be read and executed by a computer . An assembler is a program that takes basic computer instructions and converts them into a pattern of bits that the computer's processor can use to perform its basic operations Linker or link editor is a computer system program that takes one or more object files (generated by a compiler or an assembler) and combines them into a single executable file, library file, or another "object" file. It can generate three types of object files depending upon the options supplied to the compiler driver Technically an Object file is a sequence of bytes stored on disk in a file
Code, Data and Stack Segments … Object Files Relocatable object file : These are static library files Static linkers such as the Unixld program that takes collection of relocatable object files and command line arguments as input and generate a fully linked executable object file as output that can be loaded into memory and run Relocatable object files contain binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file
Code & Data Segment- Object Files … Executable object file These are executable files They contain binary code and data in a form that can be copied directly into memory and executed Shared object file These special type of relocatable object files are loaded into memory and linked dynamically, at either load time or run time Object files have a specific format, however this format may vary from system to system.
Codes, Data Segments- Object Files Some most prevalent Object File formats are: . coff (Common Object File F ormat ), . pe (Portable Executable), and . elf (Executable and Linkable Format ) The actual layout of a program's in-memory image is left entirely up to the operating system, and often the program itself as well For a running program, both the machine instructions (program code) and data are stored in the same memory space The memory is logically divided into text (code) and data segments Modern systems use a single text segment to store program instructions, but more than one segment for data, depending upon the storage class of the data being stored there
Codes, Data Segments These segments can be described as follows: 1. Text or Code Segment 2. Initialized Data Segments 3. Uninitialized Data Segments 4. Stack Segment 5. Heap Segment Text or Code Segment Code segment , also known as text segment contains machine code of the compiled program The text segment of an executable object file is often read-only segment that prevents a program from being accidentally modified
Codes, Data Segments … Data Segments Data segment stores program data This data could be in form of initialized or uninitialized variables I t could be local or global Data segment is further divided into four sub-data segments : initialized data segment uninitialized or .bss data segment stack and heap All these sub-data segments are used to store variables depending upon if they are local or global, and initialized or uninitialized
Codes & Data Segments … Initialized Data or Data Segment Initialized data or simply data segment that stores all global, static, constant, and external variables (declared with extern keyword) that are initialized beforehand Uninitialized Data or .bss Segment Contrary to initialized data segment, uninitialized data or .bss segment stores all uninitialized global, static, and external variables (declared with extern keyword) Global, external, and static variable are by default initialized to zero This section occupies no actual space in the object file; it is merely a place holder Object file formats distinguish between initialized and uninitialized variables for space efficiency uninitialized variables do not have to occupy any actual disk space in the object file The use of the term .bss to denote uninitialized data is universal It was originally an acronym for the "Block Storage Start" instruction from the IBM 704 assembly language (circa 1957) and the acronym has stuck A simple way to remember the difference between the .data and .bss sections is to think of "bss" as an abbreviation for "Better Save Space
Codes & Data Segments… Stack Segments The stack segment contains the system stack, which is used as temporary storage Stack segment is used to store all local variables It is used for passing arguments to the functions along with the return address of the instruction which is to be executed after the function call is over The stack is a simple data structure with a LIFO (last-in first-out) access policy Implementing a stack requires only a block of memory (e.g. an array in a HLL) and a stack pointer which tells us where the top of the stack is Local variables have a scope to the block which they are defined in they are created when control enters into the block Local variables do not appear in data or bss segment Also all recursive function calls are added to stack Data is added or removed in a Last-In-First-Out (LIFO) manner to stack Stack Segment
Codes and Data Segments … When a new stack frame needs to be added (as a result of a newly called function), the stack grows downward In the MIPS architecture, the $sp register is designated as the stack pointer MIPS (Microprocessor without Interlocked Pipelined Stages) is a R educed I nstruction Set C omputer (RISC) Instruction S et A rchitecture (ISA) developed by MIPS Computer Systems, now MIPS Technologies, based in the United States. Adding an element to the top of the stack is known as a push , and retrieving an item from the top is known as a pop If we were to push the values 1, 2, 3, 4, and 5, and then do a single pop, the stack would appear as follows:
Codes and Data Segments … pushing 1, 2, 3, 4, 5 and then do a single pop
Codes, Data Segments … Heap Segment Heap segment is also part of RAM where dynamically allocated variables are stored In C language, dynamic memory allocation is done by using malloc () and calloc () functions When some more memory need to be allocated using malloc () and calloc () function, heap grows upward The stack and heap are traditionally located at opposite ends of the process's virtual address space
Codes, Data Segments… Size of Code, Data, and .BSS Segments The size of the text and data segments are known as soon as compilation or assembly is completed The stack and heap segments, on the other hand, grow and shrink during program execution For this reason, they tend to be configured such that they grow toward each other This way, they each have a fixed starting point (one adjacent to the text and data segments, and the other at one end of the entire memory space), and the boundary between them is flexible Both can grow until all available memory is used.
Codes and Data Segments…
Assembly Language Programming - I Each personal computer has a microprocessor that manages the computer's arithmetical, logical, and control activities. Each family of processors has its own set of instructions for handling various operations such as getting input from keyboard, displaying information on screen and performing various other jobs These set of instructions are called 'machine language instructions'. A processor understands only machine language instructions, which are strings of 1's and 0's. However , machine language is too obscure and complex for use in software development So , the low-level assembly language is designed for a specific family of processors that represents various instructions in symbolic code and a more understandable form
Assembly Languages… Advantages of Assembly Language Having an understanding of assembly language makes one aware of how programs interface with OS, processor, and BIOS; How data is represented in memory and other external devices How the processor accesses and executes instruction How instructions access and process data How a program accesses external devices Other advantages of using assembly language are − It requires less memory and execution time It allows hardware-specific complex jobs in an easier way It is suitable for time-critical jobs It is most suitable for writing interrupt service routines and other memory resident programs
Assembly Language … Basic Features of PC Hardware The main internal hardware of a PC consists of processor, memory, and registers Registers are processor components that hold data and address To execute a program, the system copies it from the external device into the internal memory The processor executes the program instructions. The fundamental unit of computer storage is a bit; it could be ON (1) or OFF (0) and a group of 8 related bits makes a byte on most of the modern computers. So, the parity bit is used to make the number of bits in a byte odd. If the parity is even, the system assumes that there had been a parity error (though rare), which might have been caused due to hardware fault or electrical disturbance
Assembly Languages … The processor supports the following data sizes − Word: a 2-byte data item Doubleword: a 4-byte (32 bit) data item Quadword: an 8-byte (64 bit) data item Paragraph: a 16-byte (128 bit) area Kilobyte: 1024 bytes Megabyte: 1,048,576 bytes Binary Number System Every number system uses positional notation, i.e., each position in which a digit is written has a different positional value Each position is power of the base, which is 2 for binary number system, and these powers begin at 0 and increase by 1. The following table shows the positional values for an 8-bit binary number, where all bits are set ON.
Assembly Language … the positional values for an 8-bit binary number Bit value 1 1 1 1 1 1 1 1 Position value as a power of base 2 128 64 32 16 8 4 2 1 Bit number 7 6 5 4 3 2 1
Assembly Language … Hexadecimal Number System Hexadecimal number system uses base 16 The digits in this system range from 0 to 15 By convention, the letters A through F is used to represent the hexadecimal digits corresponding to decimal values 10 through 15. Hexadecimal numbers in computing is used for abbreviating lengthy binary representations Basically , hexadecimal number system represents a binary data by dividing each byte in half and expressing the value of each half-byte The following table provides the decimal, binary, and hexadecimal equivalents
Assembly Language… Decimal number Binary representation Hexadecimal representation 1 1 1 2 10 2 3 11 3 4 100 4 5 101 5 6 110 6 7 111 7 8 1000 8 9 1001 9 10 1010 A 11 1011 B 12 1100 C 13 1101 D 14 1110 E 15 1111 F
Assembly Language … To convert a binary number to its hexadecimal equivalent break it into groups of 4 consecutive groups each, starting from the right write those groups over the corresponding digits of the hexadecimal number. Example − Binary number 1000 1100 1101 0001 is equivalent to hexadecimal - 8 C D 1
Addressing Data in Memory The process through which the processor controls the execution of instructions is referred as the fetch-decode-execute cycle or the execution cycle . It consists of three continuous steps − Fetching the instruction from memory Decoding or identifying the instruction Executing the instruction The processor may access one or more bytes of memory at a time Let us consider a hexadecimal number 0725H This number will require two bytes of memory The high-order byte or most significant byte is 07 and the low-order byte is 25
Addressing Data in Memory … The processor stores data in reverse-byte sequence, i.e., a low-order byte is stored in a low memory address and a high-order byte in high memory address So , if the processor brings the value 0725H from register to memory, it will transfer 25 first to the lower memory address and 07 to the next memory address
Addressing Data in Memory … When the processor gets the numeric data from memory to register, it again reverses the bytes. There are two kinds of memory addresses − Absolute address - a direct reference of specific location. Segment address (or offset) - starting address of a memory segment with the offset value
Assembly Language- Environment Setup Local Environment Setup Assembly language is dependent upon the instruction set and the architecture of the processor There are many good assembler programs, such as : Microsoft Assembler (MASM) Borland Turbo Assembler (TASM) The GNU assembler (GAS) We will use the NASM assembler, as: It is Free . You can download it from various web sources. Well documented and you will get lots of information on net. Could be used on both Linux and Windows
Assembly Language Environment Setup … Installing NASM If you select "Development Tools" while installing Linux, you may get NASM installed along with the Linux operating system and you do not need to download and install it separately For checking whether you already have NASM installed, take the following steps − Open a Linux terminal. Type whereis nasm and press ENTER. If it is already installed, then a line like, nasm : / usr /bin/ nasm appears . Otherwise , you will see just nasm : , then you need to install NASM.
Assembly Language Environment Setup - Installing NASM … To install NASM, take the following steps − Check The Netwide Assembler (NASM) website for the latest version. Download the Linux source archive nasm-X.XX.ta.gz, where X.XX is the NASM version number in the archive. Unpack the archive into a directory which creates a subdirectory nasm -X. XX. cd to nasm -X.XX and type ./ configure This shell script will find the best C compiler to use and set up Makefiles accordingly. Type make to build the nasm and ndisasm binaries. Type make install to install nasm and ndisasm in / usr /local/bin and to install the man pages. This should install NASM on your system Alternatively , you can use an RPM distribution for the Fedora Linux This version is simpler to install, just double-click the RPM file