Module 5 File system.Details regarding how files are handeled by the operating system Chapter 1.pptx
vaishalibagewadikar1
5 views
116 slides
Oct 30, 2025
Slide 1 of 116
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
About This Presentation
PPT on File system of operating systems
Size: 4.26 MB
Language: en
Added: Oct 30, 2025
Slides: 116 pages
Slide Content
Module 5 Chapter 1 File System
File Concept File Attributes Different OS keep track of different file attributes, including: Name - Identifier ( e.g. inode number ) Type - Text, executable, other binary, etc. Location - on the hard drive. Size Protection Time & Date User ID
File Concept File Operations The file ADT supports many common operations: Creating a file,Writing a file,Reading a file,Repositioning within a file,Deleting a file,Truncating a file. Most OSes require that files be opened before access and closed after all access is complete. Information about currently open files is stored in an open file table , containing for example: File pointer - records the current position in the file, for the next read or write access. File-open count - How many times has the current file been opened ( simultaneously by different processes ) and not yet closed? When this counter reaches zero the file can be removed from the table. Disk location of the file. Access rights Some systems provide support for file locking. A shared lock is for reading only. A exclusive lock is for writing as well as reading. An advisory lock is informational only, and not enforced. ( A "Keep Out" sign, which may be ignored. ) A mandatory lock is enforced. ( A truly locked door. ) UNIX used advisory locks, and Windows uses mandatory locks .
File Concept
File Concept File Structure Some files contain an internal structure, which may or may not be known to the OS. For the OS to support particular file formats increases the size and complexity of the OS. UNIX treats all files as sequences of bytes, with no further consideration of the internal structure. ( With the exception of executable binary programs, which it must know how to load and find the first executable statement, etc. ) Macintosh files have two forks - a resource fork , and a data fork . The resource fork contains information relating to the UI, such as icons and button images, and can be modified independently of the data fork , which contains the code or data as appropriate.
File Concept Internal File Structure Disk files are accessed in units of physical blocks, typically 512 bytes or some power-of-two multiple thereof. Larger physical disks use larger block sizes, to keep the range of block numbers within the range of a 32-bit integer. Internally files are organized in units of logical units, which may be as small as a single byte, or may be a larger size corresponding to some data record or structure size. The number of logical units which fit into one physical block determines its packing , and has an impact on the amount of internal fragmentation ( wasted space ) that occurs.
Access methods Access methods determine the way that files are accessed and read into memory. Sequential Access ✦ The most common method used by editors and compilers. ✦ Information is processed in order. read next write next reset no read after last write (rewrite)
Access Methods 2.Direct Access File File is made up of fixed-length logical records that allow programs to read and write records in no particular order. ■ The files is viewed as a numbered sequence of blocks or records. ■ Very useful in databases. ■ Direct Access {n = relative block number} read n write n position to n read next write next rewrite n
Access Methods
3.Index Sequential Access Method (ISAM) – uses Index Sequential Access Method (ISAM) indexes in a hierarchy to point to records in a file
Directory Structures Storage Structure A disk can be used in its entirety for a file system. Alternatively a physical disk can be broken up into multiple partitions, slices, or mini-disks , each of which becomes a virtual disk and can have its own file system. ( or be used for raw storage, swap space, etc. ) Or, multiple physical disks can be combined into one volume , i.e. a larger virtual disk, with its own file system spanning the physical disks.
Directory operations
Directory Design Goals
Types of Directory Structures 1. Single level 2.Two Level 3.Tree structured 4. Acyclic graph 5. General graph
Single level Directory
Two Level Directory
Two Level Directory Advantages ✦ Solves the name-collision problem. ✦ Isolates users from one another ! a form of protection. ✦ Efficient searching. Disadvantages ✦ Restricts user cooperation. ✦ No logical grouping capability (other than by user).
Tree Structured Directory
Tree Structured Directory
Tree Structured Directory Advantages: Very generalize, since full path name can be given. Very scalable, the probability of name collision is less. Searching becomes very easy, we can use both absolute path as well as relative. Disadvantages: Every file does not fit into the hierarchical model, files may be saved into multiple directories. We can not share files. It is inefficient, because accessing a file may go under multiple directories.
Acyclic graphs Directory Acyclic graphs allow directories to have shared subdirectories and files.
Acyclic graphs Directory It is used in the situation like when two programmers are working on a joint project and they need to access files. The associated files are stored in a subdirectory, separating them from other projects and files of other programmers, since they are working on a joint project so they want the subdirectories to be into their own directories. The common subdirectories should be shared. It is the point to note that shared file is not the same as copy file . If any programmer makes some changes in the subdirectory it will reflect in both subdirectories.
Acyclic graphs Directory Advantages: We can share files. Searching is easy due to different-different paths. Disadvantages: We share the files via linking, in case of deleting it may create the problem, If the link is softlink then after deleting the file we left with a dangling pointer. In case of hardlink , to delete a file we have to delete all the reference associated with it.
General Graph Directory When links are added to an existing tree-structured directory, a general graph structure can be created. In general graph directory structure, cycles are allowed within a directory structure where multiple directories can be derived from more than one parent directory.
General Graph Directory The main problem with this kind of directory structure is to calculate total size or space that has been taken by the files and directories. If cycles are allowed in the graphs, then several problems can arise: Search algorithms can go into infinite loops. One solution is to not follow links in search algorithms. ( Or not to follow symbolic links, and to only allow symbolic links to refer to directories. ) b. Sub-trees can become disconnected from the rest of the tree and still not have their reference counts reduced to zero. Periodic garbage collection is required to detect and resolve this problem.
General Graph Directory Advantages: It allows cycles. It is more flexible than other directories structure. Disadvantages: It is more costly than others. It needs garbage collection
File Mounting The basic idea behind mounting file systems is to combine multiple file systems into one large tree structure. The mount command is given a file system to mount and a mount point ( directory ) on which to attach it. Once a file system is mounted onto a mount point, any further references to that directory actually refer to the root of the mounted file system. Any files ( or sub-directories ) that had been stored in the mount point directory prior to mounting the new file system are now hidden by the mounted file system, and are no longer available. For this reason some systems only allow mounting onto empty directories. File systems can only be mounted by root, unless root has previously configured certain file systems to be mountable onto certain pre-determined mount points. ( E.g. root may allow users to mount floppy file systems to /mnt or something like it. ) Anyone can run the mount command to see what file systems are currently mounted. File systems may be mounted read-only, or have other restrictions imposed.
File Sharing Multiple Users On a multi-user system, more information needs to be stored for each file: The owner ( user ) who owns the file, and who can control its access. The group of other user IDs that may have some special access to the file. What access rights are afforded to the owner ( U ser ), the G roup, and to the rest ( O thers. ) Some systems have more complicated access control, allowing or denying specific accesses to specifically named users or groups.
Protection Files must be kept safe for reliability ( against accidental damage ), and protection ( against deliberate malicious access. ) The former is usually managed with backup copies. This section discusses the latter. One simple protection scheme is to remove all access to a file. However this makes the file unusable, so some sort of controlled access must be arranged.
Access Control One approach is to have complicated Access Control Lists, ACL, which specify exactly what access is allowed or denied for specific users or groups. The AFS uses this system for distributed access. Control is very finely adjustable, but may be complicated, particularly when the specific users involved are unknown. ( AFS allows some wild cards, so for example all users on a certain remote system may be trusted, or a given username may be trusted when accessing from any remote system. )
File-System Structure Hard disks have two important properties that make them suitable for secondary storage of files in file systems: (1) Blocks of data can be rewritten in place, and (2) they are direct access, allowing any block of data to be accessed with only ( relatively ) minor movements of the disk heads and rotational latency. Disks are usually accessed in physical blocks, rather than a byte at a time. Block sizes may range from 512 bytes to 4K or larger.
File system Structure
File-System Structure File systems organize storage on disk drives, and can be viewed as a layered design: At the lowest layer are the physical devices , consisting of the magnetic media, motors & controls, and the electronics connected to them and controlling them. Modern disk put more and more of the electronic controls directly on the disk drive itself, leaving relatively little work for the disk controller card to perform. I/O Control consists of device drivers , special software programs ( often written in assembly ) which communicate with the devices by reading and writing special codes directly to and from memory addresses corresponding to the controller card's registers. The basic file system level works directly with the device drivers in terms of retrieving and storing raw blocks of data, without any consideration for what is in each block. Depending on the system, blocks may be referred to with a single block number, ( e.g. block # 234234 ), or with head-sector-cylinder combinations.
File-System Structure The file organization module knows about files and their logical blocks, and how they map to physical blocks on the disk. In addition to translating from logical to physical blocks, the file organization module also maintains the list of free blocks, and allocates free blocks to files as needed. The logical file system deals with all of the meta data associated with a file ( UID, GID, mode, dates, etc ), i.e. everything about the file except the data itself. This level manages the directory structure and the mapping of file names to file control blocks, FCBs , which contain all of the meta data as well as block number information for finding the data on the disk.
Directory Implementation Directories need to be fast to search, insert, and delete, with a minimum of wasted disk space. 1 Linear List A linear list is the simplest and easiest directory structure to set up, but it does have some drawbacks. Finding a file ( or verifying one does not already exist upon creation ) requires a linear search. Deletions can be done by moving all entries, flagging an entry as deleted, or by moving the last entry into the newly vacant position. Sorting the list makes searches faster, at the expense of more complex insertions and deletions. A linked list makes insertions and deletions into a sorted list easier, with overhead for the links. More complex data structures, such as B-trees, could also be considered.
The real disadvantage of a linear list of directory entries is that finding a file requires a linear search. Directory information is used frequently, and users will notice if access to it is slow Directory Implementation
Directory Implementation 2.Hash Table A hash table can also be used to speed up searches. Hash tables are generally implemented in addition to a linear or other structure. The hash table takes a value computed from the file name and returns a pointer to the file name in the linear list. Therefore, it can greatly decrease the directory search time. Insertion and deletion are also fairly straightforward, although some provision must be made for collisions—situations in which two file names hash to the same location
Directory Implementation The major difficulties with a hash table are its generally fixed size and the dependence of the hash function on that size. For example, assume that we make a linear-probing hash table that holds 64 entries. The hash function converts file names into integers from 0 to 63, for instance, by using the remainder of a division by 64. If we later try to create a 65th file, we must enlarge the directory hash table—say, to 128 entries. As a result, we need a new hash function that must map file names to the range 0 to 127, and we must reorganize the existing directory entries to reflect their new hash-function values.
Directory Implementation Alternatively, a chained-overflow hash table can be used. Each hash entry can be a linked list instead of an individual value, and we can resolve collisions by adding the new entry to the linked list. Lookups may be somewhat slowed, because searching for a name might require stepping through a linked list of colliding table entries. Still, this method is likely to be much faster than a linear search through the entire directory
Contiguous Allocation In this scheme, each file occupies a contiguous set of blocks on the disk. For example, if a file requires n blocks and is given a block b as the starting location, then the blocks assigned to the file will be: b, b+1, b+2,……b+n-1. This means that given the starting block address and the length of the file (in terms of blocks required), we can determine the blocks occupied by the file. The directory entry for a file with contiguous allocation contains Address of starting block Length of the allocated portion.
Contiguous Allocation
Contiguous Allocation Advantages: Both the Sequential and Direct Accesses are supported by this. For direct access, the address of the k block of the file which starts at block b can easily be obtained as ( b+k ). This is extremely fast since the number of seeks are minimal because of contiguous allocation of file blocks. Disadvantages: This method suffers from both internal and external fragmentation. This makes it inefficient in terms of memory utilization. Increasing file size is difficult because it depends on the availability of contiguous memory at a particular instance.
Linked Allocation In this scheme, each file is a linked list of disk blocks which need not be contiguous. The disk blocks can be scattered anywhere on the disk. The directory entry contains a pointer to the starting and the ending file block . Each block contains a pointer to the next block occupied by the file. The file ‘ abc ’ in following image shows how the blocks are randomly distributed. The last block (25) contains -1 indicating a null pointer and does not point to any other block.
Linked Allocation
Linked Allocation Advantages: This is very flexible in terms of file size. File size can be increased easily since the system does not have to look for a contiguous chunk of memory. This method does not suffer from external fragmentation. This makes it relatively better in terms of memory utilization.
Linked Allocation Disadvantages: Because the file blocks are distributed randomly on the disk, a large number of seeks are needed to access every block individually. This makes linked allocation slower. It does not support random or direct access. We can not directly access the blocks of a file. A block k of a file can be accessed by traversing k blocks sequentially (sequential access ) from the starting block of the file via block pointers. Pointers required in the linked allocation incur some extra overhead.
Indexed Allocation In this scheme, a special block known as the Index block contains the pointers to all the blocks occupied by a file. Each file has its own index block. The i th entry in the index block contains the disk address of the i th file block. The directory entry contains the address of the index block as shown in the image:
Indexed Allocation
Indexed Allocation Advantages: This supports direct access to the blocks occupied by the file and therefore provides fast access to the file blocks. It overcomes the problem of external fragmentation. Disadvantages: The pointer overhead for indexed allocation is greater than linked allocation. For very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one entire block (index block) for the pointers which is inefficient in terms of memory utilization. However, in linked allocation we lose the space of only 1 pointer per block.
Indexed allocation For files that are very large, single index block may not be able to hold all the pointers. Following mechanisms can be used to resolve this: 1.Linked scheme: This scheme links two or more index blocks together for holding the pointers. Every index block would then contain a pointer or the address to the next index block. 2.Multilevel index: In this policy, a first level index block is used to point to the second level index blocks which in turn points to the disk blocks occupied by the file. This can be extended to 3 or more levels depending on the maximum file size.
Indexed Allocation 3.Combined Scheme: In this scheme, a special block called the Inode (information Node) contains all the information about the file such as the name, size, authority, etc and the remaining space of Inode is used to store the Disk Block addresses which contain the actual file . The first few of these pointers in Inode point to the direct blocks i.e the pointers contain the addresses of the disk blocks that contain data of the file. The next few pointers point to indirect blocks. Indirect blocks may be single indirect, double indirect or triple indirect. Single Indirect block is the disk block that does not contain the file data but the disk address of the blocks that contain the file data. Similarly, double indirect blocks do not contain the file data but the disk address of the blocks that contain the address of the blocks containing the file data.
Indexed Allocation Combined Scheme
Free-Space Management Another important aspect of disk management is keeping track of and allocating free space. 1.Bit Vector 2.Linked List 3.Grouping 4.Counting 5.Space Maps
Free Space Management 1.Bit Vector: One simple approach is to use a bit vector , in which each bit represents a disk block, set to 1 if free or 0 if allocated. Fast algorithms exist for quickly finding contiguous blocks of a given size The down side is that a 40GB disk requires over 5MB just to store the bitmap. ( For example. )
Free Space Management 2.Linked List A linked list can also be used to keep track of all free blocks. Traversing the list and/or finding a contiguous block of a given size are not easy, but fortunately are not frequently needed operations. Generally the system just adds and removes single blocks from the beginning of the list. The FAT table keeps track of the free list as just one more linked list on the table.
Free Space Management
Free Space Management 3.Grouping A variation on linked list free lists is to use links of blocks of indices of free blocks. If a block holds up to N addresses, then the first block in the linked-list contains up to N-1 addresses of free blocks and a pointer to the next block of free addresses. 4. Counting When there are multiple contiguous blocks of free space then the system can keep track of the starting address of the group and the number of contiguous free blocks. As long as the average length of a contiguous group of free blocks is greater than two this offers a savings in space needed for the free list. 5.Space Maps A space map is a fairly straightforward log of the free / allocated space for a region. Records are appended to the space map as space becomes freed / allocated from the region
Efficiency and Performance Efficiency UNIX pre-allocates inodes , which occupies space even before any files are created. UNIX also distributes inodes across the disk, and tries to store data files near their inode , to reduce the distance of disk seeks between the inodes and the data. Some systems use variable size clusters depending on the file size. The more data that is stored in a directory ( e.g. last access time ), the more often the directory blocks have to be re-written. Kernel table sizes used to be fixed, and could only be changed by rebuilding the kernels. Modern tables are dynamically allocated, but that requires more complicated algorithms for accessing them.
Efficiency and Performance Performance a. Disk controllers generally include on-board caching. b. When a seek is requested, the heads are moved into place, and then an entire track is read, starting from whatever sector is currently under the heads ( reducing latency. ) c. The requested sector is returned and the unrequested portion of the track is cached in the disk's electronics. d. Some OSes cache disk blocks they expect to need again in a buffer cache. e. A page cache connected to the virtual memory system is actually more efficient as memory addresses do not need to be converted to disk block addresses and back again .
Efficiency and Performance
DISK SCHEDULING
Disk scheduling is a technique used by the operating system to schedule multiple requests for accessing the disk. The algorithms used for disk scheduling are called as disk scheduling algorithms . The purpose of disk scheduling algorithms is to reduce the total seek time.
Important Terms Seek Time : Seek time is the time taken to locate the disk arm to a specified track where the data is to be read or write. So the disk scheduling algorithm that gives minimum average seek time is better. Rotational Latency: Rotational Latency is the time taken by the desired sector of disk to rotate into a position so that it can access the read/write heads. So the disk scheduling algorithm that gives minimum rotational latency is better. Transfer Time: Transfer time is the time to transfer the data. It depends on the rotating speed of the disk and number of bytes to be transferred.
Disk Response Time: Response Time is the average of time spent by a request waiting to perform its I/O operation. Average Response time is the response time of the all requests. Variance Response Time is measure of how individual request are serviced with respect to average response time. So the disk scheduling algorithm that gives minimum variance response time is better.
FCFS DISK SCHEDULING As the name suggests, this algorithm entertains requests in the order they arrive in the disk queue. It is the simplest disk scheduling algorithm.
Advantages: 1.First Come First Serve algorithm has a very simple logic, it executes the process requests one by one in the sequence they arrive. 2. Thus, First Come First Serve is very simple and easy to understand and implement. 3.In FCFS eventually, each and every process gets a chance to execute, so no starvation occur. Disadvantages : This scheduling algorithm is non-preemptive, which means the process can’t be stopped in middle of execution and will run it’s full course. FCFS being a non-preemptive scheduling algorithm, the short processes which are at the back of the queue have to wait for the long process at the front to finish The throughput of FCFS is not very efficient. FCFS is implemented on small systems only where input-output efficiency is not of utmost importance.
Problem 1 Consider a disk queue with requests for I/O to blocks on cylinders 98, 183, 41, 122, 14, 124, 65, 67. The FCFS scheduling algorithm is used. The head is initially at cylinder number 53. The cylinders are numbered from 0 to 199. The total head movement (in number of cylinders) incurred while servicing these requests is _______.
Problem 2: Input: Request sequence = {176, 79, 34, 60, 92, 11, 41, 114} Initial head position = 50 Apply FCFS Disk Scheduling algorithm. Write the seek sequence and calculate the total number of seek operations
SSTF Disk Scheduling Algorithm- SSTF stands for Shortest Seek Time First . This algorithm services that request next which requires least number of head movements from its current position regardless of the direction. It breaks the tie in the direction of head movement.
Advantages – The total seek time is reduced compared to First Come First Serve. SSTF improves and increases throughput. Less average waiting time and response time in SSTF. Disadvantages – In SSTF there is an overhead of finding out the closest request. Starvation may occur for requests far from head. In SSTF high variance is present in response time and waiting time. Frequent switching of the Head’s direction slows the algorithm.
Problem-01: Consider a disk queue with requests for I/O to blocks on cylinders 98, 183, 41, 122, 14, 124, 65, 67. The SSTF scheduling algorithm is used. The head is initially at cylinder number 53 moving towards larger cylinder numbers on its servicing pass. The cylinders are numbered from 0 to 199. The total head movement (in number of cylinders) incurred while servicing these requests is _______.
Problem 02: Request sequence = {176, 79, 34, 60, 92, 11, 41, 114} Initial head position = 50 Apply SSTF policy to calculate the total seek count.
SCAN Disk Scheduling Algorithm- As the name suggests, this algorithm scans all the cylinders of the disk back and forth. Head starts from one end of the disk and move towards the other end servicing all the requests in between. After reaching the other end, head reverses its direction and move towards the starting end servicing all the requests in between. The same process repeats. NOTE- SCAN Algorithm is also called as Elevator Algorithm. This is because its working resembles the working of an elevator.
Advantages – Scan scheduling algorithm is simple and easy to understand and implement. Starvation is avoided in SCAN algorithm. Low variance Occurs in waiting time and response time. Disadvantages – Long waiting time occurs for the cylinders which are just visited by the head. In SCAN the head moves till the end of the disk despite the absence of requests to be serviced.
Problem 1 Consider a disk queue with requests for I/O to blocks on cylinders 98, 183, 41, 122, 14, 124, 65, 67. The SCAN scheduling algorithm is used. The head is initially at cylinder number 53 moving towards larger cylinder numbers on its servicing pass. The cylinders are numbered from 0 to 199. The total head movement (in number of cylinders) incurred while servicing these requests is _______.
Problem 2: Input: Request sequence = {176, 79, 34, 60, 92, 11, 41, 114} Initial head position = 50 Direction = left Apply SCAN algorithm to find the seek sequence and total seek count
Circular SCAN (C-SCAN) Circular SCAN (C-SCAN) scheduling algorithm is a modified version of SCAN disk scheduling algorithm that deals with the inefficiency of SCAN algorithm by servicing the requests more uniformly. Like SCAN (Elevator Algorithm) C-SCAN moves the head from one end servicing all the requests to the other end. However, as soon as the head reaches the other end, it immediately returns to the beginning of the disk without servicing any requests on the return trip and starts servicing again once reaches the beginning. This is also known as the “Circular Elevator Algorithm” as it essentially treats the cylinders as a circular list that wraps around from the final cylinder to the first one.
Problem 1 Apply C-Scan scheduling algorithm .
C-Scan Advantages – C-SCAN Algorithm is the successor and the improved version of the SCAN scheduling Algorithm. The Head move from one end to the other of the disk while serving all the requests in between. The waiting time for the cylinders which were just visited by the head is reduced in C-SCAN compared to the SCAN Algorithm. Uniform waiting time is provided. Better response time is provided. Disadvantages – More seek movements are caused in C-SCAN compared to SCAN Algorithm. In C-SCAN even if there are no requests left to be serviced the Head will still travel to the end of the disk unlike SCAN algorithm.
LOOK Algorithm LOOK Algorithm is an improved version of the SCAN Algorithm. Head starts from the first request at one end of the disk and moves towards the last request at the other end servicing all the requests in between. After reaching the last request at the other end, head reverses its direction. It then returns to the first request at the starting end servicing all the requests in between. The same process repeats.
The main difference between SCAN Algorithm and LOOK Algorithm is- SCAN Algorithm scans all the cylinders of the disk starting from one end to the other end even if there are no requests at the ends. LOOK Algorithm scans all the cylinders of the disk starting from the first request at one end to the last request at the other end.
Problem 2
Advantages : If there are no requests left to be services the Head will not move to the end of the disk unlike SCAN algorithm. Better performance is provided compared to SCAN Algorithm. Starvation is avoided in LOOK scheduling algorithm. Low variance is provided in waiting time and response time. Disadvantages : Overhead of finding the end requests is present. Cylinders which are just visited by Head have to wait for long time.
C-LOOK Algorithm Circular-LOOK Algorithm is an improved version of the LOOK Algorithm . Head starts from the first request at one end of the disk and moves towards the last request at the other end servicing all the requests in between. After reaching the last request at the other end, head reverses its direction. It then returns to the first request at the starting end without servicing any request in between. The same process repeats.
Problem 2:
C-LOOK : Advantages – In C-LOOK the head does not have to move till the end of the disk if there are no requests to be serviced. There is less waiting time for the cylinders which are just visited by the head in C-LOOK. C-LOOK provides better performance when compared to LOOK Algorithm. Starvation is avoided in C-LOOK. Low variance is provided in waiting time and response time. Disadvantages – In C-LOOK an overhead of finding the end requests is present.