Introduction A file is a named collection of related information that is recorded on secondary storage such as magnetic disks, magnetic tapes and optical disks. A file is a sequence of bits, bytes, lines or records whose meaning is defined by the files creator and user. Data files may be numeric, alphabetic, alphanumeric or binary
File Structure A File Structure should be according to a required format that the operating system can understand. A file has a certain defined structure according to its type. A text file is a sequence of characters organized into lines. A source file is a sequence of procedures and functions. An object file is a sequence of bytes organized into blocks that are understandable by the machine. When operating system defines different file structures, it also contains the code to support these file structure. Unix, MS-DOS support minimum number of file structure.
File Type File type refers to the ability of the operating system to distinguish different types of file such as text files, source files and binary files etc. Many operating systems support many types of files. Operating system like MS-DOS and UNIX have the following types of files − Ordinary files These are the files that contain user information. These may have text, databases or executable program. The user can apply various operations on such files like add, modify, delete or even remove the entire file. Directory files These files contain list of file names and other information related to these files.
File Type Special files These files are also known as device files. These files represent physical device like disks, terminals, printers, networks, tape drive etc. These files are of two types − Character special files − data is handled character by character as in case of terminals or printers. Block special files − data is handled in blocks as in the case of disks and tapes.
File Attributes A file has a name and data. Moreover, it also stores meta information like file creation date and time, current size, last modified date, etc. All this information is called the attributes of a file system. File attributes used in OS are: Name: It is the only information stored in a human-readable form. It is always followed by an extension name. It specifies the type of file . Eg. - OS .doc OS is file name and .doc is extension name. ‘.’ is a separator Identifier : Every file is identified by a unique tag number within a file system known as an identifier. It is not human readable
File Attributes Location: Points to file location on device. It is a pointer that points to address the file on storage device Type: This attribute is required for systems that support various types of files. Type is indicated with file extension Size . Attribute used to display the current file size.It is the number of bytes occupied by the contents of the file on storage device – Eg. -10 MB Protection . This attribute assigns and controls the access rights of reading, writing, and executing the file. Time, date and security: It is used for protection, security, and also used for monitorin g. It specifies information about date and time of creation of the file, last modification of file and last use of file. It is useful for protection and security and usage monitoring
Operations on File Create file, find space on disk, and make an entry in the directory. Write to file, requires positioning within the file Read from file involves positioning within the file Delete directory entry, regain disk space. Reposition: move read/write position.
Create a file Create operation is used to create a file by reserving memory space on the storage device. It includes 2 steps. To find free space from the file system To make an entry of that file in its respective directory . Creating a file requires naming a file with unique file name inside a directory
Write into a file A system call with 2 parameters is required to write into a file. First parameter specifies name of the file and the second parameter specifies the information or data to be written into the file With the name of the file, system searches the directory to find the file’s location. In that file, a write pointer is used to write data into the file. After every write operator, pointer must be updated for next write operation
Reading a file To read a file, a system call is required with 2 parameters that specify name of the file and the 2 nd optional parameter to specify the data to be read from the file. With the file name , system searches a file from the directory and read pointer is used to read data from the file. After every read operator a read pointer is updated for next read operation
Repositioning within the file The directory is searched for appropriate entry of the file and a current position pointer is repositioned to a given value. Repositioning may not always be I/O operation. This file operation is also called File Seek operation
Deleting a file For deleting a file, the OS requires location of the file. After searching the file, the system releases the memory space allocated to that file to delete file from a storage device It also deletes file entry from the directory table
Other common operations include appending a new information to end of the file and renaming an existing file. The primitive operations are combined to perform other 5 operations such as creating a copy of the file, moving file from one location to another, copying file to the I/O devices such as printer or display etc..
File Types Operating system recognises and supports various file types. After recognizing the type of file, OS can perform operations on it. File type can be mentioned as a part of file name. it consists of 2 parts. First part is the name of the file and the second part is file extension separated with a ‘.’ operator or a character. With file extension, the OS recognises the type of file such as .doc- document file, . Exe – executable file etc… In MSDOS , a name consists of upto 8 characters followed by a . Character and terminated by an extension name with 3 characters In UNIX system it uses magicnumber stored at the beginning of some files to indicate type of such file as executable program.
Common file types File Type Extension Functions executable Exe, com,bin or none ready-to-run machine- language program Object obj, o complied, machine language, not linked Source code c. p, pas, 177, asm, a source code in various languages Batch bat, sh Series of commands to interpreter Text txt, doc textual data documents Word processor doc,docs, tex, rrf, etc. various word-processor formats Library lib, h libraries of routines archive arc, zip, tar related files grouped into one file, sometimes compressed. multimedia Mpeg, mp3, mov,avi Binary files containing audio / video information
File ACCESS Methods File access mechanism refers to the manner in which the records of a file may be accessed. There are several ways to access files − Sequential access Direct/Random access Indexed sequential access
Sequential access A sequential access is that in which the records are accessed in some sequence, i.e., the information in the file is processed in order, one record after the other. This access method is the most primitive one. Example: Compilers usually access files in this fashion.
Direct/Random access Random access file organization provides, accessing the records directly. Each record has its own address on the file with by the help of which it can be directly accessed for reading or writing. The records need not be in any sequence within the file and they need not be in adjacent locations on the storage medium.
Indexed sequential access This mechanism is built up on base of sequential access. An index is created for each file which contains pointers to various blocks. Index is searched sequentially and its pointer is used to access the file directly.
File allocation Methods Files are allocated disk spaces by operating system. Operating systems deploy following three main ways to allocate disk space to files. Contiguous Allocation Linked Allocation Indexed Allocation
Contiguous allocation Each file occupies a contiguous address space on disk. Assigned disk address is in linear order. Easy to implement. External fragmentation is a major issue with this type of allocation technique.
A single continuous set of blocks is allocated to a file at the time of file creation. Thus, this is a pre-allocation strategy, using variable size portions. The file allocation table needs just a single entry for each file, showing the starting block and the length of the file. This method is best from the point of view of the individual sequential file. Multiple blocks can be read in at a time to improve I/O performance for sequential processing. It is also easy to retrieve a single block. For example, if a file starts at block b, and the ith block of the file is wanted, its location on secondary storage is simply b+i-1. Disadvantage External fragmentation will occur, making it difficult to find contiguous blocks of space of sufficient length. Compaction algorithm will be necessary to free up additional space on disk. Also, with pre-allocation, it is necessary to declare the size of the file at the time of creation.
Linked allocation Each file carries a list of links to disk blocks. Directory contains link / pointer to first block of a file. No external fragmentation Effectively used in sequential access file. Inefficient in case of direct access file.
Allocation is on an individual block basis. Each block contains a pointer to the next block in the chain. Again the file table needs just a single entry for each file, showing the starting block and the length of the file. Although pre-allocation is possible, it is more common simply to allocate blocks as needed. Any free block can be added to the chain. The blocks need not be continuous. Increase in file size is always possible if free disk block is available. There is no external fragmentation because only one block at a time is needed but there can be internal fragmentation but it exists only in the last disk block of file. Disadvantage: Internal fragmentation exists in last disk block of file. There is an overhead of maintaining the pointer in every disk block. If the pointer of any disk block is lost, the file will be truncated. It supports only the sequencial access of files.
Indexed Allocation Provides solutions to problems of contiguous and linked allocation. A index block is created having all pointers to files. Each file has its own index block which stores the addresses of disk space occupied by the file. Directory contains the addresses of index blocks of files.
It addresses many of the problems of contiguous and chained allocation. In this case, the file allocation table contains a separate one-level index for each file: The index has one entry for each block allocated to the file. Allocation may be on the basis of fixed-size blocks or variable-sized blocks. Allocation by blocks eliminates external fragmentation, whereas allocation by variable-size blocks improves locality. This allocation technique supports both sequential and direct access to the file and thus is the most popular form of file allocation.
Directory Structure Collection of files is a file directory. The directory contains information about the files, including attributes, location and ownership. Much of this information, especially that is concerned with storage, is managed by the operating system. The directory is itself a file, accessible by various file management routines. Information contained in a device directory are: Name Type Address Current length Maximum length Date last accessed Date last updated Owner id Protection information
Operation performed on directory are: Search for a file Create a file Delete a file List a directory Rename a file Traverse the file system Advantages of maintaining directories are: Efficiency: A file can be located more quickly. Naming: It becomes convenient for users as two users can have same name for different files or may have different name for same file. Grouping: Logical grouping of files can be done by properties e.g. all java programs, all games etc.
SINGLE-LEVEL DIRECTORY In this a single directory is maintained for all the users. Naming problem: Users cannot have same name for two files. Grouping problem: Users cannot group files according to their need.
Single level Directory system The single directory is also called root directory The single level directory has 5 files owned by 3 different users P,Q,R User P has 2 files, User Q has 2 Files and user R has 1 File in the directory Advantages Simple to implement Locating files is very fast Limitations If a single user has a large number of files , it becomes difficult to remember the name of each file If more than one user keeps file in the same directory, then different users may give the same names to their files – thus violating the rule of uniqueness of names Root Directory P P Q Q R
TWO-LEVEL DIRECTORY In this separate directories for each user is maintained. Path name: Due to two levels there is a path name for every file to locate that file. Now, we can have same file name for different user. Searching is efficient in this method.
Two Level Directory Systems A private directory is given to each user. The same name given to files in different users does not interfere When an user attempts to open a file, the system knows which user it is in order to know the directory in which the file is to be searched Advantage Solves name collision problem Independent user gets isolated from each other Limitations If The users are co-operative, then some systems do not allow accessing the other user’s files It is not convenient for users with large number of files
TREE-STRUCTURED DIRECTORY Directory is maintained in the form of a tree. Searching is efficient and also there is grouping capability. We have absolute or relative path name for a file. Tree is the most common directory structure Each user can have as many directories as are needed so that files can be grouped together in the way it is needed Every file has a unique pathname All modern file systems use this mechanism
Disk organization and Disk Structure The magnetic disk is used as the main storage device . It is magnetic type of storage device Within one magnetic disk, many physical disks are present Each disk is called a platter. Several platters are present in a magnetic disk. They are coated with special magnetic material
Platter One or more round, flat disks used to actually hold the data in the drive. Each platter has two surfaces (top & bottom) that are capable of holding data; Each surface has one read /write head (Each platter has two heads, one on the top of the platter and one on the bottom,) Hard disk with three platters has six surfaces and six total heads. Normally both surfaces of each platter are used The outer surface of top and bottom disk cannot be used. Platter size is the form factor Disks are sometimes referred to by a size specification for example "3.5-inch hard disk". The first PCs used hard disks that had a nominal size of 5.25". Today, by far the most common hard disk platter size is 3.5“ Laptop drives are usually smaller, The platters on these drives are usually 2.5" in diameter or less; 2.5" is the standard form factor, but drives with 1.8" and even 1.0" platters are becoming more common. PCs usually have 1 to 5 platters
Tracks and Sectors Each platter has its information recorded in concentric circles called tracks . Each track is further broken down into smaller pieces called sectors , each of which holds 512 bytes of information.
STORAGE OF DATA in Platters A sector contains a fixed number of bytes -- for example, 256 or 512. Each track typically holds between 100 and 300 sectors. Larger outer tracks hold more sectors than the smaller inner ones. All information stored on a hard disk is recorded in tracks. The tracks are numbered, starting from zero, starting at the outside of the platter. A hard disk has several thousand tracks on each platter. Either at the drive or the operating system level, sectors are often grouped together into clusters.
Same tracks of different platters form an imaginary cylinder like structure Data is stored cylinder by cylinder All tracks on a cylinder are written and then the R/W head moves to the next Cylinder . This reduces movement of R/W head and increases the speed of read and write operation
Construction of HDD The components of the Hard Disk Disk Platter Read/Write head Head Arm/ Head Slider Head Actuator mechanisms Spindle motor Bezel Cable & connectors Logic board Air filter
Read-Write(R-W) head moves over the rotating hard disk. It is this Read-Write head that performs all the read and write operations on the disk and hence, position of the R-W head is a major concern. To perform a read or write operation on a memory location, we need to place the R-W head over that position. Some important terms must be noted here: Seek time – The time taken by the R-W head to reach the desired track from it’s current position. Rotational latency – Time taken by the sector to come under the R-W head. Data transfer time – Time taken to transfer the required amount of data. It depends upon the rotational speed. Controller time – The processing time taken by the controller. Average Access time – seek time + Average Rotational latency + data transfer time + controller time.
Logical structure File Systems are stored on disks. The above figure depicts a possible File-System Layout. MBR: Master Boot Record is used to boot the computer Partition Table: Partition table is present at the end of MBR. This table gives the starting and ending addresses of each partition. Boot Block: When the computer is booted, the BIOS reads in and executes the MBR. The first thing the MBR program does is locate the active partition, read in its first block, which is called the boot block, and execute it. The program in the boot block loads the operating system contained in that partition. Every partition contains a boot block at the beginning though it does not contain a bootable operating system. Super Block: It contains all the key parameters about the file system and is read into memory when the computer is booted or the file system is first touched.
Free space Management: To keep track of free disk space, the system maintains a free space list that records all free blocks I node: The information regarding each file in file system is kept in data structure called I-Node. For each file there is one i -node Root directory: It is the top of the file system tree Files and directories: They are the files and directories in the disk
RAID(Redundant array of Independent disks) structure of disk RAID, or “Redundant Arrays of Independent Disks” is a technique which makes use of a combination of multiple disks instead of using a single disk for increased performance, data redundancy or both Data redundancy, although taking up extra space, adds to disk reliability. This means, in case of disk failure, if the same data is also backed up onto another disk, we can retrieve the data and go on with the operation. On the other hand, if the data is spread across just multiple disks without the RAID technique, the loss of a single disk can affect the entire data.
Key evaluation points for a RAID System Reliability: How many disk faults can the system tolerate? Availability: What fraction of the total session time is a system in uptime mode, i.e. how available is the system for actual use? Performance: How good is the response time? How high is the throughput (rate of processing work)? Capacity: Given a set of N disks each with B blocks, how much useful capacity is available to the user? RAID is very transparent to the underlying system. This means, to the host system, it appears as a single big disk presenting itself as a linear array of blocks. This allows older technologies to be replaced by RAID without making too many changes in the existing code.
In the figure, blocks “0,1,2,3” form a stripe. Instead of placing just one block into a disk at a time, we can work with two (or more) blocks placed into a disk before moving on to the next one. RAID-0 (Stripping) Blocks are “stripped” across disks. Evaluation: Reliability: 0 There is no duplication of data. Hence, a block once lost cannot be recovered. Capacity : N*B The entire space is being used to store data. Since there is no duplication, N disks each having B blocks are fully utilized.
RAID-1 (Mirroring) More than one copy of each block is stored in a separate disk. Thus, every block has two (or more) copies, lying on different disks. RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of reliability. Evaluation: Assume a RAID system with mirroring level 2. Reliability: 1 to N/2 1 disk failure can be handled for certain, because blocks of that disk would have duplicates on some other disk. If we are lucky enough and disks 0 and 2 fail, then again this can be handled as the blocks of these disks have duplicates on disks 1 and 3. So, in the best case, N/2 disk failures can be handled.
Raid 2 This uses bit level striping. i.e Instead of striping the blocks across the disks, it stripes the bits across the disks. In the above diagram b1, b2, b3 are bits. E1, E2, E3 are error correction codes. We need two groups of disks. One group of disks are used to write the data, another group is used to write the error correction codes. When data is read from the disks, it also reads the corresponding ECC code from the redundancy disks, and checks whether the data is consistent. If required, it makes appropriate corrections . This is not used anymore. This is expensive and implementing it in a RAID controller is complex.
RAID 3 This uses byte level striping. i.e Instead of striping the blocks across the disks, it stripes the bytes across the disks. In the above diagram B1, B2, B3 are bytes. p1, p2, p3 are parities. Uses multiple data disks, and a dedicated disk to store parity. Sequential read and write will have good performance. Random read and write will have worst performance.
RAID 4 This uses block level striping. In the above diagram A,B,C are blocks. p1, p2, p3 are parities. Uses multiple data disks, and a dedicated disk to store parity. Minimum of 3 disks (2 disks for data and 1 for parity) Good random reads, as the data blocks are striped. Bad random writes, as for every write, it has to write to the single parity disk. It is somewhat similar to RAID 3 and 5, but a little different. This is just like RAID 3 in having the dedicated parity disk, but this stripes blocks. This is just like RAID 5 in striping the blocks across the data disks, but this has only one parity disk.
RAID 5 This is a slight modification of the RAID-4 system where the only difference is that the parity rotates among the drives. Reliability: 1 RAID-5 allows recovery of at most 1 disk failure (because of the way parity works). If more than one disk fails, there is no way to recover the data. This is identical to RAID-4. Capacity: (N-1)*B Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1) disks are made available for data storage, each disk having B blocks.
RAID 6 Just like RAID 5, this does block level striping. However, it uses dual parity. In the above diagram A, B, C are blocks. p1, p2, p3 are parities. This creates two parity blocks for each data block. Can handle two disk failure This RAID configuration is complex to implement in a RAID controller, as it has to calculate two parity data for each data block.