6.5 KiB
type, backlinks
type | backlinks | |
---|---|---|
theoretical |
|
A file system consists of two parts
- Collection of files
- A directory structure -> provides information about all files in the system
File
- Logical view -> the unit of storing data Files are mapped by the OS onto physical nonvolatile devices
Types:
- Data
- Numeric
- Character
- Binary
- Program
Attributes:
- Name
- Identifier (unique number)
- Type1
- Location -> pointer
- Size
- Protection (permissions)
- Datetime and user id All of these are stored in i-nodes.
INodes
- Size in biytes
- Access permissions
- Type
- Creation and last access datetime
- Owner ID
- Group ID
- Hard link count
Logical Definition
- Named collection of related information
- Files may have free form (text files) or can be rigidly formatted2
Operations
- Create
- Write
- Read
- Seek (reposition within file)
- Delete
- Truncate - shorten or cut off by removing data from the end
- Open (load to memory)
- Close (unload)
Open files
Tracked by an open-file table, counted by file-open count.
In order to avoid race conditions, we need to lock the files somehow.
- Shared lock -> several processes can acquire concurrently, used for reads
- Exclusive lock -> writer lock
- Mandatory vs. advisory -> access is denied depending on locks held and requested vs. processes can find status of locks and decide what to do
Structure
Could be many:
- None
- Simple record
- Lines
- Fixed length
- Variable length
- Complex
- Formatted document
- Relocatable load file 3
Directories
Collection of nodes containing information about all files. Also resides on disk.
Operations:
- Search for a file
- Create a file
- Delete a file
- List a directory
- Rename a file
- Traverse file system
Single level directory
A single directory for all users.
Clearly, we need unique names, which can become a problem real fast. That shit is gonna grow super big.
Two-level directory
Users have different directories. In Linux -> /home/user
is separate, allowing for the same file names. Linux, however, uses a multi-level:
Tree-Structured Directories
- Efficient searching
- Grouping
- Absolute v. relative path
Acyclic-Graph
Have shared subdirectories and files. Symlinks achieve this.
Structure
In Linux, it is a table (a file) which stores:
- File name
- Inode
Symlinks
Hard vs Soft. Hard is a literal copy of the file but keep the same inode info, while soft is just a pointer.
Important
We only allow links to files to avoid cycles Every time a new link is added we also use a cycle detection algorithm to determine whether it is OK
Disk
Can be subdivided into partitions.
Disk/partition can be used raw (no file system) or can be formatted. The entity containing the file system is known as a volume.
Layout
- Boot block
- Contains initial bootstrap program to load the OS
- Typically the first sector reads another program from the next few sectors
- Super block - state of the file system
- Type -> ext3,ext4,FAT, etc.
- Size -> Number of blocks
- Block size
- Block group information -> number of block groups in file system
- Free block count
- Free inode count
- Inode size
- FS mount info
- Journal info
Free space management
Unix uses a bitmap to show free disk blocks. Zero=free, one=in use
Access lists and groups
Read, write and execute. Three classes of users on Linux
- Owner -> 7 (Read Write Execute)
- Group -> 6 (RW)
- Public -> 1 (X)
Blocks
The IDs of data blocks are stored in INodes, the IDs of the first 12 blocks are stored in direct reference fields.
Allocation
- Contiguous -> Stored in a single block
- Linked Allocation -> blocks contain a pointer to the next one (slower access)
- Indexed -> Each file has an index block that stores pointers to all its data blocks
Groups
Subdivision of the entire disk or partition Has:
- A block bitmap
- An inode bitmap
- An inode table holding the actual inodes
[!INFO] Default block group size in ext4 is 128MB
Journaling
Ensure the integrity of the file system by keeping track of changes before they are actually applied to the main file system
Phases:
- Write-ahead logging -> before any changes are made to the file system
- Commit -> shit actually happens
- Crash recovery -> we can replay the journal to apply any uncommitted changes
Types:
- Write-Ahead Logging (WAL) -> logs changes before they are applied to the file system
- Metadata journaling -> only metadata is logged. Metadata is restored to a consistent state if crash.
- Full journaling -> both
Example: EXT4
- Journaling
- Larger file and volume sizes
- Extents -> range of contiguous blocks, reduces fragmentation
- Multiblock allocator -> multiple blocks at once
fsck
, optimized file system check- Pre-allocation
- Checksums -> ensure integrity
Example: Windows FS
FAT(32)
File allocation table.
No hard links :C. Directory contains:
-
File name -> can be up to 8 characters and extension up to 3
-
File size -> four byte field for filesize in bytes. Max. 4GB
-
ID of first block (4 byte)
-
File size
Obviously this is trash since it cannot be used with disk of very large capacities. Windows introduced clustering 4,8,16 blocks together.
The table itself is a list of blocks where many links are created and stored. Each entry is 4 bytes. List of empty blocks is also stored.
Free blocks list
Stores a value for each cluster which can indicate:
0x00000000
-> Free cluster- Next cluster number -> Cluster is allocated and points to the next one
0xFFFFFFF8
-0xFFFFFFFF
-> EOF0xFFFFFFF7
-> bad cluster
To find a free block we just need to search for the first available cluster. We keep the last allocated cluster, optimizing search time.
NTFS
New Technologies File System
-
Extension (.pdf, .txt) as opposed to format, which specifies the grammar of the file ↩︎
-
Columnar, fixed-format ASCII Files have fixed field lengths, as opposed to delimited, i.e. fields can be as large as we want them to ↩︎
-
contains information about where to place different parts of the program in memory. ↩︎