Notes/Operating Systems/File Systems Management.md

---
type: theoretical
backlinks:
  - "[[Memory Management]]"
---


A file system consists of two parts
- Collection of files
- A directory structure -> provides information about all files in the system

## File
- Logical view -> the unit of storing data
Files are mapped by the OS onto physical nonvolatile devices

**Types:**
- Data
	- Numeric
	- Character
	- Binary
- Program


**Attributes**:
- Name
- Identifier (unique number)
- Type[^2]
- Location -> pointer
- Size
- Protection (permissions)
- Datetime and user id
All of these are stored in **i-nodes**.


### INodes
- Size in biytes
- Access permissions
- Type
- Creation and last access datetime
- Owner ID
- Group ID
- Hard link count
### Logical Definition
- Named collection of related information
- Files may have free form (text files) or can be rigidly formatted[^1]

### Operations
- Create
- Write
- Read
- Seek (reposition within file)
- Delete
- Truncate - shorten or cut off by removing data from the end
- Open (load to memory)
- Close (unload)

### Open files
Tracked by an **open-file table**, counted by **file-open count**.

In order to [avoid race conditions](Inter-Process%20Communication.md#Avoiding%20race%20conditions), we need to lock the files somehow.
- **Shared lock** -> several processes can acquire concurrently, used for reads
- **Exclusive lock** -> writer lock
- Mandatory vs. advisory -> access is denied depending on locks held and requested vs. processes can find status of locks and decide what to do


### Structure
Could be many:
- None
- Simple record
	- Lines
	- Fixed length
	- Variable length
- Complex
	- Formatted document
	- Relocatable load file [^3]


## Directories
Collection of nodes containing information about all files. Also resides on disk.

**Operations**:
- Search for a file
- Create a file
- Delete a file
- List a directory
- Rename a file
- Traverse file system

### Single level directory
A single directory for all users.

Clearly, we need unique names, which can become a problem real fast. That shit is gonna grow super big.

### Two-level directory
Users have different directories. In Linux -> `/home/user` is separate, allowing for the same file names. Linux, however, uses a multi-level:

### Tree-Structured Directories
- Efficient searching
- Grouping
- Absolute v. relative path

### Acyclic-Graph
Have shared subdirectories and files. Symlinks achieve this.

### Structure
In Linux, it is a table (a file) which stores:
- File name
- Inode


## Symlinks
**Hard** vs **Soft**. Hard is a literal copy of the file but keep the same inode info, while soft is just a pointer.

>[!IMPORTANT]
>We only allow links to files to avoid cycles Every time a new link is added we also use a cycle detection algorithm to determine whether it is OK
## Disk
Can be subdivided into **partitions**.

Disk/partition can be used **raw** (no file system) or can be **formatted**. The entity containing the file system is known as a volume.

> [!NOTE]- Typical fs organization
> ![](Pasted%20image%2020250505144352.png)


### Layout
![](Pasted%20image%2020250505155546.png)


- **Boot block**
	- Contains initial bootstrap program to load the OS
	- Typically the first sector reads another program from the next few sectors
- **Super block** - state of the file system
	- Type -> ext3,ext4,FAT, etc.
	- Size -> Number of blocks
	- Block size
	- Block group information -> number of block groups in file system
	- Free block count
	- Free inode count
	- Inode size
	- FS mount info
	- Journal info

### Free space management
Unix uses a bitmap to show free disk blocks. Zero=free, one=in use
## Access lists and groups
Read, write and execute.
Three classes of users on Linux
1. Owner -> 7 (Read Write Execute)
2. Group -> 6 (RW)
3. Public  -> 1 (X)


## Blocks
The IDs of data blocks are stored in [INodes](File%20Systems%20Management.md#INodes), the IDs of the first 12 blocks are stored in direct reference fields.

![](Pasted%20image%2020250505154746.png)


### Allocation
- Contiguous -> Stored in a single block
- Linked Allocation -> blocks contain a pointer to the next one (slower access)
- Indexed -> Each file has an index block that stores pointers to all its data blocks


### Groups
Subdivision of the entire disk or partition
Has:
- A block bitmap
- An inode bitmap
- An inode table holding the actual inodes

> [!INFO]
> Default block group size in ext4 is 128MB

## Journaling
Ensure the integrity of the file system by keeping track of changes before they are actually applied to the main file system

Phases:
- Write-ahead logging  -> before any changes are made to the file system
- Commit -> shit actually happens
- Crash recovery -> we can replay the journal to apply any uncommitted changes

Types:
- Write-Ahead Logging (WAL) -> logs changes before they are applied to the file system
- Metadata journaling -> only metadata is logged. Metadata is restored to a consistent state if crash.
- Full journaling -> both


## Example: EXT4
- Journaling
- Larger file and volume sizes
- Extents -> range of contiguous blocks, reduces fragmentation
- Multiblock allocator -> multiple blocks at once
- `fsck`, optimized file system check
- Pre-allocation
- Checksums -> ensure integrity

## Example: Windows FS
### FAT(32)
File allocation table.

No hard links :C. Directory contains:
- File name -> can be up to 8 characters and extension up to 3
- Attributes (one byte)
![](Pasted%20image%2020250505160518.png)

- File size -> four byte field for filesize in bytes. Max. 4GB
- ID of first block (4 byte)
- File size

Obviously this is trash since it cannot be used with disk of very large capacities. Windows introduced clustering 4,8,16 blocks together.

The table itself is a list of blocks where many links are created and stored. Each entry is 4 bytes. List of empty blocks is also stored.

![](Pasted%20image%2020250505161031.png)


#### Free blocks list
Stores a value for each cluster which can indicate:
- `0x00000000` -> Free cluster
- Next cluster number -> Cluster is allocated and points to the next one
- `0xFFFFFFF8` - `0xFFFFFFFF` -> EOF
- `0xFFFFFFF7` -> bad cluster

To find a free block we just need to search for the first available cluster. We keep the last allocated cluster, optimizing search time.
### NTFS
New Technologies File System


---

[^1]: **Columnar**, fixed-format ASCII Files have fixed field lengths, as opposed to **delimited**, i.e. fields can be as large as we want them to

[^2]: Extension (.pdf, .txt) as opposed to format, which specifies the [grammar](Regular%20languages.md) of the file

[^3]: contains information about where to place different parts of the program in memory.