Notes/Operating Systems/File Systems Management.md

6.5 KiB

type, backlinks
type backlinks
theoretical
Memory Management

A file system consists of two parts

  • Collection of files
  • A directory structure -> provides information about all files in the system

File

  • Logical view -> the unit of storing data Files are mapped by the OS onto physical nonvolatile devices

Types:

  • Data
    • Numeric
    • Character
    • Binary
  • Program

Attributes:

  • Name
  • Identifier (unique number)
  • Type1
  • Location -> pointer
  • Size
  • Protection (permissions)
  • Datetime and user id All of these are stored in i-nodes.

INodes

  • Size in biytes
  • Access permissions
  • Type
  • Creation and last access datetime
  • Owner ID
  • Group ID
  • Hard link count

Logical Definition

  • Named collection of related information
  • Files may have free form (text files) or can be rigidly formatted2

Operations

  • Create
  • Write
  • Read
  • Seek (reposition within file)
  • Delete
  • Truncate - shorten or cut off by removing data from the end
  • Open (load to memory)
  • Close (unload)

Open files

Tracked by an open-file table, counted by file-open count.

In order to avoid race conditions, we need to lock the files somehow.

  • Shared lock -> several processes can acquire concurrently, used for reads
  • Exclusive lock -> writer lock
  • Mandatory vs. advisory -> access is denied depending on locks held and requested vs. processes can find status of locks and decide what to do

Structure

Could be many:

  • None
  • Simple record
    • Lines
    • Fixed length
    • Variable length
  • Complex
    • Formatted document
    • Relocatable load file 3

Directories

Collection of nodes containing information about all files. Also resides on disk.

Operations:

  • Search for a file
  • Create a file
  • Delete a file
  • List a directory
  • Rename a file
  • Traverse file system

Single level directory

A single directory for all users.

Clearly, we need unique names, which can become a problem real fast. That shit is gonna grow super big.

Two-level directory

Users have different directories. In Linux -> /home/user is separate, allowing for the same file names. Linux, however, uses a multi-level:

Tree-Structured Directories

  • Efficient searching
  • Grouping
  • Absolute v. relative path

Acyclic-Graph

Have shared subdirectories and files. Symlinks achieve this.

Structure

In Linux, it is a table (a file) which stores:

  • File name
  • Inode

Hard vs Soft. Hard is a literal copy of the file but keep the same inode info, while soft is just a pointer.

Important

We only allow links to files to avoid cycles Every time a new link is added we also use a cycle detection algorithm to determine whether it is OK

Disk

Can be subdivided into partitions.

Disk/partition can be used raw (no file system) or can be formatted. The entity containing the file system is known as a volume.

[!NOTE]- Typical fs organization

Layout

  • Boot block
    • Contains initial bootstrap program to load the OS
    • Typically the first sector reads another program from the next few sectors
  • Super block - state of the file system
    • Type -> ext3,ext4,FAT, etc.
    • Size -> Number of blocks
    • Block size
    • Block group information -> number of block groups in file system
    • Free block count
    • Free inode count
    • Inode size
    • FS mount info
    • Journal info

Free space management

Unix uses a bitmap to show free disk blocks. Zero=free, one=in use

Access lists and groups

Read, write and execute. Three classes of users on Linux

  1. Owner -> 7 (Read Write Execute)
  2. Group -> 6 (RW)
  3. Public -> 1 (X)

Blocks

The IDs of data blocks are stored in INodes, the IDs of the first 12 blocks are stored in direct reference fields.

Allocation

  • Contiguous -> Stored in a single block
  • Linked Allocation -> blocks contain a pointer to the next one (slower access)
  • Indexed -> Each file has an index block that stores pointers to all its data blocks

Groups

Subdivision of the entire disk or partition Has:

  • A block bitmap
  • An inode bitmap
  • An inode table holding the actual inodes

[!INFO] Default block group size in ext4 is 128MB

Journaling

Ensure the integrity of the file system by keeping track of changes before they are actually applied to the main file system

Phases:

  • Write-ahead logging -> before any changes are made to the file system
  • Commit -> shit actually happens
  • Crash recovery -> we can replay the journal to apply any uncommitted changes

Types:

  • Write-Ahead Logging (WAL) -> logs changes before they are applied to the file system
  • Metadata journaling -> only metadata is logged. Metadata is restored to a consistent state if crash.
  • Full journaling -> both

Example: EXT4

  • Journaling
  • Larger file and volume sizes
  • Extents -> range of contiguous blocks, reduces fragmentation
  • Multiblock allocator -> multiple blocks at once
  • fsck, optimized file system check
  • Pre-allocation
  • Checksums -> ensure integrity

Example: Windows FS

FAT(32)

File allocation table.

No hard links :C. Directory contains:

  • File name -> can be up to 8 characters and extension up to 3

  • Attributes (one byte)

  • File size -> four byte field for filesize in bytes. Max. 4GB

  • ID of first block (4 byte)

  • File size

Obviously this is trash since it cannot be used with disk of very large capacities. Windows introduced clustering 4,8,16 blocks together.

The table itself is a list of blocks where many links are created and stored. Each entry is 4 bytes. List of empty blocks is also stored.

Free blocks list

Stores a value for each cluster which can indicate:

  • 0x00000000 -> Free cluster
  • Next cluster number -> Cluster is allocated and points to the next one
  • 0xFFFFFFF8 - 0xFFFFFFFF -> EOF
  • 0xFFFFFFF7 -> bad cluster

To find a free block we just need to search for the first available cluster. We keep the last allocated cluster, optimizing search time.

NTFS

New Technologies File System



  1. Extension (.pdf, .txt) as opposed to format, which specifies the grammar of the file ↩︎

  2. Columnar, fixed-format ASCII Files have fixed field lengths, as opposed to delimited, i.e. fields can be as large as we want them to ↩︎

  3. contains information about where to place different parts of the program in memory. ↩︎