Notes/Operating Systems/Processes and Threads.md

6.5 KiB

type, backlinks
type backlinks
theoretical
Overview#Multiprogramming
Overview#Multitasking/Timesharing

Process

A program in execution.

Consists of:

  • The program code - text section
  • Current activity - PC, registers
  • Stack -> Function parameters, return addresses, local variables
  • Data section
  • Heap -> dynamically allocated (at run time) memory

The difference between a process and a program is that the program is the executable file stored on disk, while the process is running (shocker).

Creation

Four events could cause processes to be created

  1. System init - Daemons
  2. Executing a process by "running a program"
  3. A user process request to create a new process
  4. Initiation of a batch1 job

fork()

A Linux system call.

graph LR;

	A["`fork()`"] --> |parent| B["wait"]
	A --> |child|C["`exec()`"]
	C --> D["`exit()`"]
	D --> B
	B --> E["Resumes"]

Hierarchy

Linux creates a parent-child relationship between processes, Windows doesn't.

Linux:

graph TD;
	init["init
	pid = 1"]

	login["login
	pid = 8415"]
	
	kthreadd["kthreadd
	pid = 2"]

	sshd["sshd
	pid=3028"]


	bash["bash
	pid=8416"]


	ps["ps
	pid=9298"]

	emacs["emacs
	pid=9204"]

    khelper["khelper
    pid=6"]

    pdflush["pdflush
    pid=200"]

init --> login
init --> kthreadd
init --> sshd

login --> bash

bash --> ps
bash --> emacs

kthreadd --> khelper
kthreadd --> pdflush


Termination

  1. Normal Process should return a code to its parent. Child processes should wait until they know that the parent received it, becoming zombie processes. If the parent dies before the child, the child is called an orphan. Absolutely fucking crazy naming. Every linux process should have a parent process source: unicef.
  2. Error - just a special return code
  3. Fatal error, involuntary - division by zero, invalid opcode; process is immediately terminated by the system
  4. Killed

States

As a state machine

  1. Running
  2. Ready
  3. Blocked (blocking == waiting)
graph TD;
	A["Running"]
	B["Ready"]
	C["Blocked"]

	A --> |1| C
	A --> |2| B
	B --> |3| A
	C --> |4| B

Ready State

  • In this state the process is not waiting for a resoucrce
  • Can be executed
  • Put in a queue (ready queue)

I/O queue

  • I/O device has its own
  • Multiple queues are created by OS

Timesharing: In-depth

from Multitasking/Timesharing The output of running programs should not change when we stop and switch back to the same program later on.

Context switching

Switching implies that we have to store the values of registers, flags, PC, etc. of the current process and load them into the next one. Then we continue.

The process control block - PCB

The OS needs a place to store the status of each process. This is that data structure.

Process table

A list of PCBs (one per process)

Figure: PS

  • Timer (ISR2) generates multiple interrupts per second
  • Store the status of the process in PCB

Threads

A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers, ( and a thread ID. )

A light-weight process.

Processes vs threads

Processes Thread
Heavyweight Lighter
Each process has its own memory Threads use memory of the process they belong to
Inter-Process Communication (IPC) is slow Way faster inter-thread communication
Context switching is more expensive Less expensive
Do not share memory do share memory

Multithreading

  • Traditional processes have a single thread of control3
  • If a process has multiple threads of control, it can perform more than one task

Ways to Implement Threads

  • Kernel-Level Threads (KLT)
    • Managed by the OS kernel
    • Each thread is a separate scheduling entity
    • pthread, thread
  • User-Level Threads (ULT)
    • Managed by user-space libraries, OS is unaware
    • Faster context switching
    • Green threads

User Threads and Kernel Threads

  • User threads
    • Implemented by a thread library at the user level
    • thread creation and scheduling are done in user space
  • Kernel Threads
    • Managed by OS

Relationship models

Many-to-one

  • User-level threads to one kernel treads
  • Management done by thread library in user space
  • The entire process blocks whenever a thread makes a blocking sycalls
  • Only one thread can access the kernel at a time (you can't run multiple threads in parallel on multiprocessors)

One-to-one

Each user thread is mapped to a kernel thread

  • Provides more concurrency Unfortunately:
  • Creating a user thread requires creating the corresponding kernel thread
  • Overhead of creating kernel threads retricts the number of threads

Many-to-many

Multiplexes many user threads to a \leq number of kernel threads.

  • Allows creation of however many threads the user wants
  • The kernel can schedule another thread for execution whenever a thread performs a blocking system call

Fork-join

Parent creates forks (children threads) and then waits for the children to terminate, joining with them, at which point it can retrieve and combine results.

This is also called synchronous threading. Parent cannot continue until the work has been completed.

Parallelism

Thread pool

Issue wih threads:

  • Overhead when creating
  • Exhausting system resources Solution: thread pools - creating a number of threads at startup and place them into a pool where they sit and wait for work.

This optimizes everything because: Sharing threads:

  • If a thread is blocked (e.g., waiting for I/O), it doesn't remain idle; it can be reassigned to another task
  • Each thread has its own task queue
  • Whenever a thread finishes its tasks it looks through the other threads' queues and "steals" tasks.

  1. A batch job is a scheduled task or a set of commands that are executed without manual intervention - cron ↩︎

  2. interrupt service routine - like in LC3 ↩︎

  3. sequence of programmed instructions that can be managed independently by a scheduler within a computer program ↩︎