Notes/Operating Systems/Processes and Threads.md

---
type: theoretical
backlinks:
  - "[[Overview#Multiprogramming]]"
  - "[[Overview#Multitasking/Timesharing]]"
---
## Process
A program in execution.

Consists of:
* The program code - **text section**
* Current activity - **PC**, registers
* **Stack** -> Function parameters, return addresses, local variables
* Data section
* **Heap** -> dynamically allocated (at run time) memory


The difference between a process and a program is that the program is the executable file stored on disk, while the process **is running** (shocker).


### Creation
Four events could cause processes to be created
1. System init - Daemons
2. Executing a process by "running a program"
3. A user process request to create a new process
4. Initiation of a batch[^1] job


### `fork()`

A Linux [system call](Overview.md#System%20calls).

```mermaid
graph LR;

	A["`fork()`"] --> |parent| B["wait"]
	A --> |child|C["`exec()`"]
	C --> D["`exit()`"]
	D --> B
	B --> E["Resumes"]
```


### Hierarchy
Linux creates a parent-child relationship between processes, Windows doesn't.

Linux:

```mermaid
graph TD;
	init["init
	pid = 1"]

	login["login
	pid = 8415"]

	kthreadd["kthreadd
	pid = 2"]

	sshd["sshd
	pid=3028"]


	bash["bash
	pid=8416"]


	ps["ps
	pid=9298"]

	emacs["emacs
	pid=9204"]

    khelper["khelper
    pid=6"]

    pdflush["pdflush
    pid=200"]

init --> login
init --> kthreadd
init --> sshd

login --> bash

bash --> ps
bash --> emacs

kthreadd --> khelper
kthreadd --> pdflush


```


### Termination
1. Normal
	Process should return a code to its parent. Child processes should wait until they know that the parent received it, becoming **zombie processes**. If the parent dies before the child, the child is called an orphan. Absolutely fucking crazy naming. Every linux process should have a parent process [source: unicef](https://unicef.org).
2. Error - just a special return code
3. Fatal error, involuntary - division by zero, invalid opcode; process is immediately terminated by the system
4. Killed


### States
As a state machine

1. Running
2. Ready
3. Blocked (blocking == waiting)

```mermaid
graph TD;
	A["Running"]
	B["Ready"]
	C["Blocked"]

	A --> |1| C
	A --> |2| B
	B --> |3| A
	C --> |4| B
```
#### Ready State
- In this state the process is not waiting for a resoucrce
- Can be executed
- Put in a queue (ready queue)

#### I/O queue
- I/O device has its own
- Multiple queues are created by OS
## Timesharing: In-depth
from [Multitasking/Timesharing](Overview.md#Multitasking/Timesharing)
The output of running programs should not change when we stop and switch back to the same program later on.


### Context switching
Switching implies that we have to store the values of registers, flags, PC, etc. of the current process and load them into the next one. Then we continue.


### The process control block - PCB
The OS needs a place to store the status of each process. This is that data structure.


### Process table
A list of PCBs (one per process)

![Figure: PS](Pasted%20image%2020250419141856.png)


* Timer (`ISR`[^2]) generates multiple interrupts per second
* Store the status of the process in PCB


## Threads
A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers, ( and a thread ID. )

A light-weight process.

![](Pasted%20image%2020250419143713.png)

### Processes vs threads


| Processes                                 | Thread                                           |
| ----------------------------------------- | ------------------------------------------------ |
| Heavyweight                               | Lighter                                          |
| Each process has its own memory           | Threads use memory of the process they belong to |
| Inter-Process Communication (IPC) is slow | Way faster inter-thread communication            |
| Context switching is more expensive       | Less expensive                                   |
| Do not share memory                       | do share memory                                  |

### Multithreading
- Traditional processes have a single thread of control[^3]
* If a process has multiple threads of control, it can perform more than one task

### Ways to Implement Threads
* Kernel-Level Threads (KLT)
	* Managed by the OS kernel
	* Each thread is a separate scheduling entity
	* `pthread`, `thread`
* User-Level Threads (ULT)
	* Managed by user-space libraries, OS is unaware
	* Faster context switching
	* Green threads

### User Threads and Kernel Threads
- **User threads**
	- Implemented by a thread library at the user level
	- thread creation and scheduling are done in user space
- **Kernel Threads**
	- Managed by OS

### Relationship models
#### Many-to-one

* User-level threads to one kernel treads
* Management done by thread library in user space
* The entire process blocks whenever a thread makes a blocking sycalls
* Only **one** thread can access the kernel at a time (you can't run multiple threads in parallel on multiprocessors)

#### One-to-one
Each user thread is mapped to a kernel thread
- Provides more concurrency
Unfortunately:
- Creating a user thread requires creating the corresponding kernel thread
- Overhead of creating kernel threads retricts the number of threads


#### Many-to-many
Multiplexes many user threads to a $\leq$ number of kernel threads.

- Allows creation of however many threads the user wants
- The kernel can schedule another thread for execution whenever a thread performs a blocking system call


#### Fork-join
Parent creates forks (children threads) and then waits for the children to terminate, joining with them, at which point it can retrieve and combine results.

This is also called **synchronous threading**. Parent **cannot** continue until the work has been completed.


[^1]: A batch job is a scheduled task or a set of commands that are executed without manual intervention - **cron**

[^2]: interrupt service routine - like in LC3

[^3]: sequence of programmed instructions that can be managed independently by a scheduler within a computer program


##### Parallelism

![](Pasted%20image%2020250421222538.png)


## Thread pool
Issue wih threads:
- Overhead when creating
- Exhausting system resources
Solution: thread pools - creating a number of threads at startup and place them into a pool where they sit and wait for work.


This optimizes everything because:
Sharing threads:
- If a thread is blocked (e.g., waiting for I/O), it doesn't remain idle; it can be reassigned to another task
- Each thread has its own task queue
- Whenever a thread finishes its tasks it looks through the other threads' queues and "steals" tasks.