Notes/Operating Systems/Processes and Threads.md

246 lines
6.5 KiB
Markdown

---
type: theoretical
backlinks:
- "[[Overview#Multiprogramming]]"
- "[[Overview#Multitasking/Timesharing]]"
---
## Process
A program in execution.
Consists of:
* The program code - **text section**
* Current activity - **PC**, registers
* **Stack** -> Function parameters, return addresses, local variables
* Data section
* **Heap** -> dynamically allocated (at run time) memory
The difference between a process and a program is that the program is the executable file stored on disk, while the process **is running** (shocker).
### Creation
Four events could cause processes to be created
1. System init - Daemons
2. Executing a process by "running a program"
3. A user process request to create a new process
4. Initiation of a batch[^1] job
### `fork()`
A Linux [system call](Overview.md#System%20calls).
```mermaid
graph LR;
A["`fork()`"] --> |parent| B["wait"]
A --> |child|C["`exec()`"]
C --> D["`exit()`"]
D --> B
B --> E["Resumes"]
```
### Hierarchy
Linux creates a parent-child relationship between processes, Windows doesn't.
Linux:
```mermaid
graph TD;
init["init
pid = 1"]
login["login
pid = 8415"]
kthreadd["kthreadd
pid = 2"]
sshd["sshd
pid=3028"]
bash["bash
pid=8416"]
ps["ps
pid=9298"]
emacs["emacs
pid=9204"]
khelper["khelper
pid=6"]
pdflush["pdflush
pid=200"]
init --> login
init --> kthreadd
init --> sshd
login --> bash
bash --> ps
bash --> emacs
kthreadd --> khelper
kthreadd --> pdflush
```
### Termination
1. Normal
Process should return a code to its parent. Child processes should wait until they know that the parent received it, becoming **zombie processes**. If the parent dies before the child, the child is called an orphan. Absolutely fucking crazy naming. Every linux process should have a parent process [source: unicef](https://unicef.org).
2. Error - just a special return code
3. Fatal error, involuntary - division by zero, invalid opcode; process is immediately terminated by the system
4. Killed
### States
As a state machine
1. Running
2. Ready
3. Blocked (blocking == waiting)
```mermaid
graph TD;
A["Running"]
B["Ready"]
C["Blocked"]
A --> |1| C
A --> |2| B
B --> |3| A
C --> |4| B
```
#### Ready State
- In this state the process is not waiting for a resoucrce
- Can be executed
- Put in a queue (ready queue)
#### I/O queue
- I/O device has its own
- Multiple queues are created by OS
## Timesharing: In-depth
from [Multitasking/Timesharing](Overview.md#Multitasking/Timesharing)
The output of running programs should not change when we stop and switch back to the same program later on.
### Context switching
Switching implies that we have to store the values of registers, flags, PC, etc. of the current process and load them into the next one. Then we continue.
### The process control block - PCB
The OS needs a place to store the status of each process. This is that data structure.
### Process table
A list of PCBs (one per process)
![Figure: PS](Pasted%20image%2020250419141856.png)
* Timer (`ISR`[^2]) generates multiple interrupts per second
* Store the status of the process in PCB
## Threads
A thread is a basic unit of CPU utilization, consisting of a program counter, a stack, and a set of registers, ( and a thread ID. )
A light-weight process.
![](Pasted%20image%2020250419143713.png)
### Processes vs threads
| Processes | Thread |
| ----------------------------------------- | ------------------------------------------------ |
| Heavyweight | Lighter |
| Each process has its own memory | Threads use memory of the process they belong to |
| Inter-Process Communication (IPC) is slow | Way faster inter-thread communication |
| Context switching is more expensive | Less expensive |
| Do not share memory | do share memory |
### Multithreading
- Traditional processes have a single thread of control[^3]
* If a process has multiple threads of control, it can perform more than one task
### Ways to Implement Threads
* Kernel-Level Threads (KLT)
* Managed by the OS kernel
* Each thread is a separate scheduling entity
* `pthread`, `thread`
* User-Level Threads (ULT)
* Managed by user-space libraries, OS is unaware
* Faster context switching
* Green threads
### User Threads and Kernel Threads
- **User threads**
- Implemented by a thread library at the user level
- thread creation and scheduling are done in user space
- **Kernel Threads**
- Managed by OS
### Relationship models
#### Many-to-one
* User-level threads to one kernel treads
* Management done by thread library in user space
* The entire process blocks whenever a thread makes a blocking sycalls
* Only **one** thread can access the kernel at a time (you can't run multiple threads in parallel on multiprocessors)
#### One-to-one
Each user thread is mapped to a kernel thread
- Provides more concurrency
Unfortunately:
- Creating a user thread requires creating the corresponding kernel thread
- Overhead of creating kernel threads retricts the number of threads
#### Many-to-many
Multiplexes many user threads to a $\leq$ number of kernel threads.
- Allows creation of however many threads the user wants
- The kernel can schedule another thread for execution whenever a thread performs a blocking system call
#### Fork-join
Parent creates forks (children threads) and then waits for the children to terminate, joining with them, at which point it can retrieve and combine results.
This is also called **synchronous threading**. Parent **cannot** continue until the work has been completed.
[^1]: A batch job is a scheduled task or a set of commands that are executed without manual intervention - **cron**
[^2]: interrupt service routine - like in LC3
[^3]: sequence of programmed instructions that can be managed independently by a scheduler within a computer program
##### Parallelism
![](Pasted%20image%2020250421222538.png)
## Thread pool
Issue wih threads:
- Overhead when creating
- Exhausting system resources
Solution: thread pools - creating a number of threads at startup and place them into a pool where they sit and wait for work.
This optimizes everything because:
Sharing threads:
- If a thread is blocked (e.g., waiting for I/O), it doesn't remain idle; it can be reassigned to another task
- Each thread has its own task queue
- Whenever a thread finishes its tasks it looks through the other threads' queues and "steals" tasks.