Interaction with the environment. e.g. computer vision, speech recognition
- Make decisions
process incoming information, analyze it, and make decisions based on it.
e.g. self-driving cars, game playing
- Learn
improve performance over time, i.e. data driven adaptation based on observations *only* (for unsupervised learning) or based on observations and feedback (for supervised learning)
## Relevant Mathematical Notation
Models are noted as $m = \gamma (D)$, where $D$ is the data and $\gamma$ is the model.
Example - a model that predicts the price of a house based on its size and location:
where $x_1$ is the size of the house and $x_2$ is the location of the house.
## Unsupervised learning
- Compression
Represent all the data in a more compact form (few features)
- Clustering
Identify groups of similar data points
- Reduction
Reduce the dimensionality of the data, i.e. represent large amount of data by few prototypes[^1]
The above aims define a **cost function** or optimization strategy, which is used to teach the machine to learn, but thee is no feedback from the environment. (hence **un**supervised learning).
Example:
Consider a dataset of images of cats and dogs. We can use unsupervised learning to identify the features that are common to all cats and all dogs. This can be used to classify new images of cats and dogs.
## Supervised learning
*Classification/Regression*
Data: observations, e.g. images, text, etc. and labels, e.g. cat/dog, spam/not spam, etc.
Regression problems:
- Predict quantitative values, e.g. house prices, stock prices, etc.
e.g. predict the weight of a cow based on its size:
$m = \gamma ( \beta_0 + \beta_1 x_1)$
where $x_1$ is the size of the cow.
Classification problems:
- Predict qualitative values, e.g. cat/dog, spam/not spam, etc.
- Binary classification: two classes
- Multi-class classification: more than two classes
> [!IMPORTANT]
> It is crucial to find the right features to represent the data. The model is only as good as the features used to represent the data.
Partially labeled data, e.g. some images are labeled, some are not. Extend by making predictions on the unlabeled data and using the predictions to improve the model.
- Reinforcement learning
Delayed reward (feedback) from the environment. e.g. game playing, robotics, etc.
- Transfer learning, few-shot learning, single-shot learning
Use knowledge from one task to improve performance on another task. e.g. use knowledge from a large dataset to improve performance on a smaller dataset.
## Deeper look of reinforcement learning
There's a reward signal evaluating the outcome of past actions.
Problems involving an agent[^3], an environment, and a reward signal.
The goal is to learn a policy that maximizes the reward signal.
[Markov Decision Process](https://en.wikipedia.org/wiki/Markov_decision_process)[^5] (MDP) is a mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision maker.
An MDP consists of:
- A set of states $S$
- A set of actions $A$
- A reward function $R$
- A transition function $P$
- A discount factor $\gamma$
It can be represented as a tuple $(S, A, R, P, \gamma)$.
Or a graph:
```mermaid
graph TD
A[States] --> B[Actions]
B --> C[Reward function]
C --> D[Transition function]
D --> E[Discount factor]
```
The process itself can be represented as a sequence of states, actions, and rewards:
Something makes me feel like this will be in the exam.
The goal of Q-learning is to find the optimal policy by learning the optimal Q-values for each state-action pair.
What's a Q-value? It's the expected return starting from state $s$, taking action $a$, and following policy $\pi$.
$$
Q^*(s, a) = \max_\pi Q_\pi(s, a)
$$
The optimal Q-value $Q^*(s, a)$ is the maximum Q-value for state $s$ and action $a$. The algorithm iteratively updates the Q-values based on the Bellman equation. This is called **value iteration**.
[^1]: Prototypes in this context means a representative sample of the data. For example, if we have a dataset of images of cats and dogs, we can represent the dataset by a few images of cats and dogs that are representative of the whole dataset.
[^2]: Parametrization is the process of defining a model in terms of its parameters. For example, in the model $m = \gamma ( \beta_0 + \beta_1 x_1)$, $\beta_0$ and $\beta_1$ are the parameters of the model.
[^3]: An agent is an entity that interacts with the environment. For example, a self-driving car is an agent that interacts with the environment (the road, other cars, etc.) to achieve a goal (e.g. reach a destination).
[^4]: A deterministic policy maps each state to a single action, while a stochastic policy maps each state to a probability distribution over actions. For example, a deterministic policy might map state $s$ to action $a$, while a stochastic policy might map state $s$ to a probability distribution over actions.