Migrated
This commit is contained in:
86
Advanced Algorithms/Pattern matching.md
Normal file
86
Advanced Algorithms/Pattern matching.md
Normal file
@ -0,0 +1,86 @@
|
||||
---
|
||||
type: mixed
|
||||
---
|
||||
|
||||
## Prefix Function ($\pi$)
|
||||
|
||||
The prefix function is a tool used in pattern matching algorithms, particularly in the **Knuth-Morris-Pratt (KMP) algorithm**. It is designed to preprocess a pattern to facilitate efficient searching.
|
||||
|
||||
### Definition
|
||||
For a string $P$ of length $m$, the prefix function $\pi[i]$ for $i = 1, 2, \ldots, m$ is the length of the longest proper prefix of the substring $P[1 \ldots i]$ that is also a suffix of this substring.
|
||||
|
||||
### Key Points
|
||||
1. A proper prefix of a string is a prefix that is not equal to the entire string. [^1]
|
||||
2. $\pi[i]$ helps skip unnecessary comparisons in pattern matching by indicating the next position to check after a mismatch.
|
||||
3. $\pi[1] = 0$ always, since no proper prefix of a single character can also be a suffix.
|
||||
|
||||
### Example
|
||||
For the pattern $P = "ababcab"$:
|
||||
- $P[1] = "a"$: $\pi[1] = 0$.
|
||||
- $P[1 \ldots 2] = "ab"$: No prefix matches the suffix, so $\pi[2] = 0$.
|
||||
- $P[1 \ldots 3] = "aba"$: Prefix "a" matches suffix "a", so $\pi[3] = 1$.
|
||||
- $P[1 \ldots 4] = "abab"$: Prefix "ab" matches suffix "ab", so $\pi[4] = 2$.
|
||||
- Continue similarly to compute $\pi[i]$ for the entire pattern.
|
||||
|
||||
---
|
||||
|
||||
## Knuth-Morris-Pratt (KMP) Algorithm
|
||||
|
||||
The KMP algorithm is a pattern matching algorithm that uses the prefix function $\pi$ to efficiently search for occurrences of a pattern $P$ in a text $T$.
|
||||
|
||||
### Key Idea
|
||||
When a mismatch occurs during the comparison of $P$ with $T$, use the prefix function $\pi$ to determine the next position in $P$ to continue matching, rather than restarting from the beginning.
|
||||
|
||||
### Steps
|
||||
1. Compute the prefix function $\pi$ for the pattern $P$.
|
||||
2. Search:
|
||||
- Compare $P$ with substrings of $T$.
|
||||
- If there’s a mismatch at $P[j]$ and $T[i]$, use $\pi[j]$ to shift $P$ rather than restarting at $P[1]$.
|
||||
3. The algorithm runs in $O(n + m)$ time [complexity](Complexity.md), where $n$ is the length of $T$ and $m$ is the length of $P$.
|
||||
|
||||
---
|
||||
|
||||
## Rabin-Karp Algorithm
|
||||
|
||||
The Rabin-Karp algorithm is another pattern matching algorithm, notable for using hashing to identify potential matches.
|
||||
|
||||
### Key Idea
|
||||
Instead of comparing substrings character by character, the algorithm compares hash values of the pattern and substrings of the text.
|
||||
|
||||
### Steps
|
||||
1. Compute the hash value of the pattern $P$ and the first substring of the text $T$ of length $m$.
|
||||
2. Slide the window over $T$ and compute hash values for the next substrings in constant time using a rolling hash. [^2]
|
||||
3. If the hash value of a substring matches the hash value of $P$, compare the actual strings to confirm the match.
|
||||
|
||||
### Hash Function
|
||||
The hash function is typically chosen such that it is fast to compute and minimizes collisions:
|
||||
$$
|
||||
h(s) = (s[1] \cdot p^{m-1} + s[2] \cdot p^{m-2} + \ldots + s[m] \cdot p^0) \mod q,
|
||||
$$
|
||||
where:
|
||||
- $p$ is a base (e.g., a small prime number),
|
||||
- $q$ is a large prime to avoid overflow.
|
||||
|
||||
### Complexity
|
||||
- Best Case: $O(n + m)$, where $n$ is the length of the text and $m$ is the length of the pattern.
|
||||
- Worst Case: $O(nm)$ due to hash collisions.
|
||||
|
||||
---
|
||||
|
||||
## KMP v.s. Rabin-Karp
|
||||
|
||||
| Feature | Knuth-Morris-Pratt (KMP) | Rabin-Karp |
|
||||
| ------------- | ------------------------ | --------------------------------------------------- |
|
||||
| Technique | Prefix function | Hashing |
|
||||
| Preprocessing | Compute $\pi$ array | Compute hash of $P$ |
|
||||
| Efficiency | $O(n + m)$ | $O(n + m)$ (best), $O(nm)$ (worst) |
|
||||
| Use Case | Best for exact matches | Useful for multiple patterns or approximate matches |
|
||||
| | | |
|
||||
_This graphic is too AI generated for me_ -> Use KMP when looking for a pattern, use RK when multiple patterns
|
||||
|
||||
---
|
||||
|
||||
## Footnotes
|
||||
|
||||
[^1]: A proper prefix of a string $s$ is any prefix of $s$ that is not equal to $s$ itself. For example, proper prefixes of "abc" are "", "a", and "ab".
|
||||
[^2]: A rolling hash computes the hash of a new substring by updating the hash of the previous substring, avoiding the need to recompute from scratch.
|
Reference in New Issue
Block a user