Migrated

2024-12-07 21:07:38 +01:00
parent 2fded76a5c
commit a9676272f2
120 changed files with 15925 additions and 1 deletions
--- a/Algorithms/Pattern
+++ b/Algorithms/Pattern
@ -0,0 +1,86 @@
+---
+type: mixed
+---
+
+## Prefix Function ($\pi$)
+
+The prefix function is a tool used in pattern matching algorithms, particularly in the **Knuth-Morris-Pratt (KMP) algorithm**. It is designed to preprocess a pattern to facilitate efficient searching.
+
+### Definition
+For a string $P$ of length $m$, the prefix function $\pi[i]$ for $i = 1, 2, \ldots, m$ is the length of the longest proper prefix of the substring $P[1 \ldots i]$ that is also a suffix of this substring.
+
+### Key Points
+1. A proper prefix of a string is a prefix that is not equal to the entire string. [^1]
+2. $\pi[i]$ helps skip unnecessary comparisons in pattern matching by indicating the next position to check after a mismatch.
+3. $\pi[1] = 0$ always, since no proper prefix of a single character can also be a suffix.
+
+### Example
+For the pattern $P = "ababcab"$:
+- $P[1] = "a"$: $\pi[1] = 0$.
+- $P[1 \ldots 2] = "ab"$: No prefix matches the suffix, so $\pi[2] = 0$.
+- $P[1 \ldots 3] = "aba"$: Prefix "a" matches suffix "a", so $\pi[3] = 1$.
+- $P[1 \ldots 4] = "abab"$: Prefix "ab" matches suffix "ab", so $\pi[4] = 2$.
+- Continue similarly to compute $\pi[i]$ for the entire pattern.
+
+---
+
+## Knuth-Morris-Pratt (KMP) Algorithm
+
+The KMP algorithm is a pattern matching algorithm that uses the prefix function $\pi$ to efficiently search for occurrences of a pattern $P$ in a text $T$.
+
+### Key Idea
+When a mismatch occurs during the comparison of $P$ with $T$, use the prefix function $\pi$ to determine the next position in $P$ to continue matching, rather than restarting from the beginning.
+
+### Steps
+1. Compute the prefix function $\pi$ for the pattern $P$.
+2. Search:
+   - Compare $P$ with substrings of $T$.
+   - If there’s a mismatch at $P[j]$ and $T[i]$, use $\pi[j]$ to shift $P$ rather than restarting at $P[1]$.
+3. The algorithm runs in $O(n + m)$ time [complexity](Complexity.md), where $n$ is the length of $T$ and $m$ is the length of $P$.
+
+---
+
+## Rabin-Karp Algorithm
+
+The Rabin-Karp algorithm is another pattern matching algorithm, notable for using hashing to identify potential matches.
+
+### Key Idea
+Instead of comparing substrings character by character, the algorithm compares hash values of the pattern and substrings of the text.
+
+### Steps
+1. Compute the hash value of the pattern $P$ and the first substring of the text $T$ of length $m$.
+2. Slide the window over $T$ and compute hash values for the next substrings in constant time using a rolling hash. [^2]
+3. If the hash value of a substring matches the hash value of $P$, compare the actual strings to confirm the match.
+
+### Hash Function
+The hash function is typically chosen such that it is fast to compute and minimizes collisions:
+$$
+h(s) = (s[1] \cdot p^{m-1} + s[2] \cdot p^{m-2} + \ldots + s[m] \cdot p^0) \mod q,
+$$
+where:
+- $p$ is a base (e.g., a small prime number),
+- $q$ is a large prime to avoid overflow.
+
+### Complexity
+- Best Case: $O(n + m)$, where $n$ is the length of the text and $m$ is the length of the pattern.
+- Worst Case: $O(nm)$ due to hash collisions.
+
+---
+
+## KMP v.s. Rabin-Karp
+
+| Feature       | Knuth-Morris-Pratt (KMP) | Rabin-Karp                                          |
+| ------------- | ------------------------ | --------------------------------------------------- |
+| Technique     | Prefix function          | Hashing                                             |
+| Preprocessing | Compute $\pi$ array      | Compute hash of $P$                                 |
+| Efficiency    | $O(n + m)$               | $O(n + m)$ (best), $O(nm)$ (worst)                  |
+| Use Case      | Best for exact matches   | Useful for multiple patterns or approximate matches |
+|               |                          |                                                     |
+_This graphic is too AI generated for me_ -> Use KMP when looking for a pattern, use RK when multiple patterns
+
+---
+
+## Footnotes
+
+[^1]: A proper prefix of a string $s$ is any prefix of $s$ that is not equal to $s$ itself. For example, proper prefixes of "abc" are "", "a", and "ab".
+[^2]: A rolling hash computes the hash of a new substring by updating the hash of the previous substring, avoiding the need to recompute from scratch.