4.1 KiB
type |
---|
mixed |
Prefix Function (\pi
)
The prefix function is a tool used in pattern matching algorithms, particularly in the Knuth-Morris-Pratt (KMP) algorithm. It is designed to preprocess a pattern to facilitate efficient searching.
Definition
For a string P
of length m
, the prefix function \pi[i]
for i = 1, 2, \ldots, m
is the length of the longest proper prefix of the substring P[1 \ldots i]
that is also a suffix of this substring.
Key Points
- A proper prefix of a string is a prefix that is not equal to the entire string. 1
\pi[i]
helps skip unnecessary comparisons in pattern matching by indicating the next position to check after a mismatch.\pi[1] = 0
always, since no proper prefix of a single character can also be a suffix.
Example
For the pattern P = "ababcab"
:
P[1] = "a"
:\pi[1] = 0
.P[1 \ldots 2] = "ab"
: No prefix matches the suffix, so\pi[2] = 0
.P[1 \ldots 3] = "aba"
: Prefix "a" matches suffix "a", so\pi[3] = 1
.P[1 \ldots 4] = "abab"
: Prefix "ab" matches suffix "ab", so\pi[4] = 2
.- Continue similarly to compute
\pi[i]
for the entire pattern.
Knuth-Morris-Pratt (KMP) Algorithm
The KMP algorithm is a pattern matching algorithm that uses the prefix function \pi
to efficiently search for occurrences of a pattern P
in a text T
.
Key Idea
When a mismatch occurs during the comparison of P
with T
, use the prefix function \pi
to determine the next position in P
to continue matching, rather than restarting from the beginning.
Steps
- Compute the prefix function
\pi
for the patternP
. - Search:
- Compare
P
with substrings ofT
. - If there’s a mismatch at
P[j]
andT[i]
, use\pi[j]
to shiftP
rather than restarting atP[1]
.
- Compare
- The algorithm runs in
O(n + m)
time complexity, wheren
is the length ofT
andm
is the length ofP
.
Rabin-Karp Algorithm
The Rabin-Karp algorithm is another pattern matching algorithm, notable for using hashing to identify potential matches.
Key Idea
Instead of comparing substrings character by character, the algorithm compares hash values of the pattern and substrings of the text.
Steps
- Compute the hash value of the pattern
P
and the first substring of the textT
of lengthm
. - Slide the window over
T
and compute hash values for the next substrings in constant time using a rolling hash. 2 - If the hash value of a substring matches the hash value of
P
, compare the actual strings to confirm the match.
Hash Function
The hash function is typically chosen such that it is fast to compute and minimizes collisions:
h(s) = (s[1] \cdot p^{m-1} + s[2] \cdot p^{m-2} + \ldots + s[m] \cdot p^0) \mod q,
where:
p
is a base (e.g., a small prime number),q
is a large prime to avoid overflow.
Complexity
- Best Case:
O(n + m)
, wheren
is the length of the text andm
is the length of the pattern. - Worst Case:
O(nm)
due to hash collisions.
KMP v.s. Rabin-Karp
Feature | Knuth-Morris-Pratt (KMP) | Rabin-Karp |
---|---|---|
Technique | Prefix function | Hashing |
Preprocessing | Compute \pi array |
Compute hash of P |
Efficiency | O(n + m) |
O(n + m) (best), O(nm) (worst) |
Use Case | Best for exact matches | Useful for multiple patterns or approximate matches |
This graphic is too AI generated for me -> Use KMP when looking for a pattern, use RK when multiple patterns |