In those situation we use pattern search algorithms.
Famous Pattern Search Algorithms
Algo-1: Naive Algorithm
# Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(pat, txt) that prints all occurrences of pat[] in txt[].
# You may assume that n > m.
def naive_pattern_search(pat, text):
m = len(pat)
n = len(text)
for i in range(n-m+1):
flag = True
for j in range(m):
if(pat[j] != text[i+j]):
flag = False
if(flag):
print("Pattern found at index {}".format(i))
print("Example-1:")
txt = "AABAACAADAABAABA"
pat = "AABA"
naive_pattern_search(pat, txt)
print("Example-2:")
txt = "abracadabra"
pat = "ab"
naive_pattern_search(pat, txt)
# Time Complexity:
# Best Case: O(n) - First character of pattern is not present in text
# Worst Case: O(n-m+1)(m) - All characters of the text and pattern are same OR only the last character is different.
Output:
Algo-2: Rabin-Karp Algorithm***
But unlike the Naive algorithm, Rabin Karp algorithm matches the hash value of the pattern with the hash value of current substring of text,and if the hash values match then only it starts matching individual characters.
Rabin-Karp algorithm needs to calculate hash values for following strings:
Calculating hash function as suggested by Rabin-Karp:
We treat string character with its ASCII value and hence we take base as 256.
The numeric values cannot be practically stored as an integer.
Hence numeric value is calculated using modular arithmetic to make sure that the hash values can be stored in an integer variable.
To do rehashing, we need to take off the most significant digit and add the new least significant digit for hash value.
Rehashing Formula:
hash( txt[s+1 .. s+m] ) = ( d ( hash(txt[s .. s+m-1]) – txt[s]*h ) + txt[s + m] ) mod q
def rabin_karp_search(pat, text):
base = 256 # Total number of characters, if you consider less numbers of characters
# base can be set accordingly
q = 101 # Prime to take modulo
p_hash = 0 # Hash value for pattern
t_hash = 0 # Hash value for text
m = len(pat)
n = len(text)
# Calculating h that will be used for rehashing: h = pow(base, m-1)%q
h = 1
for i in range(m-1):
h = (h*base)%q
# Calculating hash for pat and initial text
for i in range(m):
p_hash = (p_hash*base + ord(pat[i]))%q # ord(char) : gives ascii value of char
t_hash = (t_hash*base + ord(text[i]))%q
# Slide the pattern over text one by one
for i in range(n-m+1):
# Check the hash values of current window of text and pattern
if t_hash == p_hash: # if the hash values match then only check for characters one by one
for j in range(m):
if(text[i+j] != pat[j]):
break
if(j==m-1):
print("Found at index {}".format(i))
# Calculate hash value for next window of text: Remove leading digit, add trailing digit
if i+m < n:
t_hash = ((t_hash - h*ord(text[i]))*base + ord(text[i+m]))%q
# We might get negative values of t_hash, converting it to positive
if t_hash < 0:
t_hash = t_hash+q
print("Rabin-Karp Example-1:")
txt = "bacbabababacaca"
pat = "ababaca"
rabin_karp_search(pat, txt)
print("\nRabin-Karp Example-2:")
txt = "abracadabra"
pat = "ab"
rabin_karp_search(pat, txt)
print("\nRabin-Karp Example-3:")
txt = "AABAACAADAABAABA"
pat = "AABA"
rabin_karp_search(pat, txt)
# Complexity:
# Time: O(n)
# Auxilliary Space: O(1)
Output:
Algo-3: KMP (Knuth Morris Pratt) Algorithm***
KMP algorithm preprocesses pat[] and constructs an auxiliary lps[] of size m (same as size of pattern) which is used to skip characters while matching.
Name lps indicates longest proper prefix which is also suffix.
A proper prefix is prefix with whole string notallowed.
Example:
Prefixes of “ABC” are “ “, “A”, “AB” and “ABC”.
Proper prefixes are “ “, “A” and “AB”.
Suffixes of the string are “ “, “C”, “BC” and “ABC”.
We search for lps in sub-patterns. More clearly we focus on sub-strings of patterns that are either prefix and suffix.
For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i].
For each sub-pattern pat[0..i] where i = 0 to m-1, lps[i] stores length of the maximum matching proper prefix which is also a suffix of the sub-pattern pat[0..i].
lps[i] = the longest proper prefix of pat[0..i] which is also a suffix of pat[0..i]
Longest Proper Prefix also a Suffix: LPS[]
def calculate_lps(pat):
m = len(pat)
lps = [0]*m
i = 1; j = 0
while(i < m):
if(pat[i] == pat[j]):
lps[i] = j+1
i+=1
j+=1
elif(j>0):
j = lps[j-1]
else:
lps[i] = 0
i+=1
return lps
print("LPS Example-1:")
pat = "ababaca"
print(str(calculate_lps(pat)))
print("\nLPS Example-2:")
pat = "abcdabeabf"
print(str(calculate_lps(pat)))
# Complexity:
# Time: O(m)
# Auxilliary Space: O(m)
Output:
def KMP_search(pat, text):
lps = calculate_lps(pat)
i=0; j=0
m = len(pat)
n = len(text)
while(i<n):
if(text[i] == pat[j]):
i+=1
j+=1
if(j == m):
print("Found at index {}".format(i-j))
j = lps[j-1]
elif(j>0):
j = lps[j-1]
else:
i+=1
print("KMP Example-1:")
txt = "bacbabababacaca"
pat = "ababaca"
KMP_search(pat, txt)
print("\nKMP Example-2:")
txt = "abracadabra"
pat = "ab"
KMP_search(pat, txt)
print("\nKMP Example-3:")
txt = "AABAACAADAABAABA"
pat = "AABA"
KMP_search(pat, txt)
# Complexity:
# Time: O(m+n)
# Auxilliary Space: O(m)
Output:
Given a text txt[0..n-1] and a pattern pat[0..m-1].
Write a function search(pat, txt) that prints all occurrences of pat[] and its permutations (or anagrams) in txt[].
You may assume that n > m.
def anagram_search(pat, text):
m = len(pat)
n = len(text)
count_pat = [0]*256 # Total 256 characters
count_text_window = [0]*256
# fill the count_pat and count_text_window array
for i in range(m):
count_pat[ord(pat[i])] += 1
count_text_window[ord(pat[i])] += 1
# Starts searching anagrams or permutations
for i in range(m, n):
if(count_pat == count_text_window):
print("Found at index {}".format(i-m))
# Increase the frequency of next character and decrease the frequncy of first character in window
count_text_window[ord(text[i])] += 1
count_text_window[ord(text[i-m])] -= 1
# Check for last window as it will be left in this loop
if(count_pat == count_text_window):
print("Found at index {}".format(n-m))
print("Anagram Search Example-1:")
txt = "BACDGABCDA"
pat = "ABCD"
anagram_search(pat, txt)
Output:
Given a string, find the longest substring which is palindrome.
Examples:
- string = “abaabc” output = “baab”
- string = “babcbabcbaccba” output = “abcbabcba”
- string = “abaaba” output = “abaaba”
- string = “abababa” output = “abababa”
- string = “forgeeksskeegfor” output = “geeksskeeg”
def manacher_longest_palindromic_substring(text):
# Preprocess the given text
new_text = "#"
for ch in text:
new_text += ch
new_text += "#"
n = len(new_text)
P = [0]*n # Array to store palindromic values
# Center(C) : Index of mirror line
# Right(R) : Where we are going to end our bounds on right hand side
# Left(L) : No need to store Left(L) as it can be calculated using something like i_mirror
C = 0
R = 0
# Iterate through entire new_text
for i in range(n):
# i is pointing to current element
mirror = C - (i-C) # Mirror index of i
# If i < right bound(R) then simply copy min of mirror value or (R-i)
if(i < R):
P[i] = min(R-i, P[mirror])
# Expand around both sides of i to find new center(C)
while(((i + P[i] + 1) < n) and ((i - P[i] - 1) >= 0) and \
(new_s[i + P[i] + 1] == new_s[i - P[i] - 1])):
P[i] += 1
# If i + P[i] > right bound(R) then update Center and Right bound
if(i + P[i] > R):
C = i
R = i + P[i]
# Find the length of largest palindrome and center_index
max_pal_len = max(P)
center_index = P.index(max_pal_len)
# Print the longest palindrome
start = int((center_index-max_pal_len)/2)
end = int((center_index+max_pal_len)/2)
print("Largest Palindromic Susbtring: {}".format(text[start:end]))
print("Manacher Example-1:")
txt = "abaabc"
manacher_longest_palindromic_substring(txt)
print("Manacher Example-2:")
txt = "babcbabcbaccba"
manacher_longest_palindromic_substring(txt)
print("Manacher Example-3:")
txt = "abaaba"
manacher_longest_palindromic_substring(txt)
print("Manacher Example-4:")
txt = "abababa"
manacher_longest_palindromic_substring(txt)
print("Manacher Example-5:")
txt = "forgeeksskeegfor"
manacher_longest_palindromic_substring(txt)
Output:
Given a string, print all possible strings that can be made by placing spaces (zero or one) in between them.
Example:
input = ABC
output:
ABC
AB C
A BC
A B C
def to_string(str_list):
s = ""
for x in str_list:
if x == "$":
break
s += x
return s
def print_pattern_util(string, buffer, i, j, n):
if(i==n):
buffer[j] = "$"
print(to_string(buffer))
return
# Either put the character
buffer[j] = string[i]
print_pattern_util(string, buffer, i+1, j+1, n)
# Or put a space followed by next character
buffer[j] = " "
buffer[j+1] = string[i]
print_pattern_util(string, buffer, i+1, j+2, n)
def print_pattern(string):
n = len(string)
buffer = [0]*(2*n)
buffer[0] = string[0]
print_pattern_util(string, buffer, 1, 1, n)
print("Example-1:")
string = "ABCD"
print_pattern(string)
print("Example-2:")
string = "12345"
print_pattern(string)
Output:
← Previous: Divide and Conquer Approach