Pattern Search Algorithms

Why Pattern Search Algorithms ?

pattern_search_introduction

Famous Pattern Search Algorithms



Algo-1: Naive Algorithm

Implementation:
# Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(pat, txt) that prints all occurrences of pat[] in txt[]. 
# You may assume that n > m.


def naive_pattern_search(pat, text):
    m = len(pat)
    n = len(text)

    for i in range(n-m+1):
        flag = True
        for j in range(m):
            if(pat[j] != text[i+j]):
                flag = False
        
        if(flag):
            print("Pattern found at index {}".format(i))


print("Example-1:")
txt = "AABAACAADAABAABA"
pat = "AABA"
naive_pattern_search(pat, txt)

print("Example-2:")
txt = "abracadabra"
pat = "ab"
naive_pattern_search(pat, txt)


# Time Complexity:
# Best Case: O(n) - First character of pattern is not present in text
# Worst Case: O(n-m+1)(m) - All characters of the text and pattern are same OR only the last character is different.

Output:

pattern_search_naive_algo_output

Complexity:



Algo-2: Rabin-Karp Algorithm***

Implementation:
def rabin_karp_search(pat, text):
    base = 256         # Total number of characters, if you consider less numbers of characters 
                       # base can be set accordingly
    q = 101            # Prime to take modulo
    p_hash = 0         # Hash value for pattern
    t_hash = 0         # Hash value for text
    m = len(pat)
    n = len(text)

    # Calculating h that will be used for rehashing:  h = pow(base, m-1)%q
    h = 1
    for i in range(m-1):
        h = (h*base)%q
    
    # Calculating hash for pat and initial text
    for i in range(m):
        p_hash = (p_hash*base + ord(pat[i]))%q      # ord(char) : gives ascii value of char
        t_hash = (t_hash*base + ord(text[i]))%q

    # Slide the pattern over text one by one 
    for i in range(n-m+1):
        # Check the hash values of current window of text and pattern
        if t_hash == p_hash:   # if the hash values match then only check for characters one by one 
            for j in range(m):
                if(text[i+j] != pat[j]):
                    break
            if(j==m-1):
                print("Found at index {}".format(i))
        
        # Calculate hash value for next window of text: Remove leading digit, add trailing digit 
        if i+m < n:
            t_hash = ((t_hash - h*ord(text[i]))*base + ord(text[i+m]))%q

            # We might get negative values of t_hash, converting it to positive 
            if t_hash < 0: 
                t_hash = t_hash+q 


print("Rabin-Karp Example-1:")
txt = "bacbabababacaca"
pat = "ababaca"
rabin_karp_search(pat, txt)

print("\nRabin-Karp Example-2:")
txt = "abracadabra"
pat = "ab"
rabin_karp_search(pat, txt)

print("\nRabin-Karp Example-3:")
txt = "AABAACAADAABAABA"
pat = "AABA"
rabin_karp_search(pat, txt)

# Complexity:
# Time: O(n)
# Auxilliary Space: O(1)

Output:

rabin_karp_output

Complexity:



Algo-3: KMP (Knuth Morris Pratt) Algorithm***

Step-1: Preprocessing

Longest Proper Prefix also a Suffix: LPS[]

Algorithm to calculate π

Implementation to calculate π
def calculate_lps(pat):
    m = len(pat)
    lps = [0]*m
    i = 1; j = 0

    while(i < m):
        if(pat[i] == pat[j]):
            lps[i] = j+1
            i+=1
            j+=1
        elif(j>0):
            j = lps[j-1]
        else:
            lps[i] = 0
            i+=1
    
    return lps


print("LPS Example-1:")
pat = "ababaca"
print(str(calculate_lps(pat)))

print("\nLPS Example-2:")
pat = "abcdabeabf"
print(str(calculate_lps(pat)))

# Complexity:
# Time: O(m)
# Auxilliary Space: O(m)

Output:

calculate_lps

Complexity:
Step-2: Matching Algorithm
Implementation
def KMP_search(pat, text):
    lps = calculate_lps(pat)
    i=0; j=0
    m = len(pat)
    n = len(text)

    while(i<n):
        if(text[i] == pat[j]):
            i+=1
            j+=1
            if(j == m):
                print("Found at index {}".format(i-j))
                j = lps[j-1]
        elif(j>0):
            j = lps[j-1]
        else:
            i+=1

print("KMP Example-1:")
txt = "bacbabababacaca"
pat = "ababaca"
KMP_search(pat, txt)

print("\nKMP Example-2:")
txt = "abracadabra"
pat = "ab"
KMP_search(pat, txt)

print("\nKMP Example-3:")
txt = "AABAACAADAABAABA"
pat = "AABA"
KMP_search(pat, txt)

# Complexity:
# Time: O(m+n)
# Auxilliary Space: O(m)

Output:

kmp_algo_output

Complexity:

Standard Pattern Search Algorithms Problmes

Problem:

Given a text txt[0..n-1] and a pattern pat[0..m-1].

Write a function search(pat, txt) that prints all occurrences of pat[] and its permutations (or anagrams) in txt[].

You may assume that n > m.

Algorithm:
Implementation
def anagram_search(pat, text):
    m = len(pat)
    n = len(text)
    count_pat = [0]*256     # Total 256 characters
    count_text_window = [0]*256

    # fill the count_pat and count_text_window array
    for i in range(m):
        count_pat[ord(pat[i])] += 1
        count_text_window[ord(pat[i])] += 1
    
    # Starts searching anagrams or permutations
    for i in range(m, n):
        if(count_pat == count_text_window):
            print("Found at index {}".format(i-m))
        
        # Increase the frequency of next character and decrease the frequncy of first character in window
        count_text_window[ord(text[i])] += 1
        count_text_window[ord(text[i-m])] -= 1
    
    # Check for last window as it will be left in this loop
    if(count_pat == count_text_window):
            print("Found at index {}".format(n-m))


print("Anagram Search Example-1:")
txt = "BACDGABCDA"
pat = "ABCD"       
anagram_search(pat, txt)

Output:

anagram_search_output

Complexity:



2. Manacher’s Algorithm - O(N) Time Longest Palindromic Substring***

Problem:

Given a string, find the longest substring which is palindrome.

Examples:

  • string = “abaabc” output = “baab”
  • string = “babcbabcbaccba” output = “abcbabcba”
  • string = “abaaba” output = “abaaba”
  • string = “abababa” output = “abababa”
  • string = “forgeeksskeegfor” output = “geeksskeeg”
Approaches to solve:
Manacher Approach:

manacher_approach

Manacher Algorithm:
Implementation
def manacher_longest_palindromic_substring(text):
    # Preprocess the given text
    new_text = "#"
    for ch in text:
        new_text += ch
        new_text += "#"
    
    n = len(new_text)
    P = [0]*n    # Array to store palindromic values

    # Center(C) : Index of mirror line
    # Right(R)  : Where we are going to end our bounds on right hand side
    # Left(L)   : No need to store Left(L) as it can be calculated using something like i_mirror
    C = 0
    R = 0

    # Iterate through entire new_text
    for i in range(n):
        # i is pointing to current element
        mirror = C - (i-C)  # Mirror index of i

        # If i < right bound(R) then simply copy min of mirror value or (R-i)
        if(i < R):
            P[i] = min(R-i, P[mirror])

        # Expand around both sides of i to find new center(C)
        while(((i + P[i] + 1) < n) and ((i - P[i] - 1) >= 0) and \
              (new_s[i + P[i] + 1] == new_s[i - P[i] - 1])):
            P[i] += 1

        # If i + P[i] > right bound(R) then update Center and Right bound
        if(i + P[i] > R):
            C = i
            R = i + P[i]

    # Find the length of largest palindrome and center_index
    max_pal_len = max(P)
    center_index = P.index(max_pal_len)

    # Print the longest palindrome
    start = int((center_index-max_pal_len)/2)
    end = int((center_index+max_pal_len)/2)

    print("Largest Palindromic Susbtring: {}".format(text[start:end]))




print("Manacher Example-1:")
txt = "abaabc"
manacher_longest_palindromic_substring(txt)

print("Manacher Example-2:")
txt = "babcbabcbaccba"
manacher_longest_palindromic_substring(txt)

print("Manacher Example-3:")
txt = "abaaba"
manacher_longest_palindromic_substring(txt)

print("Manacher Example-4:")
txt = "abababa"
manacher_longest_palindromic_substring(txt)

print("Manacher Example-5:")
txt = "forgeeksskeegfor"
manacher_longest_palindromic_substring(txt)

Output:

manacher_algo_output

Complexity:



3. All Possible Strings - Made by Placing Spaces***

Problem:

Given a string, print all possible strings that can be made by placing spaces (zero or one) in between them.

Example:

input = ABC

output:

​ ABC

​ AB C

​ A  BC

​ A  B  C

Algorithm:
Implementation:
def to_string(str_list): 
    s = "" 
    for x in str_list: 
        if x == "$": 
            break
        s += x 
    return s 


def print_pattern_util(string, buffer, i, j, n):
    if(i==n):
        buffer[j] = "$"
        print(to_string(buffer))
        return
    
    # Either put the character 
    buffer[j] = string[i] 
    print_pattern_util(string, buffer, i+1, j+1, n) 
  
    # Or put a space followed by next character 
    buffer[j] = " "
    buffer[j+1] = string[i] 
  
    print_pattern_util(string, buffer, i+1, j+2, n)
    


def print_pattern(string):
    n = len(string)
    buffer = [0]*(2*n)
    buffer[0] = string[0]

    print_pattern_util(string, buffer, 1, 1, n)



print("Example-1:")
string = "ABCD"
print_pattern(string)

print("Example-2:")
string = "12345"
print_pattern(string)

Output:

all_possible_strings_output

Complexity:



Problems To Do:




← Previous: Divide and Conquer Approach

Next: Bit Algorithms →