Post

Algorithm Notes

1 Introduction

This is my journey into Algorithm and Data Structures. I implemented quite a bunch of classical algorithms based on the holy bible Introduction to Algorithms. End up getting offers from Amazon Alexa (Applied Scientist), Uber (Software Engineer), Google (Software Engineer), Microsoft (Software Engineer) with a Mechanical Engineering background.

2 Algorithms

2.1 Loop invariant

  • Initialization
  • Maintenance
  • Termination

2.2 Asymptotic notation

  • Θ(g(n)) asymptotic tight bound: there exists c1, c2, such that c1*g(n) ≤ f(n) ≤ c2*g(n).
  • O(g(n)) asymptotic upper bound: there exists c such that 0 ≤ f(n) ≤ c*g(n).
  • Ω(g(n)) asymptotic lower bound: there exists c such that c*g(n) ≤ f(n).

2.3 Master Theorem

Given the recurrence equation: T(n) = aT(n/b) + f(n)

  • If f(n) = O(nlogb a - ε) for some constant ε, then T(n) = Θ(nlogb a). That is if f(n) is polynomially faster than nlogb a then the time will be dominated by nlogb a.
  • If f(n) = Θ(nlogb a * logk n) for any k ≥ 0, then T(n) = Θ(nlogb a logk + 1 n).
  • If f(n) = Ω(nlogb a + ε) for some constant ε, and if af(n/b) ≤ cf(n) for some constant c, then T(n) = Θ(f(n)). That is f(n) is polynomially slower than nlogb a, then the time will be dominated by f(n). Can be translated as
    • Too many leaves. Number of leaf nodes outweighs the sum of the internal evaluation cost. The total running time is O(nlogb a).
    • Equal work per level. Each level cost nlogb a time, and there are logb n levels. Total time O(nlogb a log n).
    • Too expensive a root. The internal evaluation costs dominates. O(f(n)).

2.4 Sort

2.4.1 Insertion Sort

def insertion_sort(arr):
    if len(arr) == 1:
        print(arr)
        return arr

    for i in range(1, len(arr)):
            # Get the latest unsorted value
        key = arr[i]
        j = i - 1
        while arr[j] > key and j >= 0:
            # When the current value is larger than the unsorted value, shift
            # the current value to the right
            arr[j + 1] = arr[j]
            print(arr)
            j -= 1

        # The current value arr[j] is either smaller than the unsorted value,
        # or the current position is -1.
        arr[j + 1] = key
        print(arr)

    return arr



a = [2, 5, 3, 8, 7, 1]
insertion_sort(a)

insertion_sort([1])

insertion_sort([2, 2, 2, 1])
class="example"> [2, 5, 3, 8, 7, 1] [2, 5, 5, 8, 7, 1] [2, 3, 5, 8, 7, 1] [2, 3, 5, 8, 7, 1] [2, 3, 5, 8, 8, 1] [2, 3, 5, 7, 8, 1] [2, 3, 5, 7, 8, 8] [2, 3, 5, 7, 7, 8] [2, 3, 5, 5, 7, 8] [2, 3, 3, 5, 7, 8] [2, 2, 3, 5, 7, 8] [1, 2, 3, 5, 7, 8] [1] [2, 2, 2, 1] [2, 2, 2, 1] [2, 2, 2, 2] [2, 2, 2, 2] [2, 2, 2, 2] [1, 2, 2, 2]

2.4.2 Merge Sort

def mergeSort(A):
    return mergeSortHelper(A, 0, len(A) - 1)


def mergeSortHelper(A, left, right):
    if left == right:
        return A[left:right + 1]
    mid = left + (right - left) // 2
    A_left = mergeSortHelper(A, left, mid)
    A_right = mergeSortHelper(A, mid + 1, right)
    return mergeArray(A_left, A_right)


def mergeArray(A, B):
    i = 0
    j = 0
    merged = []
    while i < len(A) and j < len(B):
        if A[i] < B[j]:
            merged.append(A[i])
            i += 1
        else:
            merged.append(B[j])
            j += 1

    if i == len(A):
        merged.extend(B[j:])
    else:
        merged.extend(A[i:])
    return merged

A = [6, 7, 3, 2, 9, 6, 7]
print(mergeSort(A))
class="example"> [2, 3, 6, 6, 7, 7, 9]

2.4.3 Quick Sort

  • Simple Quick Sort
def quick_sort(A, left, right):
    if left < right:
        mid = partition(A, left, right)
        print('Pivot Index: {}'.format(mid))
        print(A)
        quick_sort(A, left, mid)
        quick_sort(A, mid + 1, right)


def partition(A, left, right):
    print('Partition {} to {}'.format(left, right))
    pivot = A[right]
    print('Pivot value: {}'.format(pivot))
    i = left
    for j in range(left, right):
        if A[j] < pivot:
            A[i], A[j] = A[j], A[i]
            i += 1
    A[i], A[right] = A[right], A[i]
    return i - 1


A = [2, 8, 7, 1, 3, 5, 6, 4]
quick_sort(A, 0, len(A) - 1)
print(A)

A = [1]
quick_sort(A, 0, len(A) - 1)
print(A)
class="example"> Partition 0 to 7 Pivot value: 4 Pivot Index: 2 [2, 1, 3, 4, 7, 5, 6, 8] Partition 0 to 2 Pivot value: 3 Pivot Index: 1 [2, 1, 3, 4, 7, 5, 6, 8] Partition 0 to 1 Pivot value: 1 Pivot Index: -1 [1, 2, 3, 4, 7, 5, 6, 8] Partition 0 to 1 Pivot value: 2 Pivot Index: 0 [1, 2, 3, 4, 7, 5, 6, 8] Partition 3 to 7 Pivot value: 8 Pivot Index: 6 [1, 2, 3, 4, 7, 5, 6, 8] Partition 3 to 6 Pivot value: 6 Pivot Index: 4 [1, 2, 3, 4, 5, 6, 7, 8] Partition 3 to 4 Pivot value: 5 Pivot Index: 3 [1, 2, 3, 4, 5, 6, 7, 8] Partition 5 to 6 Pivot value: 7 Pivot Index: 5 [1, 2, 3, 4, 5, 6, 7, 8] [1, 2, 3, 4, 5, 6, 7, 8] [1]
  • Randomized quick sort
import random


def quick_sort(A, left, right):
    if left < right:
        mid = random_partition(A, left, right)
        print('Pivot Index: {}'.format(mid))
        print(A)
        quick_sort(A, left, mid)
        quick_sort(A, mid + 1, right)


def random_partition(A, left, right):
    pivot_index = random.randint(left, right)
    A[pivot_index], A[right] = A[right], A[pivot_index]
    return partition(A, left, right)


def partition(A, left, right):
    print('Partition {} to {}'.format(left, right))
    pivot = A[right]
    print('Pivot value: {}'.format(pivot))
    i = left
    for j in range(left, right):
        if A[j] < pivot:
            A[i], A[j] = A[j], A[i]
            i += 1
            A[i], A[right] = A[right], A[i]
    return i - 1


A = [2, 8, 7, 1, 3, 5, 6, 4]
quick_sort(A, 0, len(A) - 1)
print(A)

A = [1]
quick_sort(A, 0, len(A) - 1)
print(A)
class="example"> Partition 0 to 7 Pivot value: 3 Pivot Index: 1 [2, 1, 8, 3, 4, 5, 6, 7] Partition 0 to 1 Pivot value: 1 Pivot Index: -1 [2, 1, 8, 3, 4, 5, 6, 7] Partition 0 to 1 Pivot value: 2 Pivot Index: 0 [1, 2, 8, 3, 4, 5, 6, 7] Partition 2 to 7 Pivot value: 8 Pivot Index: 5 [1, 2, 7, 4, 5, 6, 3, 8] Partition 2 to 5 Pivot value: 6 Pivot Index: 3 [1, 2, 4, 5, 7, 6, 3, 8] Partition 2 to 3 Pivot value: 5 Pivot Index: 2 [1, 2, 4, 5, 7, 6, 3, 8] Partition 4 to 5 Pivot value: 7 Pivot Index: 4 [1, 2, 4, 5, 6, 7, 3, 8] Partition 6 to 7 Pivot value: 8 Pivot Index: 6 [1, 2, 4, 5, 6, 7, 3, 8] [1, 2, 4, 5, 6, 7, 3, 8] [1]

2.4.4 Heap Sort

2.4.5 Counting Sort

Count the times of each element, and put the element to the place of the counting number.

  • NOTE: the lower bound of comparison sort is Θ(n log n), counting sort is not a comparison sort. It can sort the array in linear time.
  • The time complexity of counting sort is O(n + k), given that the elements in the array are from 0 to k.
  • It is stable, but not in place. It takes O(n + k) memory.
def counting_sort(A, B, k):
    """Counting sort
    Sort A and put the sorted results into B. The elements in A are assumed
    from 0 to k
    """
    # Array to save counts
    C = [0 for i in range(k + 1)]
    for a in A:
        C[a] += 1
        print('Counted array: {}'.format(C))

    # Accumulate the counts
    for i in range(1, len(C)):
        C[i] += C[i - 1]
        print('Accumulated counts: {}'.format(C))

    # The reason to start from the end is to maintain the stability of the
    # sort, because the count number starts from the largest position
    for i in range(len(A) - 1, -1, -1):
        B[C[A[i]] - 1] = A[i]
        C[A[i]] -= 1
        print('Current sorted array B: {}'.format(B))
        print('Current counting array C: {}'.format(C))


A = [2, 5, 3, 0, 2, 3, 0, 3]
B = [-1 for i in range(8)]
counting_sort(A, B, 5)
print('Sorted array B: {}'.format(B))
class="example"> Counted array: [0, 0, 1, 0, 0, 0] Counted array: [0, 0, 1, 0, 0, 1] Counted array: [0, 0, 1, 1, 0, 1] Counted array: [1, 0, 1, 1, 0, 1] Counted array: [1, 0, 2, 1, 0, 1] Counted array: [1, 0, 2, 2, 0, 1] Counted array: [2, 0, 2, 2, 0, 1] Counted array: [2, 0, 2, 3, 0, 1] Accumulated counts: [2, 2, 2, 3, 0, 1] Accumulated counts: [2, 2, 4, 3, 0, 1] Accumulated counts: [2, 2, 4, 7, 0, 1] Accumulated counts: [2, 2, 4, 7, 7, 1] Accumulated counts: [2, 2, 4, 7, 7, 8] Current sorted array B: [-1, -1, -1, -1, -1, -1, 3, -1] Current counting array C: [2, 2, 4, 6, 7, 8] Current sorted array B: [-1, 0, -1, -1, -1, -1, 3, -1] Current counting array C: [1, 2, 4, 6, 7, 8] Current sorted array B: [-1, 0, -1, -1, -1, 3, 3, -1] Current counting array C: [1, 2, 4, 5, 7, 8] Current sorted array B: [-1, 0, -1, 2, -1, 3, 3, -1] Current counting array C: [1, 2, 3, 5, 7, 8] Current sorted array B: [0, 0, -1, 2, -1, 3, 3, -1] Current counting array C: [0, 2, 3, 5, 7, 8] Current sorted array B: [0, 0, -1, 2, 3, 3, 3, -1] Current counting array C: [0, 2, 3, 4, 7, 8] Current sorted array B: [0, 0, -1, 2, 3, 3, 3, 5] Current counting array C: [0, 2, 3, 4, 7, 7] Current sorted array B: [0, 0, 2, 2, 3, 3, 3, 5] Current counting array C: [0, 2, 2, 4, 7, 7] Sorted array B: [0, 0, 2, 2, 3, 3, 3, 5]

2.4.6 TODO Radix sort

  • Split the number into multiple digit, and perform STABLE sort for each each digit from smaller digit to larger digit.
  • Pseudo code: for i = 1 to d: perform stable sort to sort array A on digit i
  • If using counting sort as the underlying stable sorting algorithm, the time complexity is O(d(n + k))
def radix_sort(A, d):
    # TODO
    pass

2.4.7 Bucket Sort

2.5 Maximum-subarray problem

Find the subarray that has the maximum sum.

def find_max_subarray(A):
    low, high, max_sum = helper_find_max_subarray(A, 0, len(A) - 1)
    return low, high, max_sum


def helper_find_max_subarray(A, low, high):
    if low == high:
        return low, high, A[low]

    mid = (low + high) // 2

    left_low, left_high, left_sum = helper_find_max_subarray(A, low, mid)
    right_low, right_high, right_sum = helper_find_max_subarray(A, mid + 1, high)
    cross_low, cross_high, cross_sum = helper_find_cross_max_subarray(A, low, mid, high)
    if left_sum >= right_sum and left_sum >= cross_sum:
        return left_low, left_high, left_sum
    elif right_sum >= left_sum and right_sum >= cross_sum:
        return right_low, right_high, right_sum
    else:
        return cross_low, cross_high, cross_sum


def helper_find_cross_max_subarray(A, low, mid, high):
    left_sum = A[mid]
    left_low = mid
    tmp_sum = A[mid]
    i = mid - 1
    while i >= low:
        tmp_sum += A[i]
        if tmp_sum > left_sum:
            left_sum = tmp_sum
            left_low = i
        i -= 1

    right_sum = A[mid + 1]
    right_high = mid + 1
    tmp_sum = A[mid + 1]
    i = mid + 1
    while i <= high:
        tmp_sum += A[i]
        if tmp_sum > right_sum:
            right_sum = tmp_sum
            right_high = i
        i += 1

    return left_low, right_high, left_sum + right_sum

A = [13, -3, -25, 20, -3, -16, -23, 18, 20, -7, 12, -5, -22, 15, -4, 7]
low, high, max_sum = find_max_subarray(A)
print(low)
print(high)
print(max_sum)


A = [13, -3]
low, high, max_sum = find_max_subarray(A)
print(low)
print(high)
print(max_sum)
class="example"> 7 10 63 0 0 13

2.6 Stack

class Stack:
    def __init__(self, size):
        self.size = size
        self._array = [0 for i in range(self.size)]
        self.top = -1

    def isempty(self):
        if self.top < 0:
            return True
        else:
            return False

    def push(self, x):
        if self.top == self.size - 1:
            raise Exception('Stack overflow!')
        self.top += 1
        self._array[self.top] = x

    def pop(self):
        if self.top < 0:
            raise Exception('Stack underflow!')
        self.top -= 1
        return self._array[self.top + 1]


s = Stack(3)
s.push(3)
s.push(9)
s.push(4)
print(s.pop())
print(s.pop())
s.push(5)
print(s.pop())
print(s.pop())
class="example"> 4 9 5 3

2.7 Queue

  • NOTE: use a array of n + 1 length to represent a queue of size n.
  • Tail is the pointer pointing to the position to enqueue new element.

    class Queue:
        def __init__(self, size):
            """
            Use a (n + 1) length array to implement a queue of size n
            """
            self.size = size
            self.head = 0
            self.tail = 0
            self._array = [-1 for i in range(self.size + 1)]
    
        def enqueue(self, x):
            if self.head == self.tail + 1 or (self.head == 0
                                              and self.tail == self.size):
                raise Exception("Queue overflow!")
    
            self._array[self.tail] = x
            if self.tail == self.size:
                self.tail = 0
            else:
                self.tail += 1
                print("Head {}".format(self.head))
                print("Tail {}".format(self.tail))
                print(self._array)
    
        def dequeue(self):
            if self.head == self.tail:
                raise Exception("Queue underflow!")
    
            val = self._array[self.head]
            if self.head == self.size:
                self.head = 0
            else:
                self.head += 1
                print(self._array)
                print("Head {}".format(self.head))
                print("Tail {}".format(self.tail))
            return val
    
    
    q = Queue(5)
    q.enqueue(1)
    q.enqueue(2)
    q.enqueue(3)
    q.enqueue(4)
    q.enqueue(5)
    print(q.dequeue())
    print(q.dequeue())
    print(q.dequeue())
    print(q.dequeue())
    print(q.dequeue())
    q.enqueue(9)
    print(q.dequeue())
    # print(q.dequeue())
    
    class="example"> Head 0 Tail 1 [1, -1, -1, -1, -1, -1] Head 0 Tail 2 [1, 2, -1, -1, -1, -1] Head 0 Tail 3 [1, 2, 3, -1, -1, -1] Head 0 Tail 4 [1, 2, 3, 4, -1, -1] Head 0 Tail 5 [1, 2, 3, 4, 5, -1] [1, 2, 3, 4, 5, -1] Head 1 Tail 5 1 [1, 2, 3, 4, 5, -1] Head 2 Tail 5 2 [1, 2, 3, 4, 5, -1] Head 3 Tail 5 3 [1, 2, 3, 4, 5, -1] Head 4 Tail 5 4 [1, 2, 3, 4, 5, -1] Head 5 Tail 5 5 9

2.8 Binary Search

py file

This is a less cleaner version, but its advantage is that left pointer and right pointer finally converges to the same position. Sometimes this is handy when you trying to locate certain position in the end of the loop.

def binarySearch(nums, target):
    if not nums:
        return -1

    left = 0
    right = len(nums) - 1

    while left < right:
        mid = left + (right - left) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            # mid could be equal to left, under this situation, left should be
            # mid + 1 to force the search on the right hand side. To test this,
            # use a test case of [0, 1], search 1
            left = mid + 1
        else:
            right = mid

    return left if nums[left] == target else -1

nums = [1, 4, 6, 9]
target = -1
print(binarySearch(nums, target))

nums = [1, 4, 6, 9]
target = 4
print(binarySearch(nums, target))

nums = [0, 1]
target = 1
print(binarySearch(nums, target))
class="example"> -1 1 1

NOTE: This version of binary search terminates when r == l - 1, which might not be suitable for certain situation when we are trying to locate certain value in the end of the loop.

def binarySearch1(nums, target):
    l = 0
    r = len(nums) - 1
    while l <= r:
        mid = l + (r - l) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            l = mid + 1
        else:
            r = mid - 1
    return -1

nums = [1, 4, 6, 9]
target = 6
print(binarySearch1(nums, target))

nums = [0]
target = 6
print(binarySearch1(nums, target))

nums = [0]
target = 6
print(binarySearch1(nums, target))
class="example"> None

2.9 Linked List

  • Doubly linked list: O(n) time for search O(1) time for insert and delete
    • Singly linked list

2.9.1 Singly linked list

  • Search takes O(n) time
  • Insert take O(1) time
  • Delete takes O(n) time

    class Node:
        def __init__(self, key):
            self.key = key
            self.next = None
    
        def __str__(self):
            return 'Node({})'.format(self.key)
    
    
    class SinglyLinkedList:
        def __init__(self, head):
            self.head = head
    
        @classmethod
        def fromList(cls, arr):
            head = Node(arr[0])
            curr = head
            for k in arr[1:]:
                curr.next = Node(k)
                curr = curr.next
            return cls(head)
    
        def search(self, k):
            curr = self.head
            while curr and curr.key != k:
                curr = curr.next
            return curr
    
        def insert(self, node):
            node.next = self.head
            self.head = node
    
        def delete(self, k):
            prev = None
            curr = self.head
            while curr and curr.key != k:
                prev = curr
                curr = curr.next
    
            if not curr:
                raise Exception('The key {} does not exist in the list.'.format(k))
    
            if prev:
                prev.next = curr.next
            else:
                self.head = curr.next
                curr.next = None
    
        def __str__(self):
            s = ""
            curr = self.head
            while curr:
                if s != "":
                    s += " -> "
                s += str(curr.key)
                curr = curr.next
            return s
    
    
    lst = SinglyLinkedList.fromList([1, 2, 3, 4, 5])
    print(lst)
    print(lst.search(2))
    lst.delete(3)
    print(lst)
    lst.insert(Node(9))
    print(lst)
    print(lst.search(3))
    lst.delete(1)
    print(lst)
    
    class="example"> 1 -> 2 -> 3 -> 4 -> 5 Node(2) 1 -> 2 -> 4 -> 5 9 -> 1 -> 2 -> 4 -> 5 None 9 -> 2 -> 4 -> 5

2.9.2 Doubly linked list

class Node:
    def __init__(self, key):
        self.key = key
        self.next = None
        self.prev = None

    def __str__(self):
        return 'Node({})'.format(self.key)


class DoublyLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None

    def insert(self, node):
        node.next = self.head
        node.prev = None
        if self.head:
            self.head.prev = node

        if not self.tail:
            self.tail = node

        self.head = node

    def search(self, key):
        curr = self.head
        while curr and curr.key != key:
            curr = curr.next
        return curr

    def delete(self, node):
        if node.prev:
            node.prev.next = node.next
        else:
            self.head = node.next

        if node.next:
            node.next.prev = node.prev
        else:
            self.tail = node.prev

    def __str__(self):
        s = ""
        curr = self.head
        while curr:
            if s != "":
                s += " <-> "
            s += str(curr.key)
            curr = curr.next
        return s


lst = DoublyLinkedList()
lst.insert(Node(1))
lst.insert(Node(2))
lst.insert(Node(3))
lst.insert(Node(4))
print(lst)

lst.delete(lst.search(1))
print(lst)

lst.delete(lst.search(4))
print(lst.tail)
print(lst.head)
print(lst)
class="example"> 4 <-> 3 <-> 2 <-> 1 4 <-> 3 <-> 2 Node(2) Node(3) 3 <-> 2

2.9.3 Reverse linked list

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None


def print_list(head):
    cur = head
    while cur:
        print(cur.val)
        cur = cur.next


def reverse_linked_list(head):
    cur = head
    prev = None
    while cur:
        next = cur.next
        cur.next = prev
        prev = cur
        cur = next
    return prev


head = Node(1)
head.next = Node(2)
head.next.next = Node(3)
head.next.next.next = Node(4)

print("Before revesing:")
print_list(head)

rhead = reverse_linked_list(head)
print("After revesing:")
print_list(rhead)

print("Edge Case when head is None")
print_list(reverse_linked_list(None))

print("Edge Case when head is orphan")
print_list(reverse_linked_list(Node(1)))
class="example"> Before revesing: 1 2 3 4 After revesing: 4 3 2 1 Edge Case when head is None Edge Case when head is orphan 1

2.10 Hash table

Universe of keys is large. The set of keys actually stored may be small relative to the universe of keys. Consider we maintain a table with m slots, and we have a hash function h: U -> {0, 1, …, m - 1}. Given a key k, we locate T[h[k]]. If there is key collision, which means h[k1] == h[k2]. Then we chain the corresponding values as a linked list.

If a hash table uses m slots to store n values, we define the load factor α = n/m. Then the average search time is O(1 + α) assuming a uniform hashing function.

It is often desired to have keys that are relatively close to map to hash values that are far apart (especially for linear probing).

class Node:
    def __init__(self, key):
        self.key = key
        self.next = None
        self.prev = None

    def __str__(self):
        return 'Node({})'.format(self.key)


class DoublyLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None

    def insert(self, node):
        node.next = self.head
        node.prev = None
        if self.head:
            self.head.prev = node

        if not self.tail:
            self.tail = node

        self.head = node

    def search(self, key):
        curr = self.head
        while curr and curr.key != key:
            curr = curr.next
        return curr

    def delete(self, node):
        if node.prev:
            node.prev.next = node.next
        else:
            self.head = node.next

        if node.next:
            node.next.prev = node.prev
        else:
            self.tail = node.prev

    def __str__(self):
        s = ""
        curr = self.head
        while curr:
            if s != "":
                s += " <-> "
            s += str(curr.key)
            curr = curr.next
        return s


def divide_hash_fun(val, m):
    return val % m


class HashMap:
    def __init__(self, size, hash_fun=divide_hash_fun):
        self.size = size
        self.hash_fun = hash_fun
        self._slots = [DoublyLinkedList() for i in range(self.size)]

    def insert(self, node):
        self._slots[self.hash_fun(node.key, self.size)].insert(node)

    def search(self, key):
        return self._slots[self.hash_fun(key, self.size)].search(key)

    def delete(self, node):
        self._slots[self.hash_fun(node.key, self.size)].delete(node)


map = HashMap(7)
map.insert(Node(199))
map.insert(Node(10))
map.insert(Node(30))
map.insert(Node(50))
print(map.search(10))
map.delete(map.search(10))
print(map.search(10))
class="example"> Node(10) None
  • Universal hashing: select hash function randomly in a way that is independent of the keys.
  • Open addressing: all elements occupy the hash table itself. No linked list. When searching for an element, we systematically examine table slots. The load factor α = n/m is always less than 1.
    • Linear probing
    • Quadratic probing
    • Double Hashing
  • Perfect hashing: O(1) search time, 2 levels of hash table. Similar to chained hash table, instead of using linked list, we use a small secondary hash table.
    • We can guarantee there is no collision at the secondary level.

2.11 TODO Segment Tree

  • Range Sum Query Problem
    • Given an array [-2, 0, 3, -5, 2, -1], support two operations, update and querySumInRange.
      • Naive solution: directly update the i th value. Sum up values one by one when doing the query. Update takes O(1) time, query takes O(n) time.
        • This might be good enough when update happens a lot, but query happens less frequently.
      • Precompute the cumulated sum, then query just need to do cumSum[j] - cumSum[i - 1]. Update takes O(n) time, query takes O(1) time.
        • This is good, if update happens much less frequently than query.
      • Segment Tree solution. Update and query both take O(logn) time.

2.12 Types of binary tree

  • Full binary tree, every node has 0 or 2 children
  • Complete binary tree, the tree is completely filled except the last right most node. Think about binary heap.
  • Perfect binary tree. All the interior nodes have two children, and all leaves have the same depth.

2.13 BFS

py file BFS guarantees the SHORTEST path from the source node to target node. NOTE: use the color to differentiate nodes!!!! Same as DFS. Time complexity O(V + E), given a graph G(V, E), with V vertices, and E edges.

from collections import deque


class Vertex:
    def __init__(self, val, color='white', distance=-1, parent=None):
        self.val = val
        self.color = color
        self.dist = distance
        self.parent = parent

    def __str__(self):
        return "Val: {}, Color: {}, Distance: {}".format(
            self.val, self.color, self.dist)


class Graph:
    def __init__(self, adj_list):
        self.adj_list = adj_list
        self._reset_vertices()

    def _reset_vertices(self):
        self.vertices = {n: Vertex(n) for n in self.adj_list}

    def bfs(self, s):
        self._reset_vertices()
        v_s = self.vertices[s]
        v_s.color = 'grey'
        v_s.dist = 0

        queue = deque()
        queue.append(v_s)
        while queue:
            cur = queue.popleft()
            cur.color = 'black'
            print(cur)
            for n in self.adj_list[cur.val]:
                v = self.vertices[n]
                if v.color == 'white':
                    queue.append(v)
                    v.color = 'grey'
                    v.dist = cur.dist + 1
                    v.parent = cur


# Undirected Graph
adj_list = {1: [2, 5],
            2: [1, 3, 4],
            3: [2, 4],
            4: [2, 3, 5],
            5: [4, 1]}
g = Graph(adj_list)
g.bfs(1)

print('-'*50)
# Directed cyclic Graph
adj_list = {1: [2],
            2: [4],
            3: [2],
            4: [3, 5],
            5: [1]}
g = Graph(adj_list)
g.bfs(1)

print('-'*50)
# Directed acyclic Graph
adj_list = {1: [2, 5],
            2: [3, 4],
            3: [],
            4: [3, 5],
            5: []}
g = Graph(adj_list)
g.bfs(1)

print('-'*50)
# Directed acyclic Graph
adj_list = {1: [2, 5],
            2: [3, 4],
            3: [],
            4: [3, 5],
            5: []}
g = Graph(adj_list)
g.bfs(4)
class="example"> Val: 1, Color: black, Distance: 0 Val: 2, Color: black, Distance: 1 Val: 5, Color: black, Distance: 1 Val: 3, Color: black, Distance: 2 Val: 4, Color: black, Distance: 2 -------------------------------------------------- Val: 1, Color: black, Distance: 0 Val: 2, Color: black, Distance: 1 Val: 4, Color: black, Distance: 2 Val: 3, Color: black, Distance: 3 Val: 5, Color: black, Distance: 3 -------------------------------------------------- Val: 1, Color: black, Distance: 0 Val: 2, Color: black, Distance: 1 Val: 5, Color: black, Distance: 1 Val: 3, Color: black, Distance: 2 Val: 4, Color: black, Distance: 2 -------------------------------------------------- Val: 4, Color: black, Distance: 0 Val: 3, Color: black, Distance: 1 Val: 5, Color: black, Distance: 1

2.14 DFS

py file Parenthesis theorem: Given two vertices in the graph u and v, u.d denotes the discover time of u, u.f denotes the finish time of u.

  • If the interval [u.d, u.f] and [v.d, v.f] are entirely disjoint, and neither u nor v is a descendant of the other in the depth-first forest.
  • If the interval [u.d, u.f] is contained in [v.d, v.f], the u is a descendant of v.
  • If the interval [v.d, v.f] is contained in [u.d, u.f], the v is a descendant of v.

    Classification of edges:

    • Tree edges, edges in the depth first forest. Edge (u, v) is a tree edge if v was first discovered by exploring (u, v).
    • Back edges, edges (u, v) connecting a vertex u with its ancestor v in a depth-first tree. Including self loops in the directed graph, where a vertex is connecting with itself.
    • Forward edges, nontree edges (u, v) connecting a vertex u to a descendant v in a depth-first tree.
    • Cross edges, all the other edges.

      DFS has enough information to classify some of the edges it encounters. When we first explore an edge (u, v), the color of v tells us something about the edge

      • WHITE indicates a tree edge
      • GRAY indicates a back edge. NOTE: this is critical to determine whether the graph is cyclic or not.
      • BLACK indicates a forward or cross edge.

      For undirected graph, every edge is either a tree edge or a back edge.

      from datetime import datetime
      
      
      time = 0  # global timer
      
      
      class Vertex:
          def __init__(self, val, color='white', parent=None):
              self.val = val
              self.color = color
              self.parent = parent
      
          def __str__(self):
              return "(Val: {}, Color: {})".format(self.val, self.color)
      
      
      class Graph:
          def __init__(self):
              self.adj = {}
              self.nodes = {}
      
          def addNode(self, val):
              node = Vertex(val)
              self.nodes[val] = node
              self.adj[node] = []
      
          def addMultiNodes(self, val_list):
              for v in val_list:
                  self.addNode(v)
      
          def addEdge(self, u, v):
              u_node = self.nodes[u]
              v_node = self.nodes[v]
              self.adj[u_node].append(v_node)
      
          def addMultiEdges(self, edge_list):
              for e in edge_list:
                  self.addEdge(e[0], e[1])
      
          def dfs(self):
              global time
              time = 0
              for u in self.nodes.values():
                  if u.color == 'white':
                      self.dfs_visit(u)
      
          def dfs_visit(self, u):
              global time
              time += 1
              u.color = 'gray'
              print('At time {}, Discovered node: {}'.format(time, u))
              for v in self.adj[u]:
                  if v.color == 'white':
                      self.dfs_visit(v)
                      u.color = 'black'
                      time += 1
                      print('At time {}, Finish visiting node: {}'.format(time, u))
      
      
      g = Graph()
      g.addMultiNodes(['s', 'z', 'y', 'x', 'w', 't', 'v', 'u'])
      g.addMultiEdges([['s', 'z'],
                       ['z', 'y'],
                       ['y', 'x'],
                       ['z', 'w'],
                       ['w', 'x'],
                       ['x', 'z'],
                       ['s', 'w'],
                       ['v', 's'],
                       ['v', 'w'],
                       ['t', 'v'],
                       ['t', 'u'],
                       ['u', 'v']])
      g.dfs()
      
      class="example"> At time 1, Discovered node: (Val: s, Color: gray) At time 2, Discovered node: (Val: z, Color: gray) At time 3, Discovered node: (Val: y, Color: gray) At time 4, Discovered node: (Val: x, Color: gray) At time 5, Finish visiting node: (Val: y, Color: black) At time 6, Finish visiting node: (Val: z, Color: black) At time 7, Discovered node: (Val: w, Color: gray) At time 8, Finish visiting node: (Val: z, Color: black) At time 9, Finish visiting node: (Val: s, Color: black) At time 10, Discovered node: (Val: u, Color: gray) At time 11, Discovered node: (Val: v, Color: gray) At time 12, Finish visiting node: (Val: u, Color: black) At time 13, Discovered node: (Val: t, Color: gray)

      The results shows the parenthesis theorem clearly: (s (z (y (x x) y) (w w) z) s) (t (v v) (u u) t).

2.15 Topological Sort

Topological sort can only be performed in a directed acyclic graph (DAG). A topological sort of a dag G = (V, E) is a linear ordering of all its vertices such that if G contains an edge (u, v), the u appears before v in the ordering.

Topological sort pseudo code:

  • call DFS(G)
  • as each vertex is finished, insert it in front of a linked list
  • return the linked list

    import logging
    from collections import deque
    
    logger = logging.getLogger(__name__)
    
    
    class Vertex:
        def __init__(self, val, color='white', parent=None):
            self.val = val
            self.color = color
            self.parent = parent
    
        def __str__(self):
            return "Val: {}, Color: {}".format(self.val, self.color)
    
    
    class Graph:
        def __init__(self):
            self.adj = {}
            self.nodes = {}
    
        def addNode(self, val):
            node = Vertex(val)
            self.nodes[val] = node
            self.adj[node] = []
    
        def addMultiNodes(self, val_list):
            for v in val_list:
                self.addNode(v)
    
        def addEdge(self, u, v):
            u_node = self.nodes[u]
            v_node = self.nodes[v]
            self.adj[u_node].append(v_node)
    
        def addMultiEdges(self, edge_list):
            for e in edge_list:
                self.addEdge(e[0], e[1])
    
        def dfs(self):
            self._reset_nodes()
            for u in self.nodes.values():
                if u.color == 'white':
                    self.dfs_visit(u)
    
        def dfs_visit(self, u):
            u.color = 'gray'
            logger.info('Discovered node: {}'.format(u))
            for v in self.adj[u]:
                if v.color == 'white':
                    self.parent = u
                    self.dfs_visit(v)
                    u.color = 'black'
                    logger.info('Finish visiting node: {}'.format(u))
    
        def _reset_nodes(self):
            for u in self.nodes.values():
                u.color = 'white'
                u.parent = None
    
        def isDAG(self):
            self._reset_nodes()
            for u in self.nodes.values():
                if u.color == 'white' and not self.isDAGTree(u):
                    return False
            return True
    
        def isDAGTree(self, u):
            u.color = 'gray'
            for v in self.adj[u]:
                logger.info('Discovered node: {}'.format(v))
                if v.color == 'gray':
                    return False
                elif v.color == 'white' and not self.isDAGTree(v):
                    return False
                u.color = 'black'
            return True
    
        def topo_sort(self):
            self._reset_nodes()
            sorted_list = deque()
            for u in self.nodes.values():
                if u.color == 'white':
                    self.topo_sort_helper(u, sorted_list)
            return sorted_list
    
        def topo_sort_helper(self, u, sorted_list):
            u.color = 'gray'
            for v in self.adj[u]:
                if v.color == 'white':
                    v.parent = u
                    self.topo_sort_helper(v, sorted_list)
                elif v.color == 'gray':
                    ValueError(
                        'The graph has cycle. Cannot perform topological sort!')
                    u.color = 'black'
                    sorted_list.appendleft(u.val)
    
    
    g = Graph()
    g.addMultiNodes([
        'undershorts', 'pants', 'belt', 'shirt', 'tie', 'tie', 'jacket', 'socks',
        'shoes', 'watch'
    ])
    
    g.addMultiEdges(
        [['undershorts', 'pants'], ['pants', 'belt'], ['belt', 'jacket'],
         ['shirt', 'belt'], ['shirt', 'tie'], ['tie', 'jacket'],
         ['undershorts', 'shoes'], ['pants', 'shoes'], ['socks', 'shoes']])
    print(g.topo_sort())
    
    class="example"> deque(['undershorts', 'pants', 'pants', 'socks', 'tie'])

    Here is a simpler version. Good as an interview template.

    from collections import deque
    
    
    def buildDirectedGraph(numNodes, edges):
        graph = {i: set() for i in range(numNodes)}
        for u, v in edges:
            graph[u].add(v)
    
        return graph
    
    
    def topo_sort(graph):
        def dfs(graph, node, visited, discovered, res):
            if node in visited:
                return
    
            if node in discovered:
                raise Exception(
                    'The graph has a cycle. It is not topologically sortable.')
    
            discovered.add(node)
            for nei in graph[node]:
                dfs(graph, nei, visited, discovered, res)
    
            discovered.remove(node)
            visited.add(node)
            res.appendleft(node)
    
        res = deque()
        visited = set()  # nodes that are finished visiting
        discovered = set()  # nodes that are discovered by dfs but not yet finished
        for node in graph:
            dfs(graph, node, visited, discovered, res)
        return res
    
    
    # graph = buildDirectedGraph(2, [[1, 0], [0, 1]])
    # res = topo_sort(graph)
    
    graph = buildDirectedGraph(4, [[1, 0], [2, 0], [3, 1], [3, 2]])
    res = topo_sort(graph)
    print(res)
    
    class="example"> deque([3, 2, 1, 0])

    Kahn's algorithm, something like BFS, queue is used.

    • Create the indegree map of the graph
    • Append the nodes with 0 indegree to the queue
    • while the queue is not empty, dequeue and get one node, count += 1, stack append the node. for all its neighbors, decrease the indegree by 1, and append the neighbors whose indegree is 0.
    • if count != the total number of nodes, then the graph is not DAG. Otherwise return the stack. The stack is topologically sorted.

      from collections import deque
      
      
      def buildDirectedGraph(numNodes, edges):
          graph = {i: set() for i in range(numNodes)}
          for u, v in edges:
              graph[u].add(v)
          return graph
      
      
      def topo_sort(graph):
          indegree = [0] * len(graph)
          for neighbors in graph.values():
              for nei in neighbors:
                  indegree[nei] += 1
      
          q = deque([n for n in graph if indegree[n] == 0])
          res = []
          while q:
              node = q.popleft()
              res.append(node)
              for nei in graph[node]:
                  indegree[nei] -= 1
                  if indegree[nei] == 0:
                      q.append(nei)
      
          if len(res) < len(graph):
              raise Exception('The graph has a cycle. It is not topologically sortable.')
          return res
      
      
      # graph = buildDirectedGraph(2, [[1, 0], [0, 1]])
      # res = topo_sort(graph)
      
      graph = buildDirectedGraph(4, [[1, 0], [2, 0], [3, 1], [3, 2]])
      res = topo_sort(graph)
      print(res)
      
      class="example"> [3, 1, 2, 0]

2.16 Strongly connected components

In a strongly connected component, every pair of vertices are reachable from each other.

  • call DFS(G) to compute the finishing times for each vertex
  • compute GT, G transpose (reverse every edge)
  • call DFS(GT), but in the main loop of DFS, consider vertices in order of decreasing finishing time u.f obtained in the first step.
  • output the vertices of each tree in the depth-first forest obtained in step 3

2.17 Binary Search Tree

  • If node y is in the left subtree of x, then y.key <= x.key.
  • If node y is in the right subtree of x, then y.key >= x.key.
  • Inorder tree walk: print left tree -> print the node itself -> print the right tree
  • Preorder tree walk: print the root -> print the subtrees
  • Post order tree walk: print the subtrees -> print the root

    from collections import deque
    
    
    class Node:
        def __init__(self, x):
            self.key = x
            self.left = None
            self.right = None
            self.parent = None
    
        def __str__(self):
            return "Node({})".format(self.key)
    
    
    class BinarySearchTree:
        def __init__(self):
            self.root = None
    
        def inorder_tree_walk(self):
            self._inorder_tree_walk_helper(self.root)
    
        def _inorder_tree_walk_helper(self, node):
            if node:
                self._inorder_tree_walk_helper(node.left)
                print(node.key)
                self._inorder_tree_walk_helper(node.right)
    
        def search(self, key):
            return self._search_helper(self.root, key)
    
        def _search_helper(self, node, key):
            if not node or node.key == key:
                return node
    
            if key < node.key:
                return self._search_helper(node.left, key)
            else:
                return self._search_helper(node.right, key)
    
        def iterative_search(self, key):
            curr = self.root
            while curr and curr.key != key:
                if key < curr.key:
                    curr = curr.left
                else:
                    curr = curr.right
            return curr
    
        def maximum(self):
            return self._maximum(self.root)
    
        def _maximum(self, node):
            curr = node
            while curr.right:
                curr = curr.right
            return curr
    
        def minimum(self):
            return self._minimum(self.root)
    
        def _minimum(self, node):
            curr = node
            while curr.left:
                curr = curr.left
            return curr
    
        def insert(self, node):
            p = None
            curr = self.root
            while curr:
                p = curr
                if node.key < curr.key:
                    curr = curr.left
                else:
                    curr = curr.right
                    node.parent = p
            if p:
                if node.key < p.key:
                    p.left = node
                else:
                    p.right = node
            else:
                self.root = node
    
        def successor(self, node):
            """
            Find the successor of the sorted array in the inorder traversal
            """
            if node.right:
                return self._minimum(node.right)
    
            parent_node = node.parent
            while parent_node and parent_node.left != node:
                node = parent_node
                parent_node = node.parent
            return parent_node
    
        def predecessor(self, node):
            if node.left:
                return self._maximum(node.left)
    
            parent_node = node.parent
            while parent_node and parent_node.right != node:
                node = parent_node
                parent_node = node.parent
            return parent_node
    
        def transplant(self, u, v):
            """Replace the subtree rooted at u with the subtree rooted at v
            NOTE: u must not be None, but v can be None
            """
            if not u.parent:
                self.root = v
            elif u.parent.left == u:
                u.parent.left = v
            else:
                u.parent.right = v
    
            if v:
                v.parent = u.parent
    
        def delete(self, node):
            """Delete the node, assuming node cannot be None
            """
            if not node.left:
                self.transplant(node, node.right)
            elif not node.right:
                self.transplant(node, node.left)
            else:
                suc_node = self.successor(node)
                if suc_node.parent != node:
                    self.transplant(suc_node, suc_node.right)
                    suc_node.right = node.right
                    suc_node.right.parent = suc_node
                self.transplant(node, suc_node)
                suc_node.left = node.left
                suc_node.left.parent = suc_node
    
        def _print(self, node, indent):
            s = ""
            if node:
                s = indent + str(node.key) + "\n"
                if node.left:
                    s += self._print(node.left, indent + "|---")
                if node.right:
                    s += self._print(node.right, indent + "|---")
            return s
    
        def __str__(self):
            if not self.root:
                return ""
            s = ""
            curr_level = []
            curr_level.append(self.root)
            while curr_level:
                next_level = []
                for n in curr_level:
                    s += str(n.key) + " "
                    if n.left:
                        next_level.append(n.left)
                    if n.right:
                        next_level.append(n.right)
                s += "\n"
                curr_level = next_level
            return s
    
    
    t = BinarySearchTree()
    t.insert(Node(12))
    t.insert(Node(5))
    t.insert(Node(2))
    t.insert(Node(9))
    t.insert(Node(18))
    t.insert(Node(15))
    t.insert(Node(19))
    
    t.inorder_tree_walk()
    print(t.search(9))
    print(t.iterative_search(9))
    print(t.minimum())
    print(t.maximum())
    print(t.successor(t.search(9)))
    print(t.predecessor(t.search(9)))
    print(t)
    t.delete(t.search(12))
    print(t)
    
    class="example"> 2 5 9 12 15 18 19 Node(9) Node(9) Node(2) Node(19) None Node(5) 12 5 18 2 9 15 19 15 5 2 9
    • Randomly built binary search tree: each of the n! permutations of the input keys is equally likely (it is different from insert the n keys equally likely each time).

2.18 Red-Black Tree

Red-Black Properties:

  • Every node is either red or black.
  • The root is black.
  • Every leaf (NIL) is black.
  • If a node is red, then both its children are black.
  • For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.

    Black-height of node x: the number of black nodes from the node x (not including x) to a leaf.

    Lemma: A red-black tree with n internal nodes has height at most 2lg(n+1).

    Rotation:

    beforeRotation.png

    After Right-Rotate is performed,

    afterRotation.png

    class Node:
        BLACK = "BLACK"
        RED = "RED"
    
        def __init__(self, key):
            self.key = key
            self.left = None
            self.right = None
            self.color = self.BLACK
            self.parent = None
    
        def __str__(self):
            return "Node(key={}, color={})".format(self.key, self.color)
    
    
    class RedBlackTree:
        def __init__(self):
            self.nil = Node(None)
            self.root = self.nil
    
        def left_rotate(self, x):
            # Assuming x.right is not self.nil
            y = x.right             # set y
            x.right = y.left        # turn y's left subtree into x's right subtree
            if y.left != self.nil:  # no need to link y.left's parent if it is a sentinel
                y.left.parent = x
            y.parent = x.parent     # link x's parent to y
            if x.parent == self.nil:
                self.root = y
            elif x == x.parent.left:
                x.parent.left = y
            else:
                x.parent.right = y
            y.left = x              # put x into y's left
            x.parent = y
    
        def right_rotate(self, x):
            # Assuming x.left is not self.nil, symmetric to left_rotate
            y = x.left
    
            x.left = y.right
            if y.right != self.nil:
                y.right.parent = x
    
            y.parent = x.parent
            if x.parent == self.nil:
                self.root = y
            elif x == x.parent.left:
                x.parent.left = y
            else:
                x.parent.right = y
    
            y.right = x
            x.parent = y
    
        def insert(self, z):
            prev = self.nil
            curr = self.root
            while curr != self.nil:
                prev = curr
                if curr.key > z.key:
                    curr = curr.left
                else:
                    curr = curr.right
    
            if prev == self.nil:
                self.root = z
                z.parent = self.nil
            elif prev.key > z.key:
                prev.left = z
            else:
                prev.right = z
            z.parent = prev
            z.left = self.nil
            z.right = self.nil
            z.color = Node.RED
            self.insert_fixup(z)
    
        def insert_fixup(self, z):
            while z.parent.color == Node.RED:
                # z.parent.parent is guaranteed to exist, because the root is
                # black. If a z.parent is red then it is not the root, so it must
                # have parent.
                if z.parent == z.parent.parent.left:
                    y = z.parent.parent.right  # uncle node
                    if y.color == Node.RED:
                        z.parent.color = Node.BLACK
                        y.color = Node.BLACK
                        z.parent.parent.color = Node.RED
                        z = z.parent.parent
                    else:
                        if z == z.parent.right:
                            z = z.parent
                            self.left_rotate(z)
                        z.parent.color = Node.BLACK
                        z.parent.parent.color = Node.RED
                        self.right_rotate(z.parent.parent)
                else:
                    y = z.parent.parent.left
                    if y.color == Node.RED:
                        z.parent.color = Node.BLACK
                        y.color = Node.BLACK
                        z.parent.parent.color = Node.RED
                        z = z.parent.parent
                    else:
                        if z == z.parent.left:
                            z = z.parent
                            self.right_rotate(z)
                        z.parent.color = Node.BLACK
                        z.parent.parent.color = Node.RED
                        self.left_rotate(z.parent.parent)
            self.root.color = Node.BLACK
    
        def transplant(self, u, v):
            if u.parent == self.nil:
                self.root = v
            elif u == u.parent.left:
                u.parent.left = v
            else:
                u.parent.right = v
            v.parent = u.parent
    
        def minimum(self, x):
            curr = x
            while curr.left != self.nil:
                curr = curr.left
            return curr
    
        def delete(self, z):
            y = z  # y is either the node to be removed, or the node that replaces z
            y_original_color = y.color
            if z.left == self.nil:
                x = z.right          # x moves into y's original position
                self.transplant(z, z.right)
            elif z.right == self.nil:
                x = z.left
                self.transplant(z, z.left)
            else:
                y = self.minimum(z.right)
                y_original_color = y.color
                x = y.right
                if y.parent == z:
                    x.parent = y     # TODO: why do we need this?????
                else:
                    self.transplant(y, y.right)
                    y.right = z.right
                self.transplant(z, y)
                y.left = z.left
                y.left.parent = y
                y.color = z.color
    
            if y_original_color == Node.BLACK:
                self.delete_fixup(x)
    
        def delete_fixup(self, x):
            while x != self.root and x.color == Node.BLACK:
                # x must be a "doubly-black" node in this loop
                if x == x.parent.left:
                    w = x.parent.right   # the sibling of x
                    if w.color == Node.RED:
                        # convert to case 2 or 3 or 4 by rotation
                        w.color = Node.BLACK         # case 1
                        x.parent.color = Node.RED    # case 1
                        self.left_rotate(x.parent)   # case 1
                        w = x.parent.right           # case 1
                    if w.left.color == Node.BLACK and w.right.color == Node.BLACK:
                        w.color = Node.RED           # case 2
                        x = x.parent                 # case 2
                    else:
                        if w.right.color == Node.BLACK:
                            w.color = Node.RED         # case 3
                            w.left.color = Node.BLACK  # case 3
                            self.right_rotate(w)       # case 3
                            w = x.parent.right         # case 3
                        w.color = x.parent.color       # case 4
                        x.parent.color = Node.BLACK    # case 4
                        w.right.color = Node.BLACK     # case 4
                        self.left_rotate(x.parent)     # case 4
                        x = self.root                  # case 4, terminate the loop
                else:
                    # Symmetric to left node case
                    w = x.parent.left
                    if w.color == Node.RED:
                        w.color = Node.BLACK
                        x.parent.color = Node.RED
                        self.right_rotate(x.parent)
                        w = x.parent.left
                    if w.left.color == Node.BLACK and w.right.color == Node.BLACK:
                        w.color = Node.RED
                        x = x.parent
                    else:
                        if w.left.color == Node.BLACK:
                            w.color = Node.RED
                            w.right.color = Node.BLACK
                            self.left_rotate(w)
                        w.color = x.parent.color
                        x.parent.color = Node.BLACK
                        w.left.color = Node.BLACK
                        self.right_rotate(x.parent)
                        x = self.root
    
            # Outside the loop, x is either the root, or a "red-and-black" node
            # If x is the root, then root must be BLACK.
            # If x is a "red-and-black" node, then color it as black, because it
            # will have a RED child in case 2, and it might have RED parent.
            x.color = Node.BLACK
    
        def search(self, key):
            curr = self.root
            while curr:
                if curr.key == key:
                    return curr
                elif curr.key < key:
                    curr = curr.right
                else:
                    curr = curr.left
            return None
    
        def pprint(self, node, indent):
            if node.parent == self.nil:
                s = "{}{} ({})".format(indent, node.key, node.color)
            else:
                side = "l" if node == node.parent.left else "r"
                s = "{}{}: {} ({})".format(indent, side, node.key, node.color)
    
            if node.left != self.nil:
                s += "\n" + self.pprint(node.left, "|---" + indent)
    
            if node.right != self.nil:
                s += "\n" + self.pprint(node.right, "|---" + indent)
            return s
    
        def __str__(self):
            return self.pprint(self.root, "") + "\n"
    
    
    t = RedBlackTree()
    t.insert(Node(11))
    print(t)
    t.insert(Node(3))
    print(t)
    t.insert(Node(5))
    print(t)
    t.insert(Node(20))
    print(t)
    t.insert(Node(9))
    print(t)
    t.insert(Node(1))
    print(t)
    t.insert(Node(2))
    print(t)
    t.insert(Node(4))
    print(t)
    t.insert(Node(6))
    print(t)
    t.insert(Node(-1))
    print(t)
    t.insert(Node(3))
    print(t)
    # print(t.search(5))
    t.delete(t.search(5))
    print(t)
    
    class="example"> 11 (BLACK) 11 (BLACK) |---l: 3 (RED) 5 (BLACK) |---l: 3 (RED) |---r: 11 (RED) 5 (BLACK) |---l: 3 (BLACK) |---r: 11 (BLACK) |---|---r: 20 (RED) 5 (BLACK) |---l: 3 (BLACK) |---r: 11 (BLACK) |---|---l: 9 (RED) |---|---r: 20 (RED) 5 (BLACK) |---l: 3 (BLACK) |---|---l: 1 (RED) |---r: 11 (BLACK) |---|---l: 9 (RED) |---|---r: 20 (RED) 5 (BLACK) |---l: 2 (BLACK) |---|---l: 1 (RED) |---|---r: 3 (RED) |---r: 11 (BLACK) |---|---l: 9 (RED) |---|---r: 20 (RED) 5 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---r: 3 (BLACK) |---|---|---r: 4 (RED) |---r: 11 (BLACK) |---|---l: 9 (RED) |---|---r: 20 (RED) 5 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---r: 3 (BLACK) |---|---|---r: 4 (RED) |---r: 11 (RED) |---|---l: 9 (BLACK) |---|---|---l: 6 (RED) |---|---r: 20 (BLACK) 5 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---|---l: -1 (RED) |---|---r: 3 (BLACK) |---|---|---r: 4 (RED) |---r: 11 (RED) |---|---l: 9 (BLACK) |---|---|---l: 6 (RED) |---|---r: 20 (BLACK) 5 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---|---l: -1 (RED) |---|---r: 3 (BLACK) |---|---|---l: 3 (RED) |---|---|---r: 4 (RED) |---r: 11 (RED) |---|---l: 9 (BLACK) |---|---|---l: 6 (RED) |---|---r: 20 (BLACK) 6 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---|---l: -1 (RED) |---|---r: 3 (BLACK) |---|---|---l: 3 (RED) |---|---|---r: 4 (RED) |---r: 11 (RED) |---|---l: 9 (BLACK) |---|---r: 20 (BLACK)

2.19 Trie Tree

  • Radix Tree

    class Node:
        def __init__(self):
            self.children = [None] * 2
            self.isEnd = False
    
    
    class RadixTree:
        """
        This radix tree only contains binary path at each node, i.e. it only stores
        sting like "01100".
        """
        def __init__(self):
            self.root = Node()
    
        def getIdx(self, s):
            return ord(s) - ord("0")
    
        def getStr(self, idx):
            return str(idx)
    
        def sort(self):
            sorted_arr = []
            stack = [(self.root, "")]
            while stack:
                node, curr_str = stack.pop()
                # NOTE: need to iterate the children from right to left, so left
                # node will be traversed before right node
                for idx, n in reversed(list(enumerate(node.children))):
                    if n:
                        next_str = curr_str + self.getStr(idx)
                        stack.append((n, next_str))
                        if n.isEnd:
                            sorted_arr.append(next_str)
            return sorted_arr
    
        def sort_recursive(self):
            return self._preorder_travel(self.root, "")
    
        def _preorder_travel(self, node, s):
            sorted_arr = []
            if node.isEnd:
                sorted_arr.append(s)
            for idx, n in enumerate(node.children):
                if n:
                    sorted_arr.extend(self._preorder_travel(n, s + self.getStr(idx)))
            return sorted_arr
    
        def insert(self, seq):
            curr = self.root
            for s in seq:
                idx = self.getIdx(s)
                if not curr.children[idx]:
                    curr.children[idx] = Node()
                    curr = curr.children[idx]
                    curr.isEnd = True
    
        def search(self, seq):
            curr = self.root
            for s in seq:
                idx = self.getIdx(s)
                if not curr.children[idx]:
                    return False
                curr = curr.children[idx]
            return curr.isEnd
    
    
    t = RadixTree()
    t.insert("1011")
    t.insert("0")
    t.insert("100")
    t.insert("10")
    t.insert("011")
    
    print(t.search("00"))
    print(t.search("01"))
    print(t.search("011"))
    print(t.search("1011"))
    print(t.sort())
    print(t.sort_recursive())
    
    class="example"> False False False True ['1', '0', '10', '101', '1011'] ['0', '1', '10', '101', '1011']

2.20 (Binary) Heap

  • An array with two attributes: 1) length, the number of elements in the array. 2) heapsize, the number of elements in the heap. The elements in the heap are from A[1] to A[A.heapsize].
  • The heap has two properties:
    • Complete Binary Tree: the tree is completely filled in all levels except the lowest level. For the lowest level, the tree fills from left to right.
    • Heap property: all the parent values are larger (max-heap) / smaller (min-heap) or equal to the child values.
  • Due to complete binary tree property, given a node index i, the parent index is floor(i/2), the left child index is 2*i, and the right child index is 2*i+1. NOTE: the root index is 1 (not 0).
  • NOTE: heapify only takes O(n) time (it looks like O(nlogn), however, a tighter upper bound can be found, which is O(n)). https://www.geeksforgeeks.org/time-complexity-of-building-a-heap/

    from collections import deque
    
    
    class Heap:
        def __init__(self):
            self.heap_size = 0
            self._array = [0]
    
        def __getitem__(self, k):
            if isinstance(k, slice):
                return self._array[slice(k.start + 1, k.stop + 1, k.step)]
            else:
                return self._array[k + 1]
    
        def __len__(self):
            return self.heap_size
    
        def parent(self, i):
            return i // 2
    
        def left(self, i):
            return 2 * i
    
        def right(self, i):
            return 2 * i + 1
    
        def build_heap(self, A):
            # It can be proved that A[len(A) // 2], A[len(A) - 1] are leaf nodes.
            self.heap_size = len(A)
            self._array = [0] + A
            # The time complexity is upper bounded by O(nlogn), but actually the
            # tighter upper bound is O(n)
            for i in range(self.heap_size // 2, 0, -1):
                # Build the heap bottom up
                self.max_heapify(i)
    
        def max_heapify(self, i):
            # Assumes the binary trees rooted at self.left(i) and self.right(i) are
            # max-heap
    
            # The Time complexity of this operation is O(log n), or O(h), where h
            # is the height of the tree
            lft = self.left(i)
            rht = self.right(i)
    
            if lft <= self.heap_size and self._array[i] < self._array[lft]:
                largest = lft
            else:
                largest = i
    
            if rht <= self.heap_size and self._array[largest] < self._array[rht]:
                largest = rht
    
            if largest != i:
                self._array[i], self._array[largest] = self._array[
                    largest], self._array[i]
                self.max_heapify(largest)
    
        def maximum(self):
            return self._array[1]
    
        def extract_max(self):
            # Time complexity O(logn)
            if self.heap_size < 1:
                raise Exception("Heap underflow!")
            max = self._array[1]
            self._array[1] = self._array[self.heap_size]
            self.heap_size -= 1
            self.max_heapify(1)
            return max
    
        def insert(self, x):
            # Time complexity O(logn)
            self.heap_size += 1
            self._array.append(-1)
            self.increase_key(self.heap_size, x)
    
        def increase_key(self, i, k):
            # Time complexity O(logn)
            if k < self._array[i]:
                raise Exception("New key must be larger than the current key.")
    
            self._array[i] = k
            while i > 1 and self._array[i] > self._array[self.parent(i)]:
                self._array[self.parent(i)], self._array[i] = self._array[i], self._array[self.parent(i)]
                i = self.parent(i)
    
        def __str__(self):
            return str(self[0:len(self)])
    
    
    def heap_sort(A):
        # Time complexity O(nlogn)
        h = Heap()
        h.build_heap(A)
        q = deque()
        while h.heap_size > 0:
            q.appendleft(h.extract_max())
        return list(q)
    
    
    # A = [1, 2, 3, 4, 5, 6]
    # h = Heap(A, 100)
    # print(h)
    
    A = [7, 100, 3, 6, 90, 45, 1000]
    h = Heap()
    h.build_heap(A)
    print(h)
    
    sorted_A = heap_sort(A)
    print(sorted_A)
    
    h = Heap()
    h.insert(7)
    h.insert(100)
    h.insert(3)
    h.insert(6)
    h.insert(90)
    h.insert(45)
    h.insert(32)
    print(h.maximum())
    print(h)
    h.increase_key(4, 1000)
    print(h)
    

2.21 Dynamic Programming

  • Divide-and-conquer algorithms partition the problem into disjoint subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem.
  • Dynamic programming applies when the subproblems overlap, that is when subproblems share subsubproblems. In this context, divide-and-conquer does more work than necessary, repeatedly solve the common subsubproblems.
  • Dynamic programming solve each subsubproblem just once and saves its answer in a table to avoid solving the same problem again.
  • To develop a dynamic programming algorithm, we follow 4 steps:
    • Characterize the structure of an optimal solution
    • Recursively define the value of an optimal solution
    • Compute the value of an optimal solution, typically in a bottom-up fashion
    • Construct an optimal solution from computed information

2.21.1 Rod cutting

  • Top-down method utilizing memoization

    def cut_rod(price, n):
        memo = [-1 for i in range(n)]
        return helper_cut_rod(price, n, memo)
    
    
    def helper_cut_rod(price, n, memo):
        if memo[n - 1] != -1:
            return memo[n - 1]
        if n == 0:
            return 0
        optimal_price = -1
        for i in range(1, n + 1):
            optimal_price = max(optimal_price, price[i - 1] + helper_cut_rod(price, n - i, memo))
            memo[n - 1] = optimal_price
            print(memo)
        return optimal_price
    
    
    price = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]
    print(cut_rod(price, 10))
    
    class="example"> [1, -1, -1, -1, -1, -1, -1, -1, -1, -1] [1, 2, -1, -1, -1, -1, -1, -1, -1, -1] [1, 5, -1, -1, -1, -1, -1, -1, -1, -1] [1, 5, 6, -1, -1, -1, -1, -1, -1, -1] [1, 5, 6, -1, -1, -1, -1, -1, -1, -1] [1, 5, 8, -1, -1, -1, -1, -1, -1, -1] [1, 5, 8, 9, -1, -1, -1, -1, -1, -1] [1, 5, 8, 10, -1, -1, -1, -1, -1, -1] [1, 5, 8, 10, -1, -1, -1, -1, -1, -1] [1, 5, 8, 10, -1, -1, -1, -1, -1, -1] [1, 5, 8, 10, 11, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, 14, -1, -1, -1, -1] [1, 5, 8, 10, 13, 15, -1, -1, -1, -1] [1, 5, 8, 10, 13, 16, -1, -1, -1, -1] [1, 5, 8, 10, 13, 16, -1, -1, -1, -1] [1, 5, 8, 10, 13, 16, -1, -1, -1, -1] [1, 5, 8, 10, 13, 17, -1, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, 19, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, 23, -1] [1, 5, 8, 10, 13, 17, 18, 22, 23, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, 26] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 57] 57
    • Bottom-up method

      def cut_rod(price, n):
          optimal_price, cut_points = cut_rod_helper(price, n)
          optimal_cut_points = []
          curr_len = n
          while curr_len > 0:
              optimal_cut_points.append(cut_points[curr_len])
              curr_len = curr_len - cut_points[curr_len]
          return optimal_price[n], optimal_cut_points
      
      
      def cut_rod_helper(price, n):
          memo = [-1 for i in range(n + 1)]
          cut_point = [-1 for i in range(n + 1)]
          memo[0] = 0
          cut_point[0] = 0
          for i in range(1, n + 1):
              opt = -1
              cp = -1
              for j in range(1, i + 1):
                  tmp = price[j - 1] + memo[i - j]
                  if tmp > opt:
                      opt = tmp
                      cp = j
                      memo[i] = opt
                      cut_point[i] = cp
                      print(memo)
                      print(cut_point)
          return memo, cut_point
      
      
      price = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]
      opt_price, opt_cut = cut_rod(price, 7)
      print(opt_price)
      print(opt_cut)
      
      class="example"> [0, 1, -1, -1, -1, -1, -1, -1] [0, 1, -1, -1, -1, -1, -1, -1] [0, 1, 2, -1, -1, -1, -1, -1] [0, 1, 1, -1, -1, -1, -1, -1] [0, 1, 5, -1, -1, -1, -1, -1] [0, 1, 2, -1, -1, -1, -1, -1] [0, 1, 5, 6, -1, -1, -1, -1] [0, 1, 2, 1, -1, -1, -1, -1] [0, 1, 5, 8, -1, -1, -1, -1] [0, 1, 2, 3, -1, -1, -1, -1] [0, 1, 5, 8, 9, -1, -1, -1] [0, 1, 2, 3, 1, -1, -1, -1] [0, 1, 5, 8, 10, -1, -1, -1] [0, 1, 2, 3, 2, -1, -1, -1] [0, 1, 5, 8, 10, 11, -1, -1] [0, 1, 2, 3, 2, 1, -1, -1] [0, 1, 5, 8, 10, 13, -1, -1] [0, 1, 2, 3, 2, 2, -1, -1] [0, 1, 5, 8, 10, 13, 14, -1] [0, 1, 2, 3, 2, 2, 1, -1] [0, 1, 5, 8, 10, 13, 15, -1] [0, 1, 2, 3, 2, 2, 2, -1] [0, 1, 5, 8, 10, 13, 16, -1] [0, 1, 2, 3, 2, 2, 3, -1] [0, 1, 5, 8, 10, 13, 17, -1] [0, 1, 2, 3, 2, 2, 6, -1] [0, 1, 5, 8, 10, 13, 17, 18] [0, 1, 2, 3, 2, 2, 6, 1] 18 [1, 6]
      • Subproblem Graph

      subproblem_graph.png

      • Top-down approach performs a "Depth-First-Search" of the subproblem graph.
      • Bottom-up approach solves the subproblem in a "reverse topological sort" order.
      • The time complexity of the dynamic ptrogramming is O(V + E), where V are the nodes, E are the edges of the graph.

2.21.2 Matrix-chain multiplication

  • 2D DP problem

    def matrix_chain_order(p):
        n = len(p) - 1
        m = [[-1 for i in range(n)] for i in range(n)]
        s = [[-1 for i in range(n)] for i in range(n)]
    
        for i in range(n):
            m[i][i] = 0
            s[i][i] = 0
    
        for d in range(1, n):
            # l is the difference between start and end indices
            for i in range(n - d):
                j = i + d
                # i is the start index
                for k in range(i, j):
                    # k is the relative split position
                    tmp = m[i][k] + m[k + 1][j] + p[i] * p[k + 1] * p[j + 1]
                    if m[i][j] == -1:
                        m[i][j] = tmp
                        s[i][j] = k
                    elif m[i][j] > tmp:
                        m[i][j] = tmp
                        s[i][j] = k
        return m, s
    
    
    def print_solution(s, i, j):
        sol = ""
        if i == j:
            sol += "A_{}".format(i + 1)
        else:
            sol += "("
            sol += print_solution(s, i, s[i][j])
            sol += print_solution(s, s[i][j] + 1, j)
            sol += ")"
        return sol
    
    
    
    p = [30, 35, 15, 5, 10, 20, 25]
    m, s = matrix_chain_order(p)
    sol = print_solution(s, 0, len(m) - 1)
    print(m)
    print(s)
    print(sol)
    
    class="example"> [[0, 15750, 7875, 9375, 11875, 15125], [-1, 0, 2625, 4375, 7125, 10500], [-1, -1, 0, 750, 2500, 5375], [-1, -1, -1, 0, 1000, 3500], [-1, -1, -1, -1, 0, 5000], [-1, -1, -1, -1, -1, 0]] [[0, 0, 0, 2, 2, 2], [-1, 0, 1, 2, 2, 2], [-1, -1, 0, 2, 2, 2], [-1, -1, -1, 0, 3, 4], [-1, -1, -1, -1, 0, 4], [-1, -1, -1, -1, -1, 0]] ((A_1(A_2A_3))((A_4A_5)A_6))
    • Elements of dynamic programming
      • Optimal substructure
        • We build an optimal solution to the problem from optimal solutions of the subproblems.
        • Common pattern in discovering optimal substructure:
          • A solution to the problem consists of making a choice. Making to the choice leads to solving one or more subproblems.
          • You suppose for a given problem, you are given the choice that leads to an optimal solution. You do not concern yourself yet with how the choice came up. You just assume it has been given to you.
          • Given the choice, you determine which subproblems follow, and how to best characterize the resulting space of subproblems (e.g. memo[i], memo[i][j]).
          • You show that the solutions to the subproblems used within an optimal solution to the problem must be optimal themselves to the subproblems. You can prove it by contradiction.
        • Optimal substructure varies across problem domains in two ways:
          • how many subproblems an optimal solution to the original problem uses
          • how many choices we have in determining which subproblems to use in an optimal solution.
        • Informally, the running time of a dynamic-programming algorithm depends on the product of two factors:
          • the number of subproblems overall, denoted as n (i.e. the vertices of the subproblem graph).
          • how many choices we look at for each subproblem, denoted as m (i.e. the edges of each vertex, so the total edge number will be n * m).
        • DP versus greedy:
          • DP solves the subproblem first, and then use the solutions of the subproblems to solve the original problem
          • Greedy first make a "greedy" choice (the choice that looks best at the time), and then solve a resulting subproblem. It doesn't bothering to solve every possible smaller subproblems first. In some cases, this strategy works.
        • Independence between subproblems
          • If solving subproblems will need to share information between subproblems in order to find the optimal solution to the original problem, then the problem cannot be solved using DP. Example: unweighted longest simple path in a directed graph.
      • Overlapping subproblems
        • When a recursive algorithm revisits the same problem repeatedly, then the optimization problem has overlapping subproblems.
    • Longest Common Subsequence

      def longest_common_subsequence(x, y):
          m = len(x)
          n = len(y)
          c = [[0 for j in range(n + 1)] for i in range(m + 1)]
          d = [[-1 for j in range(n)] for i in range(m)]
          for i in range(1, m + 1):
              for j in range(1, n + 1):
                  if x[i - 1] == y[j - 1]:
                      c[i][j] = c[i - 1][j - 1] + 1
                      d[i - 1][j - 1] = 2  # 2 means go to left up corner for parent
                  elif c[i - 1][j] > c[i][j - 1]:
                      c[i][j] = c[i - 1][j]
                      d[i - 1][j - 1] = 0  # 0 means go left for parent
                  else:
                      c[i][j] = c[i][j - 1]
                      d[i - 1][j - 1] = 1  # 1 means go right for parent
          return c[m][n], d
      
      
      def print_lcs(x, d, i, j):
          if i < 0 or j < 0:
              return ""
      
          s = ""
          if d[i][j] == 2:
              s += print_lcs(x, d, i - 1, j - 1)
              s += x[i]
          elif d[i][j] == 0:
              s += print_lcs(x, d, i - 1, j)
          else:
              s += print_lcs(x, d, i, j - 1)
          return s
      
      
      x = "abcbdab"
      y = "bdcaba"
      
      ct, d = longest_common_subsequence(x, y)
      print(ct)
      print(print_lcs(x, d, len(x) - 1, len(y) - 1))
      
      class="example"> 4 bdab

      NOTE: if only the length is needed, then there is no need to keep the whole lookup table. Only the latest two rows of the table is needed.

    • Optimal Binary Search Tree

      def optimal_bst(p, q, n):
          e = [[0 for j in range(n + 1)] for i in range(n + 2)]  # e[1 to n + 1][0 to n]
          w = [[0 for j in range(n + 1)] for i in range(n + 2)]
          root = [[-1 for j in range(n + 1)] for i in range(n + 1)]
      
          for i in range(1, n + 2):
              e[i][i - 1] = q[i - 1]
              w[i][i - 1] = q[i - 1]
      
          for d in range(n):
              for i in range(1, n - d + 1):
                  j = i + d
                  e[i][j] = float("inf")
                  w[i][j] = w[i][j - 1] + p[j - 1] + q[j]
                  for k in range(i, j + 1):
                      tmp = e[i][k - 1] + e[k + 1][j] + w[i][j]
                      if tmp < e[i][j]:
                          e[i][j] = tmp
                          root[i][j] = k
                          print(e)
                          print(w)
          return e[1][n], root
      
      
      def print_sol(root, i, j):
          if i > j:
              print('d{}'.format(j))
              return
          k = root[i][j]
          print("k{}".format(k))
          print("Left:")
          print_sol(root, i, k - 1)
          print("Right:")
          print_sol(root, k + 1, j)
      
      
      p = [0.15, 0.1, 0.05, 0.1, 0.2]
      q = [0.05, 0.1, 0.05, 0.05, 0.05, 0.1]
      n = len(p)
      e_opt, root = optimal_bst(p, q, n)
      print(e_opt)
      print_sol(root, 1, n)
      
      class="example"> [[0, 0, 0, 0, 0, 0], [0.05, 0.45000000000000007, 0.9, 1.25, 1.75, 2.75], [0, 0.1, 0.4, 0.7, 1.2, 2.0], [0, 0, 0.05, 0.25, 0.6, 1.2999999999999998], [0, 0, 0, 0.05, 0.30000000000000004, 0.9], [0, 0, 0, 0, 0.05, 0.5], [0, 0, 0, 0, 0, 0.1]] [[0, 0, 0, 0, 0, 0], [0.05, 0.30000000000000004, 0.45, 0.55, 0.7000000000000001, 1.0000000000000002], [0, 0.1, 0.25, 0.35, 0.49999999999999994, 0.7999999999999999], [0, 0, 0.05, 0.15000000000000002, 0.3, 0.6], [0, 0, 0, 0.05, 0.2, 0.5], [0, 0, 0, 0, 0.05, 0.35], [0, 0, 0, 0, 0, 0.1]] 2.75 k2 Left: k1 Left: d0 Right: d1 Right: k5 Left: k4 Left: k3 Left: d2 Right: d3 Right: d4 Right: d5

2.21.3 0/1 Knapsack

2.21.4 Edit Distance

2.22 Greedy Algorithms

2.22.1 Interval-graph coloring problem

2.23 Union Find

Union find with rank and path compression. The time complexity of union find with both rank and path compression is O(m α(n)), where m is the number of make-set, find, union operations, n is the number of make-set operations (only happens at the beginning, usually means n nodes). α(n) is a very slowly growing function, α(n) < 4 in most of the practical cases. Hence, the time complexity is almost linear to the number of operations. Each operation nearly takes constant time.

class UnionFind:
    def __init__(self, size):
        self.parent = list(range(size))
        self.rank = [0 for i in range(size)]

    def find(self, x):
        while x != self.parent[x]:
            # path compression, connect the grandchild node with its grandparent every time executing find
            self.parent[x] = self.parent[self.parent[x]]
            x = self.parent[x]
        return x

    def union(self, x, y):
        print('Before Union')
        print('Parent: {}'.format(self.parent))
        print('Rank: {}'.format(self.rank))
        xroot = self.find(x)
        yroot = self.find(y)

        if self.rank[xroot] > self.rank[yroot]:
            self.parent[y] = xroot
        elif self.rank[x] < self.rank[y]:
            self.parent[x] = yroot
        else:
            # rank only increases when the two sets to join have the same rank
            self.parent[y] = xroot
            self.rank[xroot] += 1

        print('After Union')
        print('Parent: {}'.format(self.parent))
        print('Rank: {}'.format(self.rank))


uf = UnionFind(10)
uf.union(3, 4)
uf.union(9, 4)
uf.union(8, 0)
uf.union(2, 3)
print(uf.find(3))
print(uf.find(2))
print(uf.find(9))
uf.union(5, 6)
uf.union(5, 9)
uf.union(7, 3)
uf.union(4, 8)
uf.union(6, 1)
class="example"> Before Union Parent: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] After Union Parent: [0, 1, 2, 3, 3, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] Before Union Parent: [0, 1, 2, 3, 3, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] After Union Parent: [0, 1, 2, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 0, 1] Before Union Parent: [0, 1, 2, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 0, 1] After Union Parent: [8, 1, 2, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 1, 1] Before Union Parent: [8, 1, 2, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 1, 1] 3 3 9 Before Union Parent: [8, 1, 3, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 9, 5, 5, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 1, 0, 0, 1, 1] Before Union Parent: [8, 1, 3, 3, 9, 5, 5, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 1, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 9, 5, 5, 7, 8, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] Before Union Parent: [8, 1, 3, 3, 9, 5, 5, 7, 8, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 9, 5, 5, 3, 8, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] Before Union Parent: [8, 1, 3, 3, 9, 5, 5, 3, 8, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 5, 5, 5, 3, 5, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] Before Union Parent: [8, 1, 3, 3, 5, 5, 5, 3, 5, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] After Union Parent: [8, 5, 3, 3, 5, 5, 5, 3, 5, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1]

2.24 Single-source shortest path

2.24.1 Bellman-Ford

2.25 Quick Select

Time complexity: worst case O(n2) if the pivot is always the biggest or smallest. best case O(n), if we can always decrease the array to search by half, then 1 + 1/2 + 1/4 + … = 2. average O(n).

def quickSelect(nums, k):
    left = 0
    right = len(nums) - 1
    while left < right:
        pivot_idx = partition(nums, left, right)
        if pivot_idx == k - 1:
            break
        elif pivot_idx < k - 1:
            left = pivot_idx + 1
        else:
            right = pivot_idx - 1
    return nums[:k]


def partition(nums, left, right):
    pivot = nums[right]
    i = left
    for j in range(left, right):
        if nums[j] < pivot:
            nums[i], nums[j] = nums[j], nums[i]
            i += 1

    nums[i], nums[right] = nums[right], nums[i]
    return i


nums = [5, 3, 9, 2, 7, 2]
print(quickSelect(nums, 3))

2.26 Graph Coloring Problem

The general graph coloring problem (find the minimum number of colors that the adjacent nodes do not share the same color) is NP-Complete. Greedy algorithm can only find an upper bound of the color numbers, but it is not guaranteed to be the optimal.

2.27 Tree in the context of graph

  • A tree is an undirected graph in which any two vertices are connected by exactly one path. Any connected graph without simple cycles is a tree.
  • Any connected graph who has n nodes with n - 1 edges is a tree.
  • Diameter of a graph: longest of the shortest paths of any two pairs in a graph.
  • Remoteness: the length of the longest path from the node to its leaves.
  • Center: the node that has the smallest remoteness. Or if treating the node as the root of the tree, the tree has the minimal height.
  • Theorem: There can be at most two centers for a tree.
  • Theorem: the center of the tree must be on the diameter of the tree.

2.28 Bellman-Ford Algorithm

  • Repeat |V| - 1 times, for each edge, relax the edge
  • The reason for |V| - 1 times: because any simple path from u to v has at most |V| - 1 edges. Otherwise, the path has cycle. Positive cycles will not contribute to shortest path.
  • After |V| - times, if we do the relaxation for all the edges one more time, we can detect negative cycles if certain nodes got relaxed again.

2.29 Dijkstra's Algorithm

  • Works for directed and undirected graphs.
  • Greedy, always find the node with the shortest distance so far and relax its edges.
  • Does not work with negative weights.

    from heapq import heappush, heappop
    from collections import deque
    
    
    def dijkstra(n, edges, start):
        graph = buildGraph(n, edges)
    
        dist = [float('inf') for _ in range(n)]
        dist[start] = 0
    
        parent = [-1 for _ in range(n)]
        parent[start] = start
    
        minHeap = []
        heappush(minHeap, (0, start))
    
        visited = set()
        while minHeap:
            d, node = heappop(minHeap)
            if node in visited:
                continue
    
            dist[node] = d
            visited.add(node)
    
            for nei, w in graph[node]:
                if nei not in visited:
                    if dist[nei] > dist[node] + w:
                        dist[nei] = dist[node] + w
                        parent[nei] = node
                        heappush(minHeap, (dist[nei], nei))
    
        print('Shortest distance from {}:'.format(start))
        print(dist)
    
        print('All the shortest paths starting from {}:'.format(start))
        for i in range(n):
            print("->".join(map(str, reconstruct_path(parent, i))))
    
    
    def reconstruct_path(parent, node):
        path = deque()
        while node != parent[node]:
            path.appendleft(node)
            node = parent[node]
            path.appendleft(node)
        return path
    
    
    def buildGraph(n, edges):
        graph = {i: set() for i in range(n)}
        for u, v, w in edges:
            graph[u].add((v, w))
        return graph
    
    
    dijkstra(5, [[0, 1, 10], [0, 2, 3], [2, 1, 2], [2, 3, 5], [1, 3, 1], [0, 4, 2], [3, 4, 1]], 0)
    
    class="example"> Shortest distance from 0: [0, 5, 3, 6, 2] All the shortest paths starting from 0: 0->2->2->1 0->2 0->2->2->1->1->3 0->4

2.31 Greatest Common Divisor

def gcd(a, b):
    if a < b:
        a, b = b, a

    # now a is always larger than b
    while b != 0:
        tmp = b
        b = a % b
        a = tmp

    return a


print(gcd(100, 30))
print(gcd(100, 11))
class="example"> 10 1

2.32 Find prime numbers less than n

2.32.1 Method 1 - brute force

def printPrime(n):
    res = []
    for i in range(2, n):
        if isPrime(i):
            res.append(i)
    return res

def isPrime(x):
    for i in range(2, x):
        if x % i == 0:
            return False
    return True


print(printPrime(20))
class="example"> [2, 3, 5, 7, 11, 13, 17, 19]

2.32.2 Method 2 - better brute force

def printPrime(n):
    res = []
    for i in range(2, n):
        if isPrime(i):
            res.append(i)
    return res


def isPrime(x):
    for i in range(2, x):
        if i * i > x:
            break
        if x % i == 0:
            return False
    return True


print(printPrime(20))
class="example"> [2, 3, 5, 7, 11, 13, 17, 19]

2.32.3 Method 3 - Sieve of Eratothenes

Key idea: mark multiples as not a prime.

def printPrime(n):
    if n <= 2:
        return []

    res = [2]

    isPrime = [True for i in range(n)]
    for i in range(4, n):
        if i % 2 == 0:
            isPrime[i] = False

    for i in range(3, n, 2):
        if isPrime[i]:
            x = i * i
            while x < n:
                isPrime[x] = False
                x += i

    for i in range(3, n):
        if isPrime[i]:
            res.append(i)
    return res


print(printPrime(100))
print(printPrime(2))
print(printPrime(3))
print(printPrime(10))
class="example"> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97] [] [2] [2, 3, 5, 7]

2.33 Bit Manipulation

2.33.1 XOR

  • Commutative law: a ^ b = b ^ a
  • Associative law: (a ^ b) ^ c = a ^ (b ^ c)
  • a ^ b ^ b = a
  • a ^ 0 = a
  • a ^ a = 0
  • Flip the n-th bit: mask = 1 << i, flippedx = x ^ mask
  • Swap two values without using extra space
a = a ^ b
b = a ^ b
a = a ^ b

2.33.2 int

a int type (signed int) is 32 bit (4 bytes), representing integers from [-2**31, 2**31 - 1].

2.34 Knuth Shuffling (Fisher-Yates shuffle)

import random

def knuthShuffle(arr):
    for i in range(len(arr) - 1, 0, -1):
        j = random.randint(0, i)
        arr[i], arr[j] = arr[j], arr[i]

arr = list(range(10))
knuthShuffle(arr)
print(arr)
class="example"> [1, 8, 9, 2, 7, 0, 3, 5, 4, 6]

2.35 Reservoir Sampling

Sample k numbers from n numbers. Consider the number array is very large that it cannot fit in memory.

  • Can we sample the the number in a streaming fashion?
  • The answer is reservoir sampling!
    • If current selected number is less than k, then keep appending the new number into the selected number array.
    • If current selected number is equal to k, when the i th number arrives, keep the new number with probablity of k / i and discard one of the old numbers equally randomly (otherwise, just keep the old ones).
    • In the end, each number is kept in the selected set with probablity of k / n.

      import random
      
      def reservoirSampling(nums, k):
          """
          Randomly draw k samples from a very large array nums.
      
          Assuming nums is a very large array that cannot fit in memory, we can only
          sample nums in a streaming fashion.
          """
          reservoir = []
          for i, x in enumerate(nums):
              if len(reservoir) < k:
                  reservoir.append(x)
              else:
                  rn = random.randint(0, i - 1)
                  if rn < k:
                      reservoir[rn] = x
          return reservoir
      
      
      nums = list(range(1, 10001))
      print(reservoirSampling(nums, 5))
      nums = list(range(1, 7))
      print(reservoirSampling(nums, 5))
      
      class="example"> [5974, 4744, 4354, 2221, 6505] [1, 2, 6, 4, 5]

2.36 Random pick number with weight

  • Compute the accumulated sum.
  • Generate a random number in range of [0, accumsum[-1]).
  • Perform a binary search to see the random number sits in which interval (defined by the accumulated sum array).

2.37 Monotonic Queue/Stack

https://leetcode.com/problems/shortest-subarray-with-sum-at-least-k/discuss/204290/Monotonic-Queue-Summary https://leetcode.com/problems/daily-temperatures/ https://leetcode.com/problems/sliding-window-maximum/

A queue where the elements from front to end are always strictly increasing (or nondecreasing) or decreasing (or nonincreasing).

When a new value arrives, it kicks the values that are smaller than it in front of it for an increasing queue.

It enables finding the maximum in a queue in O(1) time.

import collections


class MaxQueue(object):
    def __init__(self):
        self.queue = collections.deque()
        self.max_q = collections.deque()

    def append(self, val):
        self.queue.append(val)
        while self.max_q and self.max_q[-1] < val:
            self.max_q.pop()
            self.max_q.append(val)

    def popleft(self):
        val = self.queue.popleft()
        if self.max_q and self.max_q[0] == val:
            self.max_q.popleft()
        return val

    def getMax(self):
        return self.max_q[0] if self.max_q else None

2.38 Producer Consumer Pattern

2.39 Dutch national flag problem (three-way-partition problem)

Like quick sort partition, but has one more pointer to partition the array into three chunks.

  • This algorithm can be used to improve quick sort: partition the array into < pivot, = pivot and > pivot 3 parts. This can be useful if there is lots of duplicates in the array.

2.40 Hamiltonian path and hamiltonian cycle

  • A Hamiltonian path is a path that visits each vertex of the graph exactly once.
  • A Hamiltonian cycle is a cycle that visits each vertext exactly once.

Backtrack

2.41 Least Recently Used (LRU) cache

class LRUCache(object):
    def __init__(self, capacity):
        self.capacity = capacity
        self.key_map = {}
        self.dlist = DoublyLinkedList()

    def put(self, key, value):
        if key not in self.key_map and len(self.key_map) == self.capacity:
            tail = self.dlist.pop()
            self.key_map.pop(tail.key)

        if key in self.key_map:
            self.dlist.remove(self.key_map[key])
        node = Node(key, value)
        self.key_map[key] = node
        self.dlist.appendleft(node)

    def get(self, key):
        if key not in self.key_map:
            return None
        node = self.key_map[key]
        self.dlist.remove(node)
        self.dlist.appendleft(node)
        # print(node)
        return node.val


class DoublyLinkedList(object):
    def __init__(self):
        self.sentinel = Node(0, 0)
        self.sentinel.next = self.sentinel
        self.sentinel.prev = self.sentinel

    def appendleft(self, node):
        head = self.sentinel.next
        self.sentinel.next = node
        node.prev = self.sentinel
        head.prev = node
        node.next = head

    def pop(self):
        tail = self.getTail()
        if not tail:
            raise IndexError('The list is empty!')
        self.remove(tail)
        return tail

    def remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def getHead(self):
        return self.sentinel.next if self.sentinel.next is not self.sentinel else None

    def getTail(self):
        return self.sentinel.prev if self.sentinel.prev is not self.sentinel else None


class Node(object):
    def __init__(self, key, val):
        self.key = key
        self.val = val
        self.next = None
        self.prev = None


cache = LRUCache(3)
cache.put('a', 1)
cache.put('b', 2)
cache.put('c', 3)
cache.put('c', 4)
cache.put('c', 5)
cache.put('d', 6)
print(cache.get('a'))
print(cache.get('b'))
print(cache.get('c'))
cache.put('e', 7)
cache.put('e', 8)
print(cache.get('b'))
class="example"> None 2 5 2

2.42 Interval problems

2.42.1 Count number of events happening concurrently at certain time point

from heapq import heappush, heappop


class EventCounter(object):
    def __init__(self, intervals):
        pq = []
        for start, end in intervals:
            heappush(pq, (start, 1))
            heappush(pq, (end, -1))

        self.count = [[-float('inf'), 0]]
        while pq:
            time, diff = heappop(pq)
            if time == self.count[-1][0]:
                self.count[-1][1] += diff
            else:
                self.count.append([time, self.count[-1][1] + diff])

    def query(self, time):
        print(self.count)
        left = 0
        right = len(self.count) - 1
        res = 0
        while left <= right:
            mid = left + (right - left) // 2
            if self.count[mid][0] <= time:
                res = self.count[mid][1]

            if self.count[mid][0] <= time:
                left = mid + 1
            else:
                right = mid - 1
                print(res)
        return res


ec = EventCounter([[2, 5], [3, 8], [4, 8], [5, 10]])
ec.query(3)
ec.query(5)
class="example"> [[-inf, 0], [2, 1], [3, 2], [4, 3], [5, 3], [8, 1], [10, 0]] 0 [[-inf, 0], [2, 1], [3, 2], [4, 3], [5, 3], [8, 1], [10, 0]] 3

2.42.2 Meeting room problem

2.42.3 Merge intervals

2.42.4 add/delete intervals

2.43 API Limiter

2.44 Minimax

Two players play a game. Player A wants to maximize the score, while player B wants to minimize the score.

  • Do a DFS (post order), player A and B take turns to maximize/minimize the score at each level.
  • Note: we might be able to the prune the tree if there is no need to explore deeper.

4 P, NP, NP-complete, NP-hard

  • P: the set of problems that can be solved in polynomial time.
  • NP (Nondeterministic polynomial time): the set of problems that can be verified (whether the provided solution is correct or not) in polynomial time given a potential solution.
  • NP-complete: a problem p in NP is an NP-complete problem if every other problems in NP can be reduced to p in polynomial time. That is any problems belong to both NP-hard and NP. A solution to an NP-complete problem can be verified "quickly", but there is no known way to find a solution quickly.
  • NP-hard: a decision problem p is NP-hard if every problem in NP can be reduced to p in polynomial time.

5 Catalan Number

\[C_0 = 1\] \[C_{n+1} = \sum_{i=0}{n} C_i C_{n - i}\] The formula indicates a recursion scheme.

Related to generate parentheses problem, generate full binary tree problem, monotonic path problem (number of "move right" must be larger than number of "move up" in the grid, so always traveling the grid under its diagonal) https://en.wikipedia.org/wiki/Catalan_number

6 ASCII

Pronunciation: ass-key :) Extended ASCII encodes 256 characters (0 - 255). Therefore, a char usually takes 1 byte (8 bits).

7 Convert int to byte string

def int_to_byte(num):
    # assuming int is 32 bit
    byte = [chr(num >> i * 8 & 0xff) for i in range(4)]
    byte.reverse()
    print(byte)
    byte_str = ''.join(byte)
    return ''.join(byte)

def byte_to_int(byte):
    res = 0
    for i, c in enumerate(byte):
        res += ord(c) << (3 - i) * 8
    return res

b = int_to_byte(12)
print(byte_to_int(b))
class="example"> ['\x00', '\x00', '\x00', '\x0c'] 12

8 Rotation Matrix

\[R = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}\]

Therefore, turn left 90 degree (anti-clockwise): \[R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\]

\[R \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} -y \\ x \end{bmatrix}\]

Turn right 90 degree (clockwise): \[R = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}\]

\[R \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} y \\ -x \end{bmatrix}\]

Author: Jinyu Xie

Created: 2020-05-16 Sat 23:20

Validate

This post is licensed under CC BY 4.0 by the author.

Comments powered by Disqus.