Algorithm Notes

Posted May 16, 2020

By Jinyu Xie 72 min read

‎

1 Introduction

This is my journey into Algorithm and Data Structures. I implemented quite a bunch of classical algorithms based on the holy bible Introduction to Algorithms. End up getting offers from Amazon Alexa (Applied Scientist), Uber (Software Engineer), Google (Software Engineer), Microsoft (Software Engineer) with a Mechanical Engineering background.

2 Algorithms

2.1 Loop invariant

Initialization
Maintenance
Termination

2.2 Asymptotic notation

Θ(g(n)) asymptotic tight bound: there exists c₁, c₂, such that c₁*g(n) ≤ f(n) ≤ c₂*g(n).
O(g(n)) asymptotic upper bound: there exists c such that 0 ≤ f(n) ≤ c*g(n).
Ω(g(n)) asymptotic lower bound: there exists c such that c*g(n) ≤ f(n).

2.3 Master Theorem

Given the recurrence equation: T(n) = aT(n/b) + f(n)

If f(n) = O(n^{log_b a - ε}) for some constant ε, then T(n) = Θ(n^{log_b a}). That is if f(n) is polynomially faster than n^{log_b a} then the time will be dominated by n^{log_b a}.
If f(n) = Θ(n^{log_b a} * log^k n) for any k ≥ 0, then T(n) = Θ(n^{log_b a} log^{k + 1} n).
If f(n) = Ω(n^{log_b a + ε}) for some constant ε, and if af(n/b) ≤ cf(n) for some constant c, then T(n) = Θ(f(n)). That is f(n) is polynomially slower than n^{log_b a}, then the time will be dominated by f(n). Can be translated as
- Too many leaves. Number of leaf nodes outweighs the sum of the internal evaluation cost. The total running time is O(n^{log_b a}).
- Equal work per level. Each level cost n^{log_b a} time, and there are log_b n levels. Total time O(n^{log_b a} log n).
- Too expensive a root. The internal evaluation costs dominates. O(f(n)).

2.4 Sort

2.4.1 Insertion Sort

def insertion_sort(arr):
    if len(arr) == 1:
        print(arr)
        return arr

    for i in range(1, len(arr)):
            # Get the latest unsorted value
        key = arr[i]
        j = i - 1
        while arr[j] > key and j >= 0:
            # When the current value is larger than the unsorted value, shift
            # the current value to the right
            arr[j + 1] = arr[j]
            print(arr)
            j -= 1

        # The current value arr[j] is either smaller than the unsorted value,
        # or the current position is -1.
        arr[j + 1] = key
        print(arr)

    return arr



a = [2, 5, 3, 8, 7, 1]
insertion_sort(a)

insertion_sort([1])

insertion_sort([2, 2, 2, 1])

class="example"> [2, 5, 3, 8, 7, 1] [2, 5, 5, 8, 7, 1] [2, 3, 5, 8, 7, 1] [2, 3, 5, 8, 7, 1] [2, 3, 5, 8, 8, 1] [2, 3, 5, 7, 8, 1] [2, 3, 5, 7, 8, 8] [2, 3, 5, 7, 7, 8] [2, 3, 5, 5, 7, 8] [2, 3, 3, 5, 7, 8] [2, 2, 3, 5, 7, 8] [1, 2, 3, 5, 7, 8] [1] [2, 2, 2, 1] [2, 2, 2, 1] [2, 2, 2, 2] [2, 2, 2, 2] [2, 2, 2, 2] [1, 2, 2, 2]

2.4.2 Merge Sort

def mergeSort(A):
    return mergeSortHelper(A, 0, len(A) - 1)


def mergeSortHelper(A, left, right):
    if left == right:
        return A[left:right + 1]
    mid = left + (right - left) // 2
    A_left = mergeSortHelper(A, left, mid)
    A_right = mergeSortHelper(A, mid + 1, right)
    return mergeArray(A_left, A_right)


def mergeArray(A, B):
    i = 0
    j = 0
    merged = []
    while i < len(A) and j < len(B):
        if A[i] < B[j]:
            merged.append(A[i])
            i += 1
        else:
            merged.append(B[j])
            j += 1

    if i == len(A):
        merged.extend(B[j:])
    else:
        merged.extend(A[i:])
    return merged

A = [6, 7, 3, 2, 9, 6, 7]
print(mergeSort(A))

class="example"> [2, 3, 6, 6, 7, 7, 9]

2.4.3 Quick Sort

Simple Quick Sort

def quick_sort(A, left, right):
    if left < right:
        mid = partition(A, left, right)
        print('Pivot Index: {}'.format(mid))
        print(A)
        quick_sort(A, left, mid)
        quick_sort(A, mid + 1, right)


def partition(A, left, right):
    print('Partition {} to {}'.format(left, right))
    pivot = A[right]
    print('Pivot value: {}'.format(pivot))
    i = left
    for j in range(left, right):
        if A[j] < pivot:
            A[i], A[j] = A[j], A[i]
            i += 1
    A[i], A[right] = A[right], A[i]
    return i - 1


A = [2, 8, 7, 1, 3, 5, 6, 4]
quick_sort(A, 0, len(A) - 1)
print(A)

A = [1]
quick_sort(A, 0, len(A) - 1)
print(A)

class="example"> Partition 0 to 7 Pivot value: 4 Pivot Index: 2 [2, 1, 3, 4, 7, 5, 6, 8] Partition 0 to 2 Pivot value: 3 Pivot Index: 1 [2, 1, 3, 4, 7, 5, 6, 8] Partition 0 to 1 Pivot value: 1 Pivot Index: -1 [1, 2, 3, 4, 7, 5, 6, 8] Partition 0 to 1 Pivot value: 2 Pivot Index: 0 [1, 2, 3, 4, 7, 5, 6, 8] Partition 3 to 7 Pivot value: 8 Pivot Index: 6 [1, 2, 3, 4, 7, 5, 6, 8] Partition 3 to 6 Pivot value: 6 Pivot Index: 4 [1, 2, 3, 4, 5, 6, 7, 8] Partition 3 to 4 Pivot value: 5 Pivot Index: 3 [1, 2, 3, 4, 5, 6, 7, 8] Partition 5 to 6 Pivot value: 7 Pivot Index: 5 [1, 2, 3, 4, 5, 6, 7, 8] [1, 2, 3, 4, 5, 6, 7, 8] [1]

Randomized quick sort

import random


def quick_sort(A, left, right):
    if left < right:
        mid = random_partition(A, left, right)
        print('Pivot Index: {}'.format(mid))
        print(A)
        quick_sort(A, left, mid)
        quick_sort(A, mid + 1, right)


def random_partition(A, left, right):
    pivot_index = random.randint(left, right)
    A[pivot_index], A[right] = A[right], A[pivot_index]
    return partition(A, left, right)


def partition(A, left, right):
    print('Partition {} to {}'.format(left, right))
    pivot = A[right]
    print('Pivot value: {}'.format(pivot))
    i = left
    for j in range(left, right):
        if A[j] < pivot:
            A[i], A[j] = A[j], A[i]
            i += 1
            A[i], A[right] = A[right], A[i]
    return i - 1


A = [2, 8, 7, 1, 3, 5, 6, 4]
quick_sort(A, 0, len(A) - 1)
print(A)

A = [1]
quick_sort(A, 0, len(A) - 1)
print(A)

class="example"> Partition 0 to 7 Pivot value: 3 Pivot Index: 1 [2, 1, 8, 3, 4, 5, 6, 7] Partition 0 to 1 Pivot value: 1 Pivot Index: -1 [2, 1, 8, 3, 4, 5, 6, 7] Partition 0 to 1 Pivot value: 2 Pivot Index: 0 [1, 2, 8, 3, 4, 5, 6, 7] Partition 2 to 7 Pivot value: 8 Pivot Index: 5 [1, 2, 7, 4, 5, 6, 3, 8] Partition 2 to 5 Pivot value: 6 Pivot Index: 3 [1, 2, 4, 5, 7, 6, 3, 8] Partition 2 to 3 Pivot value: 5 Pivot Index: 2 [1, 2, 4, 5, 7, 6, 3, 8] Partition 4 to 5 Pivot value: 7 Pivot Index: 4 [1, 2, 4, 5, 6, 7, 3, 8] Partition 6 to 7 Pivot value: 8 Pivot Index: 6 [1, 2, 4, 5, 6, 7, 3, 8] [1, 2, 4, 5, 6, 7, 3, 8] [1]

2.4.4 Heap Sort

Heap

2.4.5 Counting Sort

Count the times of each element, and put the element to the place of the counting number.

NOTE: the lower bound of comparison sort is Θ(n log n), counting sort is not a comparison sort. It can sort the array in linear time.
The time complexity of counting sort is O(n + k), given that the elements in the array are from 0 to k.
It is stable, but not in place. It takes O(n + k) memory.

def counting_sort(A, B, k):
    """Counting sort
    Sort A and put the sorted results into B. The elements in A are assumed
    from 0 to k
    """
    # Array to save counts
    C = [0 for i in range(k + 1)]
    for a in A:
        C[a] += 1
        print('Counted array: {}'.format(C))

    # Accumulate the counts
    for i in range(1, len(C)):
        C[i] += C[i - 1]
        print('Accumulated counts: {}'.format(C))

    # The reason to start from the end is to maintain the stability of the
    # sort, because the count number starts from the largest position
    for i in range(len(A) - 1, -1, -1):
        B[C[A[i]] - 1] = A[i]
        C[A[i]] -= 1
        print('Current sorted array B: {}'.format(B))
        print('Current counting array C: {}'.format(C))


A = [2, 5, 3, 0, 2, 3, 0, 3]
B = [-1 for i in range(8)]
counting_sort(A, B, 5)
print('Sorted array B: {}'.format(B))

class="example"> Counted array: [0, 0, 1, 0, 0, 0] Counted array: [0, 0, 1, 0, 0, 1] Counted array: [0, 0, 1, 1, 0, 1] Counted array: [1, 0, 1, 1, 0, 1] Counted array: [1, 0, 2, 1, 0, 1] Counted array: [1, 0, 2, 2, 0, 1] Counted array: [2, 0, 2, 2, 0, 1] Counted array: [2, 0, 2, 3, 0, 1] Accumulated counts: [2, 2, 2, 3, 0, 1] Accumulated counts: [2, 2, 4, 3, 0, 1] Accumulated counts: [2, 2, 4, 7, 0, 1] Accumulated counts: [2, 2, 4, 7, 7, 1] Accumulated counts: [2, 2, 4, 7, 7, 8] Current sorted array B: [-1, -1, -1, -1, -1, -1, 3, -1] Current counting array C: [2, 2, 4, 6, 7, 8] Current sorted array B: [-1, 0, -1, -1, -1, -1, 3, -1] Current counting array C: [1, 2, 4, 6, 7, 8] Current sorted array B: [-1, 0, -1, -1, -1, 3, 3, -1] Current counting array C: [1, 2, 4, 5, 7, 8] Current sorted array B: [-1, 0, -1, 2, -1, 3, 3, -1] Current counting array C: [1, 2, 3, 5, 7, 8] Current sorted array B: [0, 0, -1, 2, -1, 3, 3, -1] Current counting array C: [0, 2, 3, 5, 7, 8] Current sorted array B: [0, 0, -1, 2, 3, 3, 3, -1] Current counting array C: [0, 2, 3, 4, 7, 8] Current sorted array B: [0, 0, -1, 2, 3, 3, 3, 5] Current counting array C: [0, 2, 3, 4, 7, 7] Current sorted array B: [0, 0, 2, 2, 3, 3, 3, 5] Current counting array C: [0, 2, 2, 4, 7, 7] Sorted array B: [0, 0, 2, 2, 3, 3, 3, 5]

2.4.6 TODO Radix sort

Split the number into multiple digit, and perform STABLE sort for each each digit from smaller digit to larger digit.
Pseudo code: for i = 1 to d: perform stable sort to sort array A on digit i
If using counting sort as the underlying stable sorting algorithm, the time complexity is O(d(n + k))

def radix_sort(A, d):
    # TODO
    pass

2.4.7 Bucket Sort

2.4.8 Patience Sort

Very efficient algorithm to solve Longest Increasing Subsequence Problem. https://leetcode.com/problems/longest-increasing-subsequence/ https://www.cs.princeton.edu/courses/archive/spring13/cos423/lectures/LongestIncreasingSubsequence.pdf https://en.wikipedia.org/wiki/Patience_sorting

2.5 Maximum-subarray problem

Find the subarray that has the maximum sum.

def find_max_subarray(A):
    low, high, max_sum = helper_find_max_subarray(A, 0, len(A) - 1)
    return low, high, max_sum


def helper_find_max_subarray(A, low, high):
    if low == high:
        return low, high, A[low]

    mid = (low + high) // 2

    left_low, left_high, left_sum = helper_find_max_subarray(A, low, mid)
    right_low, right_high, right_sum = helper_find_max_subarray(A, mid + 1, high)
    cross_low, cross_high, cross_sum = helper_find_cross_max_subarray(A, low, mid, high)
    if left_sum >= right_sum and left_sum >= cross_sum:
        return left_low, left_high, left_sum
    elif right_sum >= left_sum and right_sum >= cross_sum:
        return right_low, right_high, right_sum
    else:
        return cross_low, cross_high, cross_sum


def helper_find_cross_max_subarray(A, low, mid, high):
    left_sum = A[mid]
    left_low = mid
    tmp_sum = A[mid]
    i = mid - 1
    while i >= low:
        tmp_sum += A[i]
        if tmp_sum > left_sum:
            left_sum = tmp_sum
            left_low = i
        i -= 1

    right_sum = A[mid + 1]
    right_high = mid + 1
    tmp_sum = A[mid + 1]
    i = mid + 1
    while i <= high:
        tmp_sum += A[i]
        if tmp_sum > right_sum:
            right_sum = tmp_sum
            right_high = i
        i += 1

    return left_low, right_high, left_sum + right_sum

A = [13, -3, -25, 20, -3, -16, -23, 18, 20, -7, 12, -5, -22, 15, -4, 7]
low, high, max_sum = find_max_subarray(A)
print(low)
print(high)
print(max_sum)


A = [13, -3]
low, high, max_sum = find_max_subarray(A)
print(low)
print(high)
print(max_sum)

class="example"> 7 10 63 0 0 13

2.6 Stack

class Stack:
    def __init__(self, size):
        self.size = size
        self._array = [0 for i in range(self.size)]
        self.top = -1

    def isempty(self):
        if self.top < 0:
            return True
        else:
            return False

    def push(self, x):
        if self.top == self.size - 1:
            raise Exception('Stack overflow!')
        self.top += 1
        self._array[self.top] = x

    def pop(self):
        if self.top < 0:
            raise Exception('Stack underflow!')
        self.top -= 1
        return self._array[self.top + 1]


s = Stack(3)
s.push(3)
s.push(9)
s.push(4)
print(s.pop())
print(s.pop())
s.push(5)
print(s.pop())
print(s.pop())

class="example"> 4 9 5 3

2.7 Queue

NOTE: use a array of n + 1 length to represent a queue of size n.

Tail is the pointer pointing to the position to enqueue new element.

class Queue:
    def __init__(self, size):
        """
        Use a (n + 1) length array to implement a queue of size n
        """
        self.size = size
        self.head = 0
        self.tail = 0
        self._array = [-1 for i in range(self.size + 1)]

    def enqueue(self, x):
        if self.head == self.tail + 1 or (self.head == 0
                                          and self.tail == self.size):
            raise Exception("Queue overflow!")

        self._array[self.tail] = x
        if self.tail == self.size:
            self.tail = 0
        else:
            self.tail += 1
            print("Head {}".format(self.head))
            print("Tail {}".format(self.tail))
            print(self._array)

    def dequeue(self):
        if self.head == self.tail:
            raise Exception("Queue underflow!")

        val = self._array[self.head]
        if self.head == self.size:
            self.head = 0
        else:
            self.head += 1
            print(self._array)
            print("Head {}".format(self.head))
            print("Tail {}".format(self.tail))
        return val


q = Queue(5)
q.enqueue(1)
q.enqueue(2)
q.enqueue(3)
q.enqueue(4)
q.enqueue(5)
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
print(q.dequeue())
q.enqueue(9)
print(q.dequeue())
# print(q.dequeue())

class="example"> Head 0 Tail 1 [1, -1, -1, -1, -1, -1] Head 0 Tail 2 [1, 2, -1, -1, -1, -1] Head 0 Tail 3 [1, 2, 3, -1, -1, -1] Head 0 Tail 4 [1, 2, 3, 4, -1, -1] Head 0 Tail 5 [1, 2, 3, 4, 5, -1] [1, 2, 3, 4, 5, -1] Head 1 Tail 5 1 [1, 2, 3, 4, 5, -1] Head 2 Tail 5 2 [1, 2, 3, 4, 5, -1] Head 3 Tail 5 3 [1, 2, 3, 4, 5, -1] Head 4 Tail 5 4 [1, 2, 3, 4, 5, -1] Head 5 Tail 5 5 9

2.8 Binary Search

py file

This is a less cleaner version, but its advantage is that left pointer and right pointer finally converges to the same position. Sometimes this is handy when you trying to locate certain position in the end of the loop.

def binarySearch(nums, target):
    if not nums:
        return -1

    left = 0
    right = len(nums) - 1

    while left < right:
        mid = left + (right - left) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            # mid could be equal to left, under this situation, left should be
            # mid + 1 to force the search on the right hand side. To test this,
            # use a test case of [0, 1], search 1
            left = mid + 1
        else:
            right = mid

    return left if nums[left] == target else -1

nums = [1, 4, 6, 9]
target = -1
print(binarySearch(nums, target))

nums = [1, 4, 6, 9]
target = 4
print(binarySearch(nums, target))

nums = [0, 1]
target = 1
print(binarySearch(nums, target))

class="example"> -1 1 1

NOTE: This version of binary search terminates when r == l - 1, which might not be suitable for certain situation when we are trying to locate certain value in the end of the loop.

def binarySearch1(nums, target):
    l = 0
    r = len(nums) - 1
    while l <= r:
        mid = l + (r - l) // 2
        if nums[mid] == target:
            return mid
        elif nums[mid] < target:
            l = mid + 1
        else:
            r = mid - 1
    return -1

nums = [1, 4, 6, 9]
target = 6
print(binarySearch1(nums, target))

nums = [0]
target = 6
print(binarySearch1(nums, target))

nums = [0]
target = 6
print(binarySearch1(nums, target))

class="example"> None

2.9 Linked List

Doubly linked list: O(n) time for search O(1) time for insert and delete
- Singly linked list

2.9.1 Singly linked list

Search takes O(n) time
Insert take O(1) time

Delete takes O(n) time

class Node:
    def __init__(self, key):
        self.key = key
        self.next = None

    def __str__(self):
        return 'Node({})'.format(self.key)


class SinglyLinkedList:
    def __init__(self, head):
        self.head = head

    @classmethod
    def fromList(cls, arr):
        head = Node(arr[0])
        curr = head
        for k in arr[1:]:
            curr.next = Node(k)
            curr = curr.next
        return cls(head)

    def search(self, k):
        curr = self.head
        while curr and curr.key != k:
            curr = curr.next
        return curr

    def insert(self, node):
        node.next = self.head
        self.head = node

    def delete(self, k):
        prev = None
        curr = self.head
        while curr and curr.key != k:
            prev = curr
            curr = curr.next

        if not curr:
            raise Exception('The key {} does not exist in the list.'.format(k))

        if prev:
            prev.next = curr.next
        else:
            self.head = curr.next
            curr.next = None

    def __str__(self):
        s = ""
        curr = self.head
        while curr:
            if s != "":
                s += " -> "
            s += str(curr.key)
            curr = curr.next
        return s


lst = SinglyLinkedList.fromList([1, 2, 3, 4, 5])
print(lst)
print(lst.search(2))
lst.delete(3)
print(lst)
lst.insert(Node(9))
print(lst)
print(lst.search(3))
lst.delete(1)
print(lst)

class="example"> 1 -> 2 -> 3 -> 4 -> 5 Node(2) 1 -> 2 -> 4 -> 5 9 -> 1 -> 2 -> 4 -> 5 None 9 -> 2 -> 4 -> 5

2.9.2 Doubly linked list

class Node:
    def __init__(self, key):
        self.key = key
        self.next = None
        self.prev = None

    def __str__(self):
        return 'Node({})'.format(self.key)


class DoublyLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None

    def insert(self, node):
        node.next = self.head
        node.prev = None
        if self.head:
            self.head.prev = node

        if not self.tail:
            self.tail = node

        self.head = node

    def search(self, key):
        curr = self.head
        while curr and curr.key != key:
            curr = curr.next
        return curr

    def delete(self, node):
        if node.prev:
            node.prev.next = node.next
        else:
            self.head = node.next

        if node.next:
            node.next.prev = node.prev
        else:
            self.tail = node.prev

    def __str__(self):
        s = ""
        curr = self.head
        while curr:
            if s != "":
                s += " <-> "
            s += str(curr.key)
            curr = curr.next
        return s


lst = DoublyLinkedList()
lst.insert(Node(1))
lst.insert(Node(2))
lst.insert(Node(3))
lst.insert(Node(4))
print(lst)

lst.delete(lst.search(1))
print(lst)

lst.delete(lst.search(4))
print(lst.tail)
print(lst.head)
print(lst)

class="example"> 4 <-> 3 <-> 2 <-> 1 4 <-> 3 <-> 2 Node(2) Node(3) 3 <-> 2

2.9.3 Reverse linked list

class Node:
    def __init__(self, val):
        self.val = val
        self.next = None


def print_list(head):
    cur = head
    while cur:
        print(cur.val)
        cur = cur.next


def reverse_linked_list(head):
    cur = head
    prev = None
    while cur:
        next = cur.next
        cur.next = prev
        prev = cur
        cur = next
    return prev


head = Node(1)
head.next = Node(2)
head.next.next = Node(3)
head.next.next.next = Node(4)

print("Before revesing:")
print_list(head)

rhead = reverse_linked_list(head)
print("After revesing:")
print_list(rhead)

print("Edge Case when head is None")
print_list(reverse_linked_list(None))

print("Edge Case when head is orphan")
print_list(reverse_linked_list(Node(1)))

class="example"> Before revesing: 1 2 3 4 After revesing: 4 3 2 1 Edge Case when head is None Edge Case when head is orphan 1

2.10 Hash table

Universe of keys is large. The set of keys actually stored may be small relative to the universe of keys. Consider we maintain a table with m slots, and we have a hash function h: U -> {0, 1, …, m - 1}. Given a key k, we locate T[h[k]]. If there is key collision, which means h[k1] == h[k2]. Then we chain the corresponding values as a linked list.

If a hash table uses m slots to store n values, we define the load factor α = n/m. Then the average search time is O(1 + α) assuming a uniform hashing function.

It is often desired to have keys that are relatively close to map to hash values that are far apart (especially for linear probing).

class Node:
    def __init__(self, key):
        self.key = key
        self.next = None
        self.prev = None

    def __str__(self):
        return 'Node({})'.format(self.key)


class DoublyLinkedList:
    def __init__(self):
        self.head = None
        self.tail = None

    def insert(self, node):
        node.next = self.head
        node.prev = None
        if self.head:
            self.head.prev = node

        if not self.tail:
            self.tail = node

        self.head = node

    def search(self, key):
        curr = self.head
        while curr and curr.key != key:
            curr = curr.next
        return curr

    def delete(self, node):
        if node.prev:
            node.prev.next = node.next
        else:
            self.head = node.next

        if node.next:
            node.next.prev = node.prev
        else:
            self.tail = node.prev

    def __str__(self):
        s = ""
        curr = self.head
        while curr:
            if s != "":
                s += " <-> "
            s += str(curr.key)
            curr = curr.next
        return s


def divide_hash_fun(val, m):
    return val % m


class HashMap:
    def __init__(self, size, hash_fun=divide_hash_fun):
        self.size = size
        self.hash_fun = hash_fun
        self._slots = [DoublyLinkedList() for i in range(self.size)]

    def insert(self, node):
        self._slots[self.hash_fun(node.key, self.size)].insert(node)

    def search(self, key):
        return self._slots[self.hash_fun(key, self.size)].search(key)

    def delete(self, node):
        self._slots[self.hash_fun(node.key, self.size)].delete(node)


map = HashMap(7)
map.insert(Node(199))
map.insert(Node(10))
map.insert(Node(30))
map.insert(Node(50))
print(map.search(10))
map.delete(map.search(10))
print(map.search(10))

class="example"> Node(10) None

Universal hashing: select hash function randomly in a way that is independent of the keys.
Open addressing: all elements occupy the hash table itself. No linked list. When searching for an element, we systematically examine table slots. The load factor α = n/m is always less than 1.
- Linear probing
- Quadratic probing
- Double Hashing
Perfect hashing: O(1) search time, 2 levels of hash table. Similar to chained hash table, instead of using linked list, we use a small secondary hash table.
- We can guarantee there is no collision at the secondary level.

2.11 TODO Segment Tree

Range Sum Query Problem
- Given an array [-2, 0, 3, -5, 2, -1], support two operations, update and querySumInRange.
  - Naive solution: directly update the i th value. Sum up values one by one when doing the query. Update takes O(1) time, query takes O(n) time.
    - This might be good enough when update happens a lot, but query happens less frequently.
  - Precompute the cumulated sum, then query just need to do cumSum[j] - cumSum[i - 1]. Update takes O(n) time, query takes O(1) time.
    - This is good, if update happens much less frequently than query.
  - Segment Tree solution. Update and query both take O(logn) time.

2.12 Types of binary tree

Full binary tree, every node has 0 or 2 children
Complete binary tree, the tree is completely filled except the last right most node. Think about binary heap.
Perfect binary tree. All the interior nodes have two children, and all leaves have the same depth.

2.13 BFS

py file BFS guarantees the SHORTEST path from the source node to target node. NOTE: use the color to differentiate nodes!!!! Same as DFS. Time complexity O(V + E), given a graph G(V, E), with V vertices, and E edges.

from collections import deque


class Vertex:
    def __init__(self, val, color='white', distance=-1, parent=None):
        self.val = val
        self.color = color
        self.dist = distance
        self.parent = parent

    def __str__(self):
        return "Val: {}, Color: {}, Distance: {}".format(
            self.val, self.color, self.dist)


class Graph:
    def __init__(self, adj_list):
        self.adj_list = adj_list
        self._reset_vertices()

    def _reset_vertices(self):
        self.vertices = {n: Vertex(n) for n in self.adj_list}

    def bfs(self, s):
        self._reset_vertices()
        v_s = self.vertices[s]
        v_s.color = 'grey'
        v_s.dist = 0

        queue = deque()
        queue.append(v_s)
        while queue:
            cur = queue.popleft()
            cur.color = 'black'
            print(cur)
            for n in self.adj_list[cur.val]:
                v = self.vertices[n]
                if v.color == 'white':
                    queue.append(v)
                    v.color = 'grey'
                    v.dist = cur.dist + 1
                    v.parent = cur


# Undirected Graph
adj_list = {1: [2, 5],
            2: [1, 3, 4],
            3: [2, 4],
            4: [2, 3, 5],
            5: [4, 1]}
g = Graph(adj_list)
g.bfs(1)

print('-'*50)
# Directed cyclic Graph
adj_list = {1: [2],
            2: [4],
            3: [2],
            4: [3, 5],
            5: [1]}
g = Graph(adj_list)
g.bfs(1)

print('-'*50)
# Directed acyclic Graph
adj_list = {1: [2, 5],
            2: [3, 4],
            3: [],
            4: [3, 5],
            5: []}
g = Graph(adj_list)
g.bfs(1)

print('-'*50)
# Directed acyclic Graph
adj_list = {1: [2, 5],
            2: [3, 4],
            3: [],
            4: [3, 5],
            5: []}
g = Graph(adj_list)
g.bfs(4)

class="example"> Val: 1, Color: black, Distance: 0 Val: 2, Color: black, Distance: 1 Val: 5, Color: black, Distance: 1 Val: 3, Color: black, Distance: 2 Val: 4, Color: black, Distance: 2 -------------------------------------------------- Val: 1, Color: black, Distance: 0 Val: 2, Color: black, Distance: 1 Val: 4, Color: black, Distance: 2 Val: 3, Color: black, Distance: 3 Val: 5, Color: black, Distance: 3 -------------------------------------------------- Val: 1, Color: black, Distance: 0 Val: 2, Color: black, Distance: 1 Val: 5, Color: black, Distance: 1 Val: 3, Color: black, Distance: 2 Val: 4, Color: black, Distance: 2 -------------------------------------------------- Val: 4, Color: black, Distance: 0 Val: 3, Color: black, Distance: 1 Val: 5, Color: black, Distance: 1

2.14 DFS

py file Parenthesis theorem: Given two vertices in the graph u and v, u.d denotes the discover time of u, u.f denotes the finish time of u.

If the interval [u.d, u.f] and [v.d, v.f] are entirely disjoint, and neither u nor v is a descendant of the other in the depth-first forest.
If the interval [u.d, u.f] is contained in [v.d, v.f], the u is a descendant of v.

If the interval [v.d, v.f] is contained in [u.d, u.f], the v is a descendant of v.

Classification of edges:

Tree edges, edges in the depth first forest. Edge (u, v) is a tree edge if v was first discovered by exploring (u, v).
Back edges, edges (u, v) connecting a vertex u with its ancestor v in a depth-first tree. Including self loops in the directed graph, where a vertex is connecting with itself.
Forward edges, nontree edges (u, v) connecting a vertex u to a descendant v in a depth-first tree.

Cross edges, all the other edges.

DFS has enough information to classify some of the edges it encounters. When we first explore an edge (u, v), the color of v tells us something about the edge

WHITE indicates a tree edge
GRAY indicates a back edge. NOTE: this is critical to determine whether the graph is cyclic or not.
BLACK indicates a forward or cross edge.

For undirected graph, every edge is either a tree edge or a back edge.

from datetime import datetime


time = 0  # global timer


class Vertex:
    def __init__(self, val, color='white', parent=None):
        self.val = val
        self.color = color
        self.parent = parent

    def __str__(self):
        return "(Val: {}, Color: {})".format(self.val, self.color)


class Graph:
    def __init__(self):
        self.adj = {}
        self.nodes = {}

    def addNode(self, val):
        node = Vertex(val)
        self.nodes[val] = node
        self.adj[node] = []

    def addMultiNodes(self, val_list):
        for v in val_list:
            self.addNode(v)

    def addEdge(self, u, v):
        u_node = self.nodes[u]
        v_node = self.nodes[v]
        self.adj[u_node].append(v_node)

    def addMultiEdges(self, edge_list):
        for e in edge_list:
            self.addEdge(e[0], e[1])

    def dfs(self):
        global time
        time = 0
        for u in self.nodes.values():
            if u.color == 'white':
                self.dfs_visit(u)

    def dfs_visit(self, u):
        global time
        time += 1
        u.color = 'gray'
        print('At time {}, Discovered node: {}'.format(time, u))
        for v in self.adj[u]:
            if v.color == 'white':
                self.dfs_visit(v)
                u.color = 'black'
                time += 1
                print('At time {}, Finish visiting node: {}'.format(time, u))


g = Graph()
g.addMultiNodes(['s', 'z', 'y', 'x', 'w', 't', 'v', 'u'])
g.addMultiEdges([['s', 'z'],
                 ['z', 'y'],
                 ['y', 'x'],
                 ['z', 'w'],
                 ['w', 'x'],
                 ['x', 'z'],
                 ['s', 'w'],
                 ['v', 's'],
                 ['v', 'w'],
                 ['t', 'v'],
                 ['t', 'u'],
                 ['u', 'v']])
g.dfs()

class="example"> At time 1, Discovered node: (Val: s, Color: gray) At time 2, Discovered node: (Val: z, Color: gray) At time 3, Discovered node: (Val: y, Color: gray) At time 4, Discovered node: (Val: x, Color: gray) At time 5, Finish visiting node: (Val: y, Color: black) At time 6, Finish visiting node: (Val: z, Color: black) At time 7, Discovered node: (Val: w, Color: gray) At time 8, Finish visiting node: (Val: z, Color: black) At time 9, Finish visiting node: (Val: s, Color: black) At time 10, Discovered node: (Val: u, Color: gray) At time 11, Discovered node: (Val: v, Color: gray) At time 12, Finish visiting node: (Val: u, Color: black) At time 13, Discovered node: (Val: t, Color: gray)

The results shows the parenthesis theorem clearly: (s (z (y (x x) y) (w w) z) s) (t (v v) (u u) t).

2.15 Topological Sort

Topological sort can only be performed in a directed acyclic graph (DAG). A topological sort of a dag G = (V, E) is a linear ordering of all its vertices such that if G contains an edge (u, v), the u appears before v in the ordering.

Topological sort pseudo code:

call DFS(G)
as each vertex is finished, insert it in front of a linked list

return the linked list

import logging
from collections import deque

logger = logging.getLogger(__name__)


class Vertex:
    def __init__(self, val, color='white', parent=None):
        self.val = val
        self.color = color
        self.parent = parent

    def __str__(self):
        return "Val: {}, Color: {}".format(self.val, self.color)


class Graph:
    def __init__(self):
        self.adj = {}
        self.nodes = {}

    def addNode(self, val):
        node = Vertex(val)
        self.nodes[val] = node
        self.adj[node] = []

    def addMultiNodes(self, val_list):
        for v in val_list:
            self.addNode(v)

    def addEdge(self, u, v):
        u_node = self.nodes[u]
        v_node = self.nodes[v]
        self.adj[u_node].append(v_node)

    def addMultiEdges(self, edge_list):
        for e in edge_list:
            self.addEdge(e[0], e[1])

    def dfs(self):
        self._reset_nodes()
        for u in self.nodes.values():
            if u.color == 'white':
                self.dfs_visit(u)

    def dfs_visit(self, u):
        u.color = 'gray'
        logger.info('Discovered node: {}'.format(u))
        for v in self.adj[u]:
            if v.color == 'white':
                self.parent = u
                self.dfs_visit(v)
                u.color = 'black'
                logger.info('Finish visiting node: {}'.format(u))

    def _reset_nodes(self):
        for u in self.nodes.values():
            u.color = 'white'
            u.parent = None

    def isDAG(self):
        self._reset_nodes()
        for u in self.nodes.values():
            if u.color == 'white' and not self.isDAGTree(u):
                return False
        return True

    def isDAGTree(self, u):
        u.color = 'gray'
        for v in self.adj[u]:
            logger.info('Discovered node: {}'.format(v))
            if v.color == 'gray':
                return False
            elif v.color == 'white' and not self.isDAGTree(v):
                return False
            u.color = 'black'
        return True

    def topo_sort(self):
        self._reset_nodes()
        sorted_list = deque()
        for u in self.nodes.values():
            if u.color == 'white':
                self.topo_sort_helper(u, sorted_list)
        return sorted_list

    def topo_sort_helper(self, u, sorted_list):
        u.color = 'gray'
        for v in self.adj[u]:
            if v.color == 'white':
                v.parent = u
                self.topo_sort_helper(v, sorted_list)
            elif v.color == 'gray':
                ValueError(
                    'The graph has cycle. Cannot perform topological sort!')
                u.color = 'black'
                sorted_list.appendleft(u.val)


g = Graph()
g.addMultiNodes([
    'undershorts', 'pants', 'belt', 'shirt', 'tie', 'tie', 'jacket', 'socks',
    'shoes', 'watch'
])

g.addMultiEdges(
    [['undershorts', 'pants'], ['pants', 'belt'], ['belt', 'jacket'],
     ['shirt', 'belt'], ['shirt', 'tie'], ['tie', 'jacket'],
     ['undershorts', 'shoes'], ['pants', 'shoes'], ['socks', 'shoes']])
print(g.topo_sort())

class="example"> deque(['undershorts', 'pants', 'pants', 'socks', 'tie'])

Here is a simpler version. Good as an interview template.

from collections import deque


def buildDirectedGraph(numNodes, edges):
    graph = {i: set() for i in range(numNodes)}
    for u, v in edges:
        graph[u].add(v)

    return graph


def topo_sort(graph):
    def dfs(graph, node, visited, discovered, res):
        if node in visited:
            return

        if node in discovered:
            raise Exception(
                'The graph has a cycle. It is not topologically sortable.')

        discovered.add(node)
        for nei in graph[node]:
            dfs(graph, nei, visited, discovered, res)

        discovered.remove(node)
        visited.add(node)
        res.appendleft(node)

    res = deque()
    visited = set()  # nodes that are finished visiting
    discovered = set()  # nodes that are discovered by dfs but not yet finished
    for node in graph:
        dfs(graph, node, visited, discovered, res)
    return res


# graph = buildDirectedGraph(2, [[1, 0], [0, 1]])
# res = topo_sort(graph)

graph = buildDirectedGraph(4, [[1, 0], [2, 0], [3, 1], [3, 2]])
res = topo_sort(graph)
print(res)

class="example"> deque([3, 2, 1, 0])

Kahn's algorithm, something like BFS, queue is used.

Create the indegree map of the graph
Append the nodes with 0 indegree to the queue
while the queue is not empty, dequeue and get one node, count += 1, stack append the node. for all its neighbors, decrease the indegree by 1, and append the neighbors whose indegree is 0.

if count != the total number of nodes, then the graph is not DAG. Otherwise return the stack. The stack is topologically sorted.

from collections import deque


def buildDirectedGraph(numNodes, edges):
    graph = {i: set() for i in range(numNodes)}
    for u, v in edges:
        graph[u].add(v)
    return graph


def topo_sort(graph):
    indegree = [0] * len(graph)
    for neighbors in graph.values():
        for nei in neighbors:
            indegree[nei] += 1

    q = deque([n for n in graph if indegree[n] == 0])
    res = []
    while q:
        node = q.popleft()
        res.append(node)
        for nei in graph[node]:
            indegree[nei] -= 1
            if indegree[nei] == 0:
                q.append(nei)

    if len(res) < len(graph):
        raise Exception('The graph has a cycle. It is not topologically sortable.')
    return res


# graph = buildDirectedGraph(2, [[1, 0], [0, 1]])
# res = topo_sort(graph)

graph = buildDirectedGraph(4, [[1, 0], [2, 0], [3, 1], [3, 2]])
res = topo_sort(graph)
print(res)

class="example"> [3, 1, 2, 0]

2.16 Strongly connected components

In a strongly connected component, every pair of vertices are reachable from each other.

call DFS(G) to compute the finishing times for each vertex
compute G^T, G transpose (reverse every edge)
call DFS(G^T), but in the main loop of DFS, consider vertices in order of decreasing finishing time u.f obtained in the first step.
output the vertices of each tree in the depth-first forest obtained in step 3

2.17 Binary Search Tree

If node y is in the left subtree of x, then y.key <= x.key.
If node y is in the right subtree of x, then y.key >= x.key.
Inorder tree walk: print left tree -> print the node itself -> print the right tree
Preorder tree walk: print the root -> print the subtrees

Post order tree walk: print the subtrees -> print the root

from collections import deque


class Node:
    def __init__(self, x):
        self.key = x
        self.left = None
        self.right = None
        self.parent = None

    def __str__(self):
        return "Node({})".format(self.key)


class BinarySearchTree:
    def __init__(self):
        self.root = None

    def inorder_tree_walk(self):
        self._inorder_tree_walk_helper(self.root)

    def _inorder_tree_walk_helper(self, node):
        if node:
            self._inorder_tree_walk_helper(node.left)
            print(node.key)
            self._inorder_tree_walk_helper(node.right)

    def search(self, key):
        return self._search_helper(self.root, key)

    def _search_helper(self, node, key):
        if not node or node.key == key:
            return node

        if key < node.key:
            return self._search_helper(node.left, key)
        else:
            return self._search_helper(node.right, key)

    def iterative_search(self, key):
        curr = self.root
        while curr and curr.key != key:
            if key < curr.key:
                curr = curr.left
            else:
                curr = curr.right
        return curr

    def maximum(self):
        return self._maximum(self.root)

    def _maximum(self, node):
        curr = node
        while curr.right:
            curr = curr.right
        return curr

    def minimum(self):
        return self._minimum(self.root)

    def _minimum(self, node):
        curr = node
        while curr.left:
            curr = curr.left
        return curr

    def insert(self, node):
        p = None
        curr = self.root
        while curr:
            p = curr
            if node.key < curr.key:
                curr = curr.left
            else:
                curr = curr.right
                node.parent = p
        if p:
            if node.key < p.key:
                p.left = node
            else:
                p.right = node
        else:
            self.root = node

    def successor(self, node):
        """
        Find the successor of the sorted array in the inorder traversal
        """
        if node.right:
            return self._minimum(node.right)

        parent_node = node.parent
        while parent_node and parent_node.left != node:
            node = parent_node
            parent_node = node.parent
        return parent_node

    def predecessor(self, node):
        if node.left:
            return self._maximum(node.left)

        parent_node = node.parent
        while parent_node and parent_node.right != node:
            node = parent_node
            parent_node = node.parent
        return parent_node

    def transplant(self, u, v):
        """Replace the subtree rooted at u with the subtree rooted at v
        NOTE: u must not be None, but v can be None
        """
        if not u.parent:
            self.root = v
        elif u.parent.left == u:
            u.parent.left = v
        else:
            u.parent.right = v

        if v:
            v.parent = u.parent

    def delete(self, node):
        """Delete the node, assuming node cannot be None
        """
        if not node.left:
            self.transplant(node, node.right)
        elif not node.right:
            self.transplant(node, node.left)
        else:
            suc_node = self.successor(node)
            if suc_node.parent != node:
                self.transplant(suc_node, suc_node.right)
                suc_node.right = node.right
                suc_node.right.parent = suc_node
            self.transplant(node, suc_node)
            suc_node.left = node.left
            suc_node.left.parent = suc_node

    def _print(self, node, indent):
        s = ""
        if node:
            s = indent + str(node.key) + "\n"
            if node.left:
                s += self._print(node.left, indent + "|---")
            if node.right:
                s += self._print(node.right, indent + "|---")
        return s

    def __str__(self):
        if not self.root:
            return ""
        s = ""
        curr_level = []
        curr_level.append(self.root)
        while curr_level:
            next_level = []
            for n in curr_level:
                s += str(n.key) + " "
                if n.left:
                    next_level.append(n.left)
                if n.right:
                    next_level.append(n.right)
            s += "\n"
            curr_level = next_level
        return s


t = BinarySearchTree()
t.insert(Node(12))
t.insert(Node(5))
t.insert(Node(2))
t.insert(Node(9))
t.insert(Node(18))
t.insert(Node(15))
t.insert(Node(19))

t.inorder_tree_walk()
print(t.search(9))
print(t.iterative_search(9))
print(t.minimum())
print(t.maximum())
print(t.successor(t.search(9)))
print(t.predecessor(t.search(9)))
print(t)
t.delete(t.search(12))
print(t)

class="example"> 2 5 9 12 15 18 19 Node(9) Node(9) Node(2) Node(19) None Node(5) 12 5 18 2 9 15 19 15 5 2 9

Randomly built binary search tree: each of the n! permutations of the input keys is equally likely (it is different from insert the n keys equally likely each time).

2.18 Red-Black Tree

Red-Black Properties:

Every node is either red or black.
The root is black.
Every leaf (NIL) is black.
If a node is red, then both its children are black.

For each node, all simple paths from the node to descendant leaves contain the same number of black nodes.

Black-height of node x: the number of black nodes from the node x (not including x) to a leaf.

Lemma: A red-black tree with n internal nodes has height at most 2lg(n+1).

Rotation:

After Right-Rotate is performed,

class Node:
    BLACK = "BLACK"
    RED = "RED"

    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None
        self.color = self.BLACK
        self.parent = None

    def __str__(self):
        return "Node(key={}, color={})".format(self.key, self.color)


class RedBlackTree:
    def __init__(self):
        self.nil = Node(None)
        self.root = self.nil

    def left_rotate(self, x):
        # Assuming x.right is not self.nil
        y = x.right             # set y
        x.right = y.left        # turn y's left subtree into x's right subtree
        if y.left != self.nil:  # no need to link y.left's parent if it is a sentinel
            y.left.parent = x
        y.parent = x.parent     # link x's parent to y
        if x.parent == self.nil:
            self.root = y
        elif x == x.parent.left:
            x.parent.left = y
        else:
            x.parent.right = y
        y.left = x              # put x into y's left
        x.parent = y

    def right_rotate(self, x):
        # Assuming x.left is not self.nil, symmetric to left_rotate
        y = x.left

        x.left = y.right
        if y.right != self.nil:
            y.right.parent = x

        y.parent = x.parent
        if x.parent == self.nil:
            self.root = y
        elif x == x.parent.left:
            x.parent.left = y
        else:
            x.parent.right = y

        y.right = x
        x.parent = y

    def insert(self, z):
        prev = self.nil
        curr = self.root
        while curr != self.nil:
            prev = curr
            if curr.key > z.key:
                curr = curr.left
            else:
                curr = curr.right

        if prev == self.nil:
            self.root = z
            z.parent = self.nil
        elif prev.key > z.key:
            prev.left = z
        else:
            prev.right = z
        z.parent = prev
        z.left = self.nil
        z.right = self.nil
        z.color = Node.RED
        self.insert_fixup(z)

    def insert_fixup(self, z):
        while z.parent.color == Node.RED:
            # z.parent.parent is guaranteed to exist, because the root is
            # black. If a z.parent is red then it is not the root, so it must
            # have parent.
            if z.parent == z.parent.parent.left:
                y = z.parent.parent.right  # uncle node
                if y.color == Node.RED:
                    z.parent.color = Node.BLACK
                    y.color = Node.BLACK
                    z.parent.parent.color = Node.RED
                    z = z.parent.parent
                else:
                    if z == z.parent.right:
                        z = z.parent
                        self.left_rotate(z)
                    z.parent.color = Node.BLACK
                    z.parent.parent.color = Node.RED
                    self.right_rotate(z.parent.parent)
            else:
                y = z.parent.parent.left
                if y.color == Node.RED:
                    z.parent.color = Node.BLACK
                    y.color = Node.BLACK
                    z.parent.parent.color = Node.RED
                    z = z.parent.parent
                else:
                    if z == z.parent.left:
                        z = z.parent
                        self.right_rotate(z)
                    z.parent.color = Node.BLACK
                    z.parent.parent.color = Node.RED
                    self.left_rotate(z.parent.parent)
        self.root.color = Node.BLACK

    def transplant(self, u, v):
        if u.parent == self.nil:
            self.root = v
        elif u == u.parent.left:
            u.parent.left = v
        else:
            u.parent.right = v
        v.parent = u.parent

    def minimum(self, x):
        curr = x
        while curr.left != self.nil:
            curr = curr.left
        return curr

    def delete(self, z):
        y = z  # y is either the node to be removed, or the node that replaces z
        y_original_color = y.color
        if z.left == self.nil:
            x = z.right          # x moves into y's original position
            self.transplant(z, z.right)
        elif z.right == self.nil:
            x = z.left
            self.transplant(z, z.left)
        else:
            y = self.minimum(z.right)
            y_original_color = y.color
            x = y.right
            if y.parent == z:
                x.parent = y     # TODO: why do we need this?????
            else:
                self.transplant(y, y.right)
                y.right = z.right
            self.transplant(z, y)
            y.left = z.left
            y.left.parent = y
            y.color = z.color

        if y_original_color == Node.BLACK:
            self.delete_fixup(x)

    def delete_fixup(self, x):
        while x != self.root and x.color == Node.BLACK:
            # x must be a "doubly-black" node in this loop
            if x == x.parent.left:
                w = x.parent.right   # the sibling of x
                if w.color == Node.RED:
                    # convert to case 2 or 3 or 4 by rotation
                    w.color = Node.BLACK         # case 1
                    x.parent.color = Node.RED    # case 1
                    self.left_rotate(x.parent)   # case 1
                    w = x.parent.right           # case 1
                if w.left.color == Node.BLACK and w.right.color == Node.BLACK:
                    w.color = Node.RED           # case 2
                    x = x.parent                 # case 2
                else:
                    if w.right.color == Node.BLACK:
                        w.color = Node.RED         # case 3
                        w.left.color = Node.BLACK  # case 3
                        self.right_rotate(w)       # case 3
                        w = x.parent.right         # case 3
                    w.color = x.parent.color       # case 4
                    x.parent.color = Node.BLACK    # case 4
                    w.right.color = Node.BLACK     # case 4
                    self.left_rotate(x.parent)     # case 4
                    x = self.root                  # case 4, terminate the loop
            else:
                # Symmetric to left node case
                w = x.parent.left
                if w.color == Node.RED:
                    w.color = Node.BLACK
                    x.parent.color = Node.RED
                    self.right_rotate(x.parent)
                    w = x.parent.left
                if w.left.color == Node.BLACK and w.right.color == Node.BLACK:
                    w.color = Node.RED
                    x = x.parent
                else:
                    if w.left.color == Node.BLACK:
                        w.color = Node.RED
                        w.right.color = Node.BLACK
                        self.left_rotate(w)
                    w.color = x.parent.color
                    x.parent.color = Node.BLACK
                    w.left.color = Node.BLACK
                    self.right_rotate(x.parent)
                    x = self.root

        # Outside the loop, x is either the root, or a "red-and-black" node
        # If x is the root, then root must be BLACK.
        # If x is a "red-and-black" node, then color it as black, because it
        # will have a RED child in case 2, and it might have RED parent.
        x.color = Node.BLACK

    def search(self, key):
        curr = self.root
        while curr:
            if curr.key == key:
                return curr
            elif curr.key < key:
                curr = curr.right
            else:
                curr = curr.left
        return None

    def pprint(self, node, indent):
        if node.parent == self.nil:
            s = "{}{} ({})".format(indent, node.key, node.color)
        else:
            side = "l" if node == node.parent.left else "r"
            s = "{}{}: {} ({})".format(indent, side, node.key, node.color)

        if node.left != self.nil:
            s += "\n" + self.pprint(node.left, "|---" + indent)

        if node.right != self.nil:
            s += "\n" + self.pprint(node.right, "|---" + indent)
        return s

    def __str__(self):
        return self.pprint(self.root, "") + "\n"


t = RedBlackTree()
t.insert(Node(11))
print(t)
t.insert(Node(3))
print(t)
t.insert(Node(5))
print(t)
t.insert(Node(20))
print(t)
t.insert(Node(9))
print(t)
t.insert(Node(1))
print(t)
t.insert(Node(2))
print(t)
t.insert(Node(4))
print(t)
t.insert(Node(6))
print(t)
t.insert(Node(-1))
print(t)
t.insert(Node(3))
print(t)
# print(t.search(5))
t.delete(t.search(5))
print(t)

class="example"> 11 (BLACK) 11 (BLACK) |---l: 3 (RED) 5 (BLACK) |---l: 3 (RED) |---r: 11 (RED) 5 (BLACK) |---l: 3 (BLACK) |---r: 11 (BLACK) |---|---r: 20 (RED) 5 (BLACK) |---l: 3 (BLACK) |---r: 11 (BLACK) |---|---l: 9 (RED) |---|---r: 20 (RED) 5 (BLACK) |---l: 3 (BLACK) |---|---l: 1 (RED) |---r: 11 (BLACK) |---|---l: 9 (RED) |---|---r: 20 (RED) 5 (BLACK) |---l: 2 (BLACK) |---|---l: 1 (RED) |---|---r: 3 (RED) |---r: 11 (BLACK) |---|---l: 9 (RED) |---|---r: 20 (RED) 5 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---r: 3 (BLACK) |---|---|---r: 4 (RED) |---r: 11 (BLACK) |---|---l: 9 (RED) |---|---r: 20 (RED) 5 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---r: 3 (BLACK) |---|---|---r: 4 (RED) |---r: 11 (RED) |---|---l: 9 (BLACK) |---|---|---l: 6 (RED) |---|---r: 20 (BLACK) 5 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---|---l: -1 (RED) |---|---r: 3 (BLACK) |---|---|---r: 4 (RED) |---r: 11 (RED) |---|---l: 9 (BLACK) |---|---|---l: 6 (RED) |---|---r: 20 (BLACK) 5 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---|---l: -1 (RED) |---|---r: 3 (BLACK) |---|---|---l: 3 (RED) |---|---|---r: 4 (RED) |---r: 11 (RED) |---|---l: 9 (BLACK) |---|---|---l: 6 (RED) |---|---r: 20 (BLACK) 6 (BLACK) |---l: 2 (RED) |---|---l: 1 (BLACK) |---|---|---l: -1 (RED) |---|---r: 3 (BLACK) |---|---|---l: 3 (RED) |---|---|---r: 4 (RED) |---r: 11 (RED) |---|---l: 9 (BLACK) |---|---r: 20 (BLACK)

2.19 Trie Tree

Radix Tree

class Node:
    def __init__(self):
        self.children = [None] * 2
        self.isEnd = False


class RadixTree:
    """
    This radix tree only contains binary path at each node, i.e. it only stores
    sting like "01100".
    """
    def __init__(self):
        self.root = Node()

    def getIdx(self, s):
        return ord(s) - ord("0")

    def getStr(self, idx):
        return str(idx)

    def sort(self):
        sorted_arr = []
        stack = [(self.root, "")]
        while stack:
            node, curr_str = stack.pop()
            # NOTE: need to iterate the children from right to left, so left
            # node will be traversed before right node
            for idx, n in reversed(list(enumerate(node.children))):
                if n:
                    next_str = curr_str + self.getStr(idx)
                    stack.append((n, next_str))
                    if n.isEnd:
                        sorted_arr.append(next_str)
        return sorted_arr

    def sort_recursive(self):
        return self._preorder_travel(self.root, "")

    def _preorder_travel(self, node, s):
        sorted_arr = []
        if node.isEnd:
            sorted_arr.append(s)
        for idx, n in enumerate(node.children):
            if n:
                sorted_arr.extend(self._preorder_travel(n, s + self.getStr(idx)))
        return sorted_arr

    def insert(self, seq):
        curr = self.root
        for s in seq:
            idx = self.getIdx(s)
            if not curr.children[idx]:
                curr.children[idx] = Node()
                curr = curr.children[idx]
                curr.isEnd = True

    def search(self, seq):
        curr = self.root
        for s in seq:
            idx = self.getIdx(s)
            if not curr.children[idx]:
                return False
            curr = curr.children[idx]
        return curr.isEnd


t = RadixTree()
t.insert("1011")
t.insert("0")
t.insert("100")
t.insert("10")
t.insert("011")

print(t.search("00"))
print(t.search("01"))
print(t.search("011"))
print(t.search("1011"))
print(t.sort())
print(t.sort_recursive())

class="example"> False False False True ['1', '0', '10', '101', '1011'] ['0', '1', '10', '101', '1011']

2.20 (Binary) Heap

An array with two attributes: 1) length, the number of elements in the array. 2) heap_size, the number of elements in the heap. The elements in the heap are from A[1] to A[A.heap_size].
The heap has two properties:
- Complete Binary Tree: the tree is completely filled in all levels except the lowest level. For the lowest level, the tree fills from left to right.
- Heap property: all the parent values are larger (max-heap) / smaller (min-heap) or equal to the child values.
Due to complete binary tree property, given a node index i, the parent index is floor(i/2), the left child index is 2*i, and the right child index is 2*i+1. NOTE: the root index is 1 (not 0).

NOTE: heapify only takes O(n) time (it looks like O(nlogn), however, a tighter upper bound can be found, which is O(n)). https://www.geeksforgeeks.org/time-complexity-of-building-a-heap/

from collections import deque


class Heap:
    def __init__(self):
        self.heap_size = 0
        self._array = [0]

    def __getitem__(self, k):
        if isinstance(k, slice):
            return self._array[slice(k.start + 1, k.stop + 1, k.step)]
        else:
            return self._array[k + 1]

    def __len__(self):
        return self.heap_size

    def parent(self, i):
        return i // 2

    def left(self, i):
        return 2 * i

    def right(self, i):
        return 2 * i + 1

    def build_heap(self, A):
        # It can be proved that A[len(A) // 2], A[len(A) - 1] are leaf nodes.
        self.heap_size = len(A)
        self._array = [0] + A
        # The time complexity is upper bounded by O(nlogn), but actually the
        # tighter upper bound is O(n)
        for i in range(self.heap_size // 2, 0, -1):
            # Build the heap bottom up
            self.max_heapify(i)

    def max_heapify(self, i):
        # Assumes the binary trees rooted at self.left(i) and self.right(i) are
        # max-heap

        # The Time complexity of this operation is O(log n), or O(h), where h
        # is the height of the tree
        lft = self.left(i)
        rht = self.right(i)

        if lft <= self.heap_size and self._array[i] < self._array[lft]:
            largest = lft
        else:
            largest = i

        if rht <= self.heap_size and self._array[largest] < self._array[rht]:
            largest = rht

        if largest != i:
            self._array[i], self._array[largest] = self._array[
                largest], self._array[i]
            self.max_heapify(largest)

    def maximum(self):
        return self._array[1]

    def extract_max(self):
        # Time complexity O(logn)
        if self.heap_size < 1:
            raise Exception("Heap underflow!")
        max = self._array[1]
        self._array[1] = self._array[self.heap_size]
        self.heap_size -= 1
        self.max_heapify(1)
        return max

    def insert(self, x):
        # Time complexity O(logn)
        self.heap_size += 1
        self._array.append(-1)
        self.increase_key(self.heap_size, x)

    def increase_key(self, i, k):
        # Time complexity O(logn)
        if k < self._array[i]:
            raise Exception("New key must be larger than the current key.")

        self._array[i] = k
        while i > 1 and self._array[i] > self._array[self.parent(i)]:
            self._array[self.parent(i)], self._array[i] = self._array[i], self._array[self.parent(i)]
            i = self.parent(i)

    def __str__(self):
        return str(self[0:len(self)])


def heap_sort(A):
    # Time complexity O(nlogn)
    h = Heap()
    h.build_heap(A)
    q = deque()
    while h.heap_size > 0:
        q.appendleft(h.extract_max())
    return list(q)


# A = [1, 2, 3, 4, 5, 6]
# h = Heap(A, 100)
# print(h)

A = [7, 100, 3, 6, 90, 45, 1000]
h = Heap()
h.build_heap(A)
print(h)

sorted_A = heap_sort(A)
print(sorted_A)

h = Heap()
h.insert(7)
h.insert(100)
h.insert(3)
h.insert(6)
h.insert(90)
h.insert(45)
h.insert(32)
print(h.maximum())
print(h)
h.increase_key(4, 1000)
print(h)

2.21 Dynamic Programming

Divide-and-conquer algorithms partition the problem into disjoint subproblems, solve the subproblems recursively, and then combine their solutions to solve the original problem.
Dynamic programming applies when the subproblems overlap, that is when subproblems share subsubproblems. In this context, divide-and-conquer does more work than necessary, repeatedly solve the common subsubproblems.
Dynamic programming solve each subsubproblem just once and saves its answer in a table to avoid solving the same problem again.
To develop a dynamic programming algorithm, we follow 4 steps:
- Characterize the structure of an optimal solution
- Recursively define the value of an optimal solution
- Compute the value of an optimal solution, typically in a bottom-up fashion
- Construct an optimal solution from computed information

2.21.1 Rod cutting

Top-down method utilizing memoization
```
def cut_rod(price, n):
    memo = [-1 for i in range(n)]
    return helper_cut_rod(price, n, memo)


def helper_cut_rod(price, n, memo):
    if memo[n - 1] != -1:
        return memo[n - 1]
    if n == 0:
        return 0
    optimal_price = -1
    for i in range(1, n + 1):
        optimal_price = max(optimal_price, price[i - 1] + helper_cut_rod(price, n - i, memo))
        memo[n - 1] = optimal_price
        print(memo)
    return optimal_price


price = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]
print(cut_rod(price, 10))
```
class="example"> [1, -1, -1, -1, -1, -1, -1, -1, -1, -1] [1, 2, -1, -1, -1, -1, -1, -1, -1, -1] [1, 5, -1, -1, -1, -1, -1, -1, -1, -1] [1, 5, 6, -1, -1, -1, -1, -1, -1, -1] [1, 5, 6, -1, -1, -1, -1, -1, -1, -1] [1, 5, 8, -1, -1, -1, -1, -1, -1, -1] [1, 5, 8, 9, -1, -1, -1, -1, -1, -1] [1, 5, 8, 10, -1, -1, -1, -1, -1, -1] [1, 5, 8, 10, -1, -1, -1, -1, -1, -1] [1, 5, 8, 10, -1, -1, -1, -1, -1, -1] [1, 5, 8, 10, 11, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, -1, -1, -1, -1, -1] [1, 5, 8, 10, 13, 14, -1, -1, -1, -1] [1, 5, 8, 10, 13, 15, -1, -1, -1, -1] [1, 5, 8, 10, 13, 16, -1, -1, -1, -1] [1, 5, 8, 10, 13, 16, -1, -1, -1, -1] [1, 5, 8, 10, 13, 16, -1, -1, -1, -1] [1, 5, 8, 10, 13, 17, -1, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, -1, -1, -1] [1, 5, 8, 10, 13, 17, 18, 19, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, -1, -1] [1, 5, 8, 10, 13, 17, 18, 22, 23, -1] [1, 5, 8, 10, 13, 17, 18, 22, 23, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, -1] [1, 5, 8, 10, 13, 17, 18, 22, 25, 26] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 27] [1, 5, 8, 10, 13, 17, 18, 22, 25, 57] 57
- Bottom-up method
```
def cut_rod(price, n):
    optimal_price, cut_points = cut_rod_helper(price, n)
    optimal_cut_points = []
    curr_len = n
    while curr_len > 0:
        optimal_cut_points.append(cut_points[curr_len])
        curr_len = curr_len - cut_points[curr_len]
    return optimal_price[n], optimal_cut_points


def cut_rod_helper(price, n):
    memo = [-1 for i in range(n + 1)]
    cut_point = [-1 for i in range(n + 1)]
    memo[0] = 0
    cut_point[0] = 0
    for i in range(1, n + 1):
        opt = -1
        cp = -1
        for j in range(1, i + 1):
            tmp = price[j - 1] + memo[i - j]
            if tmp > opt:
                opt = tmp
                cp = j
                memo[i] = opt
                cut_point[i] = cp
                print(memo)
                print(cut_point)
    return memo, cut_point


price = [1, 5, 8, 9, 10, 17, 17, 20, 24, 30]
opt_price, opt_cut = cut_rod(price, 7)
print(opt_price)
print(opt_cut)
```
  class="example"> [0, 1, -1, -1, -1, -1, -1, -1] [0, 1, -1, -1, -1, -1, -1, -1] [0, 1, 2, -1, -1, -1, -1, -1] [0, 1, 1, -1, -1, -1, -1, -1] [0, 1, 5, -1, -1, -1, -1, -1] [0, 1, 2, -1, -1, -1, -1, -1] [0, 1, 5, 6, -1, -1, -1, -1] [0, 1, 2, 1, -1, -1, -1, -1] [0, 1, 5, 8, -1, -1, -1, -1] [0, 1, 2, 3, -1, -1, -1, -1] [0, 1, 5, 8, 9, -1, -1, -1] [0, 1, 2, 3, 1, -1, -1, -1] [0, 1, 5, 8, 10, -1, -1, -1] [0, 1, 2, 3, 2, -1, -1, -1] [0, 1, 5, 8, 10, 11, -1, -1] [0, 1, 2, 3, 2, 1, -1, -1] [0, 1, 5, 8, 10, 13, -1, -1] [0, 1, 2, 3, 2, 2, -1, -1] [0, 1, 5, 8, 10, 13, 14, -1] [0, 1, 2, 3, 2, 2, 1, -1] [0, 1, 5, 8, 10, 13, 15, -1] [0, 1, 2, 3, 2, 2, 2, -1] [0, 1, 5, 8, 10, 13, 16, -1] [0, 1, 2, 3, 2, 2, 3, -1] [0, 1, 5, 8, 10, 13, 17, -1] [0, 1, 2, 3, 2, 2, 6, -1] [0, 1, 5, 8, 10, 13, 17, 18] [0, 1, 2, 3, 2, 2, 6, 1] 18 [1, 6]
  - Subproblem Graph
  - Top-down approach performs a "Depth-First-Search" of the subproblem graph.
  - Bottom-up approach solves the subproblem in a "reverse topological sort" order.
  - The time complexity of the dynamic ptrogramming is O(V + E), where V are the nodes, E are the edges of the graph.

2.21.2 Matrix-chain multiplication

2D DP problem

def matrix_chain_order(p):
    n = len(p) - 1
    m = [[-1 for i in range(n)] for i in range(n)]
    s = [[-1 for i in range(n)] for i in range(n)]

    for i in range(n):
        m[i][i] = 0
        s[i][i] = 0

    for d in range(1, n):
        # l is the difference between start and end indices
        for i in range(n - d):
            j = i + d
            # i is the start index
            for k in range(i, j):
                # k is the relative split position
                tmp = m[i][k] + m[k + 1][j] + p[i] * p[k + 1] * p[j + 1]
                if m[i][j] == -1:
                    m[i][j] = tmp
                    s[i][j] = k
                elif m[i][j] > tmp:
                    m[i][j] = tmp
                    s[i][j] = k
    return m, s


def print_solution(s, i, j):
    sol = ""
    if i == j:
        sol += "A_{}".format(i + 1)
    else:
        sol += "("
        sol += print_solution(s, i, s[i][j])
        sol += print_solution(s, s[i][j] + 1, j)
        sol += ")"
    return sol



p = [30, 35, 15, 5, 10, 20, 25]
m, s = matrix_chain_order(p)
sol = print_solution(s, 0, len(m) - 1)
print(m)
print(s)
print(sol)

class="example"> [[0, 15750, 7875, 9375, 11875, 15125], [-1, 0, 2625, 4375, 7125, 10500], [-1, -1, 0, 750, 2500, 5375], [-1, -1, -1, 0, 1000, 3500], [-1, -1, -1, -1, 0, 5000], [-1, -1, -1, -1, -1, 0]] [[0, 0, 0, 2, 2, 2], [-1, 0, 1, 2, 2, 2], [-1, -1, 0, 2, 2, 2], [-1, -1, -1, 0, 3, 4], [-1, -1, -1, -1, 0, 4], [-1, -1, -1, -1, -1, 0]] ((A_1(A_2A_3))((A_4A_5)A_6))

Elements of dynamic programming
- Optimal substructure
  - We build an optimal solution to the problem from optimal solutions of the subproblems.
  - Common pattern in discovering optimal substructure:
    - A solution to the problem consists of making a choice. Making to the choice leads to solving one or more subproblems.
    - You suppose for a given problem, you are given the choice that leads to an optimal solution. You do not concern yourself yet with how the choice came up. You just assume it has been given to you.
    - Given the choice, you determine which subproblems follow, and how to best characterize the resulting space of subproblems (e.g. memo[i], memo[i][j]).
    - You show that the solutions to the subproblems used within an optimal solution to the problem must be optimal themselves to the subproblems. You can prove it by contradiction.
  - Optimal substructure varies across problem domains in two ways:
    - how many subproblems an optimal solution to the original problem uses
    - how many choices we have in determining which subproblems to use in an optimal solution.
  - Informally, the running time of a dynamic-programming algorithm depends on the product of two factors:
    - the number of subproblems overall, denoted as n (i.e. the vertices of the subproblem graph).
    - how many choices we look at for each subproblem, denoted as m (i.e. the edges of each vertex, so the total edge number will be n * m).
  - DP versus greedy:
    - DP solves the subproblem first, and then use the solutions of the subproblems to solve the original problem
    - Greedy first make a "greedy" choice (the choice that looks best at the time), and then solve a resulting subproblem. It doesn't bothering to solve every possible smaller subproblems first. In some cases, this strategy works.
  - Independence between subproblems
    - If solving subproblems will need to share information between subproblems in order to find the optimal solution to the original problem, then the problem cannot be solved using DP. Example: unweighted longest simple path in a directed graph.
- Overlapping subproblems
  - When a recursive algorithm revisits the same problem repeatedly, then the optimization problem has overlapping subproblems.

Longest Common Subsequence

def longest_common_subsequence(x, y):
    m = len(x)
    n = len(y)
    c = [[0 for j in range(n + 1)] for i in range(m + 1)]
    d = [[-1 for j in range(n)] for i in range(m)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if x[i - 1] == y[j - 1]:
                c[i][j] = c[i - 1][j - 1] + 1
                d[i - 1][j - 1] = 2  # 2 means go to left up corner for parent
            elif c[i - 1][j] > c[i][j - 1]:
                c[i][j] = c[i - 1][j]
                d[i - 1][j - 1] = 0  # 0 means go left for parent
            else:
                c[i][j] = c[i][j - 1]
                d[i - 1][j - 1] = 1  # 1 means go right for parent
    return c[m][n], d


def print_lcs(x, d, i, j):
    if i < 0 or j < 0:
        return ""

    s = ""
    if d[i][j] == 2:
        s += print_lcs(x, d, i - 1, j - 1)
        s += x[i]
    elif d[i][j] == 0:
        s += print_lcs(x, d, i - 1, j)
    else:
        s += print_lcs(x, d, i, j - 1)
    return s


x = "abcbdab"
y = "bdcaba"

ct, d = longest_common_subsequence(x, y)
print(ct)
print(print_lcs(x, d, len(x) - 1, len(y) - 1))

class="example"> 4 bdab

NOTE: if only the length is needed, then there is no need to keep the whole lookup table. Only the latest two rows of the table is needed.

Optimal Binary Search Tree

def optimal_bst(p, q, n):
    e = [[0 for j in range(n + 1)] for i in range(n + 2)]  # e[1 to n + 1][0 to n]
    w = [[0 for j in range(n + 1)] for i in range(n + 2)]
    root = [[-1 for j in range(n + 1)] for i in range(n + 1)]

    for i in range(1, n + 2):
        e[i][i - 1] = q[i - 1]
        w[i][i - 1] = q[i - 1]

    for d in range(n):
        for i in range(1, n - d + 1):
            j = i + d
            e[i][j] = float("inf")
            w[i][j] = w[i][j - 1] + p[j - 1] + q[j]
            for k in range(i, j + 1):
                tmp = e[i][k - 1] + e[k + 1][j] + w[i][j]
                if tmp < e[i][j]:
                    e[i][j] = tmp
                    root[i][j] = k
                    print(e)
                    print(w)
    return e[1][n], root


def print_sol(root, i, j):
    if i > j:
        print('d{}'.format(j))
        return
    k = root[i][j]
    print("k{}".format(k))
    print("Left:")
    print_sol(root, i, k - 1)
    print("Right:")
    print_sol(root, k + 1, j)


p = [0.15, 0.1, 0.05, 0.1, 0.2]
q = [0.05, 0.1, 0.05, 0.05, 0.05, 0.1]
n = len(p)
e_opt, root = optimal_bst(p, q, n)
print(e_opt)
print_sol(root, 1, n)

class="example"> [[0, 0, 0, 0, 0, 0], [0.05, 0.45000000000000007, 0.9, 1.25, 1.75, 2.75], [0, 0.1, 0.4, 0.7, 1.2, 2.0], [0, 0, 0.05, 0.25, 0.6, 1.2999999999999998], [0, 0, 0, 0.05, 0.30000000000000004, 0.9], [0, 0, 0, 0, 0.05, 0.5], [0, 0, 0, 0, 0, 0.1]] [[0, 0, 0, 0, 0, 0], [0.05, 0.30000000000000004, 0.45, 0.55, 0.7000000000000001, 1.0000000000000002], [0, 0.1, 0.25, 0.35, 0.49999999999999994, 0.7999999999999999], [0, 0, 0.05, 0.15000000000000002, 0.3, 0.6], [0, 0, 0, 0.05, 0.2, 0.5], [0, 0, 0, 0, 0.05, 0.35], [0, 0, 0, 0, 0, 0.1]] 2.75 k2 Left: k1 Left: d0 Right: d1 Right: k5 Left: k4 Left: k3 Left: d2 Right: d3 Right: d4 Right: d5

2.21.3 0/1 Knapsack

2.21.4 Edit Distance

2.22 Greedy Algorithms

2.22.1 Interval-graph coloring problem

Meeting rooms II: https://leetcode.com/problems/meeting-rooms-ii/

2.23 Union Find

Union find with rank and path compression. The time complexity of union find with both rank and path compression is O(m α(n)), where m is the number of make-set, find, union operations, n is the number of make-set operations (only happens at the beginning, usually means n nodes). α(n) is a very slowly growing function, α(n) < 4 in most of the practical cases. Hence, the time complexity is almost linear to the number of operations. Each operation nearly takes constant time.

class UnionFind:
    def __init__(self, size):
        self.parent = list(range(size))
        self.rank = [0 for i in range(size)]

    def find(self, x):
        while x != self.parent[x]:
            # path compression, connect the grandchild node with its grandparent every time executing find
            self.parent[x] = self.parent[self.parent[x]]
            x = self.parent[x]
        return x

    def union(self, x, y):
        print('Before Union')
        print('Parent: {}'.format(self.parent))
        print('Rank: {}'.format(self.rank))
        xroot = self.find(x)
        yroot = self.find(y)

        if self.rank[xroot] > self.rank[yroot]:
            self.parent[y] = xroot
        elif self.rank[x] < self.rank[y]:
            self.parent[x] = yroot
        else:
            # rank only increases when the two sets to join have the same rank
            self.parent[y] = xroot
            self.rank[xroot] += 1

        print('After Union')
        print('Parent: {}'.format(self.parent))
        print('Rank: {}'.format(self.rank))


uf = UnionFind(10)
uf.union(3, 4)
uf.union(9, 4)
uf.union(8, 0)
uf.union(2, 3)
print(uf.find(3))
print(uf.find(2))
print(uf.find(9))
uf.union(5, 6)
uf.union(5, 9)
uf.union(7, 3)
uf.union(4, 8)
uf.union(6, 1)

class="example"> Before Union Parent: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] After Union Parent: [0, 1, 2, 3, 3, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] Before Union Parent: [0, 1, 2, 3, 3, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0] After Union Parent: [0, 1, 2, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 0, 1] Before Union Parent: [0, 1, 2, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 0, 1] After Union Parent: [8, 1, 2, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 1, 1] Before Union Parent: [8, 1, 2, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 1, 1] 3 3 9 Before Union Parent: [8, 1, 3, 3, 9, 5, 6, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 0, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 9, 5, 5, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 1, 0, 0, 1, 1] Before Union Parent: [8, 1, 3, 3, 9, 5, 5, 7, 8, 9] Rank: [0, 0, 0, 1, 0, 1, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 9, 5, 5, 7, 8, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] Before Union Parent: [8, 1, 3, 3, 9, 5, 5, 7, 8, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 9, 5, 5, 3, 8, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] Before Union Parent: [8, 1, 3, 3, 9, 5, 5, 3, 8, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] After Union Parent: [8, 1, 3, 3, 5, 5, 5, 3, 5, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] Before Union Parent: [8, 1, 3, 3, 5, 5, 5, 3, 5, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1] After Union Parent: [8, 5, 3, 3, 5, 5, 5, 3, 5, 5] Rank: [0, 0, 0, 1, 0, 2, 0, 0, 1, 1]

2.24 Single-source shortest path

2.24.1 Bellman-Ford

2.25 Quick Select

Time complexity: worst case O(n²) if the pivot is always the biggest or smallest. best case O(n), if we can always decrease the array to search by half, then 1 + 1/2 + 1/4 + … = 2. average O(n).

def quickSelect(nums, k):
    left = 0
    right = len(nums) - 1
    while left < right:
        pivot_idx = partition(nums, left, right)
        if pivot_idx == k - 1:
            break
        elif pivot_idx < k - 1:
            left = pivot_idx + 1
        else:
            right = pivot_idx - 1
    return nums[:k]


def partition(nums, left, right):
    pivot = nums[right]
    i = left
    for j in range(left, right):
        if nums[j] < pivot:
            nums[i], nums[j] = nums[j], nums[i]
            i += 1

    nums[i], nums[right] = nums[right], nums[i]
    return i


nums = [5, 3, 9, 2, 7, 2]
print(quickSelect(nums, 3))

2.26 Graph Coloring Problem

The general graph coloring problem (find the minimum number of colors that the adjacent nodes do not share the same color) is NP-Complete. Greedy algorithm can only find an upper bound of the color numbers, but it is not guaranteed to be the optimal.

2.27 Tree in the context of graph

A tree is an undirected graph in which any two vertices are connected by exactly one path. Any connected graph without simple cycles is a tree.
Any connected graph who has n nodes with n - 1 edges is a tree.
Diameter of a graph: longest of the shortest paths of any two pairs in a graph.
Remoteness: the length of the longest path from the node to its leaves.
Center: the node that has the smallest remoteness. Or if treating the node as the root of the tree, the tree has the minimal height.
Theorem: There can be at most two centers for a tree.
Theorem: the center of the tree must be on the diameter of the tree.

2.28 Bellman-Ford Algorithm

Repeat |V| - 1 times, for each edge, relax the edge
The reason for |V| - 1 times: because any simple path from u to v has at most |V| - 1 edges. Otherwise, the path has cycle. Positive cycles will not contribute to shortest path.
After |V| - times, if we do the relaxation for all the edges one more time, we can detect negative cycles if certain nodes got relaxed again.

2.29 Dijkstra's Algorithm

Works for directed and undirected graphs.
Greedy, always find the node with the shortest distance so far and relax its edges.

Does not work with negative weights.

from heapq import heappush, heappop
from collections import deque


def dijkstra(n, edges, start):
    graph = buildGraph(n, edges)

    dist = [float('inf') for _ in range(n)]
    dist[start] = 0

    parent = [-1 for _ in range(n)]
    parent[start] = start

    minHeap = []
    heappush(minHeap, (0, start))

    visited = set()
    while minHeap:
        d, node = heappop(minHeap)
        if node in visited:
            continue

        dist[node] = d
        visited.add(node)

        for nei, w in graph[node]:
            if nei not in visited:
                if dist[nei] > dist[node] + w:
                    dist[nei] = dist[node] + w
                    parent[nei] = node
                    heappush(minHeap, (dist[nei], nei))

    print('Shortest distance from {}:'.format(start))
    print(dist)

    print('All the shortest paths starting from {}:'.format(start))
    for i in range(n):
        print("->".join(map(str, reconstruct_path(parent, i))))


def reconstruct_path(parent, node):
    path = deque()
    while node != parent[node]:
        path.appendleft(node)
        node = parent[node]
        path.appendleft(node)
    return path


def buildGraph(n, edges):
    graph = {i: set() for i in range(n)}
    for u, v, w in edges:
        graph[u].add((v, w))
    return graph


dijkstra(5, [[0, 1, 10], [0, 2, 3], [2, 1, 2], [2, 3, 5], [1, 3, 1], [0, 4, 2], [3, 4, 1]], 0)

class="example"> Shortest distance from 0: [0, 5, 3, 6, 2] All the shortest paths starting from 0: 0->2->2->1 0->2 0->2->2->1->1->3 0->4

2.30 Sliding window technique

https://leetcode.com/problems/minimum-window-substring/

2.31 Greatest Common Divisor

def gcd(a, b):
    if a < b:
        a, b = b, a

    # now a is always larger than b
    while b != 0:
        tmp = b
        b = a % b
        a = tmp

    return a


print(gcd(100, 30))
print(gcd(100, 11))

class="example"> 10 1

2.32 Find prime numbers less than n

2.32.1 Method 1 - brute force

def printPrime(n):
    res = []
    for i in range(2, n):
        if isPrime(i):
            res.append(i)
    return res

def isPrime(x):
    for i in range(2, x):
        if x % i == 0:
            return False
    return True


print(printPrime(20))

class="example"> [2, 3, 5, 7, 11, 13, 17, 19]

2.32.2 Method 2 - better brute force

def printPrime(n):
    res = []
    for i in range(2, n):
        if isPrime(i):
            res.append(i)
    return res


def isPrime(x):
    for i in range(2, x):
        if i * i > x:
            break
        if x % i == 0:
            return False
    return True


print(printPrime(20))

class="example"> [2, 3, 5, 7, 11, 13, 17, 19]

2.32.3 Method 3 - Sieve of Eratothenes

Key idea: mark multiples as not a prime.

def printPrime(n):
    if n <= 2:
        return []

    res = [2]

    isPrime = [True for i in range(n)]
    for i in range(4, n):
        if i % 2 == 0:
            isPrime[i] = False

    for i in range(3, n, 2):
        if isPrime[i]:
            x = i * i
            while x < n:
                isPrime[x] = False
                x += i

    for i in range(3, n):
        if isPrime[i]:
            res.append(i)
    return res


print(printPrime(100))
print(printPrime(2))
print(printPrime(3))
print(printPrime(10))

class="example"> [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97] [] [2] [2, 3, 5, 7]

2.33 Bit Manipulation

2.33.1 XOR

Commutative law: a ^ b = b ^ a
Associative law: (a ^ b) ^ c = a ^ (b ^ c)
a ^ b ^ b = a
a ^ 0 = a
a ^ a = 0
Flip the n-th bit: mask = 1 << i, flipped_x = x ^ mask
Swap two values without using extra space

a = a ^ b
b = a ^ b
a = a ^ b

2.33.2 int

a int type (signed int) is 32 bit (4 bytes), representing integers from [-2**31, 2**31 - 1].

2.34 Knuth Shuffling (Fisher-Yates shuffle)

import random

def knuthShuffle(arr):
    for i in range(len(arr) - 1, 0, -1):
        j = random.randint(0, i)
        arr[i], arr[j] = arr[j], arr[i]

arr = list(range(10))
knuthShuffle(arr)
print(arr)

class="example"> [1, 8, 9, 2, 7, 0, 3, 5, 4, 6]

2.35 Reservoir Sampling

Sample k numbers from n numbers. Consider the number array is very large that it cannot fit in memory.

Can we sample the the number in a streaming fashion?

The answer is reservoir sampling!

If current selected number is less than k, then keep appending the new number into the selected number array.
If current selected number is equal to k, when the i th number arrives, keep the new number with probablity of k / i and discard one of the old numbers equally randomly (otherwise, just keep the old ones).

In the end, each number is kept in the selected set with probablity of k / n.

import random

def reservoirSampling(nums, k):
    """
    Randomly draw k samples from a very large array nums.

    Assuming nums is a very large array that cannot fit in memory, we can only
    sample nums in a streaming fashion.
    """
    reservoir = []
    for i, x in enumerate(nums):
        if len(reservoir) < k:
            reservoir.append(x)
        else:
            rn = random.randint(0, i - 1)
            if rn < k:
                reservoir[rn] = x
    return reservoir


nums = list(range(1, 10001))
print(reservoirSampling(nums, 5))
nums = list(range(1, 7))
print(reservoirSampling(nums, 5))

class="example"> [5974, 4744, 4354, 2221, 6505] [1, 2, 6, 4, 5]

2.36 Random pick number with weight

Compute the accumulated sum.
Generate a random number in range of [0, accum_sum[-1]).
Perform a binary search to see the random number sits in which interval (defined by the accumulated sum array).

2.37 Monotonic Queue/Stack

https://leetcode.com/problems/shortest-subarray-with-sum-at-least-k/discuss/204290/Monotonic-Queue-Summary https://leetcode.com/problems/daily-temperatures/ https://leetcode.com/problems/sliding-window-maximum/

A queue where the elements from front to end are always strictly increasing (or nondecreasing) or decreasing (or nonincreasing).

When a new value arrives, it kicks the values that are smaller than it in front of it for an increasing queue.

It enables finding the maximum in a queue in O(1) time.

import collections


class MaxQueue(object):
    def __init__(self):
        self.queue = collections.deque()
        self.max_q = collections.deque()

    def append(self, val):
        self.queue.append(val)
        while self.max_q and self.max_q[-1] < val:
            self.max_q.pop()
            self.max_q.append(val)

    def popleft(self):
        val = self.queue.popleft()
        if self.max_q and self.max_q[0] == val:
            self.max_q.popleft()
        return val

    def getMax(self):
        return self.max_q[0] if self.max_q else None

Monotonic (increasing) stack can be used to find the Previous Less Element and Next Less Element in O(n) time.
This is very helpful for the histogram problem https://leetcode.com/problems/largest-rectangle-in-histogram/
Similar problems:

2.38 Producer Consumer Pattern

2.39 Dutch national flag problem (three-way-partition problem)

Like quick sort partition, but has one more pointer to partition the array into three chunks.

This algorithm can be used to improve quick sort: partition the array into < pivot, = pivot and > pivot 3 parts. This can be useful if there is lots of duplicates in the array.

2.40 Hamiltonian path and hamiltonian cycle

A Hamiltonian path is a path that visits each vertex of the graph exactly once.
A Hamiltonian cycle is a cycle that visits each vertext exactly once.

Backtrack

2.41 Least Recently Used (LRU) cache

class LRUCache(object):
    def __init__(self, capacity):
        self.capacity = capacity
        self.key_map = {}
        self.dlist = DoublyLinkedList()

    def put(self, key, value):
        if key not in self.key_map and len(self.key_map) == self.capacity:
            tail = self.dlist.pop()
            self.key_map.pop(tail.key)

        if key in self.key_map:
            self.dlist.remove(self.key_map[key])
        node = Node(key, value)
        self.key_map[key] = node
        self.dlist.appendleft(node)

    def get(self, key):
        if key not in self.key_map:
            return None
        node = self.key_map[key]
        self.dlist.remove(node)
        self.dlist.appendleft(node)
        # print(node)
        return node.val


class DoublyLinkedList(object):
    def __init__(self):
        self.sentinel = Node(0, 0)
        self.sentinel.next = self.sentinel
        self.sentinel.prev = self.sentinel

    def appendleft(self, node):
        head = self.sentinel.next
        self.sentinel.next = node
        node.prev = self.sentinel
        head.prev = node
        node.next = head

    def pop(self):
        tail = self.getTail()
        if not tail:
            raise IndexError('The list is empty!')
        self.remove(tail)
        return tail

    def remove(self, node):
        node.prev.next = node.next
        node.next.prev = node.prev

    def getHead(self):
        return self.sentinel.next if self.sentinel.next is not self.sentinel else None

    def getTail(self):
        return self.sentinel.prev if self.sentinel.prev is not self.sentinel else None


class Node(object):
    def __init__(self, key, val):
        self.key = key
        self.val = val
        self.next = None
        self.prev = None


cache = LRUCache(3)
cache.put('a', 1)
cache.put('b', 2)
cache.put('c', 3)
cache.put('c', 4)
cache.put('c', 5)
cache.put('d', 6)
print(cache.get('a'))
print(cache.get('b'))
print(cache.get('c'))
cache.put('e', 7)
cache.put('e', 8)
print(cache.get('b'))

class="example"> None 2 5 2

2.42 Interval problems

2.42.1 Count number of events happening concurrently at certain time point

from heapq import heappush, heappop


class EventCounter(object):
    def __init__(self, intervals):
        pq = []
        for start, end in intervals:
            heappush(pq, (start, 1))
            heappush(pq, (end, -1))

        self.count = [[-float('inf'), 0]]
        while pq:
            time, diff = heappop(pq)
            if time == self.count[-1][0]:
                self.count[-1][1] += diff
            else:
                self.count.append([time, self.count[-1][1] + diff])

    def query(self, time):
        print(self.count)
        left = 0
        right = len(self.count) - 1
        res = 0
        while left <= right:
            mid = left + (right - left) // 2
            if self.count[mid][0] <= time:
                res = self.count[mid][1]

            if self.count[mid][0] <= time:
                left = mid + 1
            else:
                right = mid - 1
                print(res)
        return res


ec = EventCounter([[2, 5], [3, 8], [4, 8], [5, 10]])
ec.query(3)
ec.query(5)

class="example"> [[-inf, 0], [2, 1], [3, 2], [4, 3], [5, 3], [8, 1], [10, 0]] 0 [[-inf, 0], [2, 1], [3, 2], [4, 3], [5, 3], [8, 1], [10, 0]] 3

2.42.2 Meeting room problem

2.42.3 Merge intervals

2.42.4 add/delete intervals

2.43 API Limiter

2.44 Minimax

Two players play a game. Player A wants to maximize the score, while player B wants to minimize the score.

Do a DFS (post order), player A and B take turns to maximize/minimize the score at each level.
Note: we might be able to the prune the tree if there is no need to explore deeper.

3 Learning Resources

3.1 Structure and Interpretation of Computer Programs (SICP)

4 P, NP, NP-complete, NP-hard

P: the set of problems that can be solved in polynomial time.
NP (Nondeterministic polynomial time): the set of problems that can be verified (whether the provided solution is correct or not) in polynomial time given a potential solution.
NP-complete: a problem p in NP is an NP-complete problem if every other problems in NP can be reduced to p in polynomial time. That is any problems belong to both NP-hard and NP. A solution to an NP-complete problem can be verified "quickly", but there is no known way to find a solution quickly.
NP-hard: a decision problem p is NP-hard if every problem in NP can be reduced to p in polynomial time.

5 Catalan Number

\[C_0 = 1\] \[C_{n+1} = \sum_{i=0}{n} C_i C_{n - i}\] The formula indicates a recursion scheme.

Related to generate parentheses problem, generate full binary tree problem, monotonic path problem (number of "move right" must be larger than number of "move up" in the grid, so always traveling the grid under its diagonal) https://en.wikipedia.org/wiki/Catalan_number

6 ASCII

Pronunciation: ass-key :) Extended ASCII encodes 256 characters (0 - 255). Therefore, a char usually takes 1 byte (8 bits).

7 Convert int to byte string

def int_to_byte(num):
    # assuming int is 32 bit
    byte = [chr(num >> i * 8 & 0xff) for i in range(4)]
    byte.reverse()
    print(byte)
    byte_str = ''.join(byte)
    return ''.join(byte)

def byte_to_int(byte):
    res = 0
    for i, c in enumerate(byte):
        res += ord(c) << (3 - i) * 8
    return res

b = int_to_byte(12)
print(byte_to_int(b))

class="example"> ['\x00', '\x00', '\x00', '\x0c'] 12

8 Rotation Matrix

\[R = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix}\]

Therefore, turn left 90 degree (anti-clockwise): \[R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\]

\[R \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} -y \\ x \end{bmatrix}\]

Turn right 90 degree (clockwise): \[R = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}\]

\[R \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} y \\ -x \end{bmatrix}\]

Author: Jinyu Xie

Created: 2020-05-16 Sat 23:20

Validate

Notes, Algorithm&DataStructure

machine_learning

This post is licensed under CC BY 4.0 by the author.