GitHub Repository: ethen8181/machine-learning
Path: blob/master/python/algorithms/tree.ipynb
¹⁴⁸⁰ views

Kernel: Python 3 (ipykernel)

1 Tree Data Structure

In [1]:

from jupyterthemes import get_themes
from jupyterthemes.stylefx import set_nb_theme
themes = get_themes()
set_nb_theme(themes[1])

Out[1]:

In [2]:

%load_ext watermark
%watermark -a 'Ethen' -d -t -v -p jupyterthemes

Out[2]:

Ethen 2017-11-10 05:33:32 

CPython 3.5.2
IPython 6.2.1

jupyterthemes 0.15.0

Tree Data Structure

Tree Traversal

Resources:

Summary

There are two main ways to traverse through a tree. By traversing, we essentially mean we try to walk through the tree without repeating ourselves.

For depth first search (DFS), we start down a path and we won't stop until we get to the end. In other words, we traverse through one branch of the tree until we get to a leaf and only then will we work back up to the trunk of the tree.
In breadth first search (BFS), we would search through the nodes from one level to the next, and traverse through all the children of a node before moving on to visit the grandchildren nodes

Implementation

Resources:

From an implementation standpoint, the difference between the two is that the breadth first search uses a queue, while depth first search uses a stack.

Breadth First Search (BFS)

We will use a graph that has 7 nodes and 7 edges as our example. The edges are undirected and unweighted. Distance between two nodes will be measured based on the number of edges separating two vertices.

When using BFS to traverse through all the nodes in the graph, the algorithm follows the following steps:

Check the starting node and add its neighbours to the queue.
Mark the starting node as explored.
Get the first node from the queue / remove it from the queue.
Check if node has already been visited.
If not, go through the neighbours of the node.
Add the neighbour nodes to the queue.
Mark the node as explored.
Loop through steps 3 to 7 until the queue is empty.

In [3]:

# adjacency list can be efficiently represented as
# a python dictionary, where the nodes are the keys
# and the nodes that they are connected to are stored
# as a list of values
graph = {'A': ['B', 'C', 'E'],
         'B': ['A','D', 'E'],
         'C': ['A', 'F', 'G'],
         'D': ['B'],
         'E': ['A', 'B','D'],
         'F': ['C'],
         'G': ['C']}

In [4]:

from collections import deque


def bfs(graph, start):
    """Graph traversal using Breadth First Searh"""
    # keep track of all visited nodes
    visited = set()

    # keep track nodes to be checked
    queue = deque(start)
    while queue:
        node = queue.popleft()
        if node not in visited:
            visited.add(node)
            neighbors = graph[node]
            for neighbor in neighbors:
                queue.append(neighbor)
    
    return visited

In [5]:

start = 'A'
bfs(graph, start)

Out[5]:

{'A', 'B', 'C', 'D', 'E', 'F', 'G'}

For depth first search, the list of actions to perform upon each visit to a node:

Mark the current vertex as being visited.
Explore each adjacent vertex that is not included in the visited set.

In [6]:

def dfs(graph, start):
    """Graph traversal using Depth First Searh"""
    visited = set()
    stack = [start]
    while stack:
        node = stack.pop()
        if node not in visited:
            visited.add(node)
            neighbors = graph[node]
            for neighbor in neighbors:
                stack.append(neighbor)

    return visited

In [7]:

start = 'A'
dfs(graph, start)

Out[7]:

{'A', 'B', 'C', 'D', 'E', 'F', 'G'}

Binary Seach Tree

Resources:

Summary

In order for a binary tree to be searchable, all of the nodes to the left of root node must be smaller than the value of the root node. On the other hand, all the value to the right of the root node must be bigger than the root node's value. This property also applies to every subtree.

One of the main use case for binary search tree is to search for the presence of a number in O(log(n)) time. The rationale behind this is that if the value is below root, we can say for sure that the value is not in the right subtree then we only need to search in the left subtree and if the value is above root, we can say for sure that the value is not in the left subtree and we need to only search in the right subtree.

Heap

Blog: Learning to Love Heaps

Summary

The heap is a binary tree with two additional rules

The shape: An entire level of nodes must all have children added to them before any of the child node can have grandchildren nodes
The ordering: the parent nodes must be either greater or equal to the value of its child nodes or less or equal to the value of its child node. Both formats are both acceptable.

We can classify heap into two categories based on the ordering, a minheap or a maxheap.

From an implementation standpoint, heaps are often times represented as arrays. We start off by representing the root node of the heap as the first element of the array. Then if a parent node’s index is represented by index i in an array, then it’s left child will always be located at 2i + 1. Similarly, if a parent node’s index is represented by index i in an array, then it’s right child will always be located at 2i + 2.

Table of Contents

Tree Data Structure

Tree Traversal

Summary

Implementation

Binary Seach Tree

Summary

Heap

Summary

Product

Resources

Company