GitHub Repository: ethen8181/machine-learning
Path: blob/master/python/algorithms/basic_data_structure.ipynb
¹⁴⁸⁰ views

Kernel: Python 3 (ipykernel)

1 Basic Data Structure

In [1]:

from jupyterthemes import get_themes
from jupyterthemes.stylefx import set_nb_theme
themes = get_themes()
set_nb_theme(themes[1])

Out[1]:

In [2]:

%load_ext watermark
%watermark -a 'Ethen' -d -t -v -p jupyterthemes

Out[2]:

Ethen 2016-12-26 22:22:46 

CPython 3.5.2
IPython 4.2.0

jupyterthemes 0.13.9

Basic Data Structure

Following the online book, Problem Solving with Algorithms and Data Structures. Chapter 4 works through basic data structures and some of its use cases.

Stack

In [3]:

class Stack:
    """
    Last In First Out (LIFO)
    creates a new stack that is empty,
    assumes that the end of the list will hold 
    the top element of the stack
    
    References
    ----------
    https://docs.python.org/3/tutorial/datastructures.html#using-lists-as-stacks
    """

    def __init__(self):
        self.items = []

    def is_empty(self):
        """
        For sequences, (strings, lists, tuples), 
        use the fact that empty sequences are false
        http://stackoverflow.com/questions/53513/best-way-to-check-if-a-list-is-empty
        """
        return not self.items

    def push(self, item):
        """adds a new item to the top of the stack"""
        self.items.append(item)
    
    def pop(self):
        """
        removes the top item from the stack,
        popping an empty stack (list) will result in an error
        """
        return self.items.pop()

    def peek(self):
        """returns the top item from the stack but does not remove it"""
        return self.items[len(self.items) - 1]

    def size(self):
        return len(self.items)

In [4]:

s = Stack()
print(s.is_empty())
s.push(4)
s.push(5)
print(s.is_empty())
print(s.pop())

Out[4]:

True
False
5

Write a function rev_string(mystr) that uses a stack to reverse the characters in a string.

In [5]:

def rev_string(string):
    s = Stack()
    for char in string:
        s.push(char)

    rev_str = ''
    while not s.is_empty():
        rev_str += s.pop()

    return rev_str

test = 'apple'    
rev_string(test)

Out[5]:

'elppa'

Check for balance parenthesis.

In [6]:

def match(top, char):
    if top == '[' and char == ']':
        return True
    
    if top == '{' and char == '}':
        return True
    
    if top == '(' and char == ')':
        return True
    
    return False

def check_paren(text):
    """confirm balance parenthesis in operations"""
    # define the set of opening
    # and closing brackets
    opens = '([{'
    close = ')]}'
    
    s = Stack()
    balanced = True
    for char in text:
        if char in opens:
            s.push(char)

        if char in close:
            # if a closing bracket appeared
            # without a opening bracket
            if s.is_empty():
                balanced = False
                break
            else:
                # if there is a mismatch between the
                # closing and opening bracket
                top = s.pop()
                if not match(top, char):
                    balanced = False
                    break

    if balanced and s.is_empty():
        return True
    else:
        return False

In [7]:

test1 = '{{([][])}()}'
balanced = check_paren(test1)
print(balanced)

test2 = '{test}'
balanced = check_paren(test2)
print(balanced)

test3 = ']'
balanced = check_paren(test3)
print(balanced)

Out[7]:

True
True
False

Convert numbers into binary representation.

In [8]:

def convert_binary(num):
    """assumes positive number is given"""
    s = Stack()
    while num > 0:
        remainder = num % 2
        s.push(remainder)
        num = num // 2

    binary_str = ''
    while not s.is_empty():
        binary_str += str(s.pop())
        
    return binary_str

In [9]:

num = 42
binary_str = convert_binary(num)
binary_str

Out[9]:

'101010'

Convert operators from infix to postfix.

In [10]:

import string

def infix_to_postfix(formula):
    """assume input formula is space-delimited"""
    s = Stack()
    output = [] 
    prec = {'*': 3, '/': 3, '+': 2, '-': 2, '(': 1}
    operand = string.digits + string.ascii_uppercase
    
    for token in formula.split():
        if token in operand:
            output.append(token)
        elif token == '(':
            s.push(token)
        elif token == ')':
            top = s.pop()
            while top != '(':
                output.append(top)
                top = s.pop()
        else:
            while (not s.is_empty()) and prec[s.peek()] > prec[token]:
                top = s.pop()
                output.append(top)
            s.push(token)

    while not s.is_empty():
        top = s.pop()
        output.append(top)
    
    postfix = ' '.join(output)
    return postfix

In [11]:

formula = 'A + B * C'
output = infix_to_postfix(formula)
output

Out[11]:

'A B C * +'

In [12]:

formula = '( A + B ) * C'
output = infix_to_postfix(formula)
output

Out[12]:

'A B + C *'

Queue

In [13]:

class Queue:
    """
    First In First Out (FIFO)
    assume rear is at position 0 in the list,
    so the last element of the list is the front
    """

    def __init__(self):
        self.items = []
    
    def is_empty(self):
        return not self.items
    
    def enqueue(self, item):
        self.items.insert(0, item)

    def dequeue(self):
        return self.items.pop()
        
    def size(self):
        return len(self.items)

In [14]:

q = Queue()
q.enqueue(4)
q.enqueue(3)
q.enqueue(10)

# 4 is the first one added, hence it's the first
# one that gets popped
print(q.dequeue())
print(q.size())

Out[14]:

4
2

From Python Documentation: Using Lists as Queues
Although the queue functionality can be implemented using a list, it is not efficient for this purpose. Because doing inserts or pops from the beginning of a list is slow (all of the other elements have to be shifted by one).

We instead use deque. It is designed to have fast appends and pops from both ends.

Implement the palindrome checker to check if a given word is a palindrom. A palindrome is a string that reads the same forward and backward, e.g. radar.

In [15]:

from collections import deque

def check_palindrome(word):
    equal = True
    queue = deque([token for token in word])
    while len(queue) > 1 and equal:
        first = queue.popleft()
        last = queue.pop()
        if first != last:
            equal = False

    return equal

In [16]:

test = 'radar'
check_palindrome(test)

Out[16]:

True

Unordered (Linked)List

Blog: Implementing a Singly Linked List in Python

In [17]:

class Node:
    """
    node must contain the list item itself (data) 
    node must hold a reference that points to the next node
    """

    def __init__(self, initdata):
        # None will denote the fact that there is no next node
        self._data = initdata
        self._next = None

    def get_data(self):
        return self._data

    def get_next(self):
        return self._next

    def set_data(self, newdata):
        self._data = newdata

    def set_next(self, newnext):
        self._next = newnext

In [18]:

class UnorderedList:

    def __init__(self):
        """
        we need identify the position for the first node,
        the head, When the list is first initialized 
        it has no nodes
        """
        self.head = None
        
    def is_empty(self):
        return self.head is None
    
    def add(self, item):
        """
        takes data, initializes a new node with the given data
        and add it to the list, the easiest way is to
        place it at the head of the list and 
        point the new node at the old head
        """
        node = Node(item)
        node.set_next(self.head)
        self.head = node
        return self

    def size(self):
        """
        traverse the linked list and keep a count 
        of the number of nodes that occurred
        """
        count = 0
        current = self.head
        while current is not None:
            count += 1
            current = current.get_next()

        return count

    def search(self, item):
        """
        goes through the entire list to check
        and see there's a matched value
        """
        found = False
        current = self.head
        while current is not None and not found:
            if current.get_data() == item:
                found = True
            else:
                current = current.get_next()

        return found
    
    def delete(self, item):
        """
        traverses the list in the same way that search does,
        but this time we keep track of the current node and the previous node.
        When delete finally arrives at the node it wants to delete, 
        it looks at the previous node and resets that previous node’s pointer 
        so that, rather than pointing to the soon-to-be-deleted node, 
        it will point to the next node in line;
        this assumes item is in the list
        """
        found = False
        previous = None
        current = self.head
        while current is not None and not found:
            if current.get_data() == item:
                found = True
            else:
                previous = current
                current = current.get_next()
        
        # note this assumes the item is in the list,
        # if not we'll have to write another if else statement
        # here to handle it
        if previous is None:
            # handle cases where the first item is deleted,
            # i.e. where we are modifying the head
            self.head = current.get_next()
        else:
            previous.set_next(current.get_next())

        return self

    def __repr__(self):
        value = []
        current = self.head
        while current is not None:
            data = str(current.get_data())
            value.append(data)
            current = current.get_next()

        return '[' + ', '.join(value) + ']'

In [19]:

mylist = UnorderedList()
mylist.add(31)
mylist.add(17)
mylist.add(91)
mylist.delete(17)
print(mylist.search(35))
mylist

Out[19]:

False

[91, 31]

Ordered (Linked)List

As the name suggests, compared to unordered linkedlist, the elements will always be ordered for a ordered linked list. The is_empty and size method are exactly the same as the unordered linkedlist as they don't have anything to do with the actual item in the list.

In [20]:

class OrderedList:
    def __init__(self):
        self.head = None
        
    def add(self, item):
        """
        takes data, initializes a new node with the given data
        and add it to the list while maintaining relative order
        """
        
        # stop when the current node is larger than the item
        stop = False
        previous = None
        current = self.head
        while current is not None and not stop:
            if current.get_data() > item:
                stop = True
            else:
                previous = current
                current = current.get_next()
        
        # check whether it will be added to the head
        # or somewhether in the middle and handle
        # both cases
        node = Node(item)
        if previous is None:
            node.set_next(self.head)
            self.head = node
        else:
            previous.set_next(node)
            node.set_next(current)

        return self
    
    def search(self, item):
        """
        goes through the entire list to check
        and see there's a matched value, but
        we can stop if the current node's value
        is greater than item since the list is sorted
        """
        stop = False
        found = False
        current = self.head
        while current is not None and not found and not stop:
            if current.get_data() == item:
                found = True
            else:
                if current.get_data() > item:
                    stop = True
                else:
                    current = current.get_next()

        return found
    
    def __repr__(self):
        value = []
        current = self.head
        while current is not None:
            data = str(current.get_data())
            value.append(data)
            current = current.get_next()

        return '[' + ', '.join(value) + ']'

In [21]:

mylist = OrderedList()
mylist.add(17)
mylist.add(77)
mylist.add(31)
print(mylist.search(31))
mylist

Out[21]:

True

[17, 31, 77]

Python's list is actually based on the notion of array list instead of the linked list in this section.

Table of Contents

Basic Data Structure

Stack

Queue

Unordered (Linked)List

Ordered (Linked)List

Product

Resources

Company