Binary Search Trees

· Read 03/05/2021

Binary Trees

A binary tree is a tree, which is a connected graph with no cycles, of binary nodes, which are linked node containers with these fields:

Pointer to item stored at node
Pointer to parent node (possibly None)
Pointer to left child node (possibly None)
Pointer to right child node (possibly None)

Technically, a binary node can be connected to three other nodes (parent, left child, right child), but we call it "binary" because it can have two children.

Why binary trees?

Linked-list nodes can be $O(n)$ pointer hops away from the list head, taking $O(n)$ time to reach them. It's possible to construct binary trees on $n$ nodes where no node is more than $O(\log n)$ hops from the root, i.e. there exist binary trees with logarithmic height.

We leverage the power of a binary tree structure when we keep the height $h$ of the tree low, i.e. $O(\log n)$ , and only perform operations run in time on the order of the height of the tree, i.e. $O(h) = O(\log n) < O(n)$ .

Terminology

Root: Only node lacking a parent; all other nodes can reach root by traversing parent pointers

Ancestors for node <X>: Set of nodes passed when traversing parent pointers from <X> back to the root

Leaf: Node with no children

Depth of a node <X>: Minimum number of edges from <X> to the root. Depth of root is 0

Height of a node <X>: Number of edges on the longest path from <X> to a leaf in <X>'s subtrees. Like depth but with respect to a leaf instead of root. Height of a leaf is 0

Height of a tree <X>: Height of the tree's root node

Traversal Order

Implicit characterization: every node in the left subtree of <X> comes before <X> in the traversal order. Right, after.

# O(n)
def subtree_iter(A): # given a binary node <A>
  # recursively list nodes in <A>'s left subtree
  if A.left: yield from A.left.subtree_iter()
  # then <A> itself
  yield A
  # then recursively list nodes in <A>'s right subtree
  if A.right: yield from A.right.subtree_iter()

Currently, we have no semantic connection between items stored in the tree and the tree's traversal order. Next time, we'll provide two different semantic meanings, one for an efficient implementation of the Sequence interface, the other for the Set interface. For now, we just care about preserving the traversal order as we manipulate the tree.

Tree Navigation

Find first/last node: Walk left if left child exists, $O(h)$ because each step of recursion moves down the tree.

Find successor/predecessor: Successor (next in traversal order) of <A>: if <A> has right child, then successor is smallest node in right child's subtree. Otherwise, successor can't exist in <A>'s subtree, so find lowest ancestor of <A> such that <A> is in the ancestor's left subtree. Algorithm only walks up or down tree, so $O(h)$ .

Dynamic Operations

Insert: To insert <B> before <A>: if <A> doesn't have left child, add <B> as the left child of <A>; otherwise, add <B> as right child of largest node in <A>'s left subtree (which cannot have a right child, because largest). Algorithm walks down tree at each step, so $O(h)$ .

Delete: If node is leaf, clear the child pointer from the node's parent and return the node. Else, swap the node's item with the item in the node's successor/predecessor down the tree until the item is in a leaf, which can be removed.

def subtree_delete(A):  # O(h)
  if A.left or A.right:
    if A.left: B = A.predecessor()
    else: B = A.successor()
    A.item, B.item = B.item, A.item
    return B.subtree_delete()
  if A.parent: # A is leaf
    if A.parent.left is A: A.parent.left = None
    else: A.parent.right = None
    # A.parent.maintain() # R07
  return A

Binary Node Implementation

Top-Level Data Structure

SUCH a concise answer here