I watched this on Saturday night, didn't go to this lecture or the last 2 recitations live. Here's basically an incomplete copy of the slides' content:
 Hashing: lets us implement sets more efficiently and search in constant time
 Application of sorting: search for things more efficiently

Direct access array
 (Unsigned) integers to index into an array
 Implement a set with possible elements are 0...N1 with an array of size N
 Ideally size of our set should scale like number of elements in set, not size of universe the elements come from
 Hash function maps elements in universe U to numbers from 0...m1
 Initialize: allocate an array T (hash table) of size m: O(m) time

Assuming h(x) is computable in O(1) time
 Insert(x): put x in
T[h(x)]
: O(1) time
 Find(x): check if x in
T[h(x)]
: O(1) time
 Delete(x): set position
T[h(x)]
to None: O(1) time
 Problem: hash collision; what if two x's have the same h(x)?
 By pigeonhole, any hash function will have collisions
 Simple uniform hashing function (SUHA): Each key x maps to a uniformly random h(x) in 0...m1, independent of any other keys

Load factor: $\alpha = n/m$, n = number of elements already in hash table, m = table size
 If =1, then every cell is full and if we try to add something, we'll have a collision

Even if $\alpha$ is small and SUHA holds, we should expect collisions
 Birthday paradox: $m=365, n=23$, 50% chance of collision

Dealing with collisions: chaining

Choosing a hash function
 Very simple division hash (don't do this!): $h(x) = x \mod 2^k$
 Returns last $k$ bits in binary representation
 All $x$'s with the same last $k$ bits will collide
 Prime division hash (often good enough in practice): Pick prime $p$ larger than table size: $h(x) = (x \mod p) \mod m$ ($\mod m$ so we always get something between $0$ and $m1$)
 Heuristic: In real data, patterns are unlikely to be correlated with $p$
 Any fixed simple hash function can be attacked
 6.046 for provably good hash function using randomness

Dealing with collisions: open addressing
 If
T[h(x)]
is full, probe again aat different location until you find an empty spot
 Probe sequence: $h(x,i)$
 Hmm review this
 Deletions
 Can't just make the cell empty, will mess up the probe sequence for values further along in probe sequence
 Put a "deleted" flag
 Can reuse a deleted cell to insert new item
 Linear probing
 Probing sequence: $h(x,i) = h(x) + i \mod m$
 Keep hopping by 1, loop around because of $\mod m$
 Pros: simple and efficient (looking at nearby elements is faster than far)
 Cons: very nonrandom! will build clusters of spaces, causing hash table to slow down and increasing load factor
 Double hashing
 $h(x,i) = h_1(x) + i \cdot h_2(x) \mod m$
 Hop by multiples of $h_2$ (which must never return 0)
 Pros: more random than linear probing

Table doubling and amortized complexity
 What do you do when table fills up?
 Chaining: chains get longer, but table still works
 Open addressing: once you fill up, you're stuck
 Solution: when $\alpha$ too big, double the table
 Just like dynamic arrays (R03)
 Allocate new table of size $2m$, reinsert all elements
 In the long run, we have constant time, but individual operations might be slow

Worst case vs average case vs amortized
 Worst: for a given hash function and input data, an insert/find/delete can take $O(n)$ time in the worst case
 Average: for random data, an insert/find/delete takes $O(1 + \alpha)$ time on average
 Amortized: for a sequene of many hash table operations, the average time per operation is $O(n)$, but individual operations may take $O(n)$ time

Applications of hashing
 Efficient data structures (R04)
 Cryptography: message authentication, checksums
 Algorithms for searching (pset 2)