Search Trees. 1. Trees useful structure for hierarchical information, searching, etc. With large data, repeated linear search in link list too slow. With appropriate binary search trees, we can do searches, inserts, deletes, find successor, find predecessor etc in O(log n). Hashing is convenient if you just need to do inserts and lookups, but suppose you wanted to do: Lookup-by-prefix: type in first few letters of someone's name and have it output all names in the database that begin with those letters. Find-in-range[key1, key2]: output all keys between key1 and key2 A version of lookup(x) that returns the closest key to x when x is not in the database. Also consider operations like: Output the kth smallest element; assume all keys are distinct. Do a version of lookup(x) that tells us the rank of x (how many keys are smaller than x). 2. A recursive definition of the tree: tree either empty, or it has a root node r, with zero or more non-empty subtrees, whose roots are children of r. Drawn as r at the root, and edges from r to the roots of subtrees T1, T2, .., Tk root k1 k2 .... kl Subtree1 Subtree2 Subtreel 3. No other structure: arbitrary distribution of depths, keys etc. 4. Example. A B M L T C F X Y G 5. A path in tree is the sequence of nodes, or edges. Path length is the number of edges. Paths are unique in a tree. Depth of a node is measured as path length from the root. Parent-child relationshio. Ancestor-descendant relationship. 6. The Directory structure example. the tree organizes the directory structure, making it intuitive to traverse. /usr alex bill charles research teaching junk paper books c1 c2 c2 7. Tree Traversals. 1. Listing of sub-directories ---- Pre-order traversal: print the node, then recursively traverse each subtree. 2. Compute dir sizes --- post-order traversal: compute sizes of all subtrees, then add them to get the size of the dir. 8. Binary Trees. Each node has 0, 1, or 2 children. Most common form of trees: heap; expression trees; branch-and-bound algorithms. 9. Binary SEARCH Trees. Important application of binary trees in searching and implementing a search ADT. For simplicity, we assume that objects are identified by numeric keys (a totally ordered universe suffices). Assume also all keys are distinct. What makes a binary tree a SEARCH tree is the "ordering" property: for every node X, the keys in the left subtree < key(X) the keys in the right subtree > key(X). Example. a search tree; not a search tree; 10 10 4 12 8 15 7 50 20 9 5 30 10. Find operation. if (t == null) return null elseif (x < t->key) return find(x, t->left) elseif (x > t->key) return find(x, t->right) else return t; // match 11. FindMax or FindMin. if (t != null) while (t->right != null) t = t->right return t; 12. Insert. To insert X, mimick find(X). If X found, nothing to do. (may be update some record). Otherwise, insert X at the last spot on the path. if (t == null) t = new node x; // insert x here // elseif (x < t->key) insert (x, t->left) elseif (x > t->key) insert (x, t->right) else; // already exists; do nothing. 13. Delete/Remove. As always, this is the most complex operation. a. if the node X is a leaf, easily removed. b. if node X has only one child, then just bypass it; make the child directly linked to X; c. if node X has two children, general strategy is to replace X with the smallest node in the right subtree of X; and recursively delete that node; the second deletion turns out to be easy; because the second node can't have two children. 14. Examples. I: delete 5 6 6 2 8 2 8 1 4 1 4 3 5 3 II. delete 4 6 6 2 8 2 8 1 4 1 3 3 III. delete 2; swap it with min in right subtree (3); then delete 3 6 6 -> 2 8 3 8 1 5 1 5 3 4 4 15. Average case analysis. In the worst-case, a binary search tree on n nodes can have height n-1. Thus, every insert, delete, find can cost O(n) time, the same as the linked list. Fortunately, there are ways to keep the tree roughly balanced so that height remains O(log n). Given a set of keys, it's easy to build a perfect balanced search tree: use the median division rule. Later we will see how to achieve this dynamically. If we start from an empty tree and do a series of insertions, what does the tree look like. Of course, the worst-case can be a single path: linear depth. A simple analysis shows that if insertions are done randomly (assume keys drawn from [1,n]), then average depth is log n. More precisely, the sum of the depths of all the nodes is O(n log n). Let D(n) be the internal path length. If i nodes in the left, and (n-i-1) on the right, then D(n) = D(i) + D(n-i-1) + (n-1). Under random insertions, the value of i varies uniformly between 0 and n-1, and so D(n) = 2/n \sum_{j=0}^{n-1} D(j) + (n-1) This is a famous recurrence, which solves to D(n) = O(n log n). 16. Picture of a random tree. ======================================================================== AVL Trees. 1. AVL named after its inventors: Adelson-Velskii and Landis. It is a BST with a balance condition, which ensures that for every sequence of inserts and deletes, an AVL tree on n nodes has height O(log n). 2. Simply requiring that the left and right children of the root have the same height does not work. They can each have height n/2, singly branching paths. O O O O O O O Requiring that every node's children shall have exactly the same height is too restrictive; achieved only if the tree is perfectly balanced with 2^k -1 nodes. The AVL tree tries the next best thing: for any node, the heights of left and right children can differ by at most 1. [[ That is, at each node ensure that the max path length along left and right braches differs by at most 1. If you want to use "subtree heights" then it helps to define the height of empty tree as -1. ]] Examples. AVL not an AVL tree 5 7 2 8 2 8 1 4 7 1 4 3 3 5 3. What can be the most lop-sided AVL tree? Let S(h) be the min number of nodes in h-high AVL tree. S(h) = S(h-1) + S(h-2) + 1 Like Fibonacci numbers. h = 1.44 log (n+2). Example of such a tree. 4. Of course, given a set, one can always build a balanced tree of O(log n) height. The problem is insertions and deletions get the tree out of balance. AVL trees do simple operations on the tree to regain balance. These operations, called ROTATIONS, involve only changing a couple of pointers. 5. We will only discuss insertions; deletions are similar though a bit more messy. How can an insertion violate AVL condition? At some node x, before the height diff between children was 1, but the insertion may make that difference 2. We will fix this violation from bottom (leaf) up. 6. Suppose A is the first node from bottom that needs rebalancing. Then, A must have two subtrees whose heights differ by 2. This imbalance must have been caused by two cases: (1) The new insertion occurred into the left subtree of the left child of A; or the right subtree of the right child of A. (2) The new insertion occurred into the right subtree of the left child of A; or the left subtree of the right child of A. We can see that (1) and (2) are different in that the former is an outside (left-left, or right-right) violation, while the latter is an inside (left-right) or (right-left) violation. We fix (1) with a SINGLE ROTATION; and (2) with a DOUBLE ROTATION. 7. SINGLE ROTATION. Consider the before and after pictures. Node k2 is in violation because its left child is 2 levels deeper than its right child; node k1 is *okay*. This is the only possible type (1) situation that can cause k2 to be in violation *without* k1 being in violation. k2 k1 k1 k2 -- TreeZ -- TreeY TreeX TreeY TreeZ -- TreeX * Rotation at k1 = 7, k2 = 8. 5 5 2 8 2 7 1 4 7 1 4 6 8 3 6 3 New TreeX must be at the same level as TreeY *before*, and the new insertion must increase its level by 1. The ``single rotation'' simply tilts the balance to the right, making k1 the new root. Note that "key order is preserved." Now, X, Y, Z are all at the same level, so both k1 and k2 are balanced. Since the new height of this subtree is exactly the same as the old height, we are finished: *no further rotations up the tree are needed*. 8. Example Insert 3, 2, 1, 4, 5, 6, 7. 3 2 2 2 2 -> fix 1 3 -> 1 3 -> fix 1 4 -> 1 4 3 5 5 2 4 4 1 4 -> fix 2 5 -> 2 5 3 5 1 3 6 1 3 6 6 7 4 -> fix 2 6 1 3 5 7 9. DOUBLE ROTATION If however it's the Y tree that is too deep, then single rotation doesn't fix it. k2 k1 k1 k2 TreeZ TreeX TreeX TreeZ TreeY TreeY 10. We need to look one level deeper inside Y; say, the root of this subtree is k2, and the two subtrees are B and C. We perform a double rotation. k3 k2 k1 k1 k3 k2 D B C A A D B C [[ One of B or C is at level A+1, but don't know which one. To make sure other being shallow doesn't cause problems, we make both at level below A.]] 11. Neither k1 nor k3 can be the roots; k1 as root is what single rotation does, which doesn't work; and k3 is already root and we have imbalance. So, the solution is to make k2 the root and redistribute the subtrees, preserving order. Example. To the old example, insert 16, 15, ..., 10, 8, 9. 4 4 2 6 2 6 1 3 5 7 -> fix 1 3 5 15 16 7 16 15 next insert 14, and look at the part of the tree rooted at 6, which has AVL violation. 6 (k1) 7 5 15 (k3) -> fix 6 15 insert 13 7 (k2) 16 5 14 16 14 13 -> fix by single rotation at 7: 7 4 15 2 6 14 16 and so on... 1 3 5 13