B-Trees. [Cormen-Leiserson-Rivest] 1. Search trees designed to minimize IO operations to secondary memory. When database is too large to fit in main memory, some parts will be stored in disk. A single access to disk can be 10^3 to 10^5 times slower than access to memory. (Disks rotate at about 7200 RPM; typical range 5K-15K RPM. One rotation takes 8.33 ms, which is about 5 orders slower than a 100 nano sec access for current silicon memory.) 2. In order to amortize the disk access cost, store and fetch in large chunk, instead of single items. Information is divided into large, equal-sized "pages" that are laid out consecutively within each cylinder. Typical page size: 2^11 to 2^14 bytes (2K-16K). Often, it takes longer to read one page of information than to examine it (compute). Thus, when dealing with disk-bound data structures, we look at two factors separately: a. number of disk accesses, b. the CPU time. 3. B-Tree algorithms operate at the granularity of pages. I.e., the unit operations are to READ or WRITE a page. The main memory can only accomodate only so many pages, so older pages will be flushed out as new ones are fetched. 4. Since we want to optimize the number of page accesses, we will choose the size of the B-Tree node to match the page size. That is, each node will store keys for about 50-2000 items, and will have the similar branching factor. As an example, with a branching factor of 1001 (each node with 1000 keys), 1 billion keys can be accessed by a tree of height 2. Just 2 disk accesses! Figure: 5. Definition of a B-Tree. A B-Tree is a rooted tree with the following properties: 1. Every node x has: a. n(x): the number of keys stored at x b. the n(x) keys themselves sorted, key_1[x] <= key_2[x] <=... c. leaf(x) boolean, which is true if x is a leaf. 2. Each non-leaf node contains n(x)+1 pointers, c_1(x), c_2(x), ... to the children of x. 3. The keys k_i(x) separate the ranges of keys stored in each subtree; suppose k_i is any key stored in the subtree rooted at c_i, then k_1 <= key_1(x) <= k_2 <= key_2(x) <= k_3 .... <= k_n(x)+1. 4. All leaves have the same depth, equal to B-tree's height. 5. There are lower and upper bounds on how many keys a node contains. These bounds are expressed by a parameter t >= 2. a. Every node except root must have at least t-1 keys. Every internal node, therefore, has at least t children; the root, if non-empty, has at least one key. b. Every node has at most 2t-1 keys. Thus, an internal node has at most 2t children. A node is called FULL if it has 2t-1 keys. * The simplest form of B-Tree has t=2, which is a 2-3-4 tree. EXAMPLE: 6. Thm. A B-Tree with n-keys and min deg t >=2 has height H <= log_t (n+1)/2. depth number of nodes 0 1 1 2 2 2t 3 2t^2.... (Number of nodes times keys per node). We get n >= 1 + (t-1) \sum_{i=1}^h (2 t^{i-1}) 7. Searching a B-Tree. The root of B-Tree always in main memory, so no disk-access required there. But if root node is changed, then disk-write must be done. B-TREE-SEARCH (x, k) //search for key k i <- 1 while i <= n(x) and k > key_i(x) do i++ if i <= n(x) and k = key_i(x) then return (x, i) if leaf(x) then return null else DISK-READ (c_i(x)) B-TREE-SEARCH (c_i(x), k) FIG. 1. search for R. 8. INSERTING into a B-Tree. Inserting into a B-Tree is more complex. As in binary search trees, we search for the leaf position where to insert, but we simply can't just add a new node---B-Tree requires that all leaves be at the same level, and each node have between t-1 and 2t-1 keys. We insert k into an existing node. The only problem arises when that node is already full. In this case, we SPLIT the node (with 2t-1 keys) around its t-th key (the median); the median key moves up to parent, and two new children, each with t-1 keys are formed. If the parent was also full, then this splitting step recursively continues up the tree. In order to perform insert in a single pass, on our way down the search path, we split each full node in preparation for the insert. B-Tree-Split-Child (x, i, y) z <-= allocateNode(); leaf(z) <- leaf(y); n(z) <- t-1 for j = 1 to t-1 key_j(z) <- key_j+t(y) if not leaf(y), then for j = 1 to t-1 c_j(z) <- c_j+t(y) n(y) <- t-1; for j = n(x)+1 down to i+1 c_j+1(x) <- c_j(x) c_i+1(x) <- z; for j <- n(x) downto i key_j+1(x) <- key_j (x) key_i(x) <- key_t (y) n(x) <- n(x) + 1 Disk-Write(y); Disk-Write(z); Disk-Write(x); Example. Figure 18.5 9. Description: y is the ith child of x, and is the node being split. Node y originally has 2t children (and 2t-1 keys), but is reduced to t children (and t-1 keys) by this operation. Node z "adopts" the t largest children of y, and z becomes a new child of x, positioned just after y in x's table. The median key of y moves up to become the key in x that separates y and z. 10. B-TREE-INSERT (T, k) r <- root(T) if n(r) = 2t-1 then s <- allocateNode(); root(T) <- s; leaf(s) <- false; n(s) <- 0; c_1(s) <- r; B-TREE-SPLIT-CHILD (s, 1, r) B-TREE-INSERT-NonFull (s, k) else B-TREE-INSERT-NonFull (r, k) Fig. 18.6 11. B-TREE-INSERT-NONFULL (x, k) i <- n(x); if leaf(x) then while i >= 1 and k < key_i (x) do key_i+1 (x) <= key_i (x) i-- key_i+1 (x) <- k n(x) <- n(x) + 1 diskWrite(x) else while i >= 1 and k < key_i (x) do i-- i++; diskRead(c_i (x)) if n(c_i(x)) = 2t-1 then B-TREE-SPLIT-CHILD (x, i, c_i (x)) if k > key_i (x) then i++ B-TREE-INSERT-NONFULL (c_i(x), k) end. 12. Description. First while loop handles the case when x is a leaf. When x is not a leaf, then insert k into appropriate leaf node in the subtree rooted at x. Second while loop determines the child of x to which recursion descends. The if condition checks if that child is a fullNode or not. If full, the B_TREE_SPLIT splits that into two non-full nodes, and the next if determines which of the children to descend to. Example: Figure 18.7 Analysis. The number of disk accesses by B-TREE-INSERT is O(h), since only O(1) disk read or writes between calls to B-TREE-NONFULL. The total CPU time is O(th) = O(t log_t n). 13. DELETING a key from B-Tree. The key may be deleted from any node, not just a leaf. We need to make sure a node doesn't get too small after a deletion. So, if a node has t-1 keys and one of them is deleted, we need to fix it. Suppose we need to delete k from subtree rooted at x. The proc is structured to ensure that when B-TREE-DELETE is called on a node x, the number of keys in x is at least t. This allows us to perform deletion in one pass, without backing up. 1. If k is in leaf-node x, delete k from x. 2. If k is in non-leaf node x, do: a. if the child y that precedes k in node x has t or more keys, then find predecessor k' of k in subtree rooted at y. Recursively delete k', and replace k with k' in x. b. Symmetrically, if child z that follows k in node x has >= t keys, find the successor k' of k in subtree rooted at z. Recursively delete k', and replace k with k' in x. c. Otherwise, if both y and z have only t-1 keys, merge k and all of z into y so that x loses both k and the pointer to z, and y now has 2t-1 keys. Free z and recursively delete k from y. 3. If the key k is not present in node x, determine the root c_i(x) of the appropriate subtree tht contains k. If c_i(x) has only t-1 keys, execute steps 3a or 3b to guarantee we descend to a node with >= t keys. Then finish by recursing on appropriate child of x. a. If c_i(x) has only t-1 keys but has an immediate sibling with t or more keys, give c_i(x) an extra key by moving a key from x down to c_i(x), moving a key from c_i(x)'s immediate left or right sibling up into x, and moving the appropriate child pointer from the sibling into c_i(x). b. If c_i(x) and both of c_i(x)'s immediate siblings have t-1 keys, merge c_i(x) with one sibling, which involves moving a key from x down into the new merged node to become the median key for that node. Figure: 18.8 Since more of the keys in Btree are in the leaves, we expect most delete operations occur at leaves. Like insertion, deletion also has cost O(t log_t n).