A 2-3 Tree provides the near logarithmic search efficiency of a complete Binary Search Tree (BST) by guaranteeing that the tree will always remain balanced. Unlike a BST, a 2-3 Tree remains balanced even if clients insert data in pre-sorted order.

Before reading the code for the 2-3 Insertion Algorithm, review some key concepts about 2-3 Trees to better understand the algorithm:

- Data is always inserted at a leaf (node with no children)
- Every internal node (a node with children) must be either a ‘2-node’ or a ‘3-node’ by design. This means an internal node can have either 1 data item and 2 children, or 2 data items and 3 children
- Inserting into a full node (node with 2 data items) splits the node and pushes the middle data item up, splitting and pushing recursively until a ‘2-node’ is encountered
- The tree grows in height when a recursive call splits the root

Each 2-3 Tree node contains two Data object references and three child references. A 2-node uses only the first Data object reference and the left and right child references, with the second Data reference and middle child reference set to null. Also, every node has a parent node reference that must be updated after a split.

This algorithm is written for a generic “Data” class with a string ID field and assumes a compare function exists that returns the difference between two string values using Java’s compareTo() function. If the two Data objects match, the compare function returns a value of ‘zero’. This algorithm passes data items of equal value to the right. Since the compare function subtracts the value of the second (argument object) from the first (this object), a negative return value indicates that the first value is smaller than the second.

To insert into a 2-3 Tree, a recursive function must first locate the correct position at a leaf to insert the new item. Don’t forget to use a wrapper function to check if root is null!

/**Recursive function to insert a new Data item at the correct leaf*/ protected void insert(Data item) { int diff = dataOne.compare(item); //Node is a leaf if ((left == null) && (mid == null) && (right == null)) { if (this.dataTwo == null) { //Leaf has only one data item if (diff <= 0) { //New item is larger than dataOne this.dataTwo = item; } else { //New item is smaller than dataOne this.dataTwo = dataOne; this.dataOne = item; } } else { //Leaf has two items and must be split splitLeaf(item); if (parent != null) { this.parent.pushUp(this); } } return; } //Not a leaf, continue traversal if (diff > 0) { //New item is smaller than dataOne and the smallest left.insert(item); } //New item is larger or equal to dataOne and this node else if (dataTwo == null) { //is a 2-node right.insert(item); } else { //This node is a 3-node diff = dataTwo.compare(item); if (diff > 0) { //New item is smaller than dataTwo mid.insert(item); } else { //New item is larger than dataTwo and the largest right.insert(item); } } }

This insert function compares the new data item (as an argument) and recursively traverses the tree until a leaf is encountered. If the leaf has one item, the new item is added in the appropriate position, with the smallest data item always in dataOne.

The real action happens when the function attempts to insert at a leaf containing two data items already. The algorithm must split the node by passing the new data item as an argument to the splitLeaf() function.

/**Function splits a full leaf creating a 2-node */ private void splitLeaf(Data item) { int diff = dataOne.compare(item); if (diff > 0) { //New item is smaller than dataOne and smallest left = new Node23(item, this); right = new Node23(dataTwo, this); dataTwo = null; } else { //New item is larger than dataOne diff = dataTwo.compare(item); if (diff > 0) { //New item is smaller than dataTwo left = new Node23(dataOne, this); right = new Node23(dataTwo, this); dataTwo = null; dataOne = item; } else { //New item is larger than dataTwo and largest left = new Node23(dataOne, this); right = new Node23(item, this); dataOne = dataTwo; dataTwo = null; } } }

A simple constructor builds new leaf nodes:

/** Constructor builds a leaf node with one data item */ Node23(Data toAdd, Node23 prev) { parent = prev; dataOne = toAdd; }

During a leaf split, the node must manage three data items, but only has memory for two. A comparison determines the middle data item to inherit the node. The function then creates two new nodes by passing the appropriate data items (smaller to the left, larger to the right) and a ‘this’ reference via a simple constructor. The function transforms the node into a 2-node, with two children and the middle item as the data field.

Next, program control returns to the insert function. If the node has a parent, the insert function calls the parent’s recursive pushUp function, passing itself as an argument via the ‘this’ reference. pushUp then attempts to absorb its child 2-node.

/**Function to push up the middle item after splitting a 3-node or leaf * @param item is a '2-node' * @post If this node is a 2-node, the Node23 object is inserted and connected * creating a 3-node * If this node is a 3-node, it must be split and pushed up again */ protected void pushUp(Node23 item) { if (this.dataTwo == null) { //This node is a 2-node int diff = dataOne.compare(item.dataOne); if (diff > 0) { //Node item.dataOne to add is smaller than this.dataOne this.dataTwo = dataOne; this.dataOne = item.dataOne; left = item.left; left.parent = this; mid = item.right; mid.parent = this; } else { //Node item.dataOne to add is larger or equal than this.dataOne this.dataTwo = item.dataOne; mid = item.left; mid.parent = this; right = item.right; right.parent = this; } } else { //This node is a 3-node and must be split splitThreeNode(item); if (parent != null) { parent.pushUp(this); } } }

If the parent is a 2-node, the function inserts the new data item and reconnects the children appropriately. However, if the parent is a 3-node, it too must be split and pushed up.

/**Recursive function to split an internal 3-node creating a 2-node to be pushed * up to the node's parent * @param item is a 2-node to be inserted, reconnected and pushed up the tree * @post Node is a 2-node with all data connected to left and right children */ private void splitThreeNode(Node23 item) { int diff = dataOne.compare(item.dataOne); Node23 temp = null; if (diff > 0) { //Item toAdd is smaller than dataOne left = item; left.parent = this; temp = new Node23(dataTwo, this); dataTwo = null; temp.left = mid; temp.left.parent = temp; temp.right = right; temp.right.parent = temp; right = temp; mid = null; } else { diff = dataTwo.compare(item.dataOne); if (diff > 0) { //Item to add is between dataOne and dataTwo temp = new Node23(dataOne, this); temp.left = left; temp.left.parent = temp; temp.right = item.left; temp.right.parent = temp; left = temp; temp = new Node23(dataTwo, this); temp.left = item.right; temp.left.parent = temp; temp.right = right; temp.right.parent = temp; right = temp; dataOne = item.dataOne; dataTwo = null; mid = null; } else { //Item to add is larger than dataTwo right = item; right.parent = this; temp = new Node23(dataOne, this); temp.left = left; temp.left.parent = temp; temp.right = mid; temp.right.parent = temp; left = temp; dataOne = dataTwo; dataTwo = null; mid = null; } } }

The splitThreeNode function creates a 2-node, being careful not to lose or misplace the four children. Program control returns to the pushUp function, which executes recursively until it encounters either a 2-node or the root.

The graphics below provide a visual demonstration of this algorithm:

**Algorithm Review**

How can this algorithm be improved upon?

By changing the compare() function to compare an int rather than a string, the cost of comparisons can be reduced.

What are the disadvantages of a 2-3 Tree design?

Maintaining balance of a search tree improves performance during search operations. However, maintaining this balance comes with an occasionally large performance cost during a recursive split. If the recursive calls split all the way to the root (increasing the tree height) on a large data set, clients will notice a delay after certain insertion operations. Therefore, if a client performs frequent insertion operations, a Red-black Tree would provide better performance.