index.html

<!DOCTYPE html>
<html>
<head>
<title>Tufts COMP 150-08 Final Project</title>

<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://code.jquery.com/mobile/1.4.5/jquery.mobile-1.4.5.min.css">
<script src="https://code.jquery.com/jquery-1.11.3.min.js"></script>
<script src="https://code.jquery.com/mobile/1.4.5/jquery.mobile-1.4.5.min.js"></script>
<!-- Latest compiled and minified CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">

<!-- Latest compiled and minified JavaScript -->
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
<style>
#case-table { border: none; border-collapse: collapse; }
#case-table td { border-left: 1px solid #000; border-bottom: none; border-top: none}
#case-table td:first-child { border-left: none; }
#case-table th { border-left: 1px solid #000;  border-bottom: none; border-top: none}
#case-table th:first-child { border-left: none;}
.borderless td, .borderless th {
    border: none;
}
body{
	font-size: 16px;
}
</style>
</head>

<body>
<div class="container" style="text-align: center">
	<div class="row">
		<h1>Parallel Algorithm For Red-Black Trees</h1>
		<h3>Zhaokun Xue</h3>
		<h4><a href="mailto:Zhaokun.Xue@tufts.edu">Zhaokun.Xue@tufts.edu</a></h4>
	</div>
</div>
<div class="container">
	<div class="row">
		<h2>Introduction</h2>
		<p>
			In our previous study, we have done a lot of detailed analysis of basic operations for the red-black trees based on the traditional sequential algorithm. In this project, I am going to use parallel algorithm to implement red-black tree's <i>search</i> and <i>insertion</i> operations. I will implement these two operations by both lock-based and lock-free synchronization in parallel algorithm, and then compare the performance of these two methods. 
		</p>
		<p>
			The red-black tree is a balanced binary search tree with height O(log <i>n</i>), and efficient search, insertion, and deletion operations, which makes it a better choice than regular binary search in search-intensive applications. And it only requires few rotations to rebalance the tree and keep it red-black properties.
		</p>
		<p>
			As we known, it takes O(log <i>n</i>) for red-black tree's search and insertion operations by sequential algorithms. As for using parallel algorithms, the search and insertion time would be O(log <i>n</i> + log<i>k</i>), where <i>k</i> is the number threads/concurrents. Although they both have the upper bound O(log <i>n</i>), and parallel algorithm even have another add-on term <i>log k</i>, parallel algorithms would provide a smaller coefficient for the big-O notation. And in my project, I will do experiments to compare the performance on lock-based and lock-free parallel algorithms, and I will claim that lock-free synchronization has a better performance than lock-based. 
		</p>
	</div>
</div>
<div class="container">
	<div class="row">
		<h2>Parallel Algorithm</h2>
		<p>
			<i>A <a href="https://en.wikipedia.org/wiki/Parallel_algorithm">parallel algorithm</a>, as opposed to a traditional sequential algorithm, is an algorithm which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.</i>
		</p>
		<p>
			People usually build cost-effective parallel algorithms by defining multiple operations on each step to improve the performance of traditional sequential algorithm, which specifies a sequence of operations and each step does one operation.
		</p>
		<h3>Lock-Based Algorithm</h3>
		<p>
			<i>A lock or mutex (from mutual exclusion) is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution.</i>
		</p>
		<p>
			Lock-based parallel algorithms are usually implemented on the parallel random-access machine (PRAM), which is a shared-memory model of parallel computation that consists of a collection of identical processors and a shared memory. There are four types of PRAM.
			<ul>
  				<li><b>EREW (exclusive read exclusive write)</b></li>
  				Concurrents are not allowed to read and write to the same memory location at the same time.
  				<li><b>CREW (concurrent read exclusive write)</b></li>
  				Concurrents can only read a memory location.
  				<li><b>CRCW (concurrent read concurrent write)</b></li>
  				Concurrents allows both concurrent reads and concurrent writes to a memory location
  				<li><b>ERCW (Exclusive read concurrent write)</b></li>
  				This case is usually not considered.
			</ul>
		</p>
	</div>
	<div class="row">
		<table class="table borderless">
			<thead>
				<tr>
					<th>Advantages</th>
					<th>Disadvantages</th>
				</tr>
			</thead>
			</tbody>
				<tr>
					<td>
						<ul>
							<li>
								Simple to use
							</li>
							<li>
								Ubiquitous
							</li>
							<li>
								Easily to amenable to performance analysis
							</li>
						</ul>
					</td>
					<td>
						<ul>
							<li>
								Potential for deadlock
							</li>
							<li>
								Unavoidable overhead at each synchronization point, which usually can be a system call
							</li>
							<li>
								Possible scheduling anomalies (such as priority inversion), where high-priority processes are delayed waiting for a lock held by a low-priority process
							</li>
							<li>
								Convoying, where a delayed process holding a lock effectively blocks all other processes waiting for the lock
							</li>
						</ul>
					</td>
				</tr>
			</tbody>
		</table>
 	</div>
 	<div class="row">
		<h3>Lock-Free Algorithm</h3>
		<p>
			Opposite to lock-based synchronization, lock-free synchronization is algorithm that implements concurrents' operations without using "lock" or "mutex". 
		</p>
		<table class="table borderless">
			<thead>
				<tr>
					<th>Advantage</th>
					<th>Disadvantage</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>
						<ul>
							<li>
								Lock-free algorithms avoid problems in lock-based algorithms.
							</li>
						</ul>
					</td>
					<td>
						<ul>
							<li>
								Lock-free algorithms are usually hard to implement, especially for complicate data structure.
							</li>
							<li> 
								It is usually harder to analyze the time complexity for Lock-free algorithms than Lock-based algorithms.
							</li>
						</ul>
					</td>
				</tr>
			</tbody>
		</table>
		<h4>CAS (Compare and Swap)</h4>
		<p>
			In this project, I use <b><i>CAS (Compare And Swap)</i></b> synchronization primitive to implement my lock-free algorithm, which is supported by my most modern computer architectures. <br />
			<i>Rather than acquiring a lock, a process makes a copy of the data it wishes to modify, changes the copy and then replaces the original data with the copy only if the original data is unchanged. If there is no contention, the update is made with no appreciable overhead. If there is contention, all but one concurrent process accessing the shared data must “roll-back” and redo its computation using the updated data.</i> Most modern programming languages support <i>CAS</i> as an atomic operation. For example, in this project, I am going to use JAVA's <i>AtomicBoolean</i> class to realize the CAS strategy. We can also use other <i>Atomic</i> classes to implement <i>CAS</i> operation.
			The following piece of pesudocode just depicts how CAS function works.
			<pre>
CAS(shVble,savedValue,newValue)
Begin Atomic
if (shVble==savedValue) {
    shVble=newValue;
    return True;
} else {
    return False; 
}
End Atomic

do  {
    savedValue=shVble;
    newValue=computeNewValue(savedValue);
}while (!CAS(shVble,savedValue,newValue));
			</pre>

		</p>
	</div>
</div>
<div class="container">
	<div class="row">
		<h2>Red-Black Trees</h2>
		<p>
			<img src="https://upload.wikimedia.org/wikipedia/commons/6/66/Red-black_tree_example.svg" alt="red-black tree" width="40%" height="200" align="right">
			<ul>
				<li>
					Nodes are colored red or black
				</li>
				<li>
					Root is always black
				</li>
				<li>
					Add black "dummy" leaves so every "real" node has 2 children
				</li>
				<li>
					Every red node has a black parent
				</li>
				<li>
					For any node x: all paths down to leaves contain equal number of black nodes
				</li>
			</ul>
		</p>
	</div>
	<div class="row">
		<h3>Operations</h3>
		<p>
			For this project, I just focus on <i>search</i> and <i>insertion</i> operations on red-black tree.
		</p>
		<h4>Search</h4>
		<p>
			Just like any regular binary search tree, we use recursion on search operation. 
		</p>
		<div class"row">
			<h4>Insertion</h4>
			<p>
				Before working on the details of the <i>insertion</i> operation, we need to introduce an important helper operation, <i><b>rotation</i></b>, for red-black tree.
			</p>
			<table class="table">
				<tbody>
					<tr>
						<td>
							<pre>
Left_Rotate(T,x) {
    y = right[x];
    right[x] = left[y];
    p[left[y]] = x;
    p[y] = p[x];
    if (p[x] == p[root]) 
        root[T] = y;
    else if (x == left[p[x]])
        left[p[x]] = y;
    else 
        right[p[x]] = y;
    left[y] = x;
    p[x] = y;
}
							</pre>
						</td>
						<td>
							<img src="https://upload.wikimedia.org/wikipedia/commons/3/31/Tree_rotation_animation_250x250.gif" alt="rotation" width="300" height="300" align="middle">
						</td>
						<td> 
							<pre>
Right_Rotate(T,y) {
    x = left[y];
    left[y] = right[x];
    p[right[x]] = y;
    p[x] = p[y];
    if (p[y] == p[root]) 
        root[T] = x;
    else if (y == left[p[y]])
        left[p[y]] = x;
    else 
        right[p[y]] = x;
    right[x] = y;
    p[y] = x;
}
							</pre>
						</td>
					</tr>
				</tbody>
			</table>
			<p>
				When we insert a new node to the red-black tree, the first step is to physically insert the node to the given red-black tree, just as we do for regular binary search tree, and color the node to red. Then we need to fix upward from the inserted node to the root to maintain the red-black tree's properties. In general, we have three cases for insertion. Suppose we insert node <i><b>N</b></i> to the red-black tree, the following table explains each of the three cases.
				<table class="table" id="case-table">
					<thead>
						<tr>
							<th>
								Case 1: The uncle of inserted node is <font color="red">red</font>.
							</th>
							<th>
								Case 2: The inserted node is right child of its parent and its uncle is black.
							</th>
							<th>
								Case 3: The inserted node is left child of its parent and its uncle is black.
							</th>
						</tr>
					</thead>
					<tbody>
						<tr>
							<td>
								<img src="https://upload.wikimedia.org/wikipedia/commons/d/d6/Red-black_tree_insert_case_3.svg" alt="case 1" width="400" height="200" align="middle">
							</td>
							<td>
								<img src="https://upload.wikimedia.org/wikipedia/commons/8/89/Red-black_tree_insert_case_4.svg" alt="case 2" width="400" height="200" align="middle">
								
							</td>
							<td>
								<img src="https://upload.wikimedia.org/wikipedia/commons/d/dc/Red-black_tree_insert_case_5.svg" alt="case 3" width="400" height="200" align="middle">
							</td>
						</tr>
						<tr>
							<td>
								For case 1, we just need to recolor the inserted node, its parent, uncle and its grandparent. Then we continue to fix upward and reuse the grandparent node as the "inserted" node <i>N</i>.
							</td>
							<td>
								We can to a left rotation on the inserted node's parent to turn it to case 3. And switch label for the parent and inserted node.
							</td>
							<td>
								In case 3, we first do a right rotation on the inserted node's parent, then recolor parent and grandparent. Then we move up and fix upward. In this case, we do not reuse the grandparent node.
							</td>
						</tr>
					</tbody>
				</table>
			</p>
		</div>
	</div>
</div>
<div class="container">
	<div class="row">
		<h2>Implementation</h2>
		<p>
			I programmed in JAVA for this project. I used JAVA's "<i>Thread</i> to simulate concurrents' operations. And I used "<i>ReentrantLock</i>" lock class to implement EREW lock-based synchronization. For lock-free synchronization, I implemented the CAS process by using "<i>AtomicBoolean</i>" class, which provides the convenience for doing the CAS atomic operation. My lock-free implementation is based on the idea of Ma's Algorithm and pseudocode provided in Kim's paper. Finally, I modified the code of treeGUI from GitHub to realize a small JFrame visualization for displaying the red-black tree built by my algorithm. As the code limits, the visualization only works and layouts well for small data set, like trees with less than 20 nodes.
		</p>
	</div>
	<div class="row">
		<h3>Ma's Algorithm</h3>
		<p>
			Ma's algorithm is built upon the basic sequential insertion algorithm with the addition of a "local area" concept and the use of lock-free primitives to control concurrency. And Ma also adds an extra <i>flag</i> field to each node to indicate whether a process gains control of the node, which is used for CAS operation check.
		</P>
		<h4>Local Area</h4>
		<p>
			<img src="img/local_area.png" alt="local area" width="45%" height="200" align="right">
			The local area consists of the set of nodes that a process must have full control of
			to ensure successful completion of an operation. As we see, the local area for insertion is just made of nodes involved in the rotation fix-up operation. With the local area concept, if several concurrents in a localized region have overlapping areas, only the one that obtains all the flags in its local area will be able to proceed. Other concurrents will have to re-attempt to gain control of their local areas. As one insertion concurrent completes its processing in one part of the tree, it may either finish entirely (and release all its flags) or move up the tree. In the later case, the concurrent first obtains the flags of nodes in its new local area, moves up, and then releases the flags that it set for the nodes in its old local area. Releasing flags allows other processes that may have been waiting to advance. The figure on the right shows the "local area" defined by Ma. The following slides demonstrate how local area works for insertion case 1.
			<div id="local-arae-demo" class="carousel slide" data-ride="carousel" style="width:50%; height:300">
				<!-- Wrapper for slides -->
				<div class="carousel-inner">
					<div class="item active">
						<img src="img/local_area_demo/Slide1.gif" alt="Step 1" style="width:100%;">
					</div>

					<div class="item">
						<img src="img/local_area_demo/Slide2.gif" alt="Step 2" style="width:100%;">
					</div>

					<div class="item">
						<img src="img/local_area_demo/Slide3.gif" alt="Step 3" style="width:100%;">
					</div>
					<div class="item">
						<img src="img/local_area_demo/Slide4.gif" alt="Step 4" style="width:100%;">
					</div>
				</div>

				<!-- Left and right controls -->
				<a class="left carousel-control" href="#local-arae-demo" data-slide="prev">
					<span class="glyphicon glyphicon-chevron-left"></span>
					<span class="sr-only">Previous</span>
				</a>
				<a class="right carousel-control" href="#local-arae-demo" data-slide="next">
					<span class="glyphicon glyphicon-chevron-right"></span>
					<span class="sr-only">Next</span>
				</a>
			</div>
			<!--<img src="img/local_area_demo.gif" alt="local area animation" width="60%" height="450" align="middle">-->
		</p>
		<br />
		<p>
			<img src="img/insertion_pseudo.png" alt="insertion" width="45%" height="550" align="right">
			With the idea of Ma's Algorithm, I add a <i>flag</i> field to my "LockFreeRBNode" object, which indicates whether the current node is occupied by any concurrent. And then I added <i>CAS</i> and <i>SetupLocalAreaForInsert</i> functions to the regular sequential insertion operation as Kim's pseudocode did. The pseudocode on the right shows the shows the necessary changes for <i>insertion</i> function, and the red text indicates the differences from sequential algorithm.
			<img src="img/object.png" alt="insertion" width="55%" height="300" align="left">
		</p>
	</div>
</div>
<div class="container">
	<div class="row">
		<h2>Experiment Results</h2>
		<h3>Sample Output for visualization</h3>
		I modified the TreeGUI.java code from GitHub to visualize red-black trees I got from my experiments. Due to the visualization code limits, it can only layout and display well for small trees. In my project, I applied the code to display trees with less than 20 nodes. I generated 20 random nodes from 0 to 1000. Then I ran lock-based insertion and lock-free insertion on the same dataset. As we known, different insertion order could result in different red-black trees. Here are one sample output from my experiment. Each figure demonstrates the red-black tree built by their own algorithm and concurrents flow. You can click to enlarge the figures. <br />
		<table class="table borderless">
			<tbody>
				<tr>
					<td>
						<b>Lock-Based Algorithm Output</b>
					</td>
					<td>
						<b>Lock-Free Algorithm Output</b>
					</td>
				</tr>
				<tr>
					<td>
						<a href="#locked-tree" data-rel="popup" data-position-to="window">
    					<img src="img/lock_based_tree_drawer.png" alt="lock based tree demo" style="width:50%"></a>

					    <div data-role="popup" id="locked-tree">
					      	<a href="#pageone" data-rel="back" class="ui-btn ui-corner-all ui-shadow ui-btn-a ui-icon-delete ui-btn-icon-notext ui-btn-right">Close</a><img src="img/locked_tree.jpg" style="width:100%;height:100%;" alt="lock based tree demo">
					    </div>
					</td>
					<td>
						<a href="#lock-free-tree" data-rel="popup" data-position-to="window">
    					<img src="img/lock_based_tree_drawer.png" alt="lock free tree demo" style="width:50%"></a>

					    <div data-role="popup" id="lock-free-tree">
					      	<a href="#pageone" data-rel="back" class="ui-btn ui-corner-all ui-shadow ui-btn-a ui-icon-delete ui-btn-icon-notext ui-btn-right">Close</a><img src="img/lockfree_tree.jpg" style="width:100%;height:100%;" alt="lock free tree demo">
					    </div>
					</td>
				</tr>
			</tbody>
		</table>
		
		<br />
		Then I did experiments for comparing the performance of these two kinds of synchronizations. For every experiment, I ran my program 10 times. I collected results from eclipse console output, and imported the data to excel. Finally I calculated the averages and plotted the line charts in excel. 
		<h3>Insert 1000 nodes to an empty red-black tree</h3>
		In this experiment I totally inserted 1000 random nodes to an empty red-black tree by using different number of threads. For example, when the number of threads is 10, it means each thread inserts 100 nodes to the tree.<br />
		<h4>Results for Lock-Based Synchronization</h4>
		<table class="table table-sm">
			<thead>
				<tr>
					<th>
						Threads #
					</th>
					<th colspan="10">
						Time Cost (ms)
					</th>
					<th>
						Average
					</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>1</td> 
					<td>21</td><td>18</td><td>21</td><td>21</td><td>22</td><td>22</td><td>21</td><td>23</td>	<td>19</td><td>21</td><td>20.9</td>
				</tr>
				<tr>
					<td>2</td>
					<td>26</td><td>25</td><td>26</td><td>26</td><td>27</td><td>26</td><td>26</td><td>28</td><td>27</td><td>29</td><td>26.6</td>
				</tr>
				<tr>
					<td>4</td>
					<td>28</td><td>28</td><td>34</td><td>26</td><td>28</td><td>30</td><td>30</td><td>33</td><td>33</td><td>26</td><td>29.6</td>
				</tr>
				<tr>
					<td>5</td>
					<td>32</td><td>30</td><td>28</td><td>27</td><td>24</td><td>25</td><td>30</td><td>30</td><td>27</td><td>28</td><td>28.1</td>
				</tr>
				<tr>
					<td>8</td>
					<td>28</td><td>29</td><td>29</td><td>26</td><td>27</td><td>25</td><td>25</td><td>28</td><td>28</td><td>27</td><td>27.2</td>
				</tr>
				<tr>
					<td>10</td>
					<td>25</td><td>29</td><td>27</td><td>26</td><td>28</td><td>30</td><td>30</td><td>26</td><td>29</td><td>28</td><td>27.8</td>
				</tr>
				<tr>
					<td>20</td>
					<td>28</td><td>30</td><td>34</td><td>31</td><td>36</td><td>29</td><td>27</td><td>31</td><td>30</td><td>30</td><td>30.6</td>
				</td>
			</tbody>
		</table>
		<h4>Results for Lock-Free Synchronization</h4>
		<table class="table table-sm">
			<thead>
				<tr>
					<th>
						Threads #
					</th>
					<th colspan="10">
						Time Cost (ms)
					</th>
					<th>
						Average
					</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>1</td>
					<td>15</td><td>15</td><td>16</td><td>14</td><td>16</td><td>19</td><td>17</td><td>12</td><td>18</td><td>12</td><td>15.4</td>
				</tr>
				<tr>
					<td>2</td>
					<td>13</td><td>9</td><td>12</td><td>12</td><td>14</td><td>14</td><td>12</td><td>17</td><td>12</td><td>19</td><td>13.4</td>
				</tr>
				<tr>
					<td>4</td>
					<td>15</td><td>18</td><td>30</td><td>14</td><td>10</td><td>12</td><td>16</td><td>19</td><td>11</td><td>26</td><td>17.1</td>
				</tr>
				<tr>
					<td>5</td>
					<td>22</td><td>31</td><td>16</td><td>20</td><td>17</td><td>15</td><td>11</td><td>26</td><td>28</td><td>10</td><td>19.6</td>
				</tr>
				<tr>
					<td>8</td>
					<td>13</td><td>17</td><td>13</td><td>15</td><td>21</td><td>32</td><td>15</td><td>31</td><td>14</td><td>39</td><td>21</td>
				</tr>
				<tr>
					<td>10</td>
					<td>20</td><td>30</td><td>13</td><td>13</td><td>55</td><td>27</td><td>10</td><td>21</td><td>27</td><td>17</td><td>23.3</td>
				</tr>
				<tr>
					<td>20</td>
					<td>15</td><td>14</td><td>48</td><td>16</td><td>29</td><td>13</td><td>12</td><td>19</td><td>15</td><td>20</td><td>20.1</td>
				</tr>
			</tbody>
		</table>
		<h4>Line Chart for Comparing the Average Time Costs</h4>
		<img src="img/insertion_nodes_pic.png" alt="insertion node" width="80%" height="400" align="middle">
		
		<h3>Search 10000 times from the tree</h3>
		I used different number of threads to do 10000 times search on the 1000-node red-black tree I built from the above experiment. <br />
		<h4>Results for Lock-Based Synchronization</h4>
		<table class="table table-sm">
			<thead>
				<tr>
					<th>
						Threads #
					</th>
					<th colspan="10">
						Time Cost (ms)
					</th>
					<th>
						Average
					</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>1</td>
					<td>8</td><td>10</td><td>10</td><td>9</td><td>10</td><td>8</td><td>10</td><td>10</td><td>11</td><td>11</td><td>9.7</td>
				</tr>
				<tr>
					<td>2</td>
					<td>18</td><td>25</td><td>22</td><td>20</td><td>21</td><td>18</td><td>18</td><td>24</td><td>23</td><td>25</td><td>21.4</td>
				</tr>
				<tr>
					<td>4</td>
					<td>43</td><td>46</td><td>34</td><td>53</td><td>43</td><td>46</td><td>69</td><td>52</td><td>78</td><td>47</td><td>51.1</td>
				</tr>
				<tr>
					<td>5</td>
					<td>43</td><td>54</td><td>47</td><td>42</td><td>40</td><td>54</td><td>65</td><td>46</td><td>39</td><td>49</td><td>47.9</td>
				</tr>
				<tr>
					<td>8</td>
					<td>65</td><td>53</td><td>54</td><td>87</td><td>60</td><td>59</td><td>60</td><td>56</td><td>54</td><td>72</td><td>62</td>
				</tr>
				<tr>
					<td>10</td>
					<td>59</td><td>74</td><td>79</td><td>64</td><td>93</td><td>68</td><td>62</td><td>59</td><td>69</td><td>63</td><td>69</td>
				</tr>
				<tr>
					<td>20</td>
					<td>93</td><td>83</td><td>128</td><td>94</td><td>94</td><td>87</td><td>86</td><td>102</td><td>99</td><td>112</td><td>97.8</td>
				</tr>
			</tbody>
		</table>
		<h4>Results for Lock-Free Synchronization</h4>
		<table class="table table-sm">
			<thead>
				<tr>
					<th>
						Threads #
					</th>
					<th colspan="10">
						Time Cost (ms)
					</th>
					<th>
						Average
					</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>1</td>
					<td>6</td><td>7</td><td>6</td><td>6</td><td>7</td><td>4</td><td>8</td><td>5</td><td>6</td><td>8</td><td>6.3</td>
				</tr>
				<tr>
					<td>2</td>
					<td>7</td><td>7</td><td>8</td><td>7</td><td>9</td><td>8</td><td>7</td><td>6</td><td>8</td><td>9</td><td>7.6</td>
				</tr>
				<tr>
					<td>4</td>
					<td>16</td><td>39</td><td>11</td><td>31</td><td>23</td><td>16</td><td>18</td><td>26</td><td>16</td><td>34</td><td>23</td>
				</tr>
				<tr>
					<td>5</td>
					<td>14</td><td>44</td><td>16</td><td>21</td><td>43</td><td>47</td><td>39</td><td>13</td><td>20</td><td>20</td><td>27.7</td>
				</tr>
				<tr>
					<td>8</td>
					<td>23</td><td>20</td><td>34</td><td>75</td><td>20</td><td>31</td><td>18</td><td>30</td><td>22</td><td>29</td><td>30.2</td>
				</tr>
				<tr>
					<td>10</td>
					<td>47</td><td>98</td><td>31</td><td>28</td><td>95</td><td>20</td><td>23</td><td>43</td><td>31</td><td>22</td><td>43.8</td>
				</tr>
				<tr>
					<td>20</td>
					<td>45</td><td>37</td><td>63</td><td>40</td><td>48</td><td>38</td><td>38</td><td>47</td><td>43</td><td>50</td><td>44.9</td>
				</tr>
			</tbody>
		</table>
		<h4>Line Chart for Comparing the Average Time Costs</h4>
		<img src="img/search_nodes_pic.png" alt="search node" width="80%" height="400" align="middle">

		<h3>Use 4 threads to insert different number of nodes</h3>
		Then I fixed the number of threads and changed the number of nodes that each thread needs to insert. Combining  results from the above two experiments, I noticed that when using 4 threads we have the largest differences for insertion and search. Therefore, I fixed the number of threads to 4 for this experiment. And I will use the set of data to roughly calculate the constants for Big-O nations of these two algorithms.<br />
		<h4>Results for Lock-Based Synchronization</h4>
		<table class="table table-sm">
			<thead>
				<tr>
					<th>
						Nodes #
					</th>
					<th colspan="10">
						Time Cost (ms)
					</th>
					<th>
						Average
					</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>10</td>
					<td>2</td><td>2</td><td>2</td><td>3</td><td>2</td><td>2</td><td>3</td><td>2</td><td>3</td><td>4</td><td>2.5</td>
				</tr>
				<tr>
					<td>20</td>
					<td>3</td><td>3</td><td>4</td><td>3</td><td>3</td><td>4</td><td>2</td><td>3</td><td>4</td><td>2</td><td>3.1</td>
				</tr>
				<tr>
					<td>50</td>
					<td>5</td><td>4</td><td>5</td><td>5</td><td>5</td><td>6</td><td>5</td><td>5</td><td>6</td><td>6</td><td>5.2</td>
				</tr>
				<tr>
					<td>100</td>
					<td>10</td><td>11</td><td>10</td><td>10</td><td>9</td><td>9</td><td>11</td><td>9</td><td>10</td><td>11</td><td>10</td>
				</tr>
				<tr>
					<td>200</td>
					<td>23</td><td>20</td><td>22</td><td>21</td><td>22</td><td>21</td><td>20</td><td>24</td><td>27</td><td>20</td><td>22</td>
				</tr>
				<tr>
					<td>500</td>
					<td>68</td><td>67</td><td>69</td><td>81</td><td>66</td><td>71</td><td>68</td><td>69</td><td>80</td><td>71</td><td>71</td>
				</tr>
				<tr>
					<td>1000</td>
					<td>168</td><td>167</td><td>177</td><td>161</td><td>164</td><td>162</td><td>198</td><td>185</td><td>174</td><td>178</td><td>173.4</td>
				</tr>
			</tbdoy>
		</table>
		<h4>Results for Lock-Free Synchronization</h4>
		<table class="table table-sm">
			<thead>
				<tr>
					<th>
						Nodes #
					</th>
					<th colspan="10">
						Time Cost (ms)
					</th>
					<th>
						Average
					</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>10</td>
					<td>2</td><td>2</td><td>2</td><td>2</td><td>1</td><td>2</td><td>2</td><td>1</td><td>1</td><td>2</td><td>1.7</td>
				</tr>
				<tr>
					<td>20</td>
					<td>2</td><td>3</td><td>4</td><td>2</td><td>2</td><td>3</td><td>2</td><td>2</td><td>2</td><td>2</td><td>2.4</td>
				</tr>
				<tr>
					<td>50</td>
					<td>4</td><td>3</td><td>5</td><td>3</td><td>4</td><td>4</td><td>4</td><td>4</td><td>6</td><td>3</td><td>4</td>
				</tr>
				<tr>
					<td>100</td>
					<td>6</td><td>7</td><td>6</td><td>6</td><td>5</td><td>8</td><td>5</td><td>10</td><td>5</td><td>6</td><td>6.4</td>
				</tr>
				<tr>
					<td>200</td>
					<td>9</td><td>12</td><td>13</td><td>13</td><td>11</td><td>13</td><td>7</td><td>18</td><td>8</td><td>16</td><td>12</td>
				</tr>
				<tr>
					<td>500</td>
					<td>17</td><td>19</td><td>21</td><td>23</td><td>17</td><td>17</td><td>25</td><td>23</td><td>18</td><td>37</td><td>21.7</td>
				</tr>
				<tr>
					<td>1000</td>
					<td>23</td><td>28</td><td>34</td><td>38</td><td>33</td><td>29</td><td>40</td><td>36</td><td>33</td><td>32</td><td>32.6</td>
				</tr>
			</tbody>
		</table>
		<h4>Line Chart for Comparing the Average Time Costs</h4>
		<img src="img/4_threads.png" alt="4 threads" width="80%" height="400" align="middle">
		<h4>Roughly Estimate the Constants for Big-O Notation</h4>
		<p>
			I roughly calculate the constants for Big-O Notations of these two algorithms by O(log<i>n</i> + log<i>k</i>), where k is the number of threads. I used Excel to find the logarithmic trendlines for these these two datasets. As I found, at inserting 4000 nodes, it causes outliers of the trendlines, so I just removed that data point from my dataset, and plotted the following figures. Based on my calculation, the constant for lock-based synchronization is around 30.72, and the constant for lock-free synchronization is about 6.43. The lock-free synchronization has a much smaller constant which means lock-free synchronization performs better than lock-based synchronization.
		</p>
		<table>
			<thead>
				<tr>
					<th>
						Lock-Based Synchronization
					</th>
					<th>
						Lock-Free Synchronization
					</th>
				</tr>
			</thead>
			<tbody>
				<tr>
					<td>
						<img src="img/lock_based_trend.png" alt="lock-based trend" width="80%" height="400" align="middle">
					</td>
					<td>
						<img src="img/lock_free_trend.png" alt="lock-free trend" width="80%" height="400" align="middle">
					</td>
				</tr>
			</tbody>
		</table>
		<div class="row">
			<h3>How to run my code</h3>
			<p>
				<img src="img/code_layout.png" alt="code layout" width="30%" height="300" align="right">
				You can checkout the code from my GitHub <a href="https://github.com/xuezhaokun/150-algorithm">repository</a>. Then you can just import the project to Eclipse to test it. The figure on the right shows the content of my program. The main function is in "TestTrees.java" file. You can change "<i>num_threads</i>" and "<i>insert_nodes_per_thread</i>" parameters to test the performance for different number of threads and different number of nodes. You can also change "visulize_locked_tree" and "visulize_lock_free_tree" parameters to enable or disable the visualization for these two trees. Instead, you can just check the video below. It demonstrates how to run some sample experiments with my program. <br />
				<iframe width="65%" height="450" src="https://www.youtube.com/embed/PoNhaIdfvOE" frameborder="0" allowfullscreen></iframe>
			</p>
	</div>
</div>
<div class="container">
	<div class="row">
		<h2>Conclusion and Future Work</h2>
		<p>
			Based on my implementation and experiment results, we can reach the conclusion that lock-free algorithms have a better performance on <i>search</i> and <i>insertion</i> operations on relatively large red-black tree. From the line charts, we can conclude that when the tree gets larger, the performance difference would be more obvious. For future work, I think we can work on finding the hidden constant in the Big-O notation for these two algorithms, so that we can compare the performance of these two algorithms theoretically. The other thing we can work on is to implement lock-free algorithm for <i>deletion</i> operation. Since red-black trees store information not only on leaf nodes but also on internal nodes, <i>deletion</i> operation for red-black trees would result in a more complex structural change, which makes the implementation much harder. Another thing we can do in the future is to find a way to visualize the process for lock-free insertion, which could give users a more straightforward demonstration on how these concurrents work. In addition, in Kim's paper, he proposed an improved version on Ma's algorithm, which needs fewer CAS operations by defining a larger "local area". We might try out his method, and compare its performance with the original Ma's algorithm. We can also study the better concurrent control method "wait-free" to improve the performance for lock-free synchronization.
		</p>
	</div>
</div>

<div class="container">
	<div class="row">
		<h2>References</h2>
		<ol>
			<li>
				Park, Heejin, and Kunsoo Park. "Parallel algorithms for red–black trees." Theoretical Computer Science 262.1-2 (2001): 415-435.
			</li>
			<li>
				Kim, Jong Ho, Helen Cameron, and Peter Graham. "Lock-free red-black trees using cas." Concurrency and Computation: Practice and Experience(2006): 1-40.
			</li>
			<li>
				Jianwen Ma. Lock-Free Insertions on Red-Black Trees. MSc thesis, University of Manitoba, October,2003.
			</li>
			<li>
				Natarajan, Aravind, Lee H. Savoie, and Neeraj Mittal. "Concurrent wait-free red black trees." Symposium on Self-Stabilizing Systems. Springer International Publishing, 2013.
			</li>
			<li>
				GitHub TreeGui: <a href="https://github.com/EslaMx7/AI-Tasks-JADE-Tests/blob/master/src/trees/tasks/treeGUI.java">https://github.com/EslaMx7/AI-Tasks-JADE-Tests/blob/master/src/trees/tasks/treeGUI.java</a>
			</li>
			<li>
				Red-Black Tree figure: <a href="https://upload.wikimedia.org/wikipedia/commons/6/66/Red-black_tree_example.svg">https://upload.wikimedia.org/wikipedia/commons/6/66/Red-black_tree_example.svg</a>
			</li>
			<li>
				Rotation GIF figure: <a href="https://upload.wikimedia.org/wikipedia/commons/3/31/Tree_rotation_animation_250x250.gif">https://upload.wikimedia.org/wikipedia/commons/3/31/Tree_rotation_animation_250x250.gif</a>
			</li>
		</ol>
	</div>
</div>
</body>
</html>