-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
767 lines (752 loc) · 33.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
<!DOCTYPE html>
<html>
<head>
<title>Tufts COMP 150-08 Final Project</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://code.jquery.com/mobile/1.4.5/jquery.mobile-1.4.5.min.css">
<script src="https://code.jquery.com/jquery-1.11.3.min.js"></script>
<script src="https://code.jquery.com/mobile/1.4.5/jquery.mobile-1.4.5.min.js"></script>
<!-- Latest compiled and minified CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<!-- Latest compiled and minified JavaScript -->
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
<style>
#case-table { border: none; border-collapse: collapse; }
#case-table td { border-left: 1px solid #000; border-bottom: none; border-top: none}
#case-table td:first-child { border-left: none; }
#case-table th { border-left: 1px solid #000; border-bottom: none; border-top: none}
#case-table th:first-child { border-left: none;}
.borderless td, .borderless th {
border: none;
}
body{
font-size: 16px;
}
</style>
</head>
<body>
<div class="container" style="text-align: center">
<div class="row">
<h1>Parallel Algorithm For Red-Black Trees</h1>
<h3>Zhaokun Xue</h3>
<h4><a href="mailto:Zhaokun.Xue@tufts.edu">Zhaokun.Xue@tufts.edu</a></h4>
</div>
</div>
<div class="container">
<div class="row">
<h2>Introduction</h2>
<p>
In our previous study, we have done a lot of detailed analysis of basic operations for the red-black trees based on the traditional sequential algorithm. In this project, I am going to use parallel algorithm to implement red-black tree's <i>search</i> and <i>insertion</i> operations. I will implement these two operations by both lock-based and lock-free synchronization in parallel algorithm, and then compare the performance of these two methods.
</p>
<p>
The red-black tree is a balanced binary search tree with height O(log <i>n</i>), and efficient search, insertion, and deletion operations, which makes it a better choice than regular binary search in search-intensive applications. And it only requires few rotations to rebalance the tree and keep it red-black properties.
</p>
<p>
As we known, it takes O(log <i>n</i>) for red-black tree's search and insertion operations by sequential algorithms. As for using parallel algorithms, the search and insertion time would be O(log <i>n</i> + log<i>k</i>), where <i>k</i> is the number threads/concurrents. Although they both have the upper bound O(log <i>n</i>), and parallel algorithm even have another add-on term <i>log k</i>, parallel algorithms would provide a smaller coefficient for the big-O notation. And in my project, I will do experiments to compare the performance on lock-based and lock-free parallel algorithms, and I will claim that lock-free synchronization has a better performance than lock-based.
</p>
</div>
</div>
<div class="container">
<div class="row">
<h2>Parallel Algorithm</h2>
<p>
<i>A <a href="https://en.wikipedia.org/wiki/Parallel_algorithm">parallel algorithm</a>, as opposed to a traditional sequential algorithm, is an algorithm which can be executed a piece at a time on many different processing devices, and then combined together again at the end to get the correct result.</i>
</p>
<p>
People usually build cost-effective parallel algorithms by defining multiple operations on each step to improve the performance of traditional sequential algorithm, which specifies a sequence of operations and each step does one operation.
</p>
<h3>Lock-Based Algorithm</h3>
<p>
<i>A lock or mutex (from mutual exclusion) is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution.</i>
</p>
<p>
Lock-based parallel algorithms are usually implemented on the parallel random-access machine (PRAM), which is a shared-memory model of parallel computation that consists of a collection of identical processors and a shared memory. There are four types of PRAM.
<ul>
<li><b>EREW (exclusive read exclusive write)</b></li>
Concurrents are not allowed to read and write to the same memory location at the same time.
<li><b>CREW (concurrent read exclusive write)</b></li>
Concurrents can only read a memory location.
<li><b>CRCW (concurrent read concurrent write)</b></li>
Concurrents allows both concurrent reads and concurrent writes to a memory location
<li><b>ERCW (Exclusive read concurrent write)</b></li>
This case is usually not considered.
</ul>
</p>
</div>
<div class="row">
<table class="table borderless">
<thead>
<tr>
<th>Advantages</th>
<th>Disadvantages</th>
</tr>
</thead>
</tbody>
<tr>
<td>
<ul>
<li>
Simple to use
</li>
<li>
Ubiquitous
</li>
<li>
Easily to amenable to performance analysis
</li>
</ul>
</td>
<td>
<ul>
<li>
Potential for deadlock
</li>
<li>
Unavoidable overhead at each synchronization point, which usually can be a system call
</li>
<li>
Possible scheduling anomalies (such as priority inversion), where high-priority processes are delayed waiting for a lock held by a low-priority process
</li>
<li>
Convoying, where a delayed process holding a lock effectively blocks all other processes waiting for the lock
</li>
</ul>
</td>
</tr>
</tbody>
</table>
</div>
<div class="row">
<h3>Lock-Free Algorithm</h3>
<p>
Opposite to lock-based synchronization, lock-free synchronization is algorithm that implements concurrents' operations without using "lock" or "mutex".
</p>
<table class="table borderless">
<thead>
<tr>
<th>Advantage</th>
<th>Disadvantage</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<ul>
<li>
Lock-free algorithms avoid problems in lock-based algorithms.
</li>
</ul>
</td>
<td>
<ul>
<li>
Lock-free algorithms are usually hard to implement, especially for complicate data structure.
</li>
<li>
It is usually harder to analyze the time complexity for Lock-free algorithms than Lock-based algorithms.
</li>
</ul>
</td>
</tr>
</tbody>
</table>
<h4>CAS (Compare and Swap)</h4>
<p>
In this project, I use <b><i>CAS (Compare And Swap)</i></b> synchronization primitive to implement my lock-free algorithm, which is supported by my most modern computer architectures. <br />
<i>Rather than acquiring a lock, a process makes a copy of the data it wishes to modify, changes the copy and then replaces the original data with the copy only if the original data is unchanged. If there is no contention, the update is made with no appreciable overhead. If there is contention, all but one concurrent process accessing the shared data must “roll-back” and redo its computation using the updated data.</i> Most modern programming languages support <i>CAS</i> as an atomic operation. For example, in this project, I am going to use JAVA's <i>AtomicBoolean</i> class to realize the CAS strategy. We can also use other <i>Atomic</i> classes to implement <i>CAS</i> operation.
The following piece of pesudocode just depicts how CAS function works.
<pre>
CAS(shVble,savedValue,newValue)
Begin Atomic
if (shVble==savedValue) {
shVble=newValue;
return True;
} else {
return False;
}
End Atomic
do {
savedValue=shVble;
newValue=computeNewValue(savedValue);
}while (!CAS(shVble,savedValue,newValue));
</pre>
</p>
</div>
</div>
<div class="container">
<div class="row">
<h2>Red-Black Trees</h2>
<p>
<img src="https://upload.wikimedia.org/wikipedia/commons/6/66/Red-black_tree_example.svg" alt="red-black tree" width="40%" height="200" align="right">
<ul>
<li>
Nodes are colored red or black
</li>
<li>
Root is always black
</li>
<li>
Add black "dummy" leaves so every "real" node has 2 children
</li>
<li>
Every red node has a black parent
</li>
<li>
For any node x: all paths down to leaves contain equal number of black nodes
</li>
</ul>
</p>
</div>
<div class="row">
<h3>Operations</h3>
<p>
For this project, I just focus on <i>search</i> and <i>insertion</i> operations on red-black tree.
</p>
<h4>Search</h4>
<p>
Just like any regular binary search tree, we use recursion on search operation.
</p>
<div class"row">
<h4>Insertion</h4>
<p>
Before working on the details of the <i>insertion</i> operation, we need to introduce an important helper operation, <i><b>rotation</i></b>, for red-black tree.
</p>
<table class="table">
<tbody>
<tr>
<td>
<pre>
Left_Rotate(T,x) {
y = right[x];
right[x] = left[y];
p[left[y]] = x;
p[y] = p[x];
if (p[x] == p[root])
root[T] = y;
else if (x == left[p[x]])
left[p[x]] = y;
else
right[p[x]] = y;
left[y] = x;
p[x] = y;
}
</pre>
</td>
<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/3/31/Tree_rotation_animation_250x250.gif" alt="rotation" width="300" height="300" align="middle">
</td>
<td>
<pre>
Right_Rotate(T,y) {
x = left[y];
left[y] = right[x];
p[right[x]] = y;
p[x] = p[y];
if (p[y] == p[root])
root[T] = x;
else if (y == left[p[y]])
left[p[y]] = x;
else
right[p[y]] = x;
right[x] = y;
p[y] = x;
}
</pre>
</td>
</tr>
</tbody>
</table>
<p>
When we insert a new node to the red-black tree, the first step is to physically insert the node to the given red-black tree, just as we do for regular binary search tree, and color the node to red. Then we need to fix upward from the inserted node to the root to maintain the red-black tree's properties. In general, we have three cases for insertion. Suppose we insert node <i><b>N</b></i> to the red-black tree, the following table explains each of the three cases.
<table class="table" id="case-table">
<thead>
<tr>
<th>
Case 1: The uncle of inserted node is <font color="red">red</font>.
</th>
<th>
Case 2: The inserted node is right child of its parent and its uncle is black.
</th>
<th>
Case 3: The inserted node is left child of its parent and its uncle is black.
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/d/d6/Red-black_tree_insert_case_3.svg" alt="case 1" width="400" height="200" align="middle">
</td>
<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/8/89/Red-black_tree_insert_case_4.svg" alt="case 2" width="400" height="200" align="middle">
</td>
<td>
<img src="https://upload.wikimedia.org/wikipedia/commons/d/dc/Red-black_tree_insert_case_5.svg" alt="case 3" width="400" height="200" align="middle">
</td>
</tr>
<tr>
<td>
For case 1, we just need to recolor the inserted node, its parent, uncle and its grandparent. Then we continue to fix upward and reuse the grandparent node as the "inserted" node <i>N</i>.
</td>
<td>
We can to a left rotation on the inserted node's parent to turn it to case 3. And switch label for the parent and inserted node.
</td>
<td>
In case 3, we first do a right rotation on the inserted node's parent, then recolor parent and grandparent. Then we move up and fix upward. In this case, we do not reuse the grandparent node.
</td>
</tr>
</tbody>
</table>
</p>
</div>
</div>
</div>
<div class="container">
<div class="row">
<h2>Implementation</h2>
<p>
I programmed in JAVA for this project. I used JAVA's "<i>Thread</i> to simulate concurrents' operations. And I used "<i>ReentrantLock</i>" lock class to implement EREW lock-based synchronization. For lock-free synchronization, I implemented the CAS process by using "<i>AtomicBoolean</i>" class, which provides the convenience for doing the CAS atomic operation. My lock-free implementation is based on the idea of Ma's Algorithm and pseudocode provided in Kim's paper. Finally, I modified the code of treeGUI from GitHub to realize a small JFrame visualization for displaying the red-black tree built by my algorithm. As the code limits, the visualization only works and layouts well for small data set, like trees with less than 20 nodes.
</p>
</div>
<div class="row">
<h3>Ma's Algorithm</h3>
<p>
Ma's algorithm is built upon the basic sequential insertion algorithm with the addition of a "local area" concept and the use of lock-free primitives to control concurrency. And Ma also adds an extra <i>flag</i> field to each node to indicate whether a process gains control of the node, which is used for CAS operation check.
</P>
<h4>Local Area</h4>
<p>
<img src="img/local_area.png" alt="local area" width="45%" height="200" align="right">
The local area consists of the set of nodes that a process must have full control of
to ensure successful completion of an operation. As we see, the local area for insertion is just made of nodes involved in the rotation fix-up operation. With the local area concept, if several concurrents in a localized region have overlapping areas, only the one that obtains all the flags in its local area will be able to proceed. Other concurrents will have to re-attempt to gain control of their local areas. As one insertion concurrent completes its processing in one part of the tree, it may either finish entirely (and release all its flags) or move up the tree. In the later case, the concurrent first obtains the flags of nodes in its new local area, moves up, and then releases the flags that it set for the nodes in its old local area. Releasing flags allows other processes that may have been waiting to advance. The figure on the right shows the "local area" defined by Ma. The following slides demonstrate how local area works for insertion case 1.
<div id="local-arae-demo" class="carousel slide" data-ride="carousel" style="width:50%; height:300">
<!-- Wrapper for slides -->
<div class="carousel-inner">
<div class="item active">
<img src="img/local_area_demo/Slide1.gif" alt="Step 1" style="width:100%;">
</div>
<div class="item">
<img src="img/local_area_demo/Slide2.gif" alt="Step 2" style="width:100%;">
</div>
<div class="item">
<img src="img/local_area_demo/Slide3.gif" alt="Step 3" style="width:100%;">
</div>
<div class="item">
<img src="img/local_area_demo/Slide4.gif" alt="Step 4" style="width:100%;">
</div>
</div>
<!-- Left and right controls -->
<a class="left carousel-control" href="#local-arae-demo" data-slide="prev">
<span class="glyphicon glyphicon-chevron-left"></span>
<span class="sr-only">Previous</span>
</a>
<a class="right carousel-control" href="#local-arae-demo" data-slide="next">
<span class="glyphicon glyphicon-chevron-right"></span>
<span class="sr-only">Next</span>
</a>
</div>
<!--<img src="img/local_area_demo.gif" alt="local area animation" width="60%" height="450" align="middle">-->
</p>
<br />
<p>
<img src="img/insertion_pseudo.png" alt="insertion" width="45%" height="550" align="right">
With the idea of Ma's Algorithm, I add a <i>flag</i> field to my "LockFreeRBNode" object, which indicates whether the current node is occupied by any concurrent. And then I added <i>CAS</i> and <i>SetupLocalAreaForInsert</i> functions to the regular sequential insertion operation as Kim's pseudocode did. The pseudocode on the right shows the shows the necessary changes for <i>insertion</i> function, and the red text indicates the differences from sequential algorithm.
<img src="img/object.png" alt="insertion" width="55%" height="300" align="left">
</p>
</div>
</div>
<div class="container">
<div class="row">
<h2>Experiment Results</h2>
<h3>Sample Output for visualization</h3>
I modified the TreeGUI.java code from GitHub to visualize red-black trees I got from my experiments. Due to the visualization code limits, it can only layout and display well for small trees. In my project, I applied the code to display trees with less than 20 nodes. I generated 20 random nodes from 0 to 1000. Then I ran lock-based insertion and lock-free insertion on the same dataset. As we known, different insertion order could result in different red-black trees. Here are one sample output from my experiment. Each figure demonstrates the red-black tree built by their own algorithm and concurrents flow. You can click to enlarge the figures. <br />
<table class="table borderless">
<tbody>
<tr>
<td>
<b>Lock-Based Algorithm Output</b>
</td>
<td>
<b>Lock-Free Algorithm Output</b>
</td>
</tr>
<tr>
<td>
<a href="#locked-tree" data-rel="popup" data-position-to="window">
<img src="img/lock_based_tree_drawer.png" alt="lock based tree demo" style="width:50%"></a>
<div data-role="popup" id="locked-tree">
<a href="#pageone" data-rel="back" class="ui-btn ui-corner-all ui-shadow ui-btn-a ui-icon-delete ui-btn-icon-notext ui-btn-right">Close</a><img src="img/locked_tree.jpg" style="width:100%;height:100%;" alt="lock based tree demo">
</div>
</td>
<td>
<a href="#lock-free-tree" data-rel="popup" data-position-to="window">
<img src="img/lock_based_tree_drawer.png" alt="lock free tree demo" style="width:50%"></a>
<div data-role="popup" id="lock-free-tree">
<a href="#pageone" data-rel="back" class="ui-btn ui-corner-all ui-shadow ui-btn-a ui-icon-delete ui-btn-icon-notext ui-btn-right">Close</a><img src="img/lockfree_tree.jpg" style="width:100%;height:100%;" alt="lock free tree demo">
</div>
</td>
</tr>
</tbody>
</table>
<br />
Then I did experiments for comparing the performance of these two kinds of synchronizations. For every experiment, I ran my program 10 times. I collected results from eclipse console output, and imported the data to excel. Finally I calculated the averages and plotted the line charts in excel.
<h3>Insert 1000 nodes to an empty red-black tree</h3>
In this experiment I totally inserted 1000 random nodes to an empty red-black tree by using different number of threads. For example, when the number of threads is 10, it means each thread inserts 100 nodes to the tree.<br />
<h4>Results for Lock-Based Synchronization</h4>
<table class="table table-sm">
<thead>
<tr>
<th>
Threads #
</th>
<th colspan="10">
Time Cost (ms)
</th>
<th>
Average
</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>21</td><td>18</td><td>21</td><td>21</td><td>22</td><td>22</td><td>21</td><td>23</td> <td>19</td><td>21</td><td>20.9</td>
</tr>
<tr>
<td>2</td>
<td>26</td><td>25</td><td>26</td><td>26</td><td>27</td><td>26</td><td>26</td><td>28</td><td>27</td><td>29</td><td>26.6</td>
</tr>
<tr>
<td>4</td>
<td>28</td><td>28</td><td>34</td><td>26</td><td>28</td><td>30</td><td>30</td><td>33</td><td>33</td><td>26</td><td>29.6</td>
</tr>
<tr>
<td>5</td>
<td>32</td><td>30</td><td>28</td><td>27</td><td>24</td><td>25</td><td>30</td><td>30</td><td>27</td><td>28</td><td>28.1</td>
</tr>
<tr>
<td>8</td>
<td>28</td><td>29</td><td>29</td><td>26</td><td>27</td><td>25</td><td>25</td><td>28</td><td>28</td><td>27</td><td>27.2</td>
</tr>
<tr>
<td>10</td>
<td>25</td><td>29</td><td>27</td><td>26</td><td>28</td><td>30</td><td>30</td><td>26</td><td>29</td><td>28</td><td>27.8</td>
</tr>
<tr>
<td>20</td>
<td>28</td><td>30</td><td>34</td><td>31</td><td>36</td><td>29</td><td>27</td><td>31</td><td>30</td><td>30</td><td>30.6</td>
</td>
</tbody>
</table>
<h4>Results for Lock-Free Synchronization</h4>
<table class="table table-sm">
<thead>
<tr>
<th>
Threads #
</th>
<th colspan="10">
Time Cost (ms)
</th>
<th>
Average
</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>15</td><td>15</td><td>16</td><td>14</td><td>16</td><td>19</td><td>17</td><td>12</td><td>18</td><td>12</td><td>15.4</td>
</tr>
<tr>
<td>2</td>
<td>13</td><td>9</td><td>12</td><td>12</td><td>14</td><td>14</td><td>12</td><td>17</td><td>12</td><td>19</td><td>13.4</td>
</tr>
<tr>
<td>4</td>
<td>15</td><td>18</td><td>30</td><td>14</td><td>10</td><td>12</td><td>16</td><td>19</td><td>11</td><td>26</td><td>17.1</td>
</tr>
<tr>
<td>5</td>
<td>22</td><td>31</td><td>16</td><td>20</td><td>17</td><td>15</td><td>11</td><td>26</td><td>28</td><td>10</td><td>19.6</td>
</tr>
<tr>
<td>8</td>
<td>13</td><td>17</td><td>13</td><td>15</td><td>21</td><td>32</td><td>15</td><td>31</td><td>14</td><td>39</td><td>21</td>
</tr>
<tr>
<td>10</td>
<td>20</td><td>30</td><td>13</td><td>13</td><td>55</td><td>27</td><td>10</td><td>21</td><td>27</td><td>17</td><td>23.3</td>
</tr>
<tr>
<td>20</td>
<td>15</td><td>14</td><td>48</td><td>16</td><td>29</td><td>13</td><td>12</td><td>19</td><td>15</td><td>20</td><td>20.1</td>
</tr>
</tbody>
</table>
<h4>Line Chart for Comparing the Average Time Costs</h4>
<img src="img/insertion_nodes_pic.png" alt="insertion node" width="80%" height="400" align="middle">
<h3>Search 10000 times from the tree</h3>
I used different number of threads to do 10000 times search on the 1000-node red-black tree I built from the above experiment. <br />
<h4>Results for Lock-Based Synchronization</h4>
<table class="table table-sm">
<thead>
<tr>
<th>
Threads #
</th>
<th colspan="10">
Time Cost (ms)
</th>
<th>
Average
</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>8</td><td>10</td><td>10</td><td>9</td><td>10</td><td>8</td><td>10</td><td>10</td><td>11</td><td>11</td><td>9.7</td>
</tr>
<tr>
<td>2</td>
<td>18</td><td>25</td><td>22</td><td>20</td><td>21</td><td>18</td><td>18</td><td>24</td><td>23</td><td>25</td><td>21.4</td>
</tr>
<tr>
<td>4</td>
<td>43</td><td>46</td><td>34</td><td>53</td><td>43</td><td>46</td><td>69</td><td>52</td><td>78</td><td>47</td><td>51.1</td>
</tr>
<tr>
<td>5</td>
<td>43</td><td>54</td><td>47</td><td>42</td><td>40</td><td>54</td><td>65</td><td>46</td><td>39</td><td>49</td><td>47.9</td>
</tr>
<tr>
<td>8</td>
<td>65</td><td>53</td><td>54</td><td>87</td><td>60</td><td>59</td><td>60</td><td>56</td><td>54</td><td>72</td><td>62</td>
</tr>
<tr>
<td>10</td>
<td>59</td><td>74</td><td>79</td><td>64</td><td>93</td><td>68</td><td>62</td><td>59</td><td>69</td><td>63</td><td>69</td>
</tr>
<tr>
<td>20</td>
<td>93</td><td>83</td><td>128</td><td>94</td><td>94</td><td>87</td><td>86</td><td>102</td><td>99</td><td>112</td><td>97.8</td>
</tr>
</tbody>
</table>
<h4>Results for Lock-Free Synchronization</h4>
<table class="table table-sm">
<thead>
<tr>
<th>
Threads #
</th>
<th colspan="10">
Time Cost (ms)
</th>
<th>
Average
</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>6</td><td>7</td><td>6</td><td>6</td><td>7</td><td>4</td><td>8</td><td>5</td><td>6</td><td>8</td><td>6.3</td>
</tr>
<tr>
<td>2</td>
<td>7</td><td>7</td><td>8</td><td>7</td><td>9</td><td>8</td><td>7</td><td>6</td><td>8</td><td>9</td><td>7.6</td>
</tr>
<tr>
<td>4</td>
<td>16</td><td>39</td><td>11</td><td>31</td><td>23</td><td>16</td><td>18</td><td>26</td><td>16</td><td>34</td><td>23</td>
</tr>
<tr>
<td>5</td>
<td>14</td><td>44</td><td>16</td><td>21</td><td>43</td><td>47</td><td>39</td><td>13</td><td>20</td><td>20</td><td>27.7</td>
</tr>
<tr>
<td>8</td>
<td>23</td><td>20</td><td>34</td><td>75</td><td>20</td><td>31</td><td>18</td><td>30</td><td>22</td><td>29</td><td>30.2</td>
</tr>
<tr>
<td>10</td>
<td>47</td><td>98</td><td>31</td><td>28</td><td>95</td><td>20</td><td>23</td><td>43</td><td>31</td><td>22</td><td>43.8</td>
</tr>
<tr>
<td>20</td>
<td>45</td><td>37</td><td>63</td><td>40</td><td>48</td><td>38</td><td>38</td><td>47</td><td>43</td><td>50</td><td>44.9</td>
</tr>
</tbody>
</table>
<h4>Line Chart for Comparing the Average Time Costs</h4>
<img src="img/search_nodes_pic.png" alt="search node" width="80%" height="400" align="middle">
<h3>Use 4 threads to insert different number of nodes</h3>
Then I fixed the number of threads and changed the number of nodes that each thread needs to insert. Combining results from the above two experiments, I noticed that when using 4 threads we have the largest differences for insertion and search. Therefore, I fixed the number of threads to 4 for this experiment. And I will use the set of data to roughly calculate the constants for Big-O nations of these two algorithms.<br />
<h4>Results for Lock-Based Synchronization</h4>
<table class="table table-sm">
<thead>
<tr>
<th>
Nodes #
</th>
<th colspan="10">
Time Cost (ms)
</th>
<th>
Average
</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td>2</td><td>2</td><td>2</td><td>3</td><td>2</td><td>2</td><td>3</td><td>2</td><td>3</td><td>4</td><td>2.5</td>
</tr>
<tr>
<td>20</td>
<td>3</td><td>3</td><td>4</td><td>3</td><td>3</td><td>4</td><td>2</td><td>3</td><td>4</td><td>2</td><td>3.1</td>
</tr>
<tr>
<td>50</td>
<td>5</td><td>4</td><td>5</td><td>5</td><td>5</td><td>6</td><td>5</td><td>5</td><td>6</td><td>6</td><td>5.2</td>
</tr>
<tr>
<td>100</td>
<td>10</td><td>11</td><td>10</td><td>10</td><td>9</td><td>9</td><td>11</td><td>9</td><td>10</td><td>11</td><td>10</td>
</tr>
<tr>
<td>200</td>
<td>23</td><td>20</td><td>22</td><td>21</td><td>22</td><td>21</td><td>20</td><td>24</td><td>27</td><td>20</td><td>22</td>
</tr>
<tr>
<td>500</td>
<td>68</td><td>67</td><td>69</td><td>81</td><td>66</td><td>71</td><td>68</td><td>69</td><td>80</td><td>71</td><td>71</td>
</tr>
<tr>
<td>1000</td>
<td>168</td><td>167</td><td>177</td><td>161</td><td>164</td><td>162</td><td>198</td><td>185</td><td>174</td><td>178</td><td>173.4</td>
</tr>
</tbdoy>
</table>
<h4>Results for Lock-Free Synchronization</h4>
<table class="table table-sm">
<thead>
<tr>
<th>
Nodes #
</th>
<th colspan="10">
Time Cost (ms)
</th>
<th>
Average
</th>
</tr>
</thead>
<tbody>
<tr>
<td>10</td>
<td>2</td><td>2</td><td>2</td><td>2</td><td>1</td><td>2</td><td>2</td><td>1</td><td>1</td><td>2</td><td>1.7</td>
</tr>
<tr>
<td>20</td>
<td>2</td><td>3</td><td>4</td><td>2</td><td>2</td><td>3</td><td>2</td><td>2</td><td>2</td><td>2</td><td>2.4</td>
</tr>
<tr>
<td>50</td>
<td>4</td><td>3</td><td>5</td><td>3</td><td>4</td><td>4</td><td>4</td><td>4</td><td>6</td><td>3</td><td>4</td>
</tr>
<tr>
<td>100</td>
<td>6</td><td>7</td><td>6</td><td>6</td><td>5</td><td>8</td><td>5</td><td>10</td><td>5</td><td>6</td><td>6.4</td>
</tr>
<tr>
<td>200</td>
<td>9</td><td>12</td><td>13</td><td>13</td><td>11</td><td>13</td><td>7</td><td>18</td><td>8</td><td>16</td><td>12</td>
</tr>
<tr>
<td>500</td>
<td>17</td><td>19</td><td>21</td><td>23</td><td>17</td><td>17</td><td>25</td><td>23</td><td>18</td><td>37</td><td>21.7</td>
</tr>
<tr>
<td>1000</td>
<td>23</td><td>28</td><td>34</td><td>38</td><td>33</td><td>29</td><td>40</td><td>36</td><td>33</td><td>32</td><td>32.6</td>
</tr>
</tbody>
</table>
<h4>Line Chart for Comparing the Average Time Costs</h4>
<img src="img/4_threads.png" alt="4 threads" width="80%" height="400" align="middle">
<h4>Roughly Estimate the Constants for Big-O Notation</h4>
<p>
I roughly calculate the constants for Big-O Notations of these two algorithms by O(log<i>n</i> + log<i>k</i>), where k is the number of threads. I used Excel to find the logarithmic trendlines for these these two datasets. As I found, at inserting 4000 nodes, it causes outliers of the trendlines, so I just removed that data point from my dataset, and plotted the following figures. Based on my calculation, the constant for lock-based synchronization is around 30.72, and the constant for lock-free synchronization is about 6.43. The lock-free synchronization has a much smaller constant which means lock-free synchronization performs better than lock-based synchronization.
</p>
<table>
<thead>
<tr>
<th>
Lock-Based Synchronization
</th>
<th>
Lock-Free Synchronization
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<img src="img/lock_based_trend.png" alt="lock-based trend" width="80%" height="400" align="middle">
</td>
<td>
<img src="img/lock_free_trend.png" alt="lock-free trend" width="80%" height="400" align="middle">
</td>
</tr>
</tbody>
</table>
<div class="row">
<h3>How to run my code</h3>
<p>
<img src="img/code_layout.png" alt="code layout" width="30%" height="300" align="right">
You can checkout the code from my GitHub <a href="https://github.com/xuezhaokun/150-algorithm">repository</a>. Then you can just import the project to Eclipse to test it. The figure on the right shows the content of my program. The main function is in "TestTrees.java" file. You can change "<i>num_threads</i>" and "<i>insert_nodes_per_thread</i>" parameters to test the performance for different number of threads and different number of nodes. You can also change "visulize_locked_tree" and "visulize_lock_free_tree" parameters to enable or disable the visualization for these two trees. Instead, you can just check the video below. It demonstrates how to run some sample experiments with my program. <br />
<iframe width="65%" height="450" src="https://www.youtube.com/embed/PoNhaIdfvOE" frameborder="0" allowfullscreen></iframe>
</p>
</div>
</div>
<div class="container">
<div class="row">
<h2>Conclusion and Future Work</h2>
<p>
Based on my implementation and experiment results, we can reach the conclusion that lock-free algorithms have a better performance on <i>search</i> and <i>insertion</i> operations on relatively large red-black tree. From the line charts, we can conclude that when the tree gets larger, the performance difference would be more obvious. For future work, I think we can work on finding the hidden constant in the Big-O notation for these two algorithms, so that we can compare the performance of these two algorithms theoretically. The other thing we can work on is to implement lock-free algorithm for <i>deletion</i> operation. Since red-black trees store information not only on leaf nodes but also on internal nodes, <i>deletion</i> operation for red-black trees would result in a more complex structural change, which makes the implementation much harder. Another thing we can do in the future is to find a way to visualize the process for lock-free insertion, which could give users a more straightforward demonstration on how these concurrents work. In addition, in Kim's paper, he proposed an improved version on Ma's algorithm, which needs fewer CAS operations by defining a larger "local area". We might try out his method, and compare its performance with the original Ma's algorithm. We can also study the better concurrent control method "wait-free" to improve the performance for lock-free synchronization.
</p>
</div>
</div>
<div class="container">
<div class="row">
<h2>References</h2>
<ol>
<li>
Park, Heejin, and Kunsoo Park. "Parallel algorithms for red–black trees." Theoretical Computer Science 262.1-2 (2001): 415-435.
</li>
<li>
Kim, Jong Ho, Helen Cameron, and Peter Graham. "Lock-free red-black trees using cas." Concurrency and Computation: Practice and Experience(2006): 1-40.
</li>
<li>
Jianwen Ma. Lock-Free Insertions on Red-Black Trees. MSc thesis, University of Manitoba, October,2003.
</li>
<li>
Natarajan, Aravind, Lee H. Savoie, and Neeraj Mittal. "Concurrent wait-free red black trees." Symposium on Self-Stabilizing Systems. Springer International Publishing, 2013.
</li>
<li>
GitHub TreeGui: <a href="https://github.com/EslaMx7/AI-Tasks-JADE-Tests/blob/master/src/trees/tasks/treeGUI.java">https://github.com/EslaMx7/AI-Tasks-JADE-Tests/blob/master/src/trees/tasks/treeGUI.java</a>
</li>
<li>
Red-Black Tree figure: <a href="https://upload.wikimedia.org/wikipedia/commons/6/66/Red-black_tree_example.svg">https://upload.wikimedia.org/wikipedia/commons/6/66/Red-black_tree_example.svg</a>
</li>
<li>
Rotation GIF figure: <a href="https://upload.wikimedia.org/wikipedia/commons/3/31/Tree_rotation_animation_250x250.gif">https://upload.wikimedia.org/wikipedia/commons/3/31/Tree_rotation_animation_250x250.gif</a>
</li>
</ol>
</div>
</div>
</body>
</html>