-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
1603 lines (1314 loc) · 96.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width">
<meta name="theme-color" content="#222"><meta name="generator" content="Hexo 7.3.0">
<link rel="apple-touch-icon" sizes="180x180" href="/images/apple-touch-icon-next.png">
<link rel="icon" type="image/png" sizes="32x32" href="/images/favicon-32x32-next.png">
<link rel="icon" type="image/png" sizes="16x16" href="/images/favicon-16x16-next.png">
<link rel="mask-icon" href="/images/logo.svg" color="#222">
<link rel="stylesheet" href="/css/main.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.2/css/all.min.css" integrity="sha256-XOqroi11tY4EFQMR9ZYwZWKj5ZXiftSx36RRuC3anlA=" crossorigin="anonymous">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/animate.css/3.1.1/animate.min.css" integrity="sha256-PR7ttpcvz8qrF57fur/yAx1qXMFJeJFiA6pSzWi0OIE=" crossorigin="anonymous">
<script class="next-config" data-name="main" type="application/json">{"hostname":"haorongx.github.io","root":"/","images":"/images","scheme":"Mist","darkmode":false,"version":"8.20.0","exturl":false,"sidebar":{"position":"left","width_expanded":320,"width_dual_column":240,"display":"post","padding":18,"offset":12},"copycode":{"enable":false,"style":null},"fold":{"enable":false,"height":500},"bookmark":{"enable":false,"color":"#222","save":"auto"},"mediumzoom":false,"lazyload":false,"pangu":false,"comments":{"style":"tabs","active":null,"storage":true,"lazyload":false,"nav":null},"stickytabs":false,"motion":{"enable":true,"async":false,"transition":{"menu_item":"fadeInDown","post_block":"fadeIn","post_header":"fadeInDown","post_body":"fadeInDown","coll_header":"fadeInLeft","sidebar":"fadeInUp"}},"i18n":{"placeholder":"Searching...","empty":"We didn't find any results for the search: ${query}","hits_time":"${hits} results found in ${time} ms","hits":"${hits} results found"},"path":"/search.xml","localsearch":{"enable":true,"trigger":"auto","top_n_per_article":1,"unescape":false,"preload":false}}</script><script src="/js/config.js"></script>
<meta property="og:type" content="website">
<meta property="og:title" content="Haorong's Blog">
<meta property="og:url" content="https://haorongx.github.io/index.html">
<meta property="og:site_name" content="Haorong's Blog">
<meta property="og:locale" content="en_US">
<meta property="article:author" content="Haorong Xu">
<meta property="article:tag" content="Computer Science, Database, DBMS">
<meta name="twitter:card" content="summary">
<link rel="canonical" href="https://haorongx.github.io/">
<script class="next-config" data-name="page" type="application/json">{"sidebar":"","isHome":true,"isPost":false,"lang":"en","comments":"","permalink":"","path":"index.html","title":""}</script>
<script class="next-config" data-name="calendar" type="application/json">""</script>
<title>Haorong's Blog</title>
<noscript>
<link rel="stylesheet" href="/css/noscript.css">
</noscript>
<!-- hexo injector head_end start -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/hexo-math@4.0.0/dist/style.css">
<!-- hexo injector head_end end --></head>
<body itemscope itemtype="http://schema.org/WebPage" class="use-motion">
<div class="headband"></div>
<main class="main">
<div class="column">
<header class="header" itemscope itemtype="http://schema.org/WPHeader"><div class="site-brand-container">
<div class="site-nav-toggle">
<div class="toggle" aria-label="Toggle navigation bar" role="button">
<span class="toggle-line"></span>
<span class="toggle-line"></span>
<span class="toggle-line"></span>
</div>
</div>
<div class="site-meta">
<a href="/" class="brand" rel="start">
<i class="logo-line"></i>
<h1 class="site-title">Haorong's Blog</h1>
<i class="logo-line"></i>
</a>
<p class="site-subtitle" itemprop="description">To remove all barrieres in the way of computer science.</p>
</div>
<div class="site-nav-right">
<div class="toggle popup-trigger" aria-label="Search" role="button">
<i class="fa fa-search fa-fw fa-lg"></i>
</div>
</div>
</div>
<nav class="site-nav">
<ul class="main-menu menu"><li class="menu-item menu-item-home"><a href="/" rel="section"><i class="fa fa-home fa-fw"></i>Home</a></li><li class="menu-item menu-item-about"><a href="/about/" rel="section"><i class="fa fa-user fa-fw"></i>About</a></li><li class="menu-item menu-item-tags"><a href="/tags/" rel="section"><i class="fa fa-tags fa-fw"></i>Tags</a></li><li class="menu-item menu-item-categories"><a href="/categories/" rel="section"><i class="fa fa-th fa-fw"></i>Categories</a></li><li class="menu-item menu-item-archives"><a href="/archives/" rel="section"><i class="fa fa-archive fa-fw"></i>Archives</a></li>
<li class="menu-item menu-item-search">
<a role="button" class="popup-trigger"><i class="fa fa-search fa-fw"></i>Search
</a>
</li>
</ul>
</nav>
<div class="search-pop-overlay">
<div class="popup search-popup"><div class="search-header">
<span class="search-icon">
<i class="fa fa-search"></i>
</span>
<div class="search-input-container">
<input autocomplete="off" autocapitalize="off" maxlength="80"
placeholder="Searching..." spellcheck="false"
type="search" class="search-input">
</div>
<span class="popup-btn-close" role="button">
<i class="fa fa-times-circle"></i>
</span>
</div>
<div class="search-result-container no-result">
<div class="search-result-icon">
<i class="fa fa-spinner fa-pulse fa-5x"></i>
</div>
</div>
</div>
</div>
</header>
<aside class="sidebar">
<div class="sidebar-inner sidebar-overview-active">
<ul class="sidebar-nav">
<li class="sidebar-nav-toc">
Table of Contents
</li>
<li class="sidebar-nav-overview">
Overview
</li>
</ul>
<div class="sidebar-panel-container">
<!--noindex-->
<div class="post-toc-wrap sidebar-panel">
</div>
<!--/noindex-->
<div class="site-overview-wrap sidebar-panel">
<div class="site-author animated" itemprop="author" itemscope itemtype="http://schema.org/Person">
<img class="site-author-image" itemprop="image" alt="Haorong Xu"
src="https://avatars.githubusercontent.com/u/103825064?v=4">
<p class="site-author-name" itemprop="name">Haorong Xu</p>
<div class="site-description" itemprop="description"></div>
</div>
<div class="site-state-wrap animated">
<nav class="site-state">
<div class="site-state-item site-state-posts">
<a href="/archives/">
<span class="site-state-item-count">7</span>
<span class="site-state-item-name">posts</span>
</a>
</div>
<div class="site-state-item site-state-categories">
<a href="/categories/">
<span class="site-state-item-count">5</span>
<span class="site-state-item-name">categories</span></a>
</div>
<div class="site-state-item site-state-tags">
<a href="/tags/">
<span class="site-state-item-count">8</span>
<span class="site-state-item-name">tags</span></a>
</div>
</nav>
</div>
<div class="links-of-author animated">
<span class="links-of-author-item">
<a href="mailto:db_haorong@outlook.com" title="E-Mail → mailto:db_haorong@outlook.com" rel="noopener me" target="_blank"><i class="fa fa-envelope fa-fw"></i>E-Mail</a>
</span>
</div>
</div>
</div>
</div>
</aside>
</div>
<div class="main-inner index posts-expand">
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://haorongx.github.io/2024/08/17/Intro-to-Relational-Model-Relational-Algebra-Where-journey-begins/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="https://avatars.githubusercontent.com/u/103825064?v=4">
<meta itemprop="name" content="Haorong Xu">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="Haorong's Blog">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | Haorong's Blog">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2024/08/17/Intro-to-Relational-Model-Relational-Algebra-Where-journey-begins/" class="post-title-link" itemprop="url">Intro to Relational Model & Relational Algebra: Where journey begins</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">Posted on</span>
<time title="Created: 2024-08-17 10:30:50" itemprop="dateCreated datePublished" datetime="2024-08-17T10:30:50+08:00">2024-08-17</time>
</span>
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar-check"></i>
</span>
<span class="post-meta-item-text">Edited on</span>
<time title="Modified: 2024-08-19 20:49:42" itemprop="dateModified" datetime="2024-08-19T20:49:42+08:00">2024-08-19</time>
</span>
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-folder"></i>
</span>
<span class="post-meta-item-text">In</span>
<span itemprop="about" itemscope itemtype="http://schema.org/Thing">
<a href="/categories/DBMS/" itemprop="url" rel="index"><span itemprop="name">DBMS</span></a>
</span>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<p>In 1970, a ground-breaking paper by Edgar F. Codd <span
class="math inline">\(^{[1]}\)</span> was published. This paper proposed
the relational model and non-procedural way to query data in relational
model, which led to the birth of RDBMS. In this article, we review some
basic concepts in relational model, and briefly introduce relational
algebra.</p>
<h2 id="relational-model">Relational Model</h2>
<p>Data models describe how data is organized in a database. A
relational database, as you are probably aware of, mainly consists of a
set of tables, which have mutiple columns(attributes) and rows(tuples).
As highlited in <em>Database System Concepts</em><span
class="math inline">\(^{[2]}\)</span> that</p>
<blockquote>
<p>A row in a table represents a relationship among a set of values.
Since a table is a collection of such relationships, there is a close
correspondence between the concept of table and the mathematical concept
of relation...</p>
</blockquote>
<p>However, the definition of <strong>relation</strong> is slightly
different. Considering a <em>students</em> table which consists of three
columns: <em>ID, dept., and grade</em>. For every single attribute, its
domain is the set of all acceptable values. <em>e.g.</em>, for attribute
<em>grade</em>, its domain <span class="math inline">\(D\)</span> might
consists of <span class="math inline">\(\{A*, A, B, C, D, U\}\)</span>.
And a relation <span class="math inline">\(R\)</span> is a subset of the
cartesian product of its columns. <em>Cartesian product</em> basically
means all possible combinations. Using the example of <em>students</em>
table, domains of its attributes might be:</p>
<ul>
<li><span class="math inline">\(D_{ID} = {x: x = n , n \in N, n \geq
0}\)</span></li>
<li><span class="math inline">\(D_{dept} = {Engineering, Business,
Computer \space Science, Social \space Science}\)</span></li>
<li><span class="math inline">\(D_{grade} = \{A*, A, B, C, D,
U\}\)</span></li>
</ul>
<p>So the cartesian product includes: <span class="math inline">\(<0,
Computer \space Science, A*>, <2, Social \space Science, U>,
<3, Bussiness, B>\)</span> etc.</p>
<h3 id="superkey-cadidate-key-primary-key">Superkey, Cadidate Key,
Primary Key</h3>
<p>A superkey refers to a set of attributes which uniquely identifies
each tuple in a relation. For example, in our previous <em>students</em>
relation, ID is a superkey since each student is assigned with a unique
ID.</p>
<p>If a superkey has no proper subset such that it's also a superkey of
this relation, then this superkey is also called a <strong>cadidate
key</strong>. (i.e. A cadidate key is a superkey which contains of the
minimum number of attributes)</p>
<p>A primary key simply means a cadidate key which is chosen by the user
to identify tuples within a relation. Clearly, there cannot be two
completely same primary keys at the same time.(This is called
primary-key constraint)</p>
<h3 id="foreign-key">Foreign Key</h3>
<p>A foreign-key constraint from <span
class="math inline">\(R_1.A\)</span> (attribute <span
class="math inline">\(A\)</span> of relation <span
class="math inline">\(R_1\)</span>) to <span
class="math inline">\(R_2.B\)</span>(B is the primary key of <span
class="math inline">\(R_2\)</span>) states that every single value of
<span class="math inline">\(R_1.A\)</span> must exist in <span
class="math inline">\(R_2.B\)</span>. Attribute <span
class="math inline">\(R_1.A\)</span> is named the <strong>foreign
key</strong> from <span class="math inline">\(R_1.A\)</span> referencing
<span class="math inline">\(R_2.B \space ^{[1]}\)</span>.</p>
<p>e.g. Considering our <em>students</em> table, let's say we have
another <em>departments</em> table, which is defined as
<code>departments(name, staff, students, avg_salary)</code>, and
<code>name</code> attribute is the primary key of this relation. Then
<em>dept</em> column in <em>students</em> table is a foreign key
referencing to <em>name</em> column in <em>departments</em> table.</p>
<h2 id="relational-algebra">Relational Algebra</h2>
<p>The relational algebra mainly contains a set of operations which take
one or two relation(s) as input and produce a new relation (Operations
that only take one relation as input is called <strong>unary</strong>
operations. Similarly, operations that takes two relations are named
<strong>binary</strong> operations). These operators were first defined
in Edgar F. Codd's paper published in 1970([1]), and summarized in his
another paper([3]). In the following sections, a brief description of
each operation from his paper will present, and further explaination
will be added.</p>
<h3 id="select-operation">Select Operation</h3>
<blockquote>
<p>The <strong>select</strong> operator of the relational algebra takes
one relation as operand and produces a new relation consisting of
selected tuples of the first.<span
class="math inline">\(^{[3]}\)</span></p>
</blockquote>
<p>The select operator is denoted as <em>sigma</em> <span
class="math inline">\(\sigma\)</span>. For example, if we want to find
tuples in the <em>students</em> table which ID equals to 1, we
write:</p>
<p><span class="math display">\[ \sigma_{ID=1}(students) \]</span></p>
<p>Moreover, we can use logic connectives to combine mutiple predicates,
for example: (Common logic connectives include AND <span
class="math inline">\(\land\)</span>, OR <span
class="math inline">\(\lor\)</span>, and NOT <span
class="math inline">\(\lnot\)</span>)</p>
<p><span class="math display">\[ \sigma_{ID \leq 500 \space \land \space
Dept = "Engineering"}(students) \]</span></p>
<h3 id="project-operation">Project Operation</h3>
<blockquote>
<p>The PROJECT operator(<span class="math inline">\(\Pi\)</span>) also
transforms one relation (table) into a new one, this time however
consisting of selected attributes (columns) of the first. <span
class="math inline">\(^{[3]}\)</span></p>
</blockquote>
<p>e.g. <span class="math display">\[\Pi_{ID,
Dept}(students)\]</span></p>
<p>This operation gives us the <em>students</em> table without
<em>grade</em> column.</p>
<p>This operator is pretty straightforward, but it's worth addressed
that this operator also takes expressions involving attributes as input,
for example:</p>
<p><span class="math display">\[\Pi_{ID*2, Dept}(students)\]</span></p>
<p>By performing this operation, every student's ID is timed by 2.</p>
<h3 id="join-operation">Join Operation</h3>
<p>Sometimes we have to combine information in two different relations.
For instance, if we want to find the information of the department of
each student, then we have to combine relation <em>students</em> and
<em>departments</em>. To extract information in both relations, we can
apply <strong>cartesian product</strong>(denoted as <span
class="math inline">\(\times\)</span>), which combines each tuple in a
relation with each tuple in another. Considering the following two
relations, <em>students</em> and <em>departments</em>:</p>
<p><img src="/images/intro_to_relational_model_students.png" /> <img
src="/images/intro_to_relational_model_dept.png" /></p>
<p>Then their cartesian product will be:</p>
<p><img src="/images/intro_to_relational_model_cartproduct.png" /></p>
<p>However, the highlighted tuples shown in the figure is obviously
worthless since these tuples connected irrelevent rows together.(We only
want the information of a student and their corresponding departments)
To get rid of the mismatching data, we shall add a predicate:</p>
<p><span class="math display">\[\sigma_{students.dept =
departments.name}(students\times departments)\]</span></p>
<p>The join operation allows us to combine a selection and a Cartesian
product into a single operation.<span
class="math inline">\(^{[2]}\)</span> The join operation(<span
class="math inline">\(\bowtie\)</span>) is defined as:</p>
<p><span class="math display">\[R_1 \bowtie_\theta R_2 =
\sigma_\theta(R_1 \times R_2)\]</span></p>
<p>(<span class="math inline">\(R_1, R_2\)</span> are two relations
taken as operands. <span class="math inline">\(\theta\)</span>
represents a predicate on attributes in <span
class="math inline">\(R_1\)</span> and <span
class="math inline">\(R_2\)</span>.)</p>
<p>Thus our previous operation can be rewritten as</p>
<p><span class="math display">\[students\bowtie_{students.dept =
departments.name} departments\]</span></p>
<h3 id="set-operations">Set Operations</h3>
<p>Set operations mainly include <strong>union</strong>,
<strong>intersection</strong>, and <strong>set-difference</strong>.</p>
<p>The union operation allows us to combine two sets of tuples, which is
denoted as <span class="math inline">\(\cup\)</span>. e.g.</p>
<p><span class="math display">\[\sigma_{ID=1}(students) \cup
\sigma_{dept=Fine\space Arts}(students)\]</span></p>
<p>This expression finds students that have ID 1 or study Fine Arts.</p>
<p>Similarly, the intersection operation(<span
class="math inline">\(\cap\)</span>) allows us to find tuples that exist
in both relations, e.g.</p>
<p><span class="math display">\[\sigma_{ID=1}(students) \cap
\sigma_{dept=Fine\space Arts}(students)\]</span></p>
<p>We also have the set-difference operator(<span
class="math inline">\(-\)</span>) which finds tuples that exist in first
relation but not in the second.</p>
<p><span class="math display">\[\sigma_{dept=Fine\space Arts}(students)
- \sigma_{ID=1}(students)\]</span></p>
<p>This expression finds students who study Fine Arts but have ID not
equal to 1.</p>
<h3 id="assignment-operation">Assignment Operation</h3>
<p>The assignment operation(<span
class="math inline">\(\leftarrow\)</span>) allows you to give a
relational expression a different name for your convenience, for
instance:</p>
<p><span class="math display">\[ StudentsWhoLearnCS \leftarrow
\sigma_{dept=CS} (students)\]</span></p>
<p>And then you can refer to students who study at CS department as
<span class="math inline">\(StudentsWhoLearnCS\)</span>.</p>
<h2 id="references">References</h2>
<ul>
<li><strong>[1]</strong> Edgar F. Codd, 1970. A Relational Model of Data
for Large Shared Data Banks. Communications of the ACM 13, 377–387. <a
target="_blank" rel="noopener" href="https://doi.org/10.1145/362384.362685"
class="uri">https://doi.org/10.1145/362384.362685</a></li>
<li><strong>[2]</strong> Abraham Silberschatz, Henry F. Korth, S.
Sudarshan, n.d. Database System Concepts, 7th Edition. ed.</li>
<li><strong>[3]</strong> Codd, E.F., 1982. Relational database: a
practical foundation for productivity. Commun. ACM 25, 109–117. <a
target="_blank" rel="noopener" href="https://doi.org/10.1145/358396.358400"
class="uri">https://doi.org/10.1145/358396.358400</a></li>
</ul>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://haorongx.github.io/2024/08/10/Column-stores-vs-row-stores-how-different-are-they-really-A-Brief-Summary/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="https://avatars.githubusercontent.com/u/103825064?v=4">
<meta itemprop="name" content="Haorong Xu">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="Haorong's Blog">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | Haorong's Blog">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2024/08/10/Column-stores-vs-row-stores-how-different-are-they-really-A-Brief-Summary/" class="post-title-link" itemprop="url">Column-stores vs. row-stores: how different are they really? : A Brief Summary</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">Posted on</span>
<time title="Created: 2024-08-10 18:53:28 / Modified: 19:01:07" itemprop="dateCreated datePublished" datetime="2024-08-10T18:53:28+08:00">2024-08-10</time>
</span>
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-folder"></i>
</span>
<span class="post-meta-item-text">In</span>
<span itemprop="about" itemscope itemtype="http://schema.org/Thing">
<a href="/categories/DBMS/" itemprop="url" rel="index"><span itemprop="name">DBMS</span></a>
</span>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<h2 id="background-introduction">Background & Introduction</h2>
<p>Column-oriented database systems (column-store or c-store) have been
widely applied in data warehousing and business intelligence
application. Briefly, a column-store database stores data in the same
column contigously and compressed.<span
class="math inline">\(^{[2]}\)</span> Unlike traditional row-store
database systems, which store data belonging to the same record
together. Column-store database is able to execute faster in certain
situations because it avoids fetch unneccessary data (i.e. attributs not
required).</p>
<h2 id="row-store-simulating-column-store">Row-store simulating
column-store</h2>
<p>Although the storage layer and query executor of c-store have
competely different archeitecture, it was thought to be possible to
simulate columnar storage in traditional row-store DBMS without changing
its kernel. There are mainly three ways to achieve this:</p>
<ul>
<li>Vertically partitioning</li>
<li>All - Index plan</li>
<li>Materialized views</li>
</ul>
<p>Now we explore these three techniques and thier disadvantages.</p>
<h3 id="vertically-partitioning">Vertically Partitioning</h3>
<p>Vertically partitioning a table creates a seperate table for each
attribute, which contains a (position key, attribute value) pair. By
doing so database can fetch one attribute every time, thus reduces I/O
redundent. Take an <code>employee</code> table for example.</p>
<p><img src="/images/cstore_verticalpartitioning.png" /></p>
<p>This optimization, however, does not work very effective. The main
problem is that the DBMS needs to join each sub-table upon querying,
which leads to significant decrease in performance. The position keys
stored in every column also wastes disk bandwidth.</p>
<h3 id="all---index">All - Index</h3>
<p>This plan adds a B <span class="math inline">\(^{+}\)</span> -Tree
index on every column of the table. Upon querying, a list of (record-id,
value) pairs generated for each column, which contains records that
satisfy predicates(i.e. WHERE clause) on this column(If no predicate is
given, then it will generates a list of all tuples). Then the DBMS joins
these lists of tuples.</p>
<p>The obvious drawbacks is that the hash join(which is used in this
case) is quite slow. The author claims that theortically the performance
of all-index plan should be close to the virtically partioning plan, but
they cannot set their DBMS properly so it performs way worse than
VP.</p>
<h3 id="materialized-view">Materialized View</h3>
<p>Materialized views are a kind of views which precompute the result of
a given query. Unlike most common views which only store a virtual
table, materialized views allows precomputed data to be stored
physically. Hence, row-store can avoid reading unneccessary attributes
by including only required columns in materialized views. However, using
materialized view requires to know all quries in advance, and
materialized views must be properly maintained when new data is inserted
/ modified.</p>
<h3 id="performances">Performances</h3>
<p>This image from <span class="math inline">\([1]\)</span> illustrates
the performance of basic c-store and r-store. (MV stands for
materialized views enabled)</p>
<p><img src="/images/cstore_performance1.png" /></p>
<p>This image shows the performance differences between all varients of
r-store. ((a) Performance numbers for different variants of the
row-store by query flight. Here, T is traditional, T(B) is traditional
(bitmap), MV is materialized views, VP is vertical partitioning, and AI
is all indexes. (b) Average performance across all queries.)</p>
<p><img src="/images/cstore_performance2.png" /></p>
<p>Clearly, none of the optimizations above can make the performance of
r-store competitive to c-store. Hence, the author concludes that:</p>
<blockquote>
<p>Rather, it is that this simulation performs poorly on today’s
row-store systems (our experiments were performed on a very recent
product release of System X). A successful column-oriented simulation
will require some important system improvements, such as virtual
record-ids, reduced tuple overhead, fast merge joins of sorted data,
run-length encoding across multiple tuples, and some column-oriented
query execution techniques... <span
class="math inline">\(^{[1]}\)</span></p>
</blockquote>
<h2 id="c-store-optimizations">C-store Optimizations</h2>
<p>This article explores several practical ways to improve the
performance of c-store, which include:</p>
<ul>
<li>Compression</li>
<li>Late Materialization</li>
<li>Block Iteration</li>
<li>Invisible Join</li>
</ul>
<h3 id="compression">Compression</h3>
<p>Compression not only saves storage space but also reduces I/O
redundent, thus speeds up query processing. The author argues that
compression is especially effective in c-store because data of the same
type(which is stored in the same column) has lower <em>information
entropy</em><span class="math inline">\(^{[3]}\)</span>. It's also
suggested that compression algorithms which can directly operate on
compressed data is even more effective because decompression can be
avoided.</p>
<h3 id="late-materialization">Late Materialization</h3>
<p>Materialization refers to the process which tuples on the disks are
fetched into RAM, and <strong>late materialization</strong> means to
perform materialization as late as possible. Without late
materialization(i.e. using early materialization strategy instead), a
c-store DBMS fetches columns from the disks and reconstructs tuples(i.e.
combine columns into rows) at early stage in query processing. Late
materialization defers to reconstruct tuples early, instead, it first
applies all predicates and generates a list of positions to represent
required tuples.</p>
<p>Late materialization avoids to fetch unneccessary tuples(since
predicates are applied early). It also ensures that compressed data will
not be decompressed when there's no need to (but if we reconstruct
tuples, then data must be decompressed to combine with data from other
columns).</p>
<h3 id="block-iteration">Block Iteration</h3>
<p>C-store allows all data from a column to be processed together, and
fixed-length tuples allows DBMS to treat data as an array, thus exploits
potention for parallelism on CPU. Futhermore, no data extraction is
required in this case, unlike in r-store which attributes must be
extracted from tuples.</p>
<h3 id="invisible-join">Invisible Join</h3>
<p>In this article, the author proposed a new join technique called
<strong>invisible join</strong>. The process of invisible join is as
follow. (Before we get started, let's now define what <strong>fact
table</strong>, and <strong>dimension table</strong>. Fact tables
contain numerical data, while dimension tables provide context and
background information <span class="math inline">\(^{[4]}\)</span>)</p>
<p>Invisible join first extracts a list of dimension table keys which
satisfy the predicate, and uses these keys to create a hash table to
filter out keys in the fact table.(the following image from the article
illustrates this process)</p>
<p><img src="/images/cstore_invisiblejoin1.png" /></p>
<p>Then, keys in the fact table are fed to the hash table. A list of
positions of accepted tuples is created for each column of the fact
table. Then all the position lists are joined(which only requires a bit
AND).</p>
<p><img src="/images/cstore_invisiblejoin2.png" /></p>
<p>Finally, required data is extracted from fact table and dimension
tables.</p>
<p><img src="/images/cstore_invisiblejoin3.png" /></p>
<h3 id="performances-1">Performances</h3>
<p><img src="/images/cstore_performance3.png" /></p>
<p>((a) Performance numbers for C-Store by query flight with various
optimizations removed. The four letter code indicates the C-Store
configuration: T=tuple-at-a-time processing, t=block processing;
I=invisible join enabled, i=disabled; C=compression enabled, c=disabled;
L=late materialization enabled, l=disabled. (b) Average performance
numbers for C-Store across all queries.)</p>
<p>The image above illustrates the performances of different
optimizations techniques for c-store. Obviously the most siginificant
optimizations are compression and late materialization.</p>
<p>The author also explores the performance of c-store on
prejoined(a.k.a. denormalized) tables(in this case, the fact table is
prejoined with dimension tables). The intention of this experiment was
to explore the cost of predicates application(since the author noticed
that "performance is dominated in the lower parts of the query plan
(predicate application) and that the invisible join technique made join
performance relatively cheap."<span
class="math inline">\(^{[1]}\)</span>) It was assumed that the
performance would be improved since the join process can be avoided.
However, the experiment shows that the assumption is wrong.</p>
<p><img src="/images/cstore_performance4.png" /></p>
<p>Surprisingly, the performance wasn't improved at all (PJ, No C), but
declined significantly. The reason is that invisible join converts
predicates on dimension tables to predicates on fact table foreign keys,
but with prejoined table, predicates are applied directly on strings,
which takes more time to process than integers. Therefore, after strings
are mapped to integers (PJ, Int C), the performance improved
significantly.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Finally, the author concludes that:</p>
<blockquote>
<p>We showed that attempts to emulate the physical layout of a
column-store in a row-store via techniques like vertical partitioning
and index-only plans do not yield good performance. We attribute this
slowness to high tuple reconstruction costs, as well as the high
per-tuple overheads in narrow, vertically partitioned tables. ... The
conclusion of this work is not that simulating a columnstore in a
row-store is impossible. Rather, it is that this simulation performs
poorly on today’s row-store systems (our experiments were performed on a
very recent product release of System X). A successful column-oriented
simulation will require some important system improvements, such as
virtual record-ids, reduced tuple overhead, fast merge joins of sorted
data, run-length encoding across multiple tuples, and some
column-oriented query execution techniques like operating directly on
compressed data, block processing, invisible joins, and late
materialization.</p>
</blockquote>
<h2 id="references">References</h2>
<ul>
<li><p><strong>[1]</strong> Abadi, D.J., Madden, S.R., Hachem, N., 2008.
Column-stores vs. row-stores: how different are they really?, in:
Proceedings of the 2008 ACM SIGMOD International Conference on
Management of Data. Presented at the SIGMOD/PODS ’08: SIGMOD/PODS ’08 -
International Conference on Management of Data, ACM, Vancouver Canada,
pp. 967–980. <a target="_blank" rel="noopener" href="https://doi.org/10/bbx62s"
class="uri">https://doi.org/10/bbx62s</a></p></li>
<li><p><strong>[2]</strong> Abadi, D.J., Boncz, P.A., Harizopoulos, S.,
2009. Column-oriented database systems. Proc. VLDB Endow. 2, 1664–1665.
<a target="_blank" rel="noopener" href="https://doi.org/10/ggmz6g"
class="uri">https://doi.org/10/ggmz6g</a></p></li>
<li><p><strong>[3]</strong> Abadi, D., Madden, S., Ferreira, M., 2006.
Integrating compression and execution in column-oriented database
systems. Proceedings of the 2006 ACM SIGMOD international conference on
Management of data 671–682. <a target="_blank" rel="noopener" href="https://doi.org/10/b3q7nx"
class="uri">https://doi.org/10/b3q7nx</a></p></li>
<li><p><strong>[4]</strong> Simplilearn, 2024. Fact Table vs. Dimension
Table - Differences Between The Two. Simplilearn. <a
target="_blank" rel="noopener" href="https://www.simplilearn.com/fact-table-vs-dimension-table-article"
class="uri">https://www.simplilearn.com/fact-table-vs-dimension-table-article</a></p></li>
</ul>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://haorongx.github.io/2024/08/10/The-Physical-Storage-of-PostgreSQL-Page-Layout/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="https://avatars.githubusercontent.com/u/103825064?v=4">
<meta itemprop="name" content="Haorong Xu">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="Haorong's Blog">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | Haorong's Blog">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2024/08/10/The-Physical-Storage-of-PostgreSQL-Page-Layout/" class="post-title-link" itemprop="url">The Physical Storage of PostgreSQL - Page Layout</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">Posted on</span>
<time title="Created: 2024-08-10 18:47:28 / Modified: 19:44:49" itemprop="dateCreated datePublished" datetime="2024-08-10T18:47:28+08:00">2024-08-10</time>
</span>
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-folder"></i>
</span>
<span class="post-meta-item-text">In</span>
<span itemprop="about" itemscope itemtype="http://schema.org/Thing">
<a href="/categories/DBMS/" itemprop="url" rel="index"><span itemprop="name">DBMS</span></a>
</span>
,
<span itemprop="about" itemscope itemtype="http://schema.org/Thing">
<a href="/categories/DBMS/PostgreSQL/" itemprop="url" rel="index"><span itemprop="name">PostgreSQL</span></a>
</span>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<h2 id="database-cluster">Database Cluster</h2>
<p>A database cluster is a collection of databases that is managed by a
single instance of a running database server. In file system terms, a
database cluster is a single directory under which all data will be
stored.<span class="math inline">\(^{[1]}\)</span></p>
<p>Generally, after you excute <code>initdb</code>, the database cluster
directory is created, typically it would be
<code>/usr/local/pgsql/data</code>. This directory contains several
subdirectories, you can find a full list <a
target="_blank" rel="noopener" href="https://www.postgresql.org/docs/current/storage-file-layout.html">here</a>.
Now we shall just focus on the <code>base</code> folder, which contains
per-database subdirectories.</p>
<h2 id="heap-table">Heap Table</h2>
<p>(<a
target="_blank" rel="noopener" href="https://www.postgresql.org/docs/current/storage-page-layout.html">Here</a>
is the official documentation that describes the layout of a database
page)</p>
<p>In PostgreSQL, each table is stored in one or several files that are
called <strong>HEAP TABLE FILE</strong>.</p>
<p>Now let’s look into what is inside this file. Basically, a heap table
file contains of five seperate areas, which are:</p>
<ul>
<li>Header data: Consists of some meta data and three pointers, which
are <code>pd_lower</code>, <code>pd_upper</code>, and
<code>pd_special</code>. They point to the</li>
<li>Item Ids: Each Item Id is a pointer to a tuple(A tuple is a row in
the table)</li>
<li>Unallocated space</li>
<li>Tuples(Items)</li>
<li>Special space</li>
</ul>
<p>Whenever a new tuple is added, Postgres allocates a new ItemID at the
beginning of the free space, and then stores the tuple at the end of the
free space, as shown in the following figure:</p>
<p><img src="/images/pagelayout1.svg" /> <span
class="math inline">\(^{[2]}\)</span></p>
<p>The core components of a page are ItemIDs and header data, and they
are internally represented as a PageHeaderData structure, which is
defined below:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">PageHeaderData</span></span></span><br><span class="line"><span class="class">{</span></span><br><span class="line"> <span class="comment">/* XXX LSN is member of *any* block, not only page-organized ones */</span></span><br><span class="line"> PageXLogRecPtr pd_lsn; <span class="comment">/* LSN: next byte after last byte of xlog</span></span><br><span class="line"><span class="comment"> * record for last change to this page */</span></span><br><span class="line"> uint16 pd_checksum; <span class="comment">/* checksum */</span></span><br><span class="line"> uint16 pd_flags; <span class="comment">/* flag bits, see below */</span></span><br><span class="line"> LocationIndex pd_lower; <span class="comment">/* offset to start of free space */</span></span><br><span class="line"> LocationIndex pd_upper; <span class="comment">/* offset to end of free space */</span></span><br><span class="line"> LocationIndex pd_special; <span class="comment">/* offset to start of special space */</span></span><br><span class="line"> uint16 pd_pagesize_version;</span><br><span class="line"> TransactionId pd_prune_xid; <span class="comment">/* oldest prunable XID, or zero if none */</span></span><br><span class="line"> ItemIdData pd_linp[FLEXIBLE_ARRAY_MEMBER]; <span class="comment">/* line pointer array */</span></span><br><span class="line">} PageHeaderData;</span><br></pre></td></tr></table></figure>
<p>While we can simply ingnore most variables since they are irrelevent
to what we are studying now, we really need to focus on
<code>pd_lower</code>, <code>pd_upper</code>, <code>pd_special</code>,
and <code>pd_linp</code>. We can get started by examining the
initialization process.</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">void</span></span><br><span class="line"><span class="title function_">PageInit</span><span class="params">(Page page, Size pageSize, Size specialSize)</span></span><br><span class="line">{</span><br><span class="line"> PageHeader p = (PageHeader) page;</span><br><span class="line"></span><br><span class="line"> specialSize = MAXALIGN(specialSize);</span><br><span class="line"></span><br><span class="line"> Assert(pageSize == BLCKSZ);</span><br><span class="line"> Assert(pageSize > specialSize + SizeOfPageHeaderData);</span><br><span class="line"></span><br><span class="line"> <span class="comment">/* Make sure all fields of page are zero, as well as unused space */</span></span><br><span class="line"> MemSet(p, <span class="number">0</span>, pageSize);</span><br><span class="line"></span><br><span class="line"> p->pd_flags = <span class="number">0</span>;</span><br><span class="line"> p->pd_lower = SizeOfPageHeaderData;</span><br><span class="line"> p->pd_upper = pageSize - specialSize;</span><br><span class="line"> p->pd_special = pageSize - specialSize;</span><br><span class="line"> PageSetPageSizeAndVersion(page, pageSize, PG_PAGE_LAYOUT_VERSION);</span><br><span class="line"> <span class="comment">/* p->pd_prune_xid = InvalidTransactionId; done by above MemSet */</span></span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>As you can see, <code>pd_lower</code> is set to the end of header
data, which is exactly where the free space begins. Meanwhile,
<code>pd_upper</code> is set to the beginning of special space, which is
the next byte of the terminal of free space, while
<code>pd_special</code> is set to the same location. As for
<code>pd_linp</code>, they are actually the line pointers, or Item IDs
we mentioned above. They are a (offset,length) pair, which is defined
below:</p>
<figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">ItemIdData</span></span></span><br><span class="line"><span class="class">{</span></span><br><span class="line"> <span class="type">unsigned</span> lp_off:<span class="number">15</span>, <span class="comment">/* offset to tuple (from start of page) */</span></span><br><span class="line"> lp_flags:<span class="number">2</span>, <span class="comment">/* state of line pointer, see below */</span></span><br><span class="line"> lp_len:<span class="number">15</span>; <span class="comment">/* byte length of tuple */</span></span><br><span class="line">} ItemIdData;</span><br><span class="line"></span><br><span class="line"><span class="comment">/*</span></span><br><span class="line"><span class="comment">* lp_flags has these possible states. An UNUSED line pointer is available</span></span><br><span class="line"><span class="comment">* for immediate re-use, the other states are not.</span></span><br><span class="line"><span class="comment">*/</span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> LP_UNUSED 0 <span class="comment">/* unused (should always have lp_len=0) */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> LP_NORMAL 1 <span class="comment">/* used (should always have lp_len>0) */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> LP_REDIRECT 2 <span class="comment">/* HOT redirect (should have lp_len=0) */</span></span></span><br><span class="line"><span class="meta">#<span class="keyword">define</span> LP_DEAD 3 <span class="comment">/* dead, may or may not have storage */</span></span></span><br></pre></td></tr></table></figure>
<p>By specifying offest and length, we can find a tuple easily. A tuple
begins at <code>page + lp_off</code> and ends at
<code>page + lp_off + lp_len - 1</code>. Insert Tuples</p>
<p>Inserting tuples is implemented with function PageAddItemExtended in
bufpage.c. I briefly summarized the entire process, which is</p>
<ol type="1">
<li>If a offsetNumber is specified(which means we want to insert the
tuple at a specific location), check the validity of given location</li>
<li>Check weather the given page has sufficient space or not</li>
<li>If the inserting position is in the middle of ItemIDs array, then
shuffle them to make room for new tuple</li>
<li>Copy the new tuple to the page</li>
<li>Update pointers(pd_lower & pd_upper)</li>
</ol>
<p>References</p>
<ul>
<li>[1]: PostgreSQL Documentation, <a
target="_blank" rel="noopener" href="https://www.postgresql.org/docs/current/creating-cluster.html"
class="uri">https://www.postgresql.org/docs/current/creating-cluster.html</a></li>
<li>[2]: This figure is from PostgreSQL documentation, <a
target="_blank" rel="noopener" href="https://www.postgresql.org/docs/current/storage-page-layout.html"
class="uri">https://www.postgresql.org/docs/current/storage-page-layout.html</a></li>
</ul>
</div>
<footer class="post-footer">
<div class="post-eof"></div>
</footer>
</article>
</div>
<div class="post-block">
<article itemscope itemtype="http://schema.org/Article" class="post-content" lang="">
<link itemprop="mainEntityOfPage" href="https://haorongx.github.io/2024/06/25/Some-Tips-for-PostgreSQL-Beginners/">
<span hidden itemprop="author" itemscope itemtype="http://schema.org/Person">
<meta itemprop="image" content="https://avatars.githubusercontent.com/u/103825064?v=4">
<meta itemprop="name" content="Haorong Xu">
</span>
<span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<meta itemprop="name" content="Haorong's Blog">
<meta itemprop="description" content="">
</span>
<span hidden itemprop="post" itemscope itemtype="http://schema.org/CreativeWork">
<meta itemprop="name" content="undefined | Haorong's Blog">
<meta itemprop="description" content="">
</span>
<header class="post-header">
<h2 class="post-title" itemprop="name headline">
<a href="/2024/06/25/Some-Tips-for-PostgreSQL-Beginners/" class="post-title-link" itemprop="url">Some Tips for PostgreSQL Beginners</a>
</h2>
<div class="post-meta-container">
<div class="post-meta">
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar"></i>
</span>
<span class="post-meta-item-text">Posted on</span>
<time title="Created: 2024-06-25 18:44:50" itemprop="dateCreated datePublished" datetime="2024-06-25T18:44:50+08:00">2024-06-25</time>
</span>
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-calendar-check"></i>
</span>
<span class="post-meta-item-text">Edited on</span>
<time title="Modified: 2024-08-10 18:47:11" itemprop="dateModified" datetime="2024-08-10T18:47:11+08:00">2024-08-10</time>
</span>
<span class="post-meta-item">
<span class="post-meta-item-icon">
<i class="far fa-folder"></i>
</span>
<span class="post-meta-item-text">In</span>
<span itemprop="about" itemscope itemtype="http://schema.org/Thing">
<a href="/categories/DBMS/" itemprop="url" rel="index"><span itemprop="name">DBMS</span></a>
</span>
,
<span itemprop="about" itemscope itemtype="http://schema.org/Thing">
<a href="/categories/DBMS/PostgreSQL/" itemprop="url" rel="index"><span itemprop="name">PostgreSQL</span></a>
</span>
</span>
</div>
</div>
</header>
<div class="post-body" itemprop="articleBody">
<p>Lately I’ve been doing research on PostgreSQL for a while. As a
beginner, it’s really hard to decide what to focus on since PostgreSQL
is such a complex system. Nevertheless, there are bunches of stuff we
have to keep in mind during our learning process. Here are some tips
that might help you.</p>
<ol type="1">
<li>Participate in the community. <a
target="_blank" rel="noopener" href="https://www.postgresql.org/community/">PostgreSQL Community</a> is
mainly made up by several mailing lists. You can find those lists at
their website. Whenever you encounter a problem while using or learning
Postgres, you can send a e-mail to one of these lists. But there are
some thing to keep in mind before starting writing e-mails.</li>
</ol>
<p>The very first point is to choose the right list. If you just have
some general problems while using PostgreSQL you’d better to ask in
pg-general first. This list is for general user support. But if your
question is related to the internals or seem to be complicated, then
post it in pg-hackers, where most developers are. It’s also important to
be aware when writing e-mails. For example, you need to be polite and
respectful, and don’t forget to cc your e-mails to the mailing list.</p>
<p>By the way, although you can subscribe to any mailing lists, I don’t