@@ -22,15 +22,15 @@ Frequently, research problems that use computing can outgrow the capabilities
2222of the desktop or laptop computer where they started:
2323
2424- A statistics student wants to cross-validate a model. This involves running
25- the model 1000 times -- but each run takes an hour. Running the model on
25+ the model 1000 times — but each run takes an hour. Running the model on
2626 a laptop will take over a month! In this research problem, final results are
2727 calculated after all 1000 models have run, but typically only one model is
2828 run at a time (in ** serial** ) on the laptop. Since each of the 1000 runs is
2929 independent of all others, and given enough computers, it's theoretically
3030 possible to run them all at once (in ** parallel** ).
3131- A genomics researcher has been using small datasets of sequence data, but
3232 soon will be receiving a new type of sequencing data that is 10 times as
33- large. It's already challenging to open the datasets on a computer --
33+ large. It's already challenging to open the datasets on a computer —
3434 analyzing these larger datasets will probably crash it. In this research
3535 problem, the calculations required might be impossible to parallelize, but a
3636 computer with ** more memory** would be required to analyze the much larger
@@ -54,7 +54,7 @@ problems in parallel**.
5454
5555## Jargon Busting Presentation
5656
57- Open the [ HPC Jargon Buster] ( ../ files/jargon#p1)
57+ Open the [ HPC Jargon Buster] ( files/jargon.html #p1 )
5858in a new tab. To present the content, press ` C ` to open a ** c** lone in a
5959separate window, then press ` P ` to toggle ** p** resentation mode.
6060
@@ -71,48 +71,44 @@ results.
7171## Some Ideas
7272
7373- Checking email: your computer (possibly in your pocket) contacts a remote
74- machine, authenticates, and downloads a list of new messages; it also
75- uploads changes to message status, such as whether you read, marked as
76- junk, or deleted the message. Since yours is not the only account, the
77- mail server is probably one of many in a data center.
78- - Searching for a phrase online involves comparing your search term against
79- a massive database of all known sites, looking for matches. This "query"
74+ machine, authenticates, and downloads a list of new messages; it also uploads
75+ changes to message status, such as whether you read, marked as junk, or
76+ deleted the message. Since yours is not the only account, the mail server is
77+ probably one of many in a data center.
78+ - Searching for a phrase online involves comparing your search term against a
79+ massive database of all known sites, looking for matches. This "query"
8080 operation can be straightforward, but building that database is a
8181 [ monumental task] [ mapreduce ] ! Servers are involved at every step.
82- - Searching for directions on a mapping website involves connecting your
83- (A) starting and (B) end points by [ traversing a graph] [ dijkstra ] in
84- search of the "shortest" path by distance, time, expense, or another
85- metric. Converting a map into the right form is relatively simple, but
86- calculating all the possible routes between A and B is expensive.
82+ - Searching for directions on a mapping website involves connecting your (A)
83+ starting and (B) end points by [ traversing a graph] [ dijkstra ] in search of
84+ the "shortest" path by distance, time, expense, or another metric. Converting
85+ a map into the right form is relatively simple, but calculating all the
86+ possible routes between A and B is expensive.
8787
8888Checking email could be serial: your machine connects to one server and
8989exchanges data. Searching by querying the database for your search term (or
90- endpoints) could also be serial, in that one machine receives your query
91- and returns the result. However, assembling and storing the full database
92- is far beyond the capability of any one machine. Therefore, these functions
93- are served in parallel by a large, [ "hyperscale"] [ hyperscale ] collection of
94- servers working together.
95-
96-
90+ endpoints) could also be serial, in that one machine receives your query and
91+ returns the result. However, assembling and storing the full database is far
92+ beyond the capability of any one machine. Therefore, these functions are served
93+ in parallel by a large, [ "hyperscale"] [ hyperscale ] collection of servers
94+ working together.
9795
9896:::::::::::::::::::::::::
9997
10098::::::::::::::::::::::::::::::::::::::::::::::::::
10199
102-
103-
104100[ mapreduce ] : https://en.wikipedia.org/wiki/MapReduce
105101[ dijkstra ] : https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
106102[ hyperscale ] : https://en.wikipedia.org/wiki/Hyperscale_computing
107103
108-
109104:::::::::::::::::::::::::::::::::::::::: keypoints
110105
111- - High Performance Computing (HPC) typically involves connecting to very large computing systems elsewhere in the world.
112- - These other systems can be used to do work that would either be impossible or much slower on smaller systems.
106+ - High Performance Computing (HPC) typically involves connecting to very large
107+ computing systems elsewhere in the world.
108+ - These other systems can be used to do work that would either be impossible or
109+ much slower on smaller systems.
113110- HPC resources are shared by multiple users.
114- - The standard method of interacting with such systems is via a command line interface.
111+ - The standard method of interacting with such systems is via a command line
112+ interface.
115113
116114::::::::::::::::::::::::::::::::::::::::::::::::::
117-
118-
0 commit comments