Skip to content

Unit 1 ‐ Lesson 1

Robert Millikin edited this page Jun 3, 2020 · 1 revision

Why code?

Computers were invented to perform computations so that humans don't have to. Computers are thus meant to save time and effort by performing menial computational work that would be time-consuming and/or tedious for humans to do.

However, computers must be told what the computation is; for example, you could have the computer print out all numbers for 2x where x ranges from 0 to 100. You could do this calculation manually but it would take a while; a computer could do this in milliseconds. However, it must be told to calculate 2x from x=0 to x=100.

This is what computer programming is: a set of instructions for computers to perform. It is the task of the computer programmer to give the computer accurate instructions. Computers are very literal beings and need to be given tasks in a very specific manner; they have their own language with its own syntax, vocabulary, etc. called machine language.

Machine language is very difficult for humans to understand and learn, and so we have created intermediate languages that are intelligible to us. The programming language we will use in these lessons is C# (pronounced "see sharp"). Lucky for us, the C# language was designed to be easy for humans to understand and learn. This intermediate language is transformed ("compiled") into machine language automatically by a C# compiler.

The example used above (regarding the 2x calculation) may not spark your imagination, but consider this: in proteomics, we collect data files that have tens of thousands of MS2 (fragmentation) spectra. Each of those spectra could contain the fragments of a peptide. But we don't know which peptide. So we generate a massive list of millions of theoretical peptides and their fragments, match those peptide fragments to the tens of thousands of MS2 spectra, and pick the peptide that best corresponds to the experimental spectrum for each MS2 scan.

A human could do this, but not on a feasible timescale; it would take weeks or months for a single data file and would not be an enjoyable experience. Computers can perform this menial work in a matter of minutes, leaving humans to perform more interesting tasks like interpreting what the search results mean.

Computational Efficiency

An important concept in computer programming is computational efficiency. There are different ways to perform a calculation, and some methods are more efficient than others. In a computer programming context, efficiency refers to how quickly the computer can perform the task. For example, if method A and method B produce the same answer to the same question, but B produces the answer more quickly, it is said to be more computationally efficient.

This concept may not be particularly important at first; if a result is produced in 50 milliseconds versus 500 milliseconds, who cares? A difference of 450ms is negligible to most humans. However, differences in computational efficiency can be dramatic and surprising; one method may take 30 seconds and another may take 30 minutes. Some may take days, months, or years! Thus it is important to understand how to write code that the computer can execute quickly; such code is said to be efficient code.

These articles are here for later reference:

Storing and Retrieving Data

Much like the human brain, computers store and interact with data. That is how they perform their calculations; the data that is being manipulated is stored somewhere, and the processor performs the operation on that stored data. Again, much like humans, computers have short-term and long-term memory. Short-term memory is called "Random Access Memory" (RAM). Often this is just called "memory". This short-term memory is emptied when the computer is turned off. Long-term memory is called "hard drive", "hard disk", or "disk", and persists beyond when the computer is turned off. The processor that manipulates the stored data is called the "Central Processing Unit" (CPU) or often just "processor".

RAM is much faster to retrieve data from than from disk. The exact amount that RAM is faster by depends on how the data is stored and other factors, but generally it is between 6 and 100,000 times faster. RAM, however, is more expensive (in dollars) than disk space. Thus they are used for different purposes: the disk stores installed programs and data not currently in use, and when a program or data is currently in use, it is loaded into RAM.

Input and Output (I/O)

Until now we have written no code. Let's change that.

First you will need Visual Studio Community Edition. Visual Studio contains a C# compiler and other niceties that make coding in C# easier, such as syntax highlighting and autocomplete.

Also install Git. Git is a tool for downloading and uploading code over the Internet.

Now:

  1. Open Visual Studio and create a new project. Select the Console App (.NET Core) template. Name your project HelloWorld.
  2. On the top area of Visual Studio, click the button with the green arrow that says HelloWorld. This runs the program.

A window should pop up (probably a black box) that prints out some text. The window is called the command prompt, or command-line interface (CLI) or "console". In the old days the console was used to provide commands to the computer, and in some cases it is still used for that purpose. It is possible to provide C# commands to the console, but we won't cover that here; generally, this functionality is not useful. However we can have our C# programs write text to the command prompt. This is called output - the result of your program.

Generally speaking, the workflow of a program is:

Input -> Manipulation -> Output

"Input" is data stored in RAM or on the disk. "Manipulation" retrieves this data and does the operations that we have told the computer to perform on the data. "Output" is the manipulated data; often we want to store this again into memory or on to the disk.

Consider the similarity to a math function:

f(x) = x2

x is the input, the square is the manipulation, and the output is whatever number x2 comes out to be.

For the program we just ran, Hello World! written to the console is the output. You can close the command prompt window and go back to Visual Studio. If you look at the code, you will see the command Console.WriteLine("Hello World!");. This is C# code. It tells the computer to write Hello World! to the console, and that's exactly what happened. Hello World! was the input to the function, Console.WriteLine was the manipulation, and the output was Hello World! on your console.

Storing data via code

So now we are able to call functions in C# with some input in the code and write some output. Often, though, we want to be able to take input from the user of the program rather than editing the code to provide input. This makes the program more flexible; different inputs can be run without having to re-compile the code for each input.

Let's store some data in memory. Type these lines below the existing Console.WriteLine in Visual Studio:

string input = Console.ReadLine();

Console.WriteLine(input);

Start the program. The program will print out Hello World! and then wait for input. You can type in whatever you want - for example, test. Then hit enter. The program will output what you just typed in.

Let's examine what those two new lines of code do.

string input tells the computer that you want to store some data in RAM; this is called declaring a variable, where input is the name of the variable and string is the variable's type. In C#, data is "typed", meaning that certain data is labeled as a certain type. This helps the computer figure out what operations are possible to perform on that data. For example, 22 is a valid mathematical operation but hi2 is not; "hi" is not a number and so the square operation is not valid for that type.

Here are some types in C# that you should be familiar with:

string - any text. For example hi or 1345.

char - a character. For example c or 1.

int - an integer number. For example, 1 or -20. Does not include decimal numbers such as 1.5.

double - a decimal number. For example, 1 or -1.5.

bool - a boolean. Can be only either true or false.

One thing to take note of is that we named this variable input. We can give our variables almost any name we like; some words are reserved by the C# compiler (called keywords, like string) because if we used them as a variable name it would confuse the compiler. It is very important to give your variables meaningful, descriptive names. In a team setting, other programmers will be looking at your code, trying to understand it. If you use meaningful names, it will help other programmers to understand your code. Get in the habit of giving your variables useful names now.

= is the next part of the code line; it is the assignment operator. It assigns the data on the right into part on the left.

Console.ReadLine() is a function that reads a line from the console. It returns a string type as output.

; tells the compiler that the command is done.

Thus this line of code is telling the computer to:

  • use a function to read in a line of text from the console,
  • store the output of this function in a variable of type string, which is named input

The next line of code writes the input variable to the console. Since we stored whatever we typed in to the console in to the input variable, and then we print out the input variable, the console will print out whatever we typed in to the console.

Homework 1

Calculator - Part 1

We will be building a simple calculator in this lesson and the next. This is part 1.

  • Open Visual Studio and create a new project. Select the Console App (.NET Core) template. Name your project "ConsoleCalculator".
  • Delete the "Console.WriteLine("Hello World!");" line.
  • Write a line of code that will print out 42 to the console. Do not copy+paste or edit an existing line. Seriously, type out the command yourself.
  • Run the program.
  • Did your program print out 42? If so, continue.
  • Replace the line of code that prints out 42 with a line of code that asks the user for a number. Store that number in a variable. Then write a line of code that prints that number to the console. Run the program; does it work?
  • Try to run your program and when it prompts for input, try putting in 42.5. Does your program handle a decimal value or does it print an error message? Why/why not?
  • Try running it again but put in a value that isn't a number, like number. Does it report an error message? Should it report an error message? (Think about the purpose of this program.) Why/why not?
  • Try to store the console input into a double type. Does the program run? Why/why not?