In the MIDS program we teach concepts and tools around data science and machine learning. While we offer our students introductory classes for Python and Statistics, we cannot deliver basic computation skills that are usually taught in a full Bachelor program CS. To cover this gap this course gives resources to learn the most important concepts and tools required to successfully complete the MIDS program.
This course has the objective to give resources for self learning and test the student's skill level in the end. Students who already know the content can skip ahead to the tests. Objectives of this course:
- Acquire basic knowledge/terminology about:
- Operating systems
- Computer Networks
- LINUX
- Learn basics of important tools:
- Linux command line
- Git and GitHub
- Shell scripting
This course guides you through a series of FREE online classes. For example in EDX you can select the Audit only option, the paid certificate is NOT necessary. The courses are in a order we think will be most efficient for learners but the are self contained and can be taken in any order. Each topic ends with a series of multiple choice questions, a short quiz. These quizzes are not graded and the answers directly follow each questions. They function as self assessment to give you an idea where you stand in the topic, i.e., if you should definitely take the course or maybe can skip the segment. At the end you will have to take an exam (find the link at the end of this doc). You will have to solve the given problem and write two reviews for the solution others have submitted to show your thorrough understanding of the material.
Alternatively you can prove your knowledge through having finished a CS undergrad or corresponding work experience - please speak to your admissions counsellor or your student success advisor.
- Access to a Linux machine. This can be a cloud machine, a virtual machine on your local computer, or a Linux sub system on Windows. The test material was tested on Ubuntu 20.04.1 LTS.
- Root access, i.e., admin rights on the system. You need to be able to install software and execute software as root.
- An Internet connection.
- An account on github.com. You can use your personal or work email address to set that up or your UC Berkeley address.
The big topics are
- Operating systems: have an understanding
- Networking: the network stack and an overview on protocols
- LINUX: have a working knowledge of the LINUX operating system
- Shell scripting (optional): this is an extension and a deep dive of topics covered in the LINUX class. You will need HEREDOCS in later classes though.
- Git and GitHub: You need to be able to use git from the web UI and the commandline, know branching, reversing, have good workflow habits.
Understanding how a computer works and the role of the Operating System is the basis for any work with cloud technology and distributed computing as is the standard for many big data and machine learning approaches.
The following course is offered through EDX: https://www.edx.org/course/introduction-to-hardware-and-operating-systems The relevant modules are 1 and 4. If you want to use different courses make sure they cover basic operating systems components like filesystem, the concept of a process and how it interacts with CPU and memory, I/O, schedulers, etc.
Quiz - click to see
- What is an operating system?
(a) a collection of programs that manages hardware resources
(b) a system service provider to the application programs
(c) a link to interface the hardware and application programs
(d) all of the above
Answer
d - all of the above- What is interprocess communication?
(a) communication within the process
(b) communication between two processes
(c) communication between two threads of the same process
(d) none of the mentioned
Answer
b - communication between two processes- The CPU fetches the instruction from memory according to the value of the
(a) program counter
(b) status register
(c) instruction register
(d) program status word
Answer
a - The CPU fetches instructions from memory according to the value of the program counter. These instructions may cause additional loading from and storing to specific memory addresses- Which one of the following is not shared by threads?
(a) program counter
(b) stack
(c) both program counter and stack
(d) none of the mentioned
Answer
c- If one thread opens a file with read privileges then
(a) other threads in another process can also read from that file
(b) other threads in the same process can also read from that file
(c) any other thread can not read from that file
(d) all of the mentioned
Answer
bLast check: Think through what happens in the OS, when you write a Python script that writes "Hello world" into a file?
Understanding how computers communicate with each other is a basic concept necessary to work with cloud tech, multi container setups, and more. Networking comes in two parts. The following EDX course covers an overview of the network stack. You can use any courses that cover the basics of the ISO-OSI network stack, IP, TCP, and firewalls. For example, on EDX find the course "Boston University EC441 Introduction to Computer Networking"
Quiz - click to see
- Which of the following is private IP address?
(a) 12.0.0.1
(b) 168.172.19.39
(c) 172.15.14.36
(d) 192.168.24.43
Answer
d - Class A private address range is 10.0.0.0 through 10.255.255.255. Class B private address range is 172.16.0.0 through 172.31.255.255, and Class C private address range is 192.168.0.0 through 192.168.255.255.- What protocol is used to find the hardware address of a local device?
(a) RARP
(b) ARP
(c) IP
(d) ICMP
Answer
b - Address Resolution Protocol (ARP) is used to find the hardware address from a known IP address.- Which of the following addresses is used to deliver a message to the correct application program running on a host?
(a) Port
(b) IP
(c) Logical
(d) Physical
Answer
a- The values GET, POST, HEAD etc are specified in which line of the HTTP message?
(a) Request line
(b) Header line
(c) Status line
(d) Entity body
Answer
a - It is specified in the method field of request line in the HTTP request message.- Connection establishment in TCP is done by which mechanism?
(a) Flow control
(b) Three-Way Handshaking
(c) Forwarding
(d) Synchronization
Answer
b - A three-way handshake allows both, the server and the client to choose their Initial Sequence Number and inform the other party about it.Most server infrastructure in use today is based in LINUX systems. MacOS too has a LINUX system at its core. Understanding the principles of this specific brand of operating systems and finding your way around in it is fundamental to this program. Many tools today abstract away the complexity of the underlying system but in order to debug a problem or create a new way of using modern tools will bring you back to digging deeper into LINUX. The following EDX course is made and maintained by the LinuxFoundation. https://www.edx.org/course/introduction-to-linux
Please go through at minimum chapters 3, 7 - 16, and 18. In chapter 14 you can skip directly to Networking configuration tools if you have finished the networking class before. You may also skip 9 if you are familar with ps and top and have finished the OS lecture above.
To get extra practice the linux survival tutorial allows you to test your skill in an interactive, gamified way. This is optional: https://linuxsurvival.com/linux-tutorial-introduction/
Quiz - click to see
- What approach does an application use to communicate with the kernel?
(a) System Calls
(b) C Programs
(c) Shell Script
(d) Shell
Answer
a- Which command creates an empty file if it does not exist?
(a) cat
(b) touch
(c) ed
(d) read
Answer
b- Which command is used to change permissions of files and directories?
(a) mv
(b) chgrp
(c) chmod
(d) set
Answer
c- What would be the current working directory at the end of the following command sequence?
Code:
$ pwd
/home/user1/proj
$ cd src
$ cd generic
$ cd .
$ pwd
(a) /home/user1/proj
(b) /home/user1/proj/src
(c) /home/user1
(d) /home/user1/proj/src/generic
Answer
d- What is a shell in UNIX?
(a) a program through which users can issue commands to UNIX
(b) a window management system
(c) the login screen
(d) the thing that rides on the back of a turtle in UNIX
Answer
aShell scripting is part of each LINUX class and you probably have received your first practice going through the LINUX course. Being able to use Shell scripting in an efficient way has more to it than can be covered in a general LINUX class. A great and interactive way to learn it is on https://www.learnshell.org/
A concept usually not covered by basic scripting classes is HEREDOCS, you will need this in the advanced classes in the MIDS program and also to solve the challenge that follows this course: https://linuxhint.com/bash-heredoc-tutorial/
"Git is a distributed version-control system for tracking changes in source code during software development. It is designed for coordinating work among programmers, but it can be used to track changes in any set of files. Its goals include speed, data integrity, and support for distributed, non-linear workflows." - Wikipedia Git is an essential tool used across most modern tech companies. In the MIDS program you will need to be able to use git together with GitHub (a platform hosting Git repositories). You will be required to work with git from the commandline, local and remote locations such as VMs in the cloud.
- To get started on basic git: https://www.codecademy.com/learn/learn-git
- Using github: https://guides.github.com/activities/hello-world/
- optional: deeper dive into git and github https://try.github.io/
Quiz - click to see
- Git
(a) is a distributed version control system
(b) is an operating system
(c) can have branches
(d) saves everything automatically
Answer
a and c - Always commit and push your progress!- Which of the following statements would create branch named as "mids"?
(a)git checkout -b mids
(b)git checkout -c mids
(c)git check -b mids
(d) none of the mentioned
Answer
a- To sync a commit to a remote repository, e.g. on GitHub you need the command
(a)git pull
(b)git sync
(c)git push
(d)git commit
Answer
c - git commit only affects your local repository.- To download a copy of this repository you should execute
(a)git clone
on your local computer
(b)git download
on your local computer
(c)git get
(d) none of the above
Answer
a- To clone this reppository you need the URL found on the top of this page. The correct URL to clone is
(a) https://github.com/UC-Berkeley-I-School/MIDS-1D-Computing-Basic.wav
(b) https://github.com/UC-Berkeley-I-School/MIDS-1D-Computing-Basic.html
(c) https://github.com/UC-Berkeley-I-School/MIDS-1D-Computing-Basic.mp3
(d) https://github.com/UC-Berkeley-I-School/MIDS-1D-Computing-Basic.git
Answer
d- Kurose, Ross: Computer Networking - A Top Down Approach: http://www.bau.edu.jo/UserPortal/UserProfile/PostsAttach/10617_1870_1.pdf
- Linux For Beginners by Jason Cannon
- The Linux Command Line : A Complete Introduction by William Shotts
- Modern Operating Systems by Andrew S. Tanenbaum
To prove your skills please take this challenge: In the meantime please use this link to see the challenge: https://docs.google.com/document/d/1ENcEp9UpTUxbU5GDlXn5stMt1hlHqDupNAwcBrcdmuQ/edit?usp=sharing Afterwards please invite me to your repo with the solution, GitHub ID dschioberg For questions reach out to dschib@berkeley.edu and subscribe to the ISchool slack channel #1d-computing-basics.