Chapter-3/review.txt

Review

Know the Concepts

• What does it mean if a line in the program starts with the ’#’ character?
    The octothorpe, or hash symbol, is used to mark the start of a code comment.

• What is the difference between an assembly language file and an object code
file?
    Assembly language files are a type of source code, it is human-readable and mnemonic with operation names like "add" or "mov" (for move), but assembly files alone can not be executed.
    In order to convert the source code into an executable file it must be translated into machine code by the assembler.
    Machine code is made up of 1's and 0's (binary), and is not easily readable by humans because it is used to speak directly to the CPU.
    Object code is a type of machine code that is created after assembling a programs source code, it usually contains metadata that will be used by the linker.
    This metadata tells the linker where programs start and how they relate to each other.
    Object code gets turned into an executable after the linking process.
    Assembly Source Code -> Assembler -> Object Code -> Linker -> Executable.

• What does the linker do?
    The linker converts object files into a single executable.
    Typically, an object file will contain references to other object files or libraries,
    the linker resolves all these references into one executable program.

• How do you check the result status code of the last program you ran?
    echo $?

• What is the difference between movl $1, %eax and movl 1, %eax?
    movl $1, %eax - This intruction uses immediate addressing ($) to move the value 1 into the accumulator (%eax register).
    movl 1, %eax - This instruction is using direct addressing to move the value stored at the 1 address in memory into the accumulator.

• Which register holds the system call number?
    %eax is the register responsible for storing the system call number.

• What are indexes used for?
    Indexes are used to keep track of where we are or where we want to be in an array.

• Why do indexes usually start at 0?
    Zero-based indexing makes pointer arithmetic simpler to use.
    When grabbing bytes, we multiply the index by the amount of bytes we need.
    For instance, if we wanted to grab the second byte from memory address 2700 (where our pointer is) we
    would simply do 2700 + 2 * 1 (2702).
    If we wanted the first element, we would do 2700 + 0 * 1 (2700).
    The formula used:
        address + index * bytes = effective address
    If we started counting at 1 instead we would need to account for this change by subtracting or adding 1 everytime we wanted to point to anything.
    Zero-based indexing allows for cleaner and simpler arithmetic.

• If I issued the command movl data_items(,%edi,4), %eax and
data_items was address 3634 and %edi held the value 13, what address would
you be using to move into %eax?
    We can simplify the instruction by plugging in the values above, so it'll be easier to understand. 
    movl data_items(,%edi,4), %eax --> movl 3634(,13,4), %eax.
    Now we can use the following formula to get the value that will end up in the accumulator (%eax register).
    address + index * bytes = effective address
    3634 + (13 * 4) = 3686.
    The contents of the %eax register will be the value currently stored at location 3686 in memory.

• List the general-purpose registers.
    EAX - (Extended Accumulator)
    EBX - (Extended Base)
    ECX - (Extended Counter)
    EDX - (Extended Data)
    ESI - (Source Index)
    EDI - (Destination Index)
    EBP - (Extended Base Pointer)
    ESP - (Extended Stack Pointer)

• What is the difference between movl and movb?
    movl - (Move Long) moves 32 bits (4 bytes).
    movb - (Move Byte) moves 8 bits (1 byte).
    The difference between movl and movb is how much data they move.

• What is flow control?
    Flow control refers to the direction of a program.
    Control of a programs flow kicks in during conditionals, loops, etc.
    In a simple program where every instruction is executed in order, no matter what,
    there is no control over its flow.
    Pseudo-Code for program with no flow control:
        a = 1
        b = 2
        c = 3
        print a
        print b
        print c
    Pseudo-Code for program with flow control:
        a = 1
        b = 2
        c = 3
        if a equals b
            print a
        else
            print c

• What does a conditional jump do?
    A conditional jump moves to another part in a program (jumps) based on
    whether the value of the instruction right before was true or false.
    It is used as a control flow mechanism.
    For example, if you wanted to return to the beginning of a loop if a number is not found, or
    exit the loop if the number is found in the current iteration of the loop.
    Pseudo-Code for conditional jump:
        NUMBERS = 1, 2, 3, 4, 5
        NUMBER = 0
        START_OF_LOOP
        if NUMBERS[NUMBER] = 4
            exit
        else
            NUMBER = NUMBER + 1
            Jump to START_OF_LOOP
    In assembly, common operation codes for conditional jumps include:
        JE: Jump if Equal
        JL : Jump if Less
        JGE : Jump if Greater or Equal
        JLE : Jump if Less or Equal

• What things do you have to plan for when writing a program?
    The purpose of the program.
    A general idea of how we may solve whatever problem the program is designed to solve.
    How much memory will be needed.

• Go through every instruction and list what addressing mode is being used for
each operand.
        movl $0, %edi --> IMMEDIATE ADDRESSING MODE, REGISTER MODE
        movl data_items(,%edi,4), %eax --> INDEX ADDRESSING MODE, REGISTER MODE
        movl %eax, %ebx --> REGISTER MODE, REGISTER MODE
    start_loop:
        cmpl $0, %eax --> IMMEDIATE ADDRESSING MODE, REGISTER MODE
        je loop_exit --> RELATIVE ADDRESSING
        incl %edi --> REGISTER MODE
        movl data_items(,%edi,4), %eax --> INDEX ADDRESSING MODE, REGISTER MODE
        cmpl %ebx, %eax --> REGISTER MODE, REGISTER MODE
        jle start_loop --> RELATIVE ADDRESSING
        movl %eax, %ebx --> REGISTER MODE, REGISTER MODE
        jmp start_loop --> RELATIVE ADDRESSING
    loop_exit:
        movl $1, %eax --> IMMEDIATE ADDRESSING MODE, REGISTER MODE
        int $0x80

Use the Concepts

• Modify the first program to return the value 3.
    Refer to exit-3.s for solution.

• Modify the maximum program to find the minimum instead.
    Refer to minimum.s for solution.

• Modify the maximum program to use the number 255 to end the list rather than
the number 0.
    Refer to maximum-3.s for solution.

• Modify the maximum program to use an ending address rather than the number
0 to know when to stop.
    Refer to maximum-4.s for solution.

• Modify the maximum program to use a length count rather than the number 0 to
know when to stop.
    Refer to maximum-5.s for solution.

• What would the instruction movl _start, %eax do? Be specific, based on
your knowledge of both addressing modes and the meaning of _start. How
would this differ from the instruction movl $_start, %eax?
    movl _start, %eax: This instruction moves the value of the _start label into the %eax register.
    The _start label usually points to the memory address of a programs starting point.
    movl $_start, %eax: This instruction moves the IMMEDIATE value of the _start label into the %eax register.
    The immediate value of the _start label will be the memory address of a programs starting point.


Going Further

• Modify the first program to leave off the int instruction line. Assemble, link,
and execute the new program. What error message do you get. Why do you
think this might be?
    Error Message: Segmentation fault (core dumped)
    Without a system call to exit, the program never terminated properly and resources were not returned.
    This resulted in a segmentation fault.

• So far, we have discussed three approaches to finding the end of the list - using
a special number, using the ending address, and using the length count. Which
approach do you think is best? Why? Which approach would you use if you
knew that the list was sorted? Why?
    There are pros and cons to the three approaches mentioned above.
    Using a special number to designate the end of an array is a quick and dirty method.
    It works and it's simple, but problems with this method arise when the same character is needed as part of the actual data.
    In this scenario, the program will interpret the data number as the end of the array.
    For example, if the special number is 0 but the list of data items is [4, 8, 3, 233, 13, 172, 0, 52, 37, 2, 0],
    the program will stop evaluating the numbers 52, 37, and 2.
    Using a memory address, as a stopping point, will ensure that we always stop at a precise location rather than stopping at any
    location that contains the value we are looking for.
    The downside to this method is that we need to calculate the address at the end of an array,
    with simple arrays this may be trivial, but as programs become more complex this process becomes tedious.
    Using the length count approach is simple, it's also elegant.
    Once you've calculated the length of an array, all you need is a counter that you can check against the length.
    This method also comes in handy when trying to implement a binary search algorithm.
    With a sorted list we can check which half of an array contains the target element and slice off the other half.
    This process repeats until the target element is found.
    Therefore, the best method for searching sorted arrays is the length count approach.