UsingGENAssembler

Using the GEN-Assembler

The Burroughs Algebraic Compiler for the 220 (BAC-220 or BALGOL) was composed of four main parts:

The Generator program, used to customize the compiler for specific environments.
The Compiler Main module, which did a one-pass compilation of the BALGOL source program.
The Compiler Overlay module, which linked to the compiled BALGOL code any library routines and any machine-language routines that were included with the input to the compiler. This module also generated the necessary coding to support overlays and the symbolic dump feature, if used.
The library routines (SIN, COS, SQRT, READ, WRITE, etc.)

Two assemblers were used to prepare object code for the compiler, one for the Generator program, and one for everything else. We have no documentation or software in any form for either of these assemblers. Thus, in order to have a means to create object code from the compiler source listings donated to the Computer History Museum by Professor Donald Knuth, both assemblers had to be reverse-engineered from the listings.

This wiki page describes what we term the GEN-Assembler, used for the BALGOL Generator program.

Introducing the GEN-Assembler

GEN-Assembler is a cross-assembler. It is written in Javascript and runs in a standard web browser. You can load the assembler from this project's hosting site:

http://www.phkimpel.us/Burroughs-220/software/tools/GEN-Assembler.html

You can also run the assembler from another web server where you have set up the emulator files. The assembler consists of one HTML file, with the Javascript code and CSS style sheet embedded within it. It also depends upon the webUI/B220FramePaper.html and webUI/resources/ajax-spinner.gif files for the retro-220 emulator.

Background

The original assembler for the Generator program was substantially different and quite a bit more sophisticated than the assembler used for the other components of the BALGOL compiler. Our guess is that the Generator program was written sometime after the compiler was originally released, and may not have been written at Burroughs at all, although Burroughs distributed the Generator program as part of the compiler.

The primary purpose of the GEN-Assembler has been to generate object code for the transcribed Generator program listing. Since we have no information on the original assembler that ran on the 220, the syntax and semantics for GEN-Assembler were determined by inspecting and reverse-engineering the listing of the Generator program. While it is a fully-functional assembler and can be used for general 220 programming, it is only as functional as has been needed to assemble the Generator successfully. It does not attempt to be a faithful recreation of whatever assembler was used in the early 1960s. That original assembler likely had other features that are not apparent from the Generator listing, and thus have not been implemented in the GEN-Assembler.

User Interface

When you open the assembler in a browser, you will see a window similar to this:

GEN-Assembler User Interface

The interface has two file-picker controls plus additional controls to specify output from the assembler. The large area with scrollbars will display any listing generated during an assembly. The page has a light-green background to distinguish it from the BAC-Assembler used for the other components of the BALGOL compiler, as the user interface for both assemblers is almost identical.

The controls at the top of the page are:

List Pass 1 -- If this checkbox is ticked, the assembler will produce a listing during its initial pass. The listing will show address assignments but not generated object code. It is not normally useful for regular programming, and the checkbox is not ticked by default.
List Pass 2 -- If this checkbox is ticked, the assembler will produce a listing during its second pass. This listing includes address assignments and generated object code. This checkbox is ticked by default.
Write Checksum -- If this checkbox is ticked, the assembler will generate a checksum word for the generated code and output it as an additional word of object code on the medium selected by the next control. It is simply a sum of all the words output by the assembler, ignoring the three high-order bits in sign digits, and discarding any arithmetic overflow.
Output Mode -- This pull-down list selects the medium to which the generated object code will be written. These options have been copied from the BAC-Assembler, although most of them are not relevant to assembling the Generator program. The choices are:
- No Object -- No object code will be produced by the assembler. This option may be useful for syntax-only assemblies or simply to generate a listing of the code.
- Loadable Deck -- Produces a standard 220 "format-6" image of a self-loadable card deck in a separate temporary window. This is the default selection. Each card image holds up to six words of 220 code per card. Each card bootstraps the next card during the load process. The deck is prepared to be loaded from Cardatron input unit 1. Bootstrap with a 6 1000 60 0000 instruction in the C register.
- Paper Tape -- Produces a self loading paper tape image in a separate temporary window. The image is configured for paper tape reader 1. Bootstrap with a 6 1000 04 0000 instruction in the C register.
- BALGOL ML Deck -- Produces a BALGOL "machine language" card deck image in a separate temporary window. This format also has up to six words of object code per card, but is formatted for use by the BALGOL Object Loader and Generator programs. This is the format in which library routines must be output. See Appendix F of the BALGOL Reference Manual under the heading "PREPARATION OF EXTERNAL PROGRAMS."
- Gen MEDIA Deck -- Produces a Generator Program INPUTMEDIA/OUTPUTMEDIA card deck. This format is used to supply custom input/output routines for use by the compiler and library. It has one word of object code per card, as described in Appendix F of the BALGOL Reference Manual under the heading "INPUT-OUTPUT PROCEDURES."
- Object Tape -- Produces a retro-220 emulator magnetic tape image in a separate temporary window. This image can be saved and used as described in the Using Magnetic Tape wiki page.
Extract Listing -- Clicking this button will select all of the text in the listing area of the assembler page. Once this is done, you can copy/paste the listing text into some other program for printing or saving to disk.
Pre-load Pool -- This control allows you to specify a JSON file that pre-loads the assembler's literal pool. Its use is optional, and is not normally used for regular 220 programming. See the discussion on Pre-Loading the Literal Pool below for more information.
Load Source & Go -- This control selects the source file to be assembled. Selecting a file with this control automatically starts the assembly process, so you should establish any other control settings before using this one. See the discussion below on source files for the assembler.

Note that object code output in the form of card decks or magnetic tape is generated in a separate temporary window. From this window you can save the object code text to a local file or copy/paste the text into another program. Once you have captured the object code, simply close the temporary window.

Assembler Notation

This section describes the syntax and semantics of the notation used by the GEN-Assembler.

GEN-Assembler Source Files

The assembler reads source from ordinary text files. The lines in these files may be delimited by a CR-LF pair, LF only, or CR only. The assembler does not recognize horizontal tab (HT) characters and does not do tab expansion.

Each line in a file generally contains one machine instruction or assembler pseudo-instruction, but there are exceptions. Lines are laid out as follows:

Columns	Description
1-4	Ignored by the assembler. In the original assembler, they would have been used for Cardatron format band selection.
5-72	Blank, or a symbolic label, or a point label. If the label extends beyond column 14, then the override sign and mnemonic op code must start on the next line in column 16.
16	Override sign digit (0-9, +, -). The digit in this column sets the sign digit in the word. Note that this column may contain a "`(`", which indicates definition of a constant numeric value for the word at that location.
17-24	Symbolic operation code (standard 220 mnemonics). See the discussion on "Constant Lists" in the Pseudo-Instructions section below.
25-72	Operands and comments. Operand fields are delimited by commas. The first space terminates the operands. Any text after the space is ignored by the assembler and may be used for comments.
73-80	Card sequence number and identification. These columns are ignored by the assembler.

Labels and Symbols

Symbolic addresses are represented in the assembler by labels. Labels are identifiers that consist of from one to 68 characters.

Columns 5-14 of a line must either be blank or the label for that line must begin in column 5. If the label extends beyond column 14, then the rest of the instruction must be continued starting on the next line. By default, the presence of a non-blank symbol in the label field of a source line causes the value of the assembler's current address counter to be assigned as the value of that symbol. Certain pseudo-instructions may cause the symbol to have a different value, however.

Labels have two forms, global labels and point labels.

A global label is a symbol that contains at least one alphabetic letter, but it may also contain decimal digits and the period (".") character. Such a symbol has global scope throughout the assembly and may appear in the label field only once. The assembler will issue an error if the same global symbol is redefined by appearing more than once in a label field. Global labels used as operands are referred to simply by their symbol.

A point label is a symbol in the label field that is a numeric literal. Point labels have local scope and may appear multiple times in label fields throughout the assembly. They are frequently used to implement branches and data references to nearby locations. The scope of a point label is:

from the line where it is declared backward to (but not including) the prior declaration of the same point label, or to the beginning of the program if the label was not previously declared; and
from the line where it is declared forward to (but not including) the next declaration of the same point label, or to the end of the program if the label is not declared further on.

A point label used in the operand field of an instruction is coded as the literal numeric symbol followed by an F or B. An F ("forward") refers to the location of the next declaration of the point label later in the source; a B ("backward") refers to the location of the prior declaration of the point label earlier in the source. As an example, here is a code snippet from the Generator program:

0396  0 0004 45 0000                     CLB
0397  1 0000 41 2448         2          -LDR     TBL
0398  0 0811 18 2454                     CFR     TBL+6/08
0399  0 0099 37 0407                     BFR     1F/00,99
0400  0 0001 35 0406                     BCU     2F
0401  1 0000 10 2449                    -CAD     TBL+1
0402  0 1200 37 0405                     BFR     $+3/12,00
0403  0 0000 12 0239                     ADD     RELOCATION
0404  0 0000 13 2421                     SUB     =4900=
0405  0 0000 40 0224                     STA     MAMAXP
0406  0 0002 20 0397         2           IBB     2B,2

0407  0 0000 10 0224         1           CAD     MAMAXP
0408  0 0000 13 0620                     SUB     IK
0409  0 0000 40 0224                     STA     MAMAXP

There are two declarations of the point label 2 in this code, but the operands referencing it can only reach the immediately next or prior declaration of the label with respect to the location of the operand. To see the way that the point labels resolve to addresses, examine the address fields (last four digits of the instruction words) in the example above.

Both global label and point label symbols in the label field of a line may be followed by a minus sign (-) and a numeric literal. This is termed an "offset label." The negative offset specifies that the label is being declared that many words before the location the label represents. Consider this example from the Generator program:

1525  0 0000 00 0000         CDR-6       FILL    0,22

Without the offset value, the label CDR would have the value 1525. The offset indicates, however, that the label is being declared six words early, so the value of CDR is actually 1525+6 = 1521.

This feature is especially useful with Cardatron format bands (the FORMAT pseudo-instruction). Format bands occupy a block of 29 words and are loaded into the Cardatron using the CRF and CWF instructions. Both instructions must address the last word of the block, however, so it is common to see a declaration like this, which assigns the label ALFORMAT the value 1493+28 = 1521:

                             ALFORMAT-28
1493  3 3333 33 3333                     FORMAT  INPUT,16(T5A)

Operands

The assembler generates machine instructions by "assembling" words using the mnemonic operation code from columns 17-24 of a source record, the sign digit from column 16, and zero or more comma-delimited symbolic expressions from the operand field in columns 25-72. The symbolic expressions are in turn composed of operand primaries and operators.

The number of expressions the assembler expects to encounter in the operand field of an instruction varies by the mnemonic operation code. A given operation code may require certain expressions to be specified; some expressions may be optional and will generally have the value of zero if omitted.

Operand Primaries

The GEN-Assembler recognizes the following as primaries (fundamental operands) in operand expressions:

Primary	Description
`$`	Current value of the assembler's location counter, i.e., the address of the instruction being assembled.
integer	An unsigned decimal literal. The value of the primary is the value of the integer. Such primaries are used for several purposes, depending on the instruction being assembled and the relative position in the list of operands. Uses include absolute memory addresses, offsets to symbolic addresses, unit numbers in I/O instructions, and other variant field values.
symbol	An alphanumeric symbol referring to a global label. The value of the primary is the address at which the symbol was defined in a label field.
integerF	A forward reference to a numeric point label. The value of the primary is the address of the next declaration of that point label later in the source.
integerB	A backward reference to a numeric point label. The value of the primary is the address of the prior declaration of that point label earlier in the source.
`(`expression`)`	An operand expression (see below). The value of the primary is the value of the expression.
`=`expression`=`	A literal. The value of the primary is the address of the word in the literal pool where the value of the expression will be stored.
expression`(`sL`)`	The value of the primary is a word of zeroes with the value of the expression inserted into the field designated by "sL." See the definition of "sL" in the section below on Machine Instructions.
`='`string`'=`	A string literal. The value of the primary is the address of the first word of the string data in the literal pool. When used in the operand list for a machine instruction, the string data is limited to five characters (one word). Operands for the SPO instruction are an exception to this, and are discussed in the notes for that instruction. Operands for "constant list" declarations may also be longer strings. Words in the pool will have a sign of `2`. The "`'`" characters are string quotes and are not stored in the literal pool. If the length of the quoted string is not a multiple of five characters, the final word will be padded on the right with spaces (220 code 00).

Operand Expressions

Each comma-delimited operand for an instruction is a symbolic expression. An expression consists of an operand primary by itself or two operand primaries separated by an operator. The valid operators are:

+ Addition
- Subtraction
** Multiplication
/ Integer division
// Integer remainder.

The value of the expression is computed from the values of the primaries operated upon by the operators. Evaluation of the expression is strictly left-to-right except where parenthesized expressions dictate a different order. When an expression represents a memory address, the resulting value is truncated on the left to a four-digit memory address. If the address is negative, it is converted to its tens-complement value before being truncated (e.g., -123456 would be converted to 6544).

Examples:

Expression	Value
`*`	The current value of the location counter
`ABC`	The address associated with the symbol ABC
`3456`	The address 3456.
`ABC-1`	The address associated with the symbol ABC, minus 1
`ABC-DEF+2`	The difference between the addresses for ABC and DEF, plus 2 words
`=1234=`	The address of the literal value +0000001234 in the literal pool
`=123(64)=`	The address of the literal value +0001230000 in the literal pool.
`=1234=-=5678=`	The difference between the addresses of the two literals in the literal pool
`21F`	The address of the next declaration of the point label 21
`21B+3`	The address of the prior declaration of the point label 21, plus 3 words
`21B-=-1234=-1234`	The address of the prior declaration of the point label 21 minus the address of the pool literal -1234, minus 1234 words

Machine Instructions

The bulk of the source records fed to the assembler will typically be those representing machine instructions. The following table shows how the list of operand expressions in columns 25-72 of the record are assembled into a machine instruction word. See the Operational Characteristics of the Burroughs 220 reference manual for details on the format of instruction words and the meaning of the sub-fields for each instruction.

The table below shows the number and type of operands the assembler normally expects for each mnemonic op code. Operands in square brackets are optional and may be omitted. If an operand in the middle of the list is omitted, its comma must be retained, although commas for omitted operands at the end of the list may also be omitted. Unless noted otherwise below, the value of omitted operands is zero.

Additional operands may be appended to the list of standard operands defined for each op code below. These must also be delimited by commas. In addition, each of these additional operands must be followed by a partial-word designator (sL) enclosed in parentheses. This causes the value of the operand to be placed into the assembled word in the field specified by sL, overwriting any digits that may have been placed there by earlier operands in the list. This syntax is typically used to insert addresses or constants into unused digit positions of the instruction. For example, the following instruction generates the word 0 1370 00 7310:

        HLT     7310,1370(44)

The following conventions are used for operands in the op code table below:

aaaa -- the operand address. In some instructions, such as shifts, this field is a count and not an address, and not all of the high-order digits are used.
b -- Cardatron format band number.
c -- a control digit used in the Cardatron CWR instruction to specify how the "T relays" are to be set for card machine control.
d -- a control or variant digit with a meaning specific to the instruction.
f -- a digit indicating the instruction is to execute in a special mode, e.g., whether it targets a partial-word (sL) operand field or a whole word.
hhu -- a two-digit head (lane) and one-digit unit number -- a combined field used in some magnetic tape instructions.
k -- a digit indexing the category code word for magnetic tape scan (MTC, MFC) instructions.
kk -- the size of a block to be written by magnetic tape write instructions.
nnnn, nn, or n -- a count or other parametric value used by the instruction.
r -- digit used to specify reload-lockout in some Cardatron instructions.
sL -- a partial-word designator, used in instructions that operate on a field of digits within a word. s is the starting digit number in the word and L is the length of the field, starting with the s digit and extending to the left. A value of 0 for either digit is interpreted as 10. Digits in a word are numbered ±1234567890, where ± is the sign digit (which cannot be indexed by s). In order for the partial-word designator to be valid, the relation (s+1) <= L must hold.
u -- unit number for an input/output instruction
v -- a control or variant digit with a meaning specific to the instruction.

Word Format	Mnem	Operands	Notes
`± 0000 00 aaaa`	`HLT`	`[aaaa]`
`± 0000 01 aaaa`	`NOP`	`[aaaa]`
`± unn0 03 aaaa`	`PRD`	`aaaa,u,nn`
`± unn1 03 aaaa`	`PNC`	`aaaa,u,[nn]`	Read paper tape ignoring control words
`± u00v 04 aaaa`	`PRB`	`aaaa,u,[v]`
`± unnv 05 aaaa`	`PRI`	`aaaa,u,[nn],[v]`
`± unn0 06 aaaa`	`PWR`	`aaaa,u,nn`
`± u000 07 aaaa`	`PWI`	`aaaa,u`
`± 0000 08 aaaa`	`KAD`	`[aaaa]`
`± 0nn0 09 aaaa`	`SPO`	Special SPO operand	See the discussion on the SPO operand at the end of this table
`± 0000 10 aaaa`	`CAD`	`aaaa`
`± 0001 10 aaaa`	`CAA`	`aaaa`
`± 0000 11 aaaa`	`CSU`	`aaaa`
`± 0001 11 aaaa`	`CSA`	`aaaa`
`± 0000 12 aaaa`	`ADD`	`aaaa`
`± 0001 12 aaaa`	`ADA`	`aaaa`
`± 0000 13 aaaa`	`SUB`	`aaaa`
`± 0001 13 aaaa`	`SUA`	`aaaa`
`± 0000 14 aaaa`	`MUL`	`aaaa`
`± 0000 15 aaaa`	`DIV`	`aaaa`
`± 0000 16 aaaa`	`RND`	`[aaaa]`
`± 0000 17 aaaa`	`EXT`	`aaaa`
`± sLf0 18 aaaa`	`CFA`	`aaaa[/sL]`
`± sLf1 18 aaaa`	`CFR`	`aaaa[/sL]`
`± 0000 19 aaaa`	`ADL`	`aaaa`
`± nnnn 20 aaaa`	`IBB`	`aaaa,nnnn`
`± nnnn 21 aaaa`	`DBB`	`aaaa,nnnn`
`± n000 22 aaaa`	`FAD`	`aaaa,[n]`
`± n001 22 aaaa`	`FAA`	`aaaa,[n]`
`± n000 23 aaaa`	`FSU`	`aaaa,[n]`
`± n001 23 aaaa`	`FSA`	`aaaa,[n]`
`± 0000 24 aaaa`	`FMU`	`aaaa`
`± 0000 25 aaaa`	`FDV`	`aaaa`
`± sLnn 26 aaaa`	`IFL`	`aaaa/sL,nn`
`± sLnn 27 aaaa`	`DFL`	`aaaa/sL,nn`
`± sLnn 28 aaaa`	`DLB`	`aaaa/sL,nn`
`± 0nn0 29 aaaa`	`RTF`	`aaaa,nn`
`± 0000 30 aaaa`	`BUN`	`aaaa`
`± 0000 31 aaaa`	`BOF`	`aaaa`
`± 0000 32 aaaa`	`BRP`	`aaaa`
`± 000d 33 aaaa`	`BSA`	`aaaa,d`
`± 0000 33 aaaa`	`BPA`	`aaaa`
`± 0001 33 aaaa`	`BMA`	`aaaa`
`± 0000 34 aaaa`	`BCH`	`aaaa`
`± 0001 34 aaaa`	`BCL`	`aaaa`
`± 0000 35 aaaa`	`BCE`	`aaaa`
`± 0001 35 aaaa`	`BCU`	`aaaa`
`± sLnn 36 aaaa`	`BFA`	`aaaa/sL,nn`
`± sL00 36 aaaa`	`BZA`	`aaaa[/sL]`
`± sLnn 37 aaaa`	`BFR`	`aaaa/sL,nn`
`± sL00 37 aaaa`	`BZR`	`aaaa[/sL]`
`± u000 38 aaaa`	`BCS`	`aaaa,u`
`± 0000 39 aaaa`	`SOR`	`[aaaa]`
`± 0001 39 aaaa`	`SOH`	`[aaaa]`
`± 0002 39 aaaa`	`IOM`	`aaaa`
`± sLf0 40 aaaa`	`STA`	`aaaa[/sL]`
`± sLf1 40 aaaa`	`STR`	`aaaa[/sL]`
`± sL02 40 aaaa`	`STB`	`aaaa[/sL]`
`± 0000 41 aaaa`	`LDR`	`aaaa`
`± 0000 42 aaaa`	`LDB`	`aaaa`
`± 0001 42 aaaa`	`LBC`	`aaaa`
`± 000d 43 0000`	`LSA`	`d`
`± 0000 44 aaaa`	`STP`	`aaaa`
`± 0001 45 aaaa`	`CLA`	`[aaaa]`
`± 0002 45 aaaa`	`CLR`	`[aaaa]`
`± 0003 45 aaaa`	`CAR`	`[aaaa]`
`± 0004 45 aaaa`	`CLB`	`[aaaa]`
`± 0005 45 aaaa`	`CAB`	`[aaaa]`
`± 0006 45 aaaa`	`CRB`	`[aaaa]`
`± 0007 45 aaaa`	`CLT`	`[aaaa]`
`± 0000 46 aaaa`	`CLL`	`aaaa`
`± 0000 48 aaaa`	`SRA`	`aaaa`
`± 0001 48 aaaa`	`SRT`	`aaaa`
`± 0002 48 aaaa`	`SRS`	`aaaa`
`± 0000 49 aaaa`	`SLA`	`aaaa`
`± 0001 49 aaaa`	`SLT`	`aaaa`
`± 0002 49 aaaa`	`SLS`	`aaaa`
`± uhh0 50 aaaa`	`MTS`	`aaaa,hhu`
`4 uhh0 50 aaaa`	`MFS`	`aaaa,hhu`	Note that the sign digit is initialized to 4.
`± uhh4 50 0000`	`MLS`	`hhu`
`± uhh8 50 0000`	`MRW`	`hhu`
`± uhh9 50 0000`	`MDA`	`hhu`
`± uhhk 51 aaaa`	`MTC`	`aaaa,hhu,k`
`4 uhhk 51 aaaa`	`MFC`	`aaaa,hhu,k`	Note that the sign digit is initialized to 4.
`± un00 52 aaaa`	`MRD`	`aaaa,u,n,[v]`	The `v` operand is added to digit 4 in the instruction word. This is usually the value 8, which indicates B-Register modification of words read
`± un01 52 aaaa`	`MNC`	`aaaa,u,n,[v]`	The `v` operand is added to digit 4 in the instruction word. This is usually the value 8, which indicates B-Register modification of words read
`± un00 53 aaaa`	`MRR`	`aaaa,u,n,[v]`	The `v` operand is added to digit 4 in the instruction word. This is usually the value 8, which indicates B-Register modification of words read
`± unkk 54 aaaa`	`MIW`	`aaaa,u,n,kk`
`± un00 55 aaaa`	`MIR`	`aaaa,u,n`
`± unkk 56 aaaa`	`MOW`	`aaaa,u,n,kk`
`± un00 57 aaaa`	`MOR`	`aaaa,u,n`
`± un00 58 0000`	`MPF`	`u,n`
`± un01 58 0000`	`MPB`	`u,n`
`± u002 58 0000`	`MPE`	`u`
`± u000 59 aaaa`	`MIB`	`aaaa,u`
`± u001 59 aaaa`	`MIE`	`aaaa,u`
`± unnv 60 aaaa`	`CRD`	`aaaa,u,[v],[nn]`	Use v=1 to impose reload-lockout
`± u01v 60 aaaa`	`CNC`	`aaaa,u,[v]`	Read cards ignoring control words; use v=1 to impose reload-lockout
`± u011 60 aaaa`	`CNCL`	`aaaa,u,[v]`	Read cards ignoring control words and imposing reload-lockout
`± u0c1 61 aaaa`	`CWR`	`aaaa,u,b,[c]`	The value (`b-1`)2 is added* to digit 4 in the instruction word
`± u00r 62 aaaa`	`CRF`	`aaaa,u,b,[r]`	The value (`b-1`)2 is added* to digit 4 in the instruction word
`± u001 62 aaaa`	`CRFL`	`aaaa,u,b`	Impose reload-lockout. The value (`b-1`)2 is added* to digit 4 in the instruction word
`± u000 63 aaaa`	`CWF`	`aaaa,u,b`	The value (`b-1`)2 is added* to digit 4 in the instruction word.
`± u000 64 aaaa`	`CRI`	`aaaa,u`
`± u000 65 aaaa`	`CWI`	`aaaa,u`
`± 0nn0 66 aaaa`	`HPW`	`aaaa,nn`
`± 0000 67 aaaa`	`HPI`	`[aaaa]`

The operand for the SPO instruction (op code 09) can take two forms:

A standard address and count pair, i.e., aaaa,nn. The address expression specifies the start of a message to be typed on the SPO. The count is the number of words in the message.
A literal "SPO string."

A SPO string is composed of one or more string segments. The segments are written one after the other, without intervening spaces or punctuation (except in the case of multi-line strings as discussed below). The assembler concatenates these segments into one contiguous string of words in the literal pool. The segments may cross word boundaries and start and end in the middle of a word. The assembler automatically inserts the address of the first word and the total number of words into the instruction.

String segments consist of literal strings and the following letter codes:

'*string*' -- a literal string bounded by single quotes. Unlike most other uses of string literals, this string may be more than five characters in length.
R -- a carriage-return character (220 code 16). This performs a new-line function.
L -- a form-feed character (220 code 15).
T -- a horizontal tabulation character (220 code 26).
I -- a non-printing space (220 code 02). The teletype machine discards this character and does not move the print head. It is often used to pad the length of a message out to a multiple of five words so that the teletype does not type spaces at the end of the message.

SPO strings may continue across multiple source lines by dividing them at segment boundaries. After the end of a segment, write a space and three periods (" ..."). Continue on the next line with additional string segments starting in column 25. Here is an example of a multi-line SPO string from the Generator program:

        SPO     R'MEMORY SIZE MUST BE GIVEN AS A MULTIPLE' ...
                ' OF ONE HUNDRED'RRRI

Pseudo-Instructions

The GEN-Assembler supports a number of pseudo-instructions. These do not represent machine instructions, but instead specify address and location information to the assembler or provide convenient ways to specify constant values and other blocks of data.

Simple Pseudo-Instructions

REM: A remark or comment. This line in the source will appear on any listings with the REM blanked out, but otherwise it is ignored by the assembler.

IS: This pseudo defines the value of a symbol. The symbol in the label field of the record is assigned the value of the single address expression in the operand field. The address expression must be resolvable by the assembler during its first pass, i.e., all symbols in the expression must have been defined prior to this point in the source.

ORIGIN: Normally the assembler's location counter is incremented by one for each word of object code generated. This pseudo sets the location counter to the value of the single address expression in the operand field. As with IS, this expression must be resolvable during the first pass of the assembly.

PLACE: While ORIGIN specifies the value of the assembler's location counter used to assign addresses to symbols and compute address expressions, PLACE specifies the zero-relative location in the assembled memory image where the instructions will be stored. It takes a single address expression operand. As with IS and ORIGIN, this expression must be resolvable during the first pass of the assembly.

Internally, PLACE defines an offset to the current ORIGIN value, which the assembler uses to position words in the assembled memory image. Initially, this offset is zero. Upon encountering a PLACE pseudo, the assembler computes the offset as the value of the PLACE operand expression minus the current location counter.

PLACED This pseudo cancels the effect of a prior PLACE pseudo and resets the memory location offset to zero.

FORGET: If the operand to this pseudo is the symbol NAMES, the pseudo clears the assembler's symbol table. This is typically used when one assembled memory image must be created from multiple assemblies, as is the case with the Generator program. Specifying any other operand for this pseudo is an error.

FILL: This pseudo stores one or more copies of a word value starting at the current location. The first operand is a constant numeric expression for the word value. The second operand must be a constant numeric expression for the number of words to be stored.

POOL: This pseudo may be used at most once in an assembly. It causes the assembler's literal pool to be output at the current location. If this pseudo is not present in an assembly, the literal pool is output when the assembler encounters the next END or FORGET NAMES pseudo.

DO: This pseudo is a macro that generates the STP/BUN instruction pair for a subroutine call. It takes a single operand that may have one of two forms:

DO XXXXX -- generates STP XXXXX, BUN XXXXX.1
DO XXXXX.# -- generates STP XXXXX, BUN XXXXX.#

where XXXXX, XXXXX.1, and XXXXX.# are global labels and # is a literal integer.

DJ -- This pseudo defines a template for assembling words from multiple subfields. The template is used by the J pseudo discussed below. This pseudo takes as an operand a string of decimal digits that represents a list of partial-word sL pairs. Each sL pair defines a field to be used by the J pseudo.

J -- This pseudo assembles a word from multiple subfields. The subfields are defined by the most recent DJ pseudo encountered. Oddly, the order of the pairs in the J operand is the reverse of the order of the operands defined by DJ. For example, the following DJ defines fields 22, 86, and 02. However, J uses them in the reverse order, 02, 86, and 22. Thus the following will generate the word value 0 1234 56 7890.

        DJ      228602
        J       90,345678,12

END: This pseudo must be the last command in an assembly. Any source records following this one are part of the next assembly. A single address expression operand may be specified, but this value is not presently used by the assembler.

Constant Lists

Constant values and lists of constant values may be defined at any point in the assembly source. There is no explicit pseudo-instruction to do this. Instead, the constant value is written starting either in the sign position (column 16) or the first position of the op code field (column 17). The constant value may be a numeric literal or an expression involving literals and symbols. The following rules apply:

If the sign position is blank or +, the sign will be zero unless the first expression in the list has 11 or more significant digits.
If the sign position is "-", a 1 will be XOR-ed will be low-order bit of the sign digit for the value of the expression.
If the sign position is numeric, the first value of the list is parsed starting in the sign position. Unless the parsed value has 11 or more significant digits, the sign of the resulting word will be zero.
If the sign position is "(", the first value of the list is parsed as an expression enclosed in parentheses.
If the sign position is "'", the first value of the list is parsed as a string; the words of the string will have signs of 2.
Any sign determined from the sign position of the source line applies only to the first value parsed in the list. For any subsequent values in the list, the sign defaults to zero.

Here are some examples from the Generator program and the word values they generate:

0376  0 0000 00 1649         2          F1,F2,F3,F4
0377  0 0000 00 1662
0378  0 0000 00 1671
0379  0 0000 00 1676

1452  0 0000 30 1452         W           BUN     $
1453  0 0000 00 0000         THECOUNT    (0)
1454  1 0000 00 0000                    10000000000
1455  0 0000 00 0000         2           FILL    0,10
1465  0 0000 00 0000                     (0)
1466  2 0000 00 0000         3           FILL    20000000000,24
1490  0 3000 00 0000                    03000000000
1491  0 4000 00 9000         7          4000009000
1492  0 0000 00 0000         ADDRESS     (0)

2448  3 0000 88 0015         TBL        30(12)+((CCCNT+1)/100)(04)+((CCCNT+1)//100)(64)
2449  0 0144 30 0158                     BUN     SETSCAN.1,144(44)
2450  3 0000 19 0016                    30(12)+((CCBEG+1)/100)(04)+((CCBEG+1)//100)(64)
2451  1 4200 26 1681                    -IFL     IA/42,0
2452  1 0000 50 0000                    -((0050/100)(04)+(0050//100)(64))
2453  1 0000 12 4800                    -ADD     4800
2454  1 0000 31 0042                    -(((MAMAX+4)/100)(04)+((MAMAX+4)//100)(64))

`FORMAT` Pseudo-Instruction

The FORMAT pseudo generates Cardatron format bands from a somewhat COBOL PICTURE-like representation of the layout for a card image or print line. Here are some examples from the Compiler Main module:

FR1   FORMAT  INPUT,T2Z1B4A,15(T5A)
FR2   FORMAT  INPUT,16(P5A),P10Z
FR3   FORMAT  PRINT,49B,TZZZZZZNNNN,BBB,SBNNNNBNNBNNNN,BT5A,44B
FR6   FORMAT  PRINT,49B,TZZZZZZNNNN,BBB,SBNNNNBNNBZZZZ,5BT5A,44B
FR7   FORMAT  PRINT,49B,TZZZZZZNNNN,BBB,T6Z10BNNNN,50B
FR4   FORMAT  PRINT,7(T5A),85B
FR8   FORMAT  PRINT,TZZNNNNZZZZ,4B,16(T5A),32B

There are two classes of format bands -- those used for input from card readers and those used for output to card punches and line printers. Each FORMAT pseudo generates a data block of 29 words. These data blocks are referenced by Card Read Format Load (CRF, 62) instructions for input bands and Card Write Format Load (CWF, 63) instructions for output bands. Note that the address in CRF and CWF instructions must reference the last word in the block of 29 words.

The first operand in a band definition is a mnemonic that indicates the input/output class:

INPUT specifies a format band for input from a card reader. The format string must define a card image of exactly 80 characters.
PUNCH specifies a format band for output to a card punch. The format string must define a card image of exactly 80 characters.
PRINT specifies a format band for output to a line printer. The format string must define a line image of exactly 120 characters.

Following the class mnemonic is a list of comma-delimited format phrases. Most phrases describe how a contiguous range of columns on the card machine will be converted to or from one word in 220 memory. The format phrases are terminated by the first space or when column 72 is reached.

The Cardatron split each card column or print position into two parts -- a numeric code (the bottom nine rows on a card: 1-9) and a zone code (the top three rows on a card: +, -, 0). Each code could be transferred to and from the 220 or suppressed individually. The numeric code for a column is transferred before the zone code. Since FORMAT considers a format band from the perspective of columns on the card machine, you do not normally need to consider the separate numeric and zone codes, but there are exceptions, especially when dealing with signs. For more information, see Chapter 6 in the Operational Characteristics of the Burroughs 220 reference manual.

The phrases are composed from letter codes that determine how the Cardatron numeric and zone codes are to be treated. A letter code may be prefixed with an integer repeat count. Thus the phrase "TZZZZZZNNNN" is the same as "T6Z4N." A single phrase may also be enclosed in parentheses and prefixed by a repeat count, e.g., "16(T5A)." No commas are permitted within parentheses, and parentheses may not be nested.

The following table lists the phrase letter codes and their use in input and output format bands.

Code	Input Band Use	Output Band Use
A	copy two zone/numeric digits to memory from card machine	copy two digits from memory to card machine
B	ignore two digits from card machine, store nothing in memory	supply two zero digits to card machine, do not transfer digits from memory
N	copy a numeric digit to memory from a card column, ignoring the zone digit	copy one digit from memory to a card column, normally supplying a zero for the zone (see `X`)
P	store a zero for the sign digit, do not transfer a digit from the card machine	ignore sign digit in memory, do not transfer a digit to the card machine
S	ignore the numeric portion of a card column and copy the zone digit to the sign digit of the memory word	copy the sign digit from the memory word as the zone of a separate card column
T	like P, but store a 2 for the sign instead of zero	same as P
X	store zone digit from card machine to memory	copy a digit from memory to card machine as an over-punch for next code
Z	store zero digit in memory, do not transfer a digit from the card machine	skip/ignore a digit in memory, transfer nothing to the card machine

These codes are typically used as follows:

A: transfer a zone/numeric digit pair to or from one column on the card machine as an alphanumeric character.
B: ignore one column on the card machine on input; supply a blank column (code 00) on output.
N: transfer the numeric digit from a card column to or from a digit in memory
P: store a positive sign digit on input; ignore the sign digit in memory on output
S: transfer the sign as a separate card column to/from the sign digit in the memory word
T: store signs of 2 for alphanumeric words on input for use with the console devices; not normally used for output
X: transfer an over-punched sign (zone digit only) as a separate digit to/from memory
Z: fill zero digits in memory on input; skip digits in memory on output

Note that the P, S, and T codes must reference the sign digit of a 220 word. The assembler will issue an error if this is not the case. This assures that the band phrases will be aligned with the 220 memory words.

Here are two of the examples above annotated with what the band phrases do:

FR1   FORMAT  INPUT,T2Z1B4A,15(T5A)

Ignore the first column [1B] on the card. In the first memory word of the buffer, store a sign digit of 2 [T], followed by two zero digits [2Z]; then transfer the next four card columns [4A] alphanumerically as two-digit character codes. The two zero digits cause the four alphanumeric characters to be right-justified in the memory word. Transfer the next 75 characters alphanumerically from the card [15(T5A)] to the next 15 memory words, storing the words with signs of 2. Notice that the total number of card columns input is 80, as is required for an INPUT format band.

FR3   FORMAT  PRINT,49B,TZZZZZZNNNN,BBB,SBNNNNBNNBNNNN,BT5A,44B

Output 49 columns of spaces [49B] to the line printer. From the first word of the memory buffer, ignore the sign and first six digits [TZZZZZZ] and transfer the low-order four digits of the word numerically [NNNN] to the next four columns on the line. Output three more columns of spaces [BBB] on the line. From the second word in the memory buffer, output the sign digit as a separate column [S], then a space [B], then four columns of numeric digits from the word [NNNN], another space [B], two columns of numeric digits from the word [NN], another space [B], and four columns of numeric digits from the last four digits of the word [NNNN]. Output a space followed by the next memory word as five alphanumeric columns [BT5A]. Finish the line with 44 spaces [44B]. Notice that the total number of print columns output is 120, as is required for a PRINT format band.

Pre-Loading the Literal Pool

This is a feature implemented solely to aid in assembly of the BALGOL Generator program and is not something that would be normally useful for other 220 programming. Unless you are recovering an old program such as the Generator, you can safely skip this section.

Operand fields on an assembler source record may contain literals. Examples of literals are =1234567=, =-7654321=, ='ALPHA'= (an alphanumeric literal), and address expressions such as =START+2=. The assembler allocates storage for these values and treats each literal value as a symbol that has an associated address. The assembler attempts to assign multiple references to a given literal value at the same address.

Literals are assigned contiguous addresses in an area of memory termed the "literal pool." This pool is allocated at the point a POOL pseudo-instruction is encountered, or at the end of the program, after an END or FORGET NAMES pseudo-instruction is encountered. The order of entries within the pool is something determined by the assembler. We do not know the ordering scheme used by the original assembler -- it was probably determined by the way its symbol table was organized -- so it is not surprising that the GEN-Assembler usually generates words in a literal pool with a different sequence than the original assembler did.

This sequencing difference is not a problem in terms of generating instructions that will execute properly, but it is a problem in terms of matching the listing generated by the GEN-Assembler to the transcription of the original Generator program listing in order to verify that transcription. That verification is much simpler if the GEN-Assembler can arrange the literal pool words at the same locations as the original assembler did.

Thus, to support verification of the Generator transcription, GEN-Assembler allows an initial literal pool, including addresses for the literal values, to be specified in advance and pre-loaded prior to the assembly process. The initial literal pool is specified in a text file using JSON notation. This file is termed a "poolSet." As an example, here is the first part of the literal pool as transcribed from the listing for the Generator program:

1465      2265  0 0000 00 0000                     POOL
          2266  0 0000 00 0010
          2267  0 0000 00 0004
          2268  0 0000 00 9997
          2269  0 0000 00 0008
          2270  0 0000 00 0011
          2271  0 0000 00 1011
          2272  0 0020 00 0000
          2273  0 0000 00 0023
          2274  0 0000 00 0022
          2275  0 0000 00 0024
          2276  0 0000 00 0028
          2277  9 9999 99 9999
          2278  0 0000 00 0033
          2279  0 0000 00 0257
          2280  0 0000 00 0040
          2281  0 0000 00 0043
          2282  0 0000 00 0281
          2283  2 1654 49 6257    $MISP
          2284  2 5341 43 4544    LACED
          2285  2 0055 41 5445     NAME
          2286  2 0043 41 5944     CARD
          2287  0 0000 00 0303
          2288  0 0000 00 0312
          2289  0 0000 00 0052

Here is the text of the corresponding portion of the poolSet that can be used to pre-load the literal pool into the GEN-Assembler:

{"poolSet": [
    {"poolLoc": 2265,
     "poolData": [
                   0,          10,           4,        9997,           8,
                  11,        1011,    20000000,          23,          22,
                  24,          28, 99999999999,          33,         257,
                  40,          43,         281, 21654496257, 25341434544,
         20055415445, 20043415944,         303,         312,          52,
         ... ]

A poolSet consists of an object with a single member named poolSet. This member must be an array containing at least one object. Each object in this array defines the poolSet for one assembly unit. If the source contains multiple assembly units, the array should have a corresponding number of objects.

Each object within the array must have two members:

An integer value named poolLoc. This is the absolute memory address of the start of the pool.
An array of integers named poolData. The integers represent the values of words in the pool. If the words have non-zero signs, the integers must be 11 digits in length.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UsingGENAssembler

Using the GEN-Assembler

Introducing the GEN-Assembler

Background

User Interface

Assembler Notation

GEN-Assembler Source Files

Labels and Symbols

Operands

Operand Primaries

Operand Expressions

Machine Instructions

Pseudo-Instructions

Simple Pseudo-Instructions

Constant Lists

`FORMAT` Pseudo-Instruction

Pre-Loading the Literal Pool

Clone this wiki locally

UsingGENAssembler

Using the GEN-Assembler

Introducing the GEN-Assembler

Background

User Interface

Assembler Notation

GEN-Assembler Source Files

Labels and Symbols

Operands

Operand Primaries

Operand Expressions

Machine Instructions

Pseudo-Instructions

Simple Pseudo-Instructions

Constant Lists

FORMAT Pseudo-Instruction

Pre-Loading the Literal Pool

Clone this wiki locally

`FORMAT` Pseudo-Instruction