This project is a manual conversion of
The original code was written in WEB
where Pascal is the base language.
However, features very specific to Pascal were deliberately not much used, so adaptation to other languages is feasible without changing too much of the base code.
Here is listed how a few things were adapted.
The section numbers from TeX: The Program are kept in the source code, so the original code can still be referred to for comparison.
For example, the round_decimals
function is define in section 102:
impl Global {
// Section 102
pub(crate) fn round_decimals(&self, mut k: usize) -> Scaled {
let mut a = 0;
while k > 0 {
k -= 1;
a = (a + (self.dig[k] as Integer)*TWO) / 10;
}
(a + 1) / 2
}
}
When a section is part of another, it is enclosed with comments.
For example, section 854 is part of section 851 (itself part of section 829 that defines the try_break
procedure):
let node_r_stays_active = if b > INF_BAD || pi == EJECT_PENALTY {
// Section 854
if self.final_pass
&& self.minimum_demerits == AWFUL_BAD
&& link(r) == LAST_ACTIVE
&& prev_r == ACTIVE
{
artificial_demerits = true;
}
else if b > self.threshold {
break 'block; // Goto deactivate
}
false
// End section 854
}
else {
prev_r = r;
if b > self.threshold {
continue 'sec829; // Goto continue
}
true
};
unsafe
keyword, and alternatives would require too much verbosity, so global variables are (almost) all defined in a struct
named Global
(declared in global.rs).
As a consquence, most of the functions are defined in impl Global
, and the keyword self
appears a lot.
There are a few exceptions such as the memory array MEM
declared as static mut
in memory.rs.
Two macros mem!
and mem_mut!
are used to access members with unsafe
.
Other tables are declared as static mut
:
EQTB
and XEQ_LEVEL
(the equivalent tables), HASH
(the hash table), and POOL
(the string pool).
Backwards goto can be handled using loop
and continue
, which is the case most of the time.
Forwards goto are sometimes handled with break
from labeled blocks.
It works as a break
from a loop, but with a block of code delimited by braces.
Another form of goto is manged with an enum
, in particular for some parts that have many goto (see the main_control
procedure).
Missing $ inserted
when something that should be in math mode has been read outside of math mode (or vice-versa):
For this specific example, it means a $
token has been added, and the user can decide to keep it, insert its own choice of tokens, delete tokens, or ask for help (which prints more details about the error).
For this implementation, when there is en error, the program prints the error and the help message, then it stops.
So the error messages were rewritten.
For example, the original Missing $ inserted
message is:
! Missing $ inserted.
<inserted text>
$
<to be read again>
^
l.1 Hello x^
2$.
? h
I've inserted a begin-math/end-math symbol since I think
you left one out. Proceed, with fingers crossed.
?
The inserted text is presented, the next token to be read again (after the insertion) is presented, then the context line where we can see precisely where the problem was detected.
The help message is given only if the user types h
.
In this Rust implementation, it becomes:
! Missing $.
l.1 Hello x^
2$.
Either you forgot opening or closing math mode with $,
or a math character/control sequence is used outside
of math mode (or vice versa).
The user is invited to fix it, and run the program again.
Al the errors are listed as en enum
named TeXError
in error.rs.
They are treated in the error
procedure where all the help messages are written.
Any function where an error can occur returns TeXResult<T>
(which is defined as Result<T, TeXError>
) that returns Ok
(with the return value if there is one), or Err
with a TeXError
(the error goes up to the main
function where error
is called).
This implementation does not treat the command line arguments as the input buffer.
The user must type at least the input filename (with or without extension), and two arguments are optional:
-ini
: the INITEX mode, to dump a format;-fmt
: followed by the filename of the input format (such asplain.fmt
, again the extension is optional).
So there is no prompt **
, but there is still the prompt *
available.
For example, plain.tex
does not have the \dump
command at the end, so it has to be written when the prompt appears when running tex-rust -ini plain
.
Except for this, the usage stays the same.
An external pool file is not used to store the strings of the source code.
Instead, almost all of them are static strings except a few that are added in the string pool with put_string
:
// Add a string in the pool
pub(crate) fn put_string(s: &[u8]) -> TeXResult<StrNum> {
unsafe {
str_room(s.len())?;
POOL.pool[POOL.pool_ptr..(POOL.pool_ptr + s.len())].copy_from_slice(s);
POOL.pool_ptr += s.len();
make_string()
}
}
The behavior of the string pool has not been changed.
A memory word is expanded to 64 bits defined as a union
in memory.rs:
#[derive(Clone, Copy)]
pub(crate) union MemoryWord {
pub(crate) int: Integer,
pub(crate) sc: Scaled,
pub(crate) gr: GlueRatio,
pub(crate) hh: [HalfWord; 2],
pub(crate) qqqq: [QuarterWord; 4],
pub(crate) word: u64
}
Integer
,Scaled
andHalfWord
are alli32
, so they fit nicely into each other (it avoids many casts);GlueRatio
is af64
;QuarterWord
is au16
with the full range;HalfWord
is an integer between-0x3fff_ffff
and0x3fff_ffff
(same as LuaTeX).
Since MemoryWord
is defined as a union
, methods to access the value depending of the type it represents have been defined (a direct access needs unsafe
):
.int()
, .sc()
, .hh_b0()
, .hh_b1()
, etc., and their mutable versions .int_mut()
, .sc_mut()
, .hh_b0()
, .hh_b1()
, etc.
Some parts of the original
- Code between init and tini: for INITEX;
- Code between stat and tats: for statistics;
- Code between debug and gubed: for debugging.
The first one has been integrated as an argument to the command line.
The other two are features, both disabled by default.
Either you provide them manually with cargo build --features debug,stat
or by editing the Cargo.toml file.
Section 1331 of
To my understanding, such a trick is not used anymore for TeXlive binaries.
Instead, the name of the program you run is used to determine the format, then the file is found and loaded.
For example, optex
is a symlink to the luatex
binary, and the file optex.fmt
(which is somewhere on the installation directory) is loaded.
Instead of trying to reproduce this, this Rust implementation allows the user to embed directly the format file in the binary.
More details are given below.
As any Rust project, the compilation is easy (with or without --release
)
cargo build --release
The binary in target/release/
(or target/debug/
) is named tex-rust
.
Two features are available and can be added with -F
or --features
: debug
and stat
.
A Makefile
is provided to compile a version of the program with the plain
format preloaded.
See below for an explanation.
The file build.rs
is used to customize the compilation.
By default, it will look for a file plain.fmt
in the main directory.
If found, it will be embedded in the binary, and the program can be used without having to give a format as input.
However, to create a format you need the compiled program, so a virgin version must be produced first.
The principle is as follows:
- Compile the program a first time (in debug to be faster);
- Run the program on the source file to dump the format;
- Compile the program again (make sure that the format file in
build.rs
matches with yours).
For example, to produce a plain
format (if the file plain.tex
is in the current folder or in TeXinputs/
):
cargo run -- -ini plain
The file plain.fmt
has been generated.
Compile again in release:
cargo build --release
The binary in the folder target/release
is ready to be used with the plain
format embedded.
There are two main usages:
- Dumping a format with the
-ini
option; - Generating a DVI file.
The command line is:
tex-rust [-ini] TEXNAME[.tex] [-fmt=FMTNAME[.fmt]]
To dump a format, use the option -ini
and specify your input file (extension .tex
is optional):
./tex-rust -ini TEXNAME
You input file must include the command \dump
, otherwise a prompt will appear where you can add lines, until you type \dump
.
Example for plain
format:
./tex-rust -ini plain
The font metric files used by the format must be available while dumping. Those must be in the folder
TeXfonts/
. The fonts needed for theplain
format (mainly Computer Modern in several sizes and styles) are already present. Any missing font will produce an error. Once a format is produced, all necessary data for$\rm\TeX$ are included, so the font metric files won't be necessary.If the format needs auxiliary input files (for instance, the
plain
format needs the filehyphen.tex
), make sure those are available: put them in the folderTeXinputs/
and the program will know to look there.
To create a DVI document, you must specify your input file (extension .tex
optional).
The format file can be submitted too, but it is not mandatory.
If no format is supplied, there are a few possibilities:
- a format is preloaded in the binary and will be used (see Embedding a format file);
- no format is preloaded, then the
plain.fmt
file will be searched in the current folder or inTeXformats/
: an error will be returned if not found.
If a format is supplied at the command line, then the preloaded format (if there is one) will be erased with the new one.
The command line is:
./tex-rust TEXNAME [-fmt=FMTNAME]
For example, suppose you input is paper.tex
and the binary has plain
preloaded (or plain.fmt
is available):
./tex-rust paper
If your input uses a font that is not included in the format (for example with
\font\libertine=LinLibertineT-tosf-t1
), make sure that it is present inTeXfonts/
.A DVI file can be converted in PDF with the program
dvipdf
, available with TeXlive.
Since the first error stops the program, then the TRIP test cannot be applied.
However, the resulting DVI files from the examples have been compared to the ones obtained with the C version that passes the TRIP test, with the same results (using dvitype
for comparison), except for the date which might differ depending of the time you run the tests.
It has also been tested on
tex.tex
(i.e.,$\TeX$ : The Program) for identical results too. To test it yourself:
- Get
tex.web
,webmac.tex
andlogo10.tfm
from CTAN;- Use
weave
to gettex.tex
fromtex.web
.- Put
webmac.tex
inTeXinputs/
andlogo10.tfm
inTeXfonts/
;- Run
./tex-rust tex
.
This work is released under the MIT license.
The original
Font metric files, plain.tex
and hyphen.tex
were copied from CTAN and have not been modified:
Computer Modern, manfnt, knuth-lib.
These files are also under the Knuth license.