Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

defaultX64Disassembler should run mkX64Disassembler at compile time #14

Open
langston-barrett opened this issue Feb 28, 2020 · 2 comments

Comments

@langston-barrett
Copy link
Contributor

Constructing the NextOpcodeTable at runtime is rather expensive according to my recent profiling. This seems like something that could run at compile-time.

refurbish-hm.pdf

@RyanGlScott
Copy link
Contributor

I'm interested in looking at this, as mkX64Disassembler is proving to be a space bottleneck in large SAW proofs. I'm not sure if running mkX64Disassembler at compile time is necessarily the way to go, however. I experimented with this on the rgs/compile-time-disassembly branch, but that will cause GHC to eat up all of your memory before it can ever finish compiling Flexdis86.DefaultParser.

An alternative approach that @travitch proposed is to build the disassembler table incrementally, one instruction at a time. Similar tricks have been used in pate, but for initializing memory. It looks like the key function in flexdis86 which is responsible for table creation is mkOpcodeTable:

-- We calculate all allowed prefixes for the instruction in the first
-- argument. This simplifies parsing at the cost of extra space.
mkOpcodeTable :: [Def] -> ParserGen OpcodeTable
mkOpcodeTable defs = go [] (concatMap allPrefixedOpcodes defs)
where -- Recursive function that generates opcode table by parsing
-- opcodes in first element of list.
go :: -- Opcode bytes parsed so far.
[Word8]
-- Potential opcode definitions with the remaining opcode
-- bytes each potential definition expects.
-> [([Word8], (Prefixes, Def))]
-> ParserGen OpcodeTable
go seen l
-- If we have parsed all the opcodes expected by the remaining
-- definitions.
| all opcodeDone l = do
case l of
_ | all (expectsModRM.snd.snd) l -> do
tbl <- checkRequiredReg (snd <$> l)
case tbl of
RegTable v -> pure $! ReadModRMTable v
RegUnchecked m -> pure $! ReadModRMUnchecked m
[([],(pfx, d))] -> assert (not (expectsModRM d)) $
return $! SkipModRM pfx d
_ -> error $ "mkOpcodeTable: ambiguous operators " ++ show l
-- If we still have opcodes to parse, check that all definitions
-- expect at least one more opcode, and generate table for next
-- opcode match.
| otherwise = assert (all (not.opcodeDone) l) $ do
let v = partitionBy l
g i = go (fromIntegral i:seen) (v V.! i)
tbl <- V.generateM 256 g
pure $! OpcodeTable tbl
-- Return whether opcode parsing is done.
opcodeDone :: ([Word8], a) -> Bool
opcodeDone (remaining,_) = null remaining

It's not entirely obvious at a first glance what the best approach is for making this incremental. For starters, the OpcodeTable data type isn't just a flat Vector, so it's unclear how to map opcodes to instructions cleanly:

data OpcodeTable
= OpcodeTable !NextOpcodeTable
| SkipModRM !Prefixes !Def
| ReadModRMTable !(V.Vector ModTable)
| ReadModRMUnchecked !ModTable
deriving (Generic)
instance DS.NFData OpcodeTable
-- | A NextOpcodeTable describes a table of parsers to read based on the bytes.
type NextOpcodeTable = V.Vector OpcodeTable

@travitch, do you have any thoughts on a possible design here?

@travitch
Copy link
Contributor

Moving the discussion of parse table sizes to #40 because solving that is orthogonal to whether or not we compute the tables at compile time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants