defaultX64Disassembler should run mkX64Disassembler at compile time #14

langston-barrett · 2020-02-28T21:59:45Z

Constructing the NextOpcodeTable at runtime is rather expensive according to my recent profiling. This seems like something that could run at compile-time.

refurbish-hm.pdf

The text was updated successfully, but these errors were encountered:

RyanGlScott · 2022-05-19T14:24:10Z

I'm interested in looking at this, as mkX64Disassembler is proving to be a space bottleneck in large SAW proofs. I'm not sure if running mkX64Disassembler at compile time is necessarily the way to go, however. I experimented with this on the rgs/compile-time-disassembly branch, but that will cause GHC to eat up all of your memory before it can ever finish compiling Flexdis86.DefaultParser.

An alternative approach that @travitch proposed is to build the disassembler table incrementally, one instruction at a time. Similar tricks have been used in pate, but for initializing memory. It looks like the key function in flexdis86 which is responsible for table creation is mkOpcodeTable:

flexdis86/src/Flexdis86/Disassembler.hs

Lines 415 to 450 in c19b55e

    
           -- We calculate all allowed prefixes for the instruction in the first 
        
           -- argument.  This simplifies parsing at the cost of extra space. 
        
           mkOpcodeTable ::  [Def] -> ParserGen OpcodeTable 
        
           mkOpcodeTable defs = go [] (concatMap allPrefixedOpcodes defs) 
        
             where -- Recursive function that generates opcode table by parsing 
        
                   -- opcodes in first element of list. 
        
                   go :: -- Opcode bytes parsed so far. 
        
                         [Word8] 
        
                         -- Potential opcode definitions with the remaining opcode 
        
                         -- bytes each potential definition expects. 
        
                      -> [([Word8], (Prefixes, Def))] 
        
                      -> ParserGen OpcodeTable 
        
                   go seen l 
        
                      -- If we have parsed all the opcodes expected by the remaining 
        
                      -- definitions. 
        
                     | all opcodeDone l = do 
        
                         case l of 
        
                           _ | all (expectsModRM.snd.snd) l -> do 
        
                                tbl <- checkRequiredReg (snd <$> l) 
        
                                case tbl of 
        
                                  RegTable v -> pure $! ReadModRMTable v 
        
                                  RegUnchecked m -> pure $! ReadModRMUnchecked m 
        
                           [([],(pfx, d))] -> assert (not (expectsModRM d)) $ 
        
                               return $! SkipModRM pfx d 
        
                           _ -> error $ "mkOpcodeTable: ambiguous operators " ++ show l 
        
                       -- If we still have opcodes to parse, check that all definitions 
        
                       -- expect at least one more opcode, and generate table for next 
        
                       -- opcode match. 
        
                     | otherwise = assert (all (not.opcodeDone) l) $ do 
        
                       let v = partitionBy l 
        
                           g i = go (fromIntegral i:seen) (v V.! i) 
        
                       tbl <- V.generateM 256 g 
        
                       pure $! OpcodeTable tbl 
        
                   -- Return whether opcode parsing is done. 
        
                   opcodeDone :: ([Word8], a) -> Bool 
        
                   opcodeDone (remaining,_) = null remaining

It's not entirely obvious at a first glance what the best approach is for making this incremental. For starters, the OpcodeTable data type isn't just a flat Vector, so it's unclear how to map opcodes to instructions cleanly:

flexdis86/src/Flexdis86/Disassembler.hs

Lines 171 to 181 in c19b55e

    
           data OpcodeTable 
        
              = OpcodeTable !NextOpcodeTable 
        
              | SkipModRM !Prefixes !Def 
        
              | ReadModRMTable !(V.Vector ModTable) 
        
              | ReadModRMUnchecked !ModTable 
        
             deriving (Generic) 
        
           instance DS.NFData OpcodeTable 
        
           -- | A NextOpcodeTable describes a table of parsers to read based on the bytes. 
        
           type NextOpcodeTable = V.Vector OpcodeTable

@travitch, do you have any thoughts on a possible design here?

travitch · 2022-05-19T16:00:09Z

Moving the discussion of parse table sizes to #40 because solving that is orthogonal to whether or not we compute the tables at compile time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

defaultX64Disassembler should run mkX64Disassembler at compile time #14

defaultX64Disassembler should run mkX64Disassembler at compile time #14

langston-barrett commented Feb 28, 2020

RyanGlScott commented May 19, 2022

travitch commented May 19, 2022

defaultX64Disassembler should run mkX64Disassembler at compile time #14

defaultX64Disassembler should run mkX64Disassembler at compile time #14

Comments

langston-barrett commented Feb 28, 2020

RyanGlScott commented May 19, 2022

travitch commented May 19, 2022