Skip to content

Matching Strategy

Joxean edited this page Dec 20, 2023 · 6 revisions

General

In opposite to other binary program diffing tools, Diaphora is the only one that bases its matching strategy in running SQL queries. Basically, the process goes as follows:

  • A set of binaries is exported and calculations for function's are made.
  • SQL queries for matching functions in 2 databases are executed.
  • SQL queries are executed in order, starting from the most reliable one to the less reliable ones.
  • Then, the matches with the high scoring ratios are selected.

There are some heuristics that aren't based on SQL queries:

  • Callgraph matching, finding callers and callees of functions already matched.
  • Brute-forcing, finding good matches for functions that were not matched.

Calculations

Diaphora is slow at exporting and the reasons are various:

  • Most calculations are made at export time.
  • It tries to use the decompiler.
  • It's written in Python.

In most cases, however, the slow export time is a minor problem: we usually export once and diff many times. This is why most of the calculations are made at export time instead of doing them each time we diff 2 databases.

(To be continued)