OLLVM_reliability is a project that aims to provide suppport to the developpers of OLLVM, an open source obfuscator.
When a modification on OLLVM is pushed, this project will automatically be triggered and start the assessment of the latest OLLVM version. At the end of the evaluation, a score is provided to developers as a reliability indicator, which will be tracked over time with each version of ollvm.
A given OLLVM version is evaluated according to this three-phases protocol:
- A sample of various programs are compiled with and without obfuscation. The compiler of OLLVM is called Clang.
- The compiling phase is then followed by the analysis one. The output thus obtained is filtered before giving a score.
- The scoring phase is the last one and provides the final reliability indicator for the given OLLVM version.

allthejob.bash: does the compiling phase, and calls the useful python scripts for both the analysis and the scoring phases. It invokes the following scripts:radare2Analysis_Filtering_and_Scoring.pyscoring.pyfinalScoresCollector.pyaverageScores.py
radare2Analysis_Filtering_and_Scoring.py: gets a compiled program and analyses it with radare2. radare2 provides a large amount of data about each function called in the program. This script sorts the data from a scoring formula which may evolve and can easily be changed. The only line to be modified in this script is the following one:
SCORING_FORMULA = (line['realsz']/line['nbbs']) * J_countThen, the filtered set of data is sent to the scoring.py script.
scoring.py: calculates an average score for each program, whether it is obfuscated or not, and returns two average scores: one for the obfuscated programs and another one for the non obfuscated ones. The output file then contains a detailed overview of all scores per programs, and two average scores.
finalScoresCollector.py: gathers all obfuscated average scores and non-obfuscated average scores obtained after running the assessment protocol i times. This script will, in the same way asscoring.py, calculate two averages (obfuscated and non obfuscated) from the i scores.
/benchmak_samples: contains the source code of the programs that will be compiled with the assessed version of ollvm.
The other programs in the /OLLVM_reliability folder represent the track of the work done before reaching the current version:
radare2OutputFormatting.py: converts the output data provided by radare2, which is json formatted into csv format. On top of that, the script digs through the data and extracts some other pieces of information judged relevant (number of jumps, callrefs, etc). Having a view of the data in excel format allowed us to decide which on which feature (size of the function, cost, number of basic blocks, etc) our scoring formula will be based.radare2Analysis_and_OutputFiltering-v1/2.py: both represent the first implementations of the radare2 analysis and the output filtering which have not been retained.performanceAnalysis.py: is a script ready to welcome an implementation of a performance indicator that will help the OLLVM developpers understand the effects of obfuscation on the performance of a program.
Few dependencies are needed in order to comile the samples :
sudo apt -y install python3 python3-pip libre2-dev libomp-dev libgmp-dev libgmp3-dev radare2As well, r2pipe module is needed for the scripts:
pip3 install r2pipeAt last, boost need to be installed whithin /opt/boost directory :
cd /opt && \
wget http://downloads.sourceforge.net/project/boost/boost/1.66.0/boost_1_66_0.tar.bz2 &&\
tar xf boost_1_66_0.tar.bz2 && \
rm -rf boost_1_66_0.tar.bz2 && \
mv boost_1_66_0 boost && \
cd /opt/boost && \
./bootstrap.sh && \
./b2 --with-system --with-thread --with-date_time --with-regex --with-serialization stageThe main prerequisite to make this project work and be able to use it is having a version of OLLVM installed. You also have to make sure that the version used during the compiling phase is not the default one but the one you previously installed. You can run the following command to find out the current version of clang:
which clangAfter cloning the OLLVM_reliability project, the only command you need to run is:
bash allthejob.bashThe expected output should look like:
- The number and variety of programms in /benchmark_samples: The more programs you have tested by the compiler, the better. What is more, adding Objective-C programs in this sample may be a good way to make the sample more complete.
- The scoring formula: Many scoring formulas have been tested, but none of them really stood out from the crowd. As the scoring formula is what provides the final reliability indicator to the OLLVM developpers, we must consider this part seiously.
- The influence of obfuscation on the performance: In order to further this assessment protocol, we may consider the effet of obfuscation on the performance of a program.
Anaïs NALEM, 4th-year student at INSA Centre Val de Loire contributed from April to July 2019