Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

masterThesis #1

Open
wants to merge 107 commits into
base: master
Choose a base branch
from
Open

masterThesis #1

wants to merge 107 commits into from

Conversation

vmihalko
Copy link
Owner

No description provided.

vmihalko and others added 30 commits April 17, 2023 11:29
void foo(int a) {
	int a; // remove this line
}
1. generate random test.c file [csmith]
2. compile test.c to binary [clang]
3. modify generated test.c file [fix-csmi.sh]
	- Here we replace csmith.h header with
	  necessery stuff to avoid header expansion
	  while compiling to llvm in next step
4. compile to LLVMIR test.ll [clang]
5. run llvm2c and generate decompiled.c [llvm2c]
6. modify decompiled.c file [decom-fix-csmi.sh]
	- Here we add csmith.h include and fix
	  types for functions from csmith.h
7. compile decompiled.c to binary [clang]
8. run compiled binaries and compare their outputs

If something goes wrong at any point then  all generated files
are copied to tmp (for debug purpose).

If we caught exception after running the compiled test.c then
we continue to the next step.
Parse basic types are:
- int
- char
- short
- long

- float
- double
- long double

and correctly recognize whether a type is signed or unsigned.
Which prints strings to the llvm:errs().
From https://lists.llvm.org/pipermail/cfe-dev/2013-January/027302.html:
When a function has a struct parameter or return type,
Clang may lower a struct parameter into...
	- a "byval" pointer (for a struct with several different members)
	- a vector (for a struct with a few float members)
	- two doubles (for a struct with two double members)
	- an i64 (for a struct with two i32 members)
... and possibly more variations.

But there is no information in the metadata about types created in this way.
Therefore, we detect the use of the struct type as an argument or return value
of a function and do not reconstruct these types from the metadata.
Enable all (loop) passes from https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/IPO/PassManagerBuilder.cpp#L353-#L375:
```c
  if (EnableSimpleLoopUnswitch) {
    // The simple loop unswitch pass relies on separate cleanup passes. Schedule
    // them first so when we re-process a loop they run before other loop
    // passes.
    MPM.add(createLoopInstSimplifyPass());
    MPM.add(createLoopSimplifyCFGPass());
  }
  // Rotate Loop - disable header duplication at -Oz
  MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
  MPM.add(createLICMPass(LicmMssaOptCap, LicmMssaNoAccForPromotionCap));
  if (EnableSimpleLoopUnswitch)
    MPM.add(createSimpleLoopUnswitchLegacyPass());
  else
    MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3, DivergentTarget));
  // FIXME: We break the loop pass pipeline here in order to do full
  // simplify-cfg. Eventually loop-simplifycfg should be enhanced to replace the
  // need for this.
  MPM.add(createCFGSimplificationPass());
  addInstructionCombiningPass(MPM);
  // We resume loop passes creating a second loop pipeline here.
  MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars
  MPM.add(createLoopIdiomPass());             // Recognize idioms like memset.
```
Test:
```bash
clang -S -emit-llvm -Xclang -disable-O0-optnone simple-for-loop-second-latch.c -o simple-for-loop-second-latch-noopt.ll
optpassPasses simple-for-loop-second-latch-noopt --loop-simplify --simplifycfg --loop-rotate --lcssa --licm --loop-unswitch --simplifycfg --instcombine --indvars
old_llvm2c simple-for-loop-second-latch-noopt-opt == new_llvm2c simple-for-loop-second-latch-noopt
```
1. map LOOP with BRANCH instruction (condition)
2. transform BRANCH inst. to if's or doWhile constructs
First:
```c
goto head;
head:
...
do
    goto head;
while( C );
```

transfrom into
```c
do
    head
while( C );

Second:
Cache result from loopInfoAnalysis in particular function
e.g., type function occurs as arg type or return type
I wrongly assume that llvm always generates positive loop conditions, e.g. if this is true, then iterate, but after -O3 optimisations, loop condition might be negated: if this is false, then continue to the next iteration.

This new function will reverse the loop condition if the situation described above occurs!
This might need a proper solution - this is hackery, because we do not
replace the whole expression, just a printed character e.g. "<" -> ">=".
x = phi i32 [0, %beforeLoop]  [%y, %fromLoop] ; coming from %fromLoop means we
are in a next iteration

before this commit:
x = 0;
do {
        x = y;
        loopBody(y, ...);
} while ( cond )

after this commit:
x = 0;
do {
        loopBody(y, ...);
        x = y;
} while ( cond )
Signed-off-by: Andrew V. Teylu <andrew.teylu@vector.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants