Skip to content
This repository has been archived by the owner on Mar 2, 2023. It is now read-only.

Can qsym run the instrumented program by afl-clang-fast? #56

Closed
92wyunchao opened this issue Dec 13, 2019 · 9 comments
Closed

Can qsym run the instrumented program by afl-clang-fast? #56

92wyunchao opened this issue Dec 13, 2019 · 9 comments

Comments

@92wyunchao
Copy link

Hi,
I notice the usage tells to use the non-instrumented binary for qsym. I wonder if I can run it on the instrumented binary? if so, will it cause any problems? Thank you.

@insuyun
Copy link
Contributor

insuyun commented Dec 13, 2019

Hi! In theory, yes. But current PIN, which is DBT that QSYM relies on, seems not to support shared memory, which is commonly used in a instrumented binary.

@92wyunchao
Copy link
Author

Thanks for your explanation. I have another question. How does qsym handle strcmp or memcmp. take following code as an examle.
src:
if(strcmp(buf,"abcdefgh")==0)
printf("you find me\n");
asm:
strcmp
I find that qsym can not produce the correct input('abcdefg') by negating the branch in the asm code, but it do produce it when calling the makeAddrConcrete function which produce many inputs. I am curious how it works.

@insuyun
Copy link
Contributor

insuyun commented Dec 15, 2019

Hi. QSYM is not good for finding such inputs. It is designed for binary-format files, not string-like. First of all, QSYM uses a search mechanism calls generative search, which is introduced in Microsoft SAGE. The basic idea is to flip branches following concrete execution to maximize impacts of one concolic execution. Because of this design, it is not good for finding an input that needs to satisfy multiple branches at once (e.g., strcmp()). Second, QSYM handles native strcmp() as it is, which is not that simple as we thought. To optimize this common function, glibc uses very optimized version, which is hard to be analyzed by QSYM. (see this). Thus, QSYM is not working well as much as you imagined for strcmp(). One way to handle this is to introduce models for such functions (as suggested in #23), but this is not implemented in the current QSYM. Hope this help you. Thank you!

P.S. If you need to handle that using concolic execution, I suggest you to use angr, which already implement this mechanism.

@92wyunchao
Copy link
Author

Please allow me to make a confirmation. Do you mean qsym can not deal with strcmp() issue? But through my testing, it can produce the satisfying result by calling the function named makeAddrConcrete. Is this intentional or coincidental. As I know this function is used for handling symbolic pointer.

@insuyun
Copy link
Contributor

insuyun commented Dec 16, 2019

Oh, sorry. I misunderstood. What I mean is that QSYM can handle strcmp() partially. So, if you have strcmp(input, "this_is_very_long_string_you_cannot_find"), then QSYM will suffer from finding that input. I think the reason why it produces input by makeAddrConcrete() is because of that complicated implementation of strcmp(). But to be sure, I think we need more careful analysis. Do you have the information about the instruction that cause makeAddrConcrete()?

@92wyunchao
Copy link
Author

I checked and found that it is hard for qsym to find the correct result when the string is very long as you mentioned.
Here is the instruction cause makeAddrConcrete()
movzx ecx, byte ptr [rsi+rdx]
and this instruction is likely in implementation of strcmp in libc.

@insuyun
Copy link
Contributor

insuyun commented Dec 16, 2019

Hi! I took a look at the code.

    v4.m128i_i64[0] = *(__int64 *)a1;
    v5.m128i_i64[0] = *(__int64 *)a2;
    v4.m128i_i64[1] = *(__int64 *)(a1 + 1);
    v5.m128i_i64[1] = *(__int64 *)(a2 + 1);
    v6 = (unsigned int)(_mm_movemask_epi8(_mm_sub_epi8(_mm_cmpeq_epi8(v4, v5), _mm_cmpeq_epi8(0LL, v4))) - 0xFFFF);
    if ( (_DWORD)v6 )
    {
LABEL_16:
      _BitScanForward64(&v6, v6);
      return *((unsigned __int8 *)a1 + v6) - (unsigned int)*((unsigned __int8 *)a2 + v6); // <- here

This is a decompiled code of strcmp(). The here that I marked is the code that contains the assembly that you show (movzx ecx, byte ptr [rsi+rdx]). Basically, strcmp() uses a vectorized instruction and scan a non-matched one. It uses _BitScanForward64(), which is actually assembly bsf, to find first non-matched byte to return proper return value for strcmp(). As you know, strcmp() returns differences of the non-matched byte, e.g., strcmp("a", "b") -> return -1 and strcmp("a", "c") -> return -2. Since in QSYM, bsf is encoded through a symbolic nested if statement, e.g., ITE(x[0] == y[0], 0, ITE(x[1] == y[1], 1, ...), where ITE means If-Then-Else. Thus, the final code movzx ecx, byte ptr [rsi+rdx] becomes to access a symbolic memory address, whose value depends on the results of strcmp(). By making this address concrete, QSYM can find the matched results for strcmp() (In the example, "abcdefg"). Hope this helps you. If you have any other question, let me know. Thank you!

Best,
Insu Yun.

@92wyunchao
Copy link
Author

OK,I got it. Thanks for your patience.

@insuyun
Copy link
Contributor

insuyun commented Dec 18, 2019

You're welcome :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants