-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests (and refactor) API extraction and symbol generation #1899
Comments
Hello, @mr-tz . I would like to work on this issue. Can you please assign it to me? |
@williballenthin can you please give more insights about what the issue is about? I know that we need to do something about the api feature extracting but want more details. |
Please read the PR #1897. See that it updates the routine here: Take a look at that routine and see if you agree with the prompt:
If so, then you can take on this issue. If it doesn't make sense, we can address at another time. |
I feel the best we can do is add some comments to give it a little verbosity, but I cannot think of currently how it can be further simplified. |
i think the most important immediate goal is to add some unit tests to demonstrate he current behavior. then we can attempt any refactor knowing the behavior is the same. so, let's start with tests. if that doesn't interest you, no worries! |
@williballenthin Should I make an explicit test for checking the function or modify the existing tests that can use it internally while testing rules? |
let's create new dedicated unit test functions (one is probably sufficient) to exercise this functionality |
def test_trim_dll_part():
from capa.rules import trim_dll_part
assert trim_dll_part("kernel32.CreateFileA") == "CreateFileA"
assert trim_dll_part("System.Convert::FromBase64String") == "System.Convert::FromBase64String"
assert trim_dll_part("ws2_32.#1") == "ws2_32.#1" @williballenthin I am thinking of implementing something along this line. Is this suffice or do we need more rigorous tests? If more rigorous, then please give some idea about that. |
yup that's the right idea. ideally we have a case that covers each possible branch, and then maybe some weird made up cases to show what's expected (like both :: and .dll and more). |
You mean something like |
So for the above case, we need to ensure that it gives |
@williballenthin, I can see it can a little bit confusing as here we are overlapping cases between .NET functions and Windows PE functions. So, I think it would be better to change this so that names for all different types are handled separately. Lines 584 to 587 in 736ad1c
Here's my idea for it. # ordinal imports, like ws2_32.#1, or .NET namespace, like System.Diagnostics.Debugger::IsLogging, keep dll/namespace part
if ".#" in api or "::" in api:
return api
# kernel32.CreateFileA
if api.count(".") == 1:
api = api.split(".")[1]
return api Thoughts? |
looking good! only suggestion might be to break the first 'if' into two so they can be documented separately. |
Fair point |
if you're up for it, would you please extend this review to cover the following functions: capa/capa/features/extractors/helpers.py Lines 27 to 107 in 736ad1c
this is what we originally intended, but i totally understand it's not obvious from the issue title. |
Would be glad to do so. 😊 |
def generate_symbols(dll: str, symbol: str, include_dll=False) -> Iterator[str]:
"""
for a given dll and symbol name, generate variants.
we over-generate features to make matching easier.
these include:
- CreateFileA
- CreateFile
- ws2_32.#1
note that since capa v7 only `import` features and APIs called via ordinal include DLL names:
- kernel32.CreateFileA
- kernel32.CreateFile
- ws2_32.#1
for `api` features dll names are good for documentation but not used during matching
"""
# normalize dll name
dll = dll.lower()
# trim extensions observed in dynamic traces
dll = dll[0:-4] if dll.endswith(".dll") else dll
dll = dll[0:-4] if dll.endswith(".drv") else dll
dll = dll[0:-3] if dll.endswith(".so") else dll
if is_ordinal(symbol):
# ws2_32.#1
# kernel32.CreateFileA
yield f"{dll}.{symbol}"
return
# For non-ordinal symbols
if include_dll:
yield f"{dll}.{symbol}"
# CreateFileA
yield symbol
if is_aw_function(symbol):
if include_dll:
# kernel32.CreateFile
yield f"{dll}.{symbol[:-1]}"
# CreateFile
yield symbol[:-1] @williballenthin, this is my idea of separating paths for ordinals and other symbols. Thoughts? |
Also, I have added some comments in PR. Please review that too. |
@williballenthin, can you please share your thoughts for my suggestions? |
@williballenthin , are you available today for this? |
no, my child is sick, so i'm not at my computer today. i'll review when i am able. |
Oh. Sorry to hear that. May he/she get feel soon. |
Originally posted by @williballenthin in #1897 (comment)
We should have more tests to encode all the various API patterns we support.
Refactoring of the existing routines and maybe introducing separate functions for .NET vs. native vs. others may make sense as part of that.
The text was updated successfully, but these errors were encountered: