fcg cfg #1

lqc09 · 2022-04-10T09:20:43Z

Hello author, how to process fcg and cfg into jsonl

ryderling · 2022-04-11T02:18:21Z

Thanks for your attention.
Based on the implementation of Genius (see https://github.com/qian-feng/Gencoding for details), we disassemble all PE samples in the dataset with IDA Pro 6.4, then generate their FCGs and CFGs accordingly, and finally store them in the JSONL file format.

lqc09 · 2022-04-11T02:38:40Z

Thanks a lot, can both FCGs and CFGs be handled by Genius (https://github.com/qian-feng/Gencoding)?

ryderling · 2022-04-11T02:51:55Z

Not yet. But I recalled that Genius is used to extract CFGs and it is easy to generate FCG based the framework of Genius.

lqc09 · 2022-04-11T03:05:05Z

thanks, got it

lizhangtan · 2022-05-06T12:41:18Z

Hello author, I have read your paper and also tried to use Genius (https://github.com/qian-feng/Gencoding) to get the CFGs from PE samples. I ran the code in the preprocessing_ida.py of Genius and got the output (XXX.ida) like this:
(i__main__
raw_graphs
p1
(dp2
S'raw_graph_list'
p3
(lp4
(iraw_graphs
raw_graph
p5
(dp6
S'entry'
p7
I0
sS'fun_features'
...

So how can I get the CFG in JSONL file format? I appreciate it if you can give more details about how to use Genius to generate CFGs in JSONL file format.

KennenH · 2023-06-25T14:21:36Z

Hello author, I have read your paper and also tried to use Genius (https://github.com/qian-feng/Gencoding) to get the CFGs from PE samples. I ran the code in the preprocessing_ida.py of Genius and got the output (XXX.ida) like this: (i__main__ raw_graphs p1 (dp2 S'raw_graph_list' p3 (lp4 (iraw_graphs raw_graph p5 (dp6 S'entry' p7 I0 sS'fun_features' ...

So how can I get the CFG in JSONL file format? I appreciate it if you can give more details about how to use Genius to generate CFGs in JSONL file format.

Any solution? I also encounter with the problem.
And there is a file called train_external_function_name_vocab.jsonl before model training, I have no idea about how to generate this file either.

Divine-sh · 2023-06-29T09:17:42Z

Hello author, I have read your paper and also tried to use Genius (https://github.com/qian-feng/Gencoding) to get the CFGs from PE samples. I ran the code in the preprocessing_ida.py of Genius and got the output (XXX.ida) like this: (i__main__ raw_graphs p1 (dp2 S'raw_graph_list' p3 (lp4 (iraw_graphs raw_graph p5 (dp6 S'entry' p7 I0 sS'fun_features' ...
So how can I get the CFG in JSONL file format? I appreciate it if you can give more details about how to use Genius to generate CFGs in JSONL file format.

Any solution? I also encounter with the problem. And there is a file called train_external_function_name_vocab.jsonl before model training, I have no idea about how to generate this file either.

I don't know how to generate this file train_external_function_name_vocab.jsonl either, do you have a solution?

ryderling · 2023-06-29T09:40:15Z

reply to @KennenH and @Divine-sh :

As we have described in Section IV.A.2)
For each node representing the external function in FCG, it is one-hot encoded based on its function name and we limit the vocabulary size of external functions to 10,000 that are most frequently used in the training dataset.
And the file train_external_function_name_vocab.jsonl is used to store the TOP 10000 external function names in the training dataset.

20521862 · 2023-07-01T07:09:54Z

reply to @KennenH and @Divine-sh :

As we have described in Section IV.A.2) For each node representing the external function in FCG, it is one-hot encoded based on its function name and we limit the vocabulary size of external functions to 10,000 that are most frequently used in the training dataset. And the file train_external_function_name_vocab.jsonl is used to store the TOP 10000 external function names in the training dataset.

Can you give me a way to reach the top 10,000 you mentioned?

KennenH · 2023-07-03T03:03:44Z

@20521862
I think it is quite clear as paper said:
it is one-hot encoded based on its function name and we limit the vocabulary size of external functions to 10,000 that are most frequently used in the training dataset
count calling times for every external function and perform a sort.

20521862 · 2023-07-04T13:41:00Z

@20521862 I think it is quite clear as paper said: it is one-hot encoded based on its function name and we limit the vocabulary size of external functions to 10,000 that are most frequently used in the training dataset count calling times for every external function and perform a sort.

So does that mean I will have to parse all the PE file then collect the function names that have been called 10,000 times in the training data and save it in train_external_function_name_vocab.jsonl?

KennenH · 2023-07-04T13:49:55Z

@20521862
10,000(external functions) that are most frequently used
Not saving the functions that were called 10,000 times, but taking the first 10,000 functions that were called the most times.

KennenH · 2023-07-19T03:45:19Z

reply to @KennenH and @Divine-sh :

As we have described in Section IV.A.2) For each node representing the external function in FCG, it is one-hot encoded based on its function name and we limit the vocabulary size of external functions to 10,000 that are most frequently used in the training dataset. And the file train_external_function_name_vocab.jsonl is used to store the TOP 10000 external function names in the training dataset.

@ryderling Very much thanks for your reply. But I have another question, as mentioned earlier by @lizhangtan in this issue (the sixth post of this issue).

Hello author, I have read your paper and also tried to use Genius (https://github.com/qian-feng/Gencoding) to get the CFGs from PE samples. I ran the code in the preprocessing_ida.py of Genius and got the output (XXX.ida) like this: (i__main__ raw_graphs p1 (dp2 S'raw_graph_list' p3 (lp4 (iraw_graphs raw_graph p5 (dp6 S'entry' p7 I0 sS'fun_features' ...

So how can I get the CFG in JSONL file format? I appreciate it if you can give more details about how to use Genius to generate CFGs in JSONL file format.

I also used Genius (https://github.com/qian-feng/Gencoding) to process the assembly file of PE and obtained its output() .ida file. How can I obtain the CFG in JSONL file format from this .ida file? Would you please provide me more details, Any help would be greatly appreciated!

KennenH · 2023-07-26T03:20:34Z

reply to @KennenH and @Divine-sh :
As we have described in Section IV.A.2) For each node representing the external function in FCG, it is one-hot encoded based on its function name and we limit the vocabulary size of external functions to 10,000 that are most frequently used in the training dataset. And the file train_external_function_name_vocab.jsonl is used to store the TOP 10000 external function names in the training dataset.

@ryderling Very much thanks for your reply. But I have another question, as mentioned earlier by @lizhangtan in this issue (the sixth post of this issue).

Hello author, I have read your paper and also tried to use Genius (https://github.com/qian-feng/Gencoding) to get the CFGs from PE samples. I ran the code in the preprocessing_ida.py of Genius and got the output (XXX.ida) like this: (i__main__ raw_graphs p1 (dp2 S'raw_graph_list' p3 (lp4 (iraw_graphs raw_graph p5 (dp6 S'entry' p7 I0 sS'fun_features' ...
So how can I get the CFG in JSONL file format? I appreciate it if you can give more details about how to use Genius to generate CFGs in JSONL file format.

I also used Genius (https://github.com/qian-feng/Gencoding) to process the assembly file of PE and obtained its output() .ida file. How can I obtain the CFG in JSONL file format from this .ida file? Would you please provide me more details, Any help would be greatly appreciated!

@lizhangtan I've figured it out, it's actually data saved through pickle, reload it with pickle and you can get a readable object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fcg cfg #1

fcg cfg #1

lqc09 commented Apr 10, 2022

ryderling commented Apr 11, 2022

lqc09 commented Apr 11, 2022

ryderling commented Apr 11, 2022

lqc09 commented Apr 11, 2022

lizhangtan commented May 6, 2022

KennenH commented Jun 25, 2023

Divine-sh commented Jun 29, 2023

ryderling commented Jun 29, 2023

20521862 commented Jul 1, 2023

KennenH commented Jul 3, 2023

20521862 commented Jul 4, 2023

KennenH commented Jul 4, 2023

KennenH commented Jul 19, 2023 •

edited

Loading

KennenH commented Jul 26, 2023

fcg cfg #1

fcg cfg #1

Comments

lqc09 commented Apr 10, 2022

ryderling commented Apr 11, 2022

lqc09 commented Apr 11, 2022

ryderling commented Apr 11, 2022

lqc09 commented Apr 11, 2022

lizhangtan commented May 6, 2022

KennenH commented Jun 25, 2023

Divine-sh commented Jun 29, 2023

ryderling commented Jun 29, 2023

20521862 commented Jul 1, 2023

KennenH commented Jul 3, 2023

20521862 commented Jul 4, 2023

KennenH commented Jul 4, 2023

KennenH commented Jul 19, 2023 • edited Loading

KennenH commented Jul 26, 2023

KennenH commented Jul 19, 2023 •

edited

Loading