There are currently more than 7,260,000,000,000 mobile devices in the world, which means that 91.54% of the world's population has one. Approximately 2,500,000,000,000 of these devices have Android as their operating system.
It is no secret that these devices are becoming more and more important to us, they are with us practically all day long and contain a lot of personal information, which makes them an interesting target for malicious actors.
Malware analysis can be categorised into 3 main types. static analysis, dynamic analysis and hybrid analysis.
Static analysis is considered to be any analysis that does not have to execute the code to analyse it, it is based on the search for patterns through rules or heuristics which makes them extremely safe because there is no possibility of activating the malware unintentionally. This type of analysis is faster than dynamic analysis and has a high detection rate for known malware by the very nature of its detection system.
Dynamic scanning, on the other hand, is any scan that needs to run the malware to analyse it, which means that a larger infrastructure must be in place to isolate it so that its execution does not affect real systems. This type of scanning is more reliable than static scanning and can detect unknown malware.
Finally, an analysis that uses both static and dynamic analysis techniques is known as hybrid analysis. Currently, well-known anti-malware solutions such as Kaspersky, Avira or Avast, among others, use this type of analysis, dividing it into distinct stages.
Within the category of static analysis are the Yara rules. Yara rules are a type of malware signature that allows to identify and classify known malware.
The rules have three sections, a meta section where information about the rule itself is usually placed, the strings section where the patterns on which we are going to compare the malware are defined and the conditions section where the condition that the pattern must meet for the file to be considered malware is defined. The yara rules can be extremely complex, so I recommend reading their documentation if you want to understand in more detail how they work.
Another favourable point of yara rules is that they are a current technique that is starting to be widely used by analysts, which means that there are a large number of contributions.
Yaralyze is a malware detection tool for Android devices that employs two static analysis techniques, one using yara rules and the other based on hashes analysis. It allows the storage and visualisation of reports, it is designed using a client-server architecture where the server can be hosted in the cloud so that it is always available from any mobile device that has the client installed and makes use of +130,000 Yara rules and +500,000 hashes of malware apps obtained from virusShare and Github (the rules and hashes are not published in the repository).
Two types of tests were carried out. One type of test consisted of testing the effectiveness of the tool in detecting known malware, using samples of Brata, Sharkbot, Cerberus and Flubot malwares, and the other was to test the speed of analysis.
As it can be seen in the images, it manages to detect the malware files and does not produce false positives with the real APK of winrar.
APK | T1 | T2 | T3 | T4 | Average |
---|---|---|---|---|---|
Flubot (malware) | 2.27s | 2.23s | 2.24s | 2.29s | 2.257s |
Sharkbot (malware) | 2.54s | 2.51s | 2.53s | 2.56s | 2.535s |
Winrar | 2.18s | 2.20s | 2.16s | 2.16s | 2.175s |
Location of the application hash | T1 | T2 | T3 | T4 | Average |
---|---|---|---|---|---|
Client DB | 0.079s | 0.081s | 0.078s | 0.077s | 0.0787s |
Server DB | 0.088s | 0.085s | 0.087s | 0.091s | 0.0877s |
No coincidence | 0.087s | 0.088s | 0.084s | 0.088s | 0.0867s |
In the first table we can see that in terms of speed it can be observed that the average analysis times are very similar, this is because all the APKs analysed go through all the Yara rules even if they have already been marked as malware because there may be rules that narrow down the type of malware we are dealing with. In addition, the analysis time is also conditioned by the size of the APK to be analysed, as is logical. These APKs did not have very different sizes.
In the second table we can see that the times are also very similar and this may seem strange because when the hash is in the server's database or when there are no matches, the client is required to make a request to the server, which should slow down the speed of the analysis. The equal time can be justified by the fact that at the time of testing the server was only receiving a single request so it did not have a heavy workload and also the database does not have a large enough number of hashes to overly burden the searches.