Detecting cryptography in the source code of open-source packages or libraries turns out to be a common problem for many of the software companies that include these packages in their products.
At Wind River, we face a similar challenge. Thousands of packages are bundled as part of the Wind River operating system. We are developing this project to be an automated and efficient code parser that can determine, with some degree of confidence, that a piece of code contains restricted encryption algorithms. The output could then be verified by a human expert.
More often than not, source code that contains an encryption algorithm has words that are very related to that algorithm. For example, a block of open source code that makes use of the DES algorithm is very likely to contain strings such as "DES_" or "cipher". So as a start, we can scan the source code searching for these keywords. This is a simple, first-pass that gives us an initial idea as to whether or not the source code might contain encryption. We call this the keyword method.
To go one step further, we search the content of each file for API calls to encryption libraries, include files, encryption data types, and other evidence that might ascertain the use of encryption libraries or services. This is our API finder method.
This script crudely detects the following cryptography schemes:
- Asymmetric cryptography
RSA, DSA, Diffie-Hellman, ECC, ElGamal, Rabin, XTR
- Block ciphers
AES, DES, RC2, RC5, RC6, CAST, Blowfish, Twofish, Threefish, Rijndael, Camellia, IDEA, SEED, ARIA, SM4, Serpent, SHACAL, GOST, TEA, XTEA, BTEA, SAFER, Feistel, IntelCascade, KASUMI, MISTY1, NOEKEON, SHARK, Skipjack, BEAR-LION, RFC2268, MARS, DFC, CSCipher
- Stream ciphers
RC4, Salsa20, XSalsa20, ChaCha20, PANAMA, SEAL, SOSEMANUK, WAKE
- Substitution ciphers
ROT13
- Hybrid encryption
PGP, GPG
- Hashing algorithms
MD2, MD4, MD5, SHA-1, SHA-2, SHA-3, MDC-2, BLAKE, HMAC, RIPEMD, HAVAL, Tiger, Whirlpool, GOST, Adler32, Streebog
- Protocols and standards
SSL, TLS, SSH, PKI, PKCS, MQV, kerberos, ASN1, MSCHAP
- Encryption libraries
OpenSSL, OpenSSH, libgcrypt, Crypto++, cryptlib, libXCrypt, libMD, glibC, BeeCrypt, Botan, BouncyCastle, SpongyCastle, QT, JAVA SE 7, WinCrypt
- Message Authentication Codes
HMAC, Poly1305
-
Cryptographic random number generators
-
And other generic encryption evidence
The script is written in Python and requires Python version 3.4 or later to be installed.
To install Python, run the following command:
sudo apt-get install python3
Download the latest Python 3 release from https://www.python.org/downloads/windows/ and install it using the installation wizard.
If you have homebrew installed, simply run
sudo brew install python3
Otherwise, you can download the latest Python 3 release from https://www.python.org/downloads/mac-osx/ and install it that way.
To run the script, exectute
python3 scan-for-crypto.py <options> <packages>
Note: on Windows, the executable is called "python.exe", so put that in place of "python3". You can create an alias with the command doskey python3=C:\path\to\python.exe $*
so instead of typing the full path to python.exe, you could use the alias 'python3'.
Space-separated list of packages to scan. A package can be given as a path to a local directory, a local file, a local compressed archive, wild-card address, a remote archive, URL to a single source file, or a GitHub link. Have a look at cryptodetector.conf
file for list of examples.
The path to the configuration file. If a configuration file is present, all options will be read from this file first, with additional command line arguments overriding them. Next section covers the specifications of a configuration file.
Specifies the methods of scanning. Accepts a comma-separated list of methods. A method can be one of keyword
or api
, pertaining to the keyword search and API finder respectively.
Specifies the path in which the output files should be written. If this option is not provided, the default value is the current working directory.
With this option, this script will create the output file for a package in the directory in which that package resides. Note this will only work for local packages that have a directory.
Specifies what to do when an output crypto file already exists. Can be one of three options: rename
(default) renames the new crypto file .1.crypto, .2.crypto and so on, overwrite
overwrites the old file, and skip
skips scanning the package.
Performs a quick/express search on a set of given packages, returning a list of only the packages that had at least one match found in any of their files. In the end, it writes the output as a list to the terminal and creates a file in the output directory called quick-scan-result.txt
.
Specifies to stop the search during execution of each method after finding k
files with matches in them. k
has to be a positive integer.
Makes the keyword search case-insensitive.
Path to the keyword list file for the keyword method. By default, this is /cryptodetector/methods/keyword/keyword_list.txt
.
Path to the keyword list file for the API finder method. By default, this is /cryptodetector/methods/api/api_definitions.txt
.
Comma-separated list of match types to ignore while searching files for matches.
Specifies whether or not to scan only the files that are source code files (for example .cpp files, .py files, etc) The type of a file is guessed based on its extension (mime type).
Places indentation and additional spaces in the output crypto files to make them more readable (pretty) at the cost of producing larger files.
Specifies whether to verbosely processes files and print out information.
Specifies not to write warning messages.
Shows the version of this program.
Shows a help guide.
A configuration file contains all the options to the program, so one doesn't have to type them out in the command line every time. The path to the configuration file could be specified by the option --config-file
or -c
as specified above.
If this option is not present, this program will look for the file cryptodetector.conf
in the current working directory. This is the directory in which the command line interpreter works, not necessarily the directory where this script is saved.
If the configuration file is not found there, it looks for it in the home folder. In Unix and Unix-like systems, this is the directory referred to as ~/
. In Windows, this is the %UserProfile%
directory. Depending on the version of Windows, that could be one of <drive>:\Documents and Settings\<user>
or <drive>:\Users\<user>
.
If no config file is found, the program expects all the parameters from the command line.
Look at the example file cryptodetector.conf
for the syntax. It is the same syntax as most other configuration files, where sections are enclosed by brackets and items are under sections, line by line. If an item has a value, its value is provided by the equal sign. Commenting out a line is simply putting a #
sign in front of it.
To scan a single package (with default keyword search)
python3 scan-for-crypto.py /path/to/root/of/package
To scan two packages with keyword search and API finder
python3 scan-for-crypto.py --methods=keyword,api /path/to/package1 /path/to/package2
To download, extract, and scan the content of an archive using the API method, stopping the search after finding matches in at most 4 files
python3 scan-for-crypto.py --methods=api --stop-after=4 http://url/of/archive.tar.gz2
To scan the master branch of a public GitHub repository:
python3 scan-for-crypto.py https://github.com/godbus/dbus.git
To write output files to a different directory:
python3 scan-for-crypto.py --output=/output/path /path/to/a/compressed/archive.tar.gz2
To scan a folder containg tar archives:
python3 scan-for-crypto.py /folder/*.tar.gz
This script creates a set of <package>.crypto
files for each package that it scans. It writes a JSON object in this file. For specification of the output format, see output specification.
If the error is related to inputs being invalid, the program prints the error message and terminates. Run-time errors are printed to standard error and added to the "errors"
section of the crypto output, but they do not stop the execution of the program, unless it is an unhandled exception. A good test of knowing everything went okay in the end is to count the number of crypto files. They should be equal to the number of packages you scanned. You can check the log files to see if there was any run-time error.
In Unix and Unix-like systems like Ubuntu or OS X, this can be easily accomplished by installing GNU parallel. In Debian-based systems, it is installed via
sudo apt-get install parallel
and similarly, on OS X,
sudo brew install parallel
Unfortunately, there is no direct port of GNU parallel for Windows. But it is available in Cygwin.
Once this tool is installed, simply create a command_list.txt containing
python3 scan-for-crypto.py package1
python3 scan-for-crypto.py package2
...
and execute
parallel -j [number of cores] < command_list.txt
This way, packages are processed in parallel, one package per core.
Simply open the help guide by running:
python3 scan-for-crypto.py -h
List of available methods will be in the help message of the --methods
option. As of now, there are only two methods that are usable, keyword
and api
.
Keyword search and API finder are great, but I want to create my own way of searching code for encryption. How can I do that?
Start by creating a folder under /methods
to put your files there. Have a look at our hello_world
method under /cryptodetector/methods/hello_world
. This provides a skeleton for writing a new method. As you can see, a method inherits from the Method
class and must have a method_id
attribute. Note that the name you give to your method folder, file names, and the class name for your method are arbitrary. The only thing that uniquely identifies the method is its method_id
.
Every method class should implement the following three functions:
-
search(content, language)
Searches the stringcontent
for encryption.language
specifies the language of the content, defined in the filelanguages.py
. It returns a list of matches, where each match is dict object containing all the output fields. The example hello_world class shows basic usage. -
quick_search(content, language)
ReturnsTrue
orFalse
if it found one or more matches in the content in the given language. -
supports_scanning_file(language)
ReturnsTrue
orFalse
indicating whether this method supports scanning files in the given language.
In addition, a method class can have options
and options_help
attributes, but these are not required. The hello_world example shows how to create your own options in the method. To set them from the command line, simply reference them by --[your_method_id]-[your_option]
, replacing all underscores by dashes. For example, example_value
option can be set from the command line by --hello-world-example-value
. Indeed, you will see it if you run the help guide python3 scan-for-crypto.py -h
. Have a look at the example config file cryptodetector.conf
for the syntax of specifying method options from the config file.
We welcome your ideas and suggestions. Please create a fork of this project, make any edits to the code that you like, push them to a branch, and create a pull request from that branch by going here: https://github.com/Wind-River/crypto-detector/pulls. If you'd like to contact the maintainers, please email mark.gisi@windriver.com.
Crypto Detector is free and distributed under the Apache License, Version 2.0. For further details, visit http://www.apache.org/licenses/LICENSE-2.0. Text for the crypto-detector and other applicable license notices can be found in the LICENSE file in the project top level directory. Different files may be under different licenses. Each source file should include a license notice that designates the licensing terms for the respective file.
Version 1.0 (kamyar)
Disclaimer of Warranty / No Support: Wind River does not provide support and maintenance services for this software, under Wind River’s standard Software Support and Maintenance Agreement or otherwise. Unless required by applicable law, Wind River provides the software (and each contributor provides its contribution) on an “AS IS” BASIS, WITHOUT WARRANTIES OF ANY KIND, either express or implied, including, without limitation, any warranties of TITLE, NONINFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the software and assume any risks associated with your exercise of permissions under the license.