fix(analyzer): improvement to dockerfile scanning#7995
Open
cx-andre-pereira wants to merge 32 commits intoCheckmarx:masterfrom
Open
fix(analyzer): improvement to dockerfile scanning#7995cx-andre-pereira wants to merge 32 commits intoCheckmarx:masterfrom
cx-andre-pereira wants to merge 32 commits intoCheckmarx:masterfrom
Conversation
…s named 'dockerfile'
…es and all files with prefix 'dockerfile.' as well as all files with the '.dockerfile' extension type in a case insensitive matter (improvement on first commit)
…sion, added support for all ubi8/debian files in case of valid dockerfile structure, added support for lower case dockerfile commands - most queries will have issues with this but relevant text files are properly detected as a 'dockerfile' as intended
…ormats for consistency
…ility to lower cyclomatic complexity
…rors, fixed 'gitignore' files exclusion, docker parser will handle said case like before but with explicit 'gitignore' extension rather than 'possibleDockerfile' like before
…sion so that it 1- gets detected regardless of syntax inside 2- gets detected withouth checking syntax inside through the code optimizing detection speed for said files
…wice and minor simplificaton of query arguments
…test 105, improved uni tests to include new case insensitive samples
…ailing whitespace
… unnecessary 'gitignore' case in analyzer's workers
…have to be explicitly set as unwanted to allign with '.gitignore' behaviour
…s://github.com/cx-andre-pereira/kics into AST-140477--Improvement-to-dockerfile-scanning
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reason for Proposed Changes
By checking that a file is:
possibleDockerfile". As a final check for the file to be included in a given scan, the analyzer's worker calls the "isDockerfile" function meant to ensure that only files containing the "FROM" and "RUN"(both case-sensitive) commands are scanned, files that fail this check are promptly set as "unwanted".The way dockerfile identification is meant to work is as follows:
Additionally ".ubi8" and ".debian" files should be included in the scan if they are valid text files with docker configurations.
The current implementation has a plethora of issues:
Proposed Changes
Reworked the "GetExtension" function so that it first :
The "readPossibleDockerFile") function was changed to support both "ARG" commands as well as empty lines before the "FROM" command. Additionally the order of these checks was changed to enhance performance so that it is first checked if a line is not a "FROM" command; previously, since every line was compared to the "FROM" statement, a file with a large number of comments would take significantly longer to skim through. (tests on a dockerfile with 260 thousand comments were two times faster with this change)
In the analyzer the worker was simplified, given the changes on the utils/get_extension's "GetExtension" function, we only need to worry about 2 results "
gitignore" and ".dockerfile" , additionally the "isDockerfile" function is now gone; it was overly restrictive and served the same purpose as the "readPossibleDockerFile" function does now.Unified all "dockerfile" type resource references to ".dockerfile", previously there was use of "
dockerfile", "Dockerfile" and "possibleDockerfile", I deemed these unnecessary since they all represent the same thing, except for "possibleDockerfile" but files labeled as such could and are now determined as valid dockerfiles before being attributed a value for their "extension".After initial reviews the case handling for "
gitignore" values was simplified so that any ".gitignore" or any extensionless file with "gitignore" as a suffix will have identical flows, previously unit-tests for the analyzer forced the latter to be added to the "unwanted" path list, the test failed if we simple discarded the file in the analyze function before calling the worker (the way any invalid extension outside target folders is threated).Tests
test/fixtures/dockerfile/
The files themselves include configurations with empty lines, "ARG" commands and comments before the target "FROM" command. Additionally all naming conventions are tested. Note that files inside the target folders would not be identified if they were not inside said folders (because "isDockerfileExtension" is called before checking literal extension). The "case_insensitive_tests" folder includes identical tests where all docker commands have been written in lowercase, note that most queries logic / the engine itself will not properly allow final results pointing to target lines when a query is triggered, but they still trigger similarly to comparable conventional dockerfiles.
The ".debian" and ".ubi8" files here should not be flagged as valid dockerfiles since they do not correspond to valid docker configurations; "not_dockerfile.txt" should not be flagged since, although it contains a valid docker configuration, it is set as a ".txt" extension and is not named "dockerfile"(case-insensitive) explicitly, it only contains the word "dockerfile".
These files are used for the "util/get_extension" unit tests and the new E2E test 105.
Finally a new function, "TestParser_Parse_CaseInsensitive", was added to the parser/docker/parser_test.go unit tests; as the name implies it is meant to ensure docker samples are parsed identically regardless of casing in the dockerfile commands syntax.
I submit this contribution under the Apache-2.0 license.