Skip to content

fix(analyzer): improvement to dockerfile scanning#7995

Open
cx-andre-pereira wants to merge 32 commits intoCheckmarx:masterfrom
cx-andre-pereira:AST-140477--Improvement-to-dockerfile-scanning
Open

fix(analyzer): improvement to dockerfile scanning#7995
cx-andre-pereira wants to merge 32 commits intoCheckmarx:masterfrom
cx-andre-pereira:AST-140477--Improvement-to-dockerfile-scanning

Conversation

@cx-andre-pereira
Copy link
Contributor

@cx-andre-pereira cx-andre-pereira commented Mar 11, 2026

Reason for Proposed Changes

  • The current implementation for dockerfile scanning is overly restrictive. The GetExtension function can identify "dockerfile" type files in two ways as of now:
    • Using the Contains function on files that have no extension set, allowing detection of files that contain "Dockerfile" in a case-sensitive way, values such as:
   - "Dockerfile"
   - "123Dockerfile"
   - "Dockerfile123" 
  • By checking that a file is:

    • a valid text file and then, through the "readPossibleDockerFile" function:
      • Return true immediately if it is a file with suffix "gitignore"
      • Return true if it contains a "FROM" (case-sensitive) command after skipping any commented lines "# ...";
    • Given all these conditions, the result for "GetExtension" will be "possibleDockerfile". As a final check for the file to be included in a given scan, the analyzer's worker calls the "isDockerfile" function meant to ensure that only files containing the "FROM" and "RUN"(both case-sensitive) commands are scanned, files that fail this check are promptly set as "unwanted".
  • The way dockerfile identification is meant to work is as follows:

    • Case-insensitive support for all "dockerfile", "dockerfile.any_extension" and "any_name.dockerfile" files.
    • Check all files inside folders with the names : "docker", "dockerfile" or "dockerfiles", also in a case-insensitive manner.
  • Additionally ".ubi8" and ".debian" files should be included in the scan if they are valid text files with docker configurations.

  • The current implementation has a plethora of issues:

    • Is case-sensitive both for "Dockerfile" identification as well as command scanning which is case-sensitive by convention only.
    • Restricts dockerfile scanning to those that contain the "FROM" and "RUN" commands when the "FROM" would suffice.
    • Does not properly skip empty lines and "ARG" type arguments when searching for the "FROM" statement in the "readPossibleDockerFile" function.
    • Has a major redundancy by using both the "readPossibleDockerFile" and the "isDockerfile" functions. Both serve identical purposes and since the latter is only called when "readPossibleDockerFile" returns a positive result, this check should be simplified into a single function.

Proposed Changes

  • Reworked the "GetExtension" function so that it first :

    • Calls the (new) "isDockerfileExtension" function, here all files that are named "dockerfile.something" or "something.dockerfile" in a case-insensitive manner, or files inside the target folders mentioned before that are deemed valid dockerfile files (through "readPossibleDockerFile") will be immediately determined as ".dockerfile" files.
  • The "readPossibleDockerFile") function was changed to support both "ARG" commands as well as empty lines before the "FROM" command. Additionally the order of these checks was changed to enhance performance so that it is first checked if a line is not a "FROM" command; previously, since every line was compared to the "FROM" statement, a file with a large number of comments would take significantly longer to skim through. (tests on a dockerfile with 260 thousand comments were two times faster with this change)

  • In the analyzer the worker was simplified, given the changes on the utils/get_extension's "GetExtension" function, we only need to worry about 2 results "gitignore" and ".dockerfile" , additionally the "isDockerfile" function is now gone; it was overly restrictive and served the same purpose as the "readPossibleDockerFile" function does now.

  • Unified all "dockerfile" type resource references to ".dockerfile", previously there was use of "dockerfile", "Dockerfile" and "possibleDockerfile", I deemed these unnecessary since they all represent the same thing, except for "possibleDockerfile" but files labeled as such could and are now determined as valid dockerfiles before being attributed a value for their "extension".

  • After initial reviews the case handling for "gitignore" values was simplified so that any ".gitignore" or any extensionless file with "gitignore" as a suffix will have identical flows, previously unit-tests for the analyzer forced the latter to be added to the "unwanted" path list, the test failed if we simple discarded the file in the analyze function before calling the worker (the way any invalid extension outside target folders is threated).

  • Tests

    • Many new files were added to "test/fixtures/dockerfile" as well as a new folder "test/fixtures/negative_dockerfile".
  • test/fixtures/dockerfile/

  ├── Dockerfile-example                (pre-existing test)                                                                                                                                             
  ├── corrupted_dockerfile              (pre-existing test)                                                                                                                                               
  ├── any_name/                                                                                                                                                                  
  │   ├── DOCKERfile.txt
  │   ├── Dockerfile.something
  │   ├── any_name.debian
  │   ├── any_name.ubi8
  │   ├── dockerFILE
  │   ├── file.Dockerfile
  │   ├── file_2.DOCKERfile
  │   └── random_name
  ├── case_insensitive_tests/
  │   ├── DOCKERfile.txt
  │   ├── Dockerfile.something
  │   ├── any_name.debian
  │   ├── any_name.ubi8
  │   ├── dockerFILE
  │   ├── file.Dockerfile
  │   ├── file_2.DOCKERfile
  │   └── random_name
  ├── test_folder_names/
  │   ├── docker/
  │   │   └── any_file.txt
  │   ├── dockerfile/
  │   │   └── any_file.txt
  │   └── dockerfiles/
  │       └── any_file.txt
  └── test_folder_names_case/
      ├── Docker/
      │   └── any_file.txt
      ├── Dockerfile/
      │   └── any_file.txt
      └── Dockerfiles/
          └── any_file.txt

The files themselves include configurations with empty lines, "ARG" commands and comments before the target "FROM" command. Additionally all naming conventions are tested. Note that files inside the target folders would not be identified if they were not inside said folders (because "isDockerfileExtension" is called before checking literal extension). The "case_insensitive_tests" folder includes identical tests where all docker commands have been written in lowercase, note that most queries logic / the engine itself will not properly allow final results pointing to target lines when a query is triggered, but they still trigger similarly to comparable conventional dockerfiles.

  • test/fixtures/negative_dockerfile/
  ├── CW671X02_EBM_EVENT_RULE               (pre-existing test)                                                                                                                                       
  ├── not_dockerfile.debian                                                                                                                       
  ├── not_dockerfile.txt                                                                                                                                                           
  └── not_dockerfile.ubi8  

The ".debian" and ".ubi8" files here should not be flagged as valid dockerfiles since they do not correspond to valid docker configurations; "not_dockerfile.txt" should not be flagged since, although it contains a valid docker configuration, it is set as a ".txt" extension and is not named "dockerfile"(case-insensitive) explicitly, it only contains the word "dockerfile".

These files are used for the "util/get_extension" unit tests and the new E2E test 105.

Finally a new function, "TestParser_Parse_CaseInsensitive", was added to the parser/docker/parser_test.go unit tests; as the name implies it is meant to ensure docker samples are parsed identically regardless of casing in the dockerfile commands syntax.

I submit this contribution under the Apache-2.0 license.

…es and all files with prefix 'dockerfile.' as well as all files with the '.dockerfile' extension type in a case insensitive matter (improvement on first commit)
@cx-andre-pereira cx-andre-pereira requested a review from a team as a code owner March 11, 2026 22:12
@github-actions github-actions bot added community Community contribution dockerfile labels Mar 11, 2026
…sion, added support for all ubi8/debian files in case of valid dockerfile structure, added support for lower case dockerfile commands - most queries will have issues with this but relevant text files are properly detected as a 'dockerfile' as intended
@cx-andre-pereira cx-andre-pereira changed the title Ast 140477 improvement to dockerfile scanning (TEMPORARY PR) fix(analyzer): improvement to dockerfile scanning (TEMPORARY PR) Mar 12, 2026
…rors, fixed 'gitignore' files exclusion, docker parser will handle said case like before but with explicit 'gitignore' extension rather than 'possibleDockerfile' like before
…sion so that it 1- gets detected regardless of syntax inside 2- gets detected withouth checking syntax inside through the code optimizing detection speed for said files
…wice and minor simplificaton of query arguments
…test 105, improved uni tests to include new case insensitive samples
@cx-andre-pereira cx-andre-pereira changed the title fix(analyzer): improvement to dockerfile scanning (TEMPORARY PR) fix(analyzer): improvement to dockerfile scanning Mar 15, 2026
@github-actions github-actions bot added the docker Docker query label Mar 16, 2026
@github-actions github-actions bot added the query New query feature label Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community Community contribution docker Docker query dockerfile query New query feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant