Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan files in the ISO tree as well as the Joliet file tree #938

Merged

Conversation

ragusaa
Copy link
Contributor

@ragusaa ragusaa commented Jun 1, 2023

No description provided.

Copy link
Contributor

@micahsnyder micahsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears when testing the poc_eicar.iso and normal_eicar.iso that the first file that is extracted doesn't use utf16 for the filename encoding, so it's printing:

LibClamAV Warning: utf16 length is not multiple of two: 11

And then the filename is like this: 䕉䍁刮�

The next file entry IS utf16 encoded though. Do you know if there is any way to determine if it is actually UTF16 encoded?

To test this, and to test my concern about not aborting the scan if the return code is not success (my other code comment) I tried modifying the script to generate the POC and added 2 extra copies of EICAR to the file. Here's what the JSON looks like when scanning the "normal" one. The --gen-json --debug output reads:

{
  "Magic":"CLAMJSONv0",
  "RootFileType":"CL_TYPE_BINARY_DATA",
  "FileName":"normal_eicar.3.iso",
  "FileType":"CL_TYPE_BINARY_DATA",
  "FileSize":67584,
  "FileMD5":"0224a60c2c8e3a24f83d34099412eed1",
  "EmbeddedObjects":[
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"䕉䍁刮�",
          "FilePath":"/home/micah/workspace/clamav-micah/build/tmp/20230620_210354-normal_eicar.3.iso.1d20366123/normal_eicar.3.iso.6a1bb2008d/clamav-ba85b2302dd09a88ccfa36b5f8b00e5e.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f",
          "Viruses":[
            "Win.Test.EICAR_HDB-1"
          ]
        },
        {
          "FileName":"eicar.com",
          "FilePath":"/home/micah/workspace/clamav-micah/build/tmp/20230620_210354-normal_eicar.3.iso.1d20366123/normal_eicar.3.iso.6a1bb2008d/clamav-92e42e0d75a5f0374ffc92d4cdd1462a.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        },
        {
          "FileName":"eicar.com2",
          "FilePath":"/home/micah/workspace/clamav-micah/build/tmp/20230620_210354-normal_eicar.3.iso.1d20366123/normal_eicar.3.iso.6a1bb2008d/clamav-4fe9ba0299e7bb2df20b38642e78fb33.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        },
        {
          "FileName":"eicar.com3",
          "FilePath":"/home/micah/workspace/clamav-micah/build/tmp/20230620_210354-normal_eicar.3.iso.1d20366123/normal_eicar.3.iso.6a1bb2008d/clamav-2c67876793e968637519fb6b64eb69cc.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        }
      ]
    }
  ]
}

One concern I think people may have is that with this approach for "normal" ISO's we may scan each file 2x. I'm not sure what the solution is. I mean,... the clean cache will prevent some amount of re-scanning. But not completely. So I expect scan performance for "normal" ISO's to worsen with this change.

libclamav/iso9660.c Outdated Show resolved Hide resolved
@ragusaa
Copy link
Contributor Author

ragusaa commented Jul 6, 2023

Regarding the warning about filenames, it is just a warning about the length not being a multiple of two. Testing for an odd length and adding 1 to it gets rid of the warning, and gives the same characters that we were getting previously. @micahsnyder what do you think of this?

@micahsnyder
Copy link
Contributor

Yeah bumping the length sounds good. 👍

@ragusaa ragusaa force-pushed the CLAM-2255-ScanPrimaryAndJoliet branch from b8e99e3 to 209b494 Compare July 7, 2023 22:26
@micahsnyder
Copy link
Contributor

Regarding the warning about filenames, it is just a warning about the length not being a multiple of two. Testing for an odd length and adding 1 to it gets rid of the warning, and gives the same characters that we were getting previously. @micahsnyder what do you think of this?

Yeah bumping the length sounds good. 👍

I'm not actually sure what I was thinking. Per my previous comments, when that warning occurs it seems the input is not actually UTF-16, which means converting UTF8 to UTF8 results in gibberish. I was hoping we could determine if it is actually UTF16 before trying to convert it to UTF8.

@ragusaa
Copy link
Contributor Author

ragusaa commented Jul 18, 2023

I think the correct solution may be to take the filename as is, and let the file system handle the type, so remove the utf16->utf8 conversion all together. @micahsnyder do you have any thoughts on this?

@micahsnyder
Copy link
Contributor

I think the correct solution may be to take the filename as is, and let the file system handle the type, so remove the utf16->utf8 conversion all together. @micahsnyder do you have any thoughts on this?

That doesn't really solve the issue. Now the filename strings that were UTF16 encoded don't get transcoded and show up as empty strings. E.g.

❯ ./install/bin/clamscan -d ~/clamav.hdb ~/*l_eicar.3.iso --gen-json --debug --leave-temps --force-to-disk --tempdir=(pwd)/tmp -z
LibClamAV debug: {
  "Magic":"CLAMJSONv0",
  "RootFileType":"CL_TYPE_BINARY_DATA",
  "FileName":"normal_eicar.3.iso",
  "FileType":"CL_TYPE_BINARY_DATA",
  "FileSize":67584,
  "FileMD5":"0224a60c2c8e3a24f83d34099412eed1",
  "EmbeddedObjects":[
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-8eb68b97d63b5a1f61ecadb27e9ca865.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        },
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-7b8413cb96a3ec0b21131639f76383ab.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        },
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-ffd6c25313cbcf8644b5a5a97dae12a9.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        },
        {
          "FileName":"",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-5e6b883ec08936161fc8b2ab0aac6bc1.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        },
        {
          "FileName":"",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-12beb4289323f286460e1d8767fe1c3b.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        },
        {
          "FileName":"",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-3e94dc725f8224f4200fa50e57211e94.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f"
        }
      ]
    }
  ]
}

I also tried scanning using the full signature database and just noticed some weird behavior where it is re-scanning the embedded ISO9660 object at the same offset (32768) multiple times. My command on my machine is like this:

❯ mkdir tmp
❯ ./install/bin/clamscan -d ~/clams/1.1.0/share/clamav/ ~/*l_eicar.3.iso --gen-json --debug --leave-temps --force-to-disk --tempdir=(pwd)/tmp -z

The metadata from the scan looks kind of like this (i've added ...'s to trim out the overly verbose stuff, but will attach the log file:

Then looking at metadata.json, i'm seeing weirdness. E.g.

{
  "Magic":"CLAMJSONv0",
  "RootFileType":"CL_TYPE_BINARY_DATA",
  "FileName":"normal_eicar.3.iso",
  "FileType":"CL_TYPE_BINARY_DATA",
  "FileSize":67584,
  "FileMD5":"0224a60c2c8e3a24f83d34099412eed1",
  "EmbeddedObjects":[
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-49615fabc0342bcd076145faf5803d38.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f",
          "Viruses":[
            "Eicar-Test-Signature",
            ...
          ]
        },
        ...
      ]
    },
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-0158d255d136dab1853da3b5afd140c4.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f",
          "Viruses":[
            "Eicar-Test-Signature",
            ...
          ]
        },
        ...
      ]
    },
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-ce922cd591b9ce0931ec8734ff47d286.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f",
          "Viruses":[
            "Eicar-Test-Signature",
            ...
          ]
        },
        ...
      ]
    },
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-315903e475f2e1ca61b191b84a024a2d.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f",
          "Viruses":[
            "Eicar-Test-Signature",
            ...
          ]
        },
        ...
      ]
    },
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-47db60978699a4db93ad9416466df3e4.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f",
          "Viruses":[
            "Eicar-Test-Signature",
            ...
          ]
        },
        ...
      ]
    },
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-955ea8beb2a9e1e2e6931dd34cca3b44.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f",
          "Viruses":[
            "Eicar-Test-Signature",
            ...
          ]
        },
        ...
      ]
    },
    {
      "FileType":"CL_TYPE_ISO9660",
      "Offset":32768,
      "ContainedObjects":[
        {
          "FileName":"EICAR.COM",
          "FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-c16ce8ba824d50825d06c915c2c00a3d.tmp",
          "FileType":"CL_TYPE_TEXT_ASCII",
          "FileSize":68,
          "FileMD5":"44d88612fea8a8f36de82e1278abb02f",
          "Viruses":[
            "Eicar-Test-Signature",
            ...
          ]
        },
        ...
    }
  ]
}

20230728_125058-normal_eicar.3.iso.metadata.txt

@ragusaa ragusaa force-pushed the CLAM-2255-ScanPrimaryAndJoliet branch 2 times, most recently from 7ff6b94 to 58526c2 Compare July 28, 2023 20:44
@micahsnyder
Copy link
Contributor

Your latest update resolves the utf16/8 conversion problems.

Regarding the weirdness I observed with it extracting ISO's multiple times... I am just a complete idiot. I was testing with daily/main/bytecode/hifistatic/preclass/windows/osx/linux cvds. yeah... all of em. So it had duplicates of all the file type magic sigs. After removing the extra dbs from my db directory, extraction is back to normal.

Will give it another once over on the code now that the behavior is fixed with the filenames.

@micahsnyder
Copy link
Contributor

PR needs to be rebased to pass the build on macOS since the switch to openssl 3.x changes. I can do that.

@micahsnyder micahsnyder force-pushed the CLAM-2255-ScanPrimaryAndJoliet branch from fa2c3c5 to 702c9d1 Compare July 29, 2023 00:40
Copy link
Contributor

@micahsnyder micahsnyder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal test pipelines look good.

ragusaa and others added 2 commits August 2, 2023 18:40
Previously were only scanning the joliet tree when we found it, but
files in the ISO tree need to be scanned as well.
Test files created with pycdlib and contain the ClamAV logo.png file.
@micahsnyder micahsnyder force-pushed the CLAM-2255-ScanPrimaryAndJoliet branch from 88f0d17 to 980f893 Compare August 3, 2023 01:40
@micahsnyder micahsnyder merged commit e4ac4b6 into Cisco-Talos:main Aug 3, 2023
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants