Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CMake support; MSPACK_ERROR() to mspack API #32

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

micahsnyder
Copy link
Contributor

@micahsnyder micahsnyder commented Feb 27, 2020

Update: This PR is complete.

Best,
Micah


Adds CMake tooling for libmspack to the project.
This attempts to replicate every feature available in the autotools
tooling and includes support for building on Windows.

Adds a mspack-version.h header file, generated from mspack-version.h.in.
mspack-version.h is to be installed alongside mspack.h so that application
developers can use the version numbers and strings in the event that the API
is modified and the application must maintain backwards compatibility.

Adds error handling to autogen.sh.

Reformats libmspack's README as Markdown.

Adds mspack_error_msg() and MSPACK_ERROR() macro to mspack.h API.
The new API is a cross between the cab_error() function previously found
in cabextract and error_msg() found in test/error.h. This removes the
need the examples to depend on test/error.h, a header not exported by
any build target, and removes the code duplication with cabextract.

@micahsnyder micahsnyder force-pushed the cmake-tooling branch 2 times, most recently from 6e878e4 to 8f2ca36 Compare March 8, 2020 20:02
@micahsnyder
Copy link
Contributor Author

Fixed:

Updated the PR to get the unit test working. I thought they'd worked before, but perhaps I lost a code fix switching between Mac & Windows.

I've also:

  • added a check in to define inline as needed
  • added a check to define FILE_OFFSET_BITS to 64 as needed
  • added STDC_HEADERS and WORDS_BIGENDIAN to the config.h file (accidentally missing before)

Remaining issues:

The CMake tooling currently does not define _LARGEFILE_SOURCE / _LARGEFILE64_SOURCE or _LARGE_FILES. I'm uncertain if these need to be defined. I found that the openjpeg project has feature to define them if needed but I haven't determined if we should borrow it from them: https://github.com/uclouvain/openjpeg/blob/master/cmake/TestLargeFiles.cmake

While iconv detection will be easy, I don't yet know of a way to determine the value of ICONV_CONST which will be required for cabextract.

I have the unit tests passing on Mac, but the cabd & chmd tests currently fail on Windows. Not sure why. Any insights would be helpful.

C:\Users\micah\workspace\libmspack\libmspack\build [cmake-tooling ≡ +2 ~0 -0 !]> ctest -C Release -VV  --output-on-failure
UpdateCTestConfiguration  from :C:/Users/micah/workspace/libmspack/libmspack/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :C:/Users/micah/workspace/libmspack/libmspack/build/DartConfiguration.tcl
Test project C:/Users/micah/workspace/libmspack/libmspack/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
    Start 1: cabd_test

1: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Release\cabd_test.exe
1: Test timeout computed to be: 10000000
1: <unknown>:44 FAILED cab = cabd->open(cabd, TESTFILE("normal_2files_1folder.cab"))
1/3 Test #1: cabd_test ........................***Failed    0.02 sec
<unknown>:44 FAILED cab = cabd->open(cabd, TESTFILE("normal_2files_1folder.cab"))

test 2
    Start 2: chmd_test

2: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Release\chmd_test.exe
2: Test timeout computed to be: 10000000
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/blank-filenames.chm: invalid section number '58'.
2: <unknown>:80 FAILED chm1 = chmd->open(chmd, files[i])
2/3 Test #2: chmd_test ........................***Failed    0.02 sec
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/blank-filenames.chm: invalid section number '58'.
<unknown>:80 FAILED chm1 = chmd->open(chmd, files[i])

test 3
    Start 3: kwajd_test

3: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Release\kwajd_test.exe
3: Test timeout computed to be: 10000000
3: ALL 102 TESTS PASSED.
3/3 Test #3: kwajd_test .......................   Passed    0.02 sec

33% tests passed, 2 tests failed out of 3

Total Test time (real) =   0.09 sec

The following tests FAILED:
          1 - cabd_test (Failed)
          2 - chmd_test (Failed)
Errors while running CTest

Other notes:

I'd like to add a Github workflow to build & test on a few different systems and architectures. Maybe after the above concerns are resolved and the cabextract tooling has been added.

@micahsnyder
Copy link
Contributor Author

Update:

The initial test cabd_test failure on Windows was a silly copypaste issue. I've resolved this and the <unknown> line number issue, but now see a different failure for the cabd_test.

C:\Users\micah\workspace\libmspack\libmspack\build [cmake-tooling ≡ +2 ~0 -0 !]> ctest -C Debug -VV  --output-on-failure
UpdateCTestConfiguration  from :C:/Users/micah/workspace/libmspack/libmspack/build/DartConfiguration.tcl
UpdateCTestConfiguration  from :C:/Users/micah/workspace/libmspack/libmspack/build/DartConfiguration.tcl
Test project C:/Users/micah/workspace/libmspack/libmspack/build
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
    Start 1: cabd_test

1: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\cabd_test.exe
1: Test timeout computed to be: 10000000
1: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofolders.cab: no folders in cabinet.
1: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofiles.cab: no files in cabinet.
1: cabd_extract_test_01:347 FAILED err == MSPACK_ERR_DATAFORMAT || err == MSPACK_ERR_DECRUNCH
1/3 Test #1: cabd_test ........................***Failed    0.01 sec
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofolders.cab: no folders in cabinet.
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofiles.cab: no files in cabinet.
cabd_extract_test_01:347 FAILED err == MSPACK_ERR_DATAFORMAT || err == MSPACK_ERR_DECRUNCH

test 2
    Start 2: chmd_test

2: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\chmd_test.exe
2: Test timeout computed to be: 10000000
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/blank-filenames.chm: invalid section number '58'.
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/cve-2015-4468-namelen-bounds.chm: library not compiled to support large files.
2: chmd_search_test_01:80 FAILED chm1 = chmd->open(chmd, files[i])
2/3 Test #2: chmd_test ........................***Failed    0.01 sec
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/blank-filenames.chm: invalid section number '58'.
C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/cve-2015-4468-namelen-bounds.chm: library not compiled to support large files.
chmd_search_test_01:80 FAILED chm1 = chmd->open(chmd, files[i])

test 3
    Start 3: kwajd_test

3: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\kwajd_test.exe
3: Test timeout computed to be: 10000000
3: ALL 102 TESTS PASSED.
3/3 Test #3: kwajd_test .......................   Passed    0.01 sec

33% tests passed, 2 tests failed out of 3

Total Test time (real) =   0.05 sec

The following tests FAILED:
          1 - cabd_test (Failed)
          2 - chmd_test (Failed)
Errors while running CTest

@micahsnyder
Copy link
Contributor Author

I've figured out why the Windows tests fail.

Two reasons:

  1. You can't open "/dev/null" in Windows. Instead, open "NUL" (with 1 "L").

  2. Defining _LARGEFILE_SOURCE=1 and _FILE_OFFSET_BITS=64 does not force off_t to be an int64_t.

This second issue manifests as failures for cabd where unsigned ints are cast to off_t and a large unsigned int becomes a negative value. The following patch resolves it easily enough:

diff --git a/libmspack/mspack/cabd.c b/libmspack/mspack/cabd.c
index 56549b8..75956da 100644
--- a/libmspack/mspack/cabd.c
+++ b/libmspack/mspack/cabd.c
@@ -1012,7 +1012,7 @@ static int cabd_extract(struct mscab_decompressor *base,
   struct mscabd_folder_p *fol;
   struct mspack_system *sys;
   struct mspack_file *fh;
-  off_t filelen;
+  unsigned int filelen;

   if (!self) return MSPACK_ERR_ARGS;
   if (!file) return self->error = MSPACK_ERR_ARGS;
@@ -1049,7 +1049,7 @@ static int cabd_extract(struct mscab_decompressor *base,
    * In salvage mode, don't assume block sizes, just try decoding
    */
   if (!self->salvage) {
-    off_t maxlen = fol->base.num_blocks * CAB_BLOCKMAX;
+    unsigned int maxlen = fol->base.num_blocks * CAB_BLOCKMAX;
     if ((file->offset + filelen) > maxlen) {
       sys->message(NULL, "ERROR; file \"%s\" cannot be extracted, "
                    "cabinet set is incomplete", file->filename);

A larger problem is test failures with chmd caused by the same underlying issue, which I observed debugging a test where read_off64() attempts to store a 64-bit value in a 32-bit off_t:

image

For reference, this is in chmd_search_test_01>chmd_open>chmd_real_open>chmd_read_headers>read_off64. On macOS, the same call sets *var = 281474975015748 instead of -1694908.

I can get test passes on Windows by converting all off_t's to ptrdiff_t's, which is certainly a more reliable type but no doubt breaks large-file support on 32-bit platforms where off_t is forced to be an int64_t.

A better solution I believe would be to swap out all off_t usage for int64_t, since mspack requires off_t's to be 64-bits anyhow. The downside here is that int64_t isn't defined on every system, and even the header where it would be found isn't the same on every system. The clamav project gets around this issue by generating and installing a types header. I'm not sure if this appeals to you... or if you have a better idea.

I'm going to push what I have now, and then take a break and wait for comments before I go any further.

@kyz
Copy link
Owner

kyz commented Apr 10, 2020

Thanks very much for working on this! The support for building on Windows has been in need of modernisation for a while, and it's great that this works in harmony with the autotools rather than try to replace it. I'll take a look a the changes and review them.

@kyz
Copy link
Owner

kyz commented Apr 10, 2020

  1. You can't open "/dev/null" in Windows. Instead, open "NUL" (with 1 "L").

It seems reasonable to put in a Windows-specific filename here.

  1. Defining _LARGEFILE_SOURCE=1 and _FILE_OFFSET_BITS=64 does not force off_t to be an int64_t.

Yes. This is an issue, and it's why you can see in commits 49d8a2a and bb5e2b6 that I had to put the autotools SIZEOF_OFF_T test back in place. I thought I could get away from it, but I can't. Several platforms have 64-bit I/O without needing any preprocessor flags.

libmspack has two modes of operation:

  1. system.h defines LARGEFILE_SUPPORT because the underlying environment has 64-bit off_t and 64-bit file I/O (fseek/fseeko and ftell/ftello support 64-bit offsets). libmspack uses 64-bit offsets everywhere.
  2. system.h does not define LARGEFILE_SUPPORT. It's assumed that only 32-bit file IO is supported. There is no expectation of any 64-bit datatypes. Explicit tests are added to catch any 32-bit or 64-bit values over 2GB, because in POSIX with non-64-bit file IO these offsets would not be supported.

The current tests in system.h are not ideal, only SIZEOF_OFF_T is really testing for 64-bit I/O, LARGEFILE*_SOURCE and _FILE_OFFSET_BITS are just meant for transition, FILESIZEBITS is not whether 64-bit I/O is supported but what is the maximum size of a file (this is really filesystem and path dependent, but in practical terms the limits.h value reveals if the system generally supports 64-bit I/O or not), and I could also have used SSIZE_MAX as another proxy (it's not whether fseek/fseeko supports 64-bit offsets, it's whether read/write support 64-bit sizes).

There is an argument for making the code explicitly use 64-bit datatypes, regardless of the size of off_t, but that has the tradeoff of ending support for 32-bit-only platforms (that have no 64-bit integers or in-compiler workarounds) and would require at least C99 to be absolutely certain of a 64-bit integer datatype.

From a build-system perspective, what should happen is:

  1. Set any defines necessary for the compiler/runtime to turn on 64-bit I/O, if not turned on by default.
  2. define SIZEOF_OFF_T as the number of bytes an off_t uses (ideally with a runtime test)

Copy link
Owner

@kyz kyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks very good.

I must confess I haven't fully learned up how CMake works, so I haven't added any commentary on that, but I can now see how some of the Windows build process hangs together, and this is a great improvement over winbuild.sh, which could definitely be removed.

I'll look more into how CMake works, and also try building this with MSVC to see how it works, but I thought I'd leave some feedback on the changes for now.

Comment on lines 4 to 20
autoreconf -i -W all
echo you can now run ./configure
rc=$?; if [[ $rc != 0 ]]; then
echo "Error: Failed to generate autojunk!"; exit $rc
else
echo "You can now run ./configure"
fi
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be:

if autoreconf -i -W all; then
    echo "You can now ..."
else
   echo "Error: ..."
fi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have any luck with this. This does work though:
if [[ $(autoreconf -i -W all) ]]; then

I just thought about this... is [[ ... ]] bash-only? Will it upset people without bash?

Unrelated, a coworker recommend this improvement for clamav's autogen.sh which lets you run autogen.sh from a different directory (like if you're using a subdirectory for your build).

BASEDIR="$( cd "$(dirname "$0")" ; pwd -P )"
echo "Generating autotools files in: $BASEDIR ..."
cd $BASEDIR

# If this is a source checkout then call autoreconf with error as well
if test -d .git; then
  WARNINGS="all,error"
else
  WARNINGS="all"
fi

autoreconf -i -f
rc=$?; if [[ $rc != 0 ]]; then
  echo "Error: Failed to generate autojunk!"; exit $rc
else
  echo "You can now run ./configure"
fi

Thoughts?

cabextract/getopt.c Show resolved Hide resolved
Comment on lines 5 to 22
autoreconf -i -W all
echo you can now run ./configure
rc=$?; if [[ $rc != 0 ]]; then
echo "Error: Failed to generate autojunk!"; exit $rc
else
echo "You can now run ./configure"
fi
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As before, this could be if autoreconf ...

* @macro
* Version string of the mspack project release.
*/
#define MSPACK_VERSION "@PROJECT_VERSION@"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what value mspack-version.h adds. Could you explain why it would be useful?

libmspack currently has these version numbers and interfaces for the benefit of clients:

  1. At build time: the project version number is available via pkg-config. libmspack.pc is installed for this purpose. If your code needs libmspack 0.10 or higher, this is the right time to check with a pkg-config macro in your build system, to check that the library is both installed and has a high enough version number.
  2. At runtime: libmspack has always been and always intends to be backwards-compatibile, but if it were not, the libtool version number embedded in libmspack.so will ensure that your program links to the binary-compatible version of the library you need (hence why this value is independent of the project version). Is this binary-compatibility versioning also supported in the Windows DLL? From what I can see, it embeds the project version number rather than the libtool-like binary compatibility version numbers.
  3. Runtime feature detection: if you depend on a specific feature, rather that keep a chart of which libmspack releases introduced which features, there is an mspack_version() function to test that features exist, and what version of support is available.

For example, if your code cannot work without MSCABD_PARAM_SALVAGE, you can check at compile time using pkg-config that libmspack is even installed, and if it is that the version installed is at least 0.9. I'm not sure if CMake supports pkg-config on Windows, but from what I see it has find_package, would this be suitable for checking version numbers at build time?

If your code can optionally work with MSCABD_PARAM_SALVAGE, it should check if (mspack_version(MSPACK_VER_MSCABD) >= 2) at runtime rather than #if LIBMSPACK_VERSION >0x90000 at compile time, because the include header installed is no guarantee that your code will be linked to the same version of libmspack library.

Just as an example, imagine that libmspack 0.9, 0.10 and 0.11 exist and they all are compatible with each other. Would cabextract built against libmspack 0.9 link with libmspack.dll from 0.11, and vice versa?

Now imagine that libmspack 2.0 was deliberately binary incompatible, and its LT_ parameters were changed appropriately. On unix we could be sure that cabextract linked against libmspack.so.0 would intentionally not load libmspack.so.1, would the same mechanism stop cabextract.exe (built with libmspack 0.9) from linking to libmspack.dll built with 2.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I don't pretend to be an expert at library package management stuff. I mostly just grumble about it (especially when it has to do with freebsd 😀 ).

As far as I know there is no pkg-config on Windows (native - not counting MingW/Cygwin), on macOS, or on a few other operating systems. Granted on Windows applications are generally deployed with their dependencies. Afaik, no-one installs DLLs to a common location to be shared between different applications. I have to wonder if Microsoft is planning to change that with the new "winget" package manager 😕.

In any case, I don't think pkg-config can be relied on to exist everywhere. I think version macros are most reliable way for an app developer to support multiple versions of a library.

Copy link
Owner

@kyz kyz Jun 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a packaging expert either, but I see a lot of value in it, which is why I use Cygwin, macports and so on. Hopefully Microsoft will too with AppGetWinGet.

To me, it looks like the CMake way of doing things should involve find_package. The value in find_package is:

  • it will import the config from the package you want to use, so both library and caller have the same build settings
  • if you don't have the required version, cmake will say this unambigiously

I feel that trying to do package management in header files is too late (header files can't make the build system import necessary config/defines), and using headers to fail a build by checking version numbers is doing the packaging/build system's job for it.

From the other direction, I also don't see the value of version numbers in headers to control conditional code execution, because libmspack is always backwards compatible, and provides runtime feature checking.

Whatever version of <mspack.h> you have, you can compile code that uses all fields of all structures it defines. To be sure your code doesn't crash or access OOB memory, you should check the appropriate mspack_version() before relying on using something that only exists in V2, V3 of a specific API. A version number in a header can't give you this assurance, because header files can't control what version of library is linked or what .DLL is put in the same directory as the executable.

The only reason I can think of for including the packaging version number in libmspack is to provide it to end users as assurance, e.g. sysadmins wanting to know their libmspack-consuming-program isn't using a vulnerable version of libmspack, so they could run command --help and hope to see command v1.0, using libmspack v0.9.1

Even then, I don't think it should be in a header, because it could bear false witness. It would have to be a function in the public API, to be assured it came from the library/DLL whose code will really run.

Are there other uses of version numbers in library headers that I'm not considering?

* @param int An MSPACK_ERR code.
* @return a constant string with an appropriate error message.
*/
static inline const char *mspack_error_msg(int error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I do think this looks very convenient, there are a few problems with it, which is why I haven't added something like this in the past:

Firstly, libmspack should be able to compile without any standard C library functions when MSPACK_NO_DEFAULT_SYSTEM is defined. I understand that static inline means that technically this code is not in the library, but I would prefer to see no <stdio.h> in mspack.h at all.

Secondly, I am secretly trying to avoid any user-readable text available from the library API or header, because if it exists, it should really be internationalisable / localisable, therefore that would mean the library and/or header need to support every environment's localisation processes. I'd like to keep localisation exclusively to clients, so that e.g. clients depend on libmspack and gettext, not that libmspack depends on gettext, and definitely not that you need gettext installed in order to #include <mspack.h>

Thirdly, it reduces the quality of cabextract. cabextract.c intentionally uses its own mspack_system implementation, and knows it uses C stdio functions, so its error_msg() function is not the same as error.h, it intentionally uses errno and strerror() to print a more useful error message that includes the reason why there was a file opening/read/write error.

Copy link
Owner

@kyz kyz Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing the code in light of these comments:

Firstly, libmspack should be able to compile without any standard C library functions when MSPACK_NO_DEFAULT_SYSTEM is defined. I understand that static inline means that technically this code is not in the library, but I would prefer to see no <stdio.h> in mspack.h at all.

... I've found that memcmp, memset and strlen calls have crept into libmspack, so I've introduced replacement functions in system.h and avoid including <string.h> at all, if MSPACK_NO_DEFAULT_SYSTEM is set, so libmspack still meets its std-C-library independence goals.

However, this makes it hard to include system.h with code outside libmspack when MSPACK_NO_DEFAULT_SYSTEM is set, so I moved the convenient macros that outside code was using system.h for into a new file, macros.h.

On reflection, I'd probably be happy with the error text function in mspack.h, if it simply returned "unknown error" rather than return a snprintf() buffer, and thus mspack.h did not need stdio.h

Secondly, I am secretly trying to avoid any user-readable text available from the library API or header, because if it exists, it should really be internationalisable / localisable, therefore that would mean the library and/or header need to support every environment's localisation processes. I'd like to keep localisation exclusively to clients, so that e.g. clients depend on libmspack and gettext, not that libmspack depends on gettext, and definitely not that you need gettext installed in order to #include <mspack.h>

While I am trying to avoid it, it's clearly not working because there are several english-only warning messages emitted by libmspack. So this is still an issue to consider, but this new macro doesn't introduce a new problem.

Thirdly, it reduces the quality of cabextract. cabextract.c intentionally uses its own mspack_system implementation, and knows it uses C stdio functions, so its error_msg() function is not the same as error.h, it intentionally uses errno and strerror() to print a more useful error message that includes the reason why there was a file opening/read/write error.

cabextract.c should still revert to its original implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger I'd be happy to revert cabextract to it's original error message implementation. I'll do this as I rebase the PR and fix merge conflicts.

Regarding mspack/macros.h, I see that it's been used in one of the example programs. Would you mind me removing the #include from cabrip.c and copypasting the code to define LD? This way, the example program only depends on header(s) provided by the mspack library build target.

In the "modern" CMake method, a build is a collection of build targets to include objects, static or shared libraries, and executables. Each may have "private", "public", and "interface" sources.

  • Private sources are required to build the given target.
  • Public sources are required to build the given target and are provided as a part of the API for downstream targets.
  • Interface sources aren't used in the current target and provide an API for downstream targets (particularly useful when providing a header for pre-built sources, like a proprietary library.

I've set up the examples as executable targets that link with the mspack shared library target. As the mspack shared library doesn't include mspack/macros.h in it's public sources, the example programs don't have access to it, just like a real user application. We could definitely cheat with the target_include_directories(), but it seems an appropriate test of the library to build the example executables the same way your library users would.

As a side note, this also affects the tests targets, but for unit tests I think it makes sense to "cheat" and give unit tests access to library internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue of only using exported headers is also the reason providing MSPACK_ERROR() and mspack_error_msg() in mspack.h made sense to me. It makes these available for use in your examples.

The alternatives are to:
(A) duplicate error.h in the examples directory.
(B) merely print error codes instead of error messages in the examples.

Which would you prefer? I lean towards duplicating error.h so users are guided towards the approach cabextract uses. Since it's late and I want to push up updates before the work week, I'll make that change. Let me know if you'd prefer something different.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would rather have error.h in the examples directory.

But in general, what I'd like to have with macros.h doesn't fit cleanly with CMake's way of thinking; a header file that is used both internally by a project, and internally by its examples, but simply because it helps define the coding conventions of that project, it's not part of that project's external interface, nor is it recommended for other project users.

Perhaps instead of copy-paste, would creating a symlink work? This is how cabextract consumes its internal copy of libmspack, and it also uses files "private" to libmspack in cabinfo, simply because it does not make sense (to me) to define the structure offsets of cabinet files more than once, even if this breaks the theoretical encapsulation.

@kyz
Copy link
Owner

kyz commented Apr 14, 2020

libmspack has two modes of operation:

1. system.h defines LARGEFILE_SUPPORT **because** the underlying environment has 64-bit off_t and 64-bit file I/O (fseek/fseeko and ftell/ftello support 64-bit offsets). libmspack uses 64-bit offsets everywhere.

2. system.h does not define LARGEFILE_SUPPORT. It's assumed that only 32-bit file IO is supported. There is no expectation of any 64-bit datatypes. Explicit tests are added to catch any 32-bit or 64-bit values over 2GB, because in POSIX with non-64-bit file IO these offsets would not be supported.

So, to cover this better, I've removed the various largefile define checks that just guess if off_t is 64-bit or not, and now rely only on SIZEOF_OFF_T, so this should be simpler. If SIZEOF_OFF_T < 8, libmspack will assume off_t is at least 32-bit, and will apply overflow error checking if cab search offsets or CHM offsets go beyond a 2GB barrier. Build systems should apply any flags needed to enable 64-bit off_t (and fseeko/ftello) if they want 64-bit I/O support.

@micahsnyder
Copy link
Contributor Author

Sorry about the delay on this! I never saw a notification from github, and thought you hadn't looked at it yet. I'll try to review your comments asap and update accordingly.

@micahsnyder
Copy link
Contributor Author

After rebasing to include your changes and fix up a the mspack error message stuff, and add additional improvements to the autogen.sh script, I have found that 2/3 tests pass now. The chmd test still fails, because off_t is 32bits on Windows. Output below:

test 1
    Start 1: cabd_test

1: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\cabd_test.exe
1: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/cabd/bad_nofiles.cab: no files in cabinet.
1: ERROR; file " ¶hello.jl♠�♣" cannot be extracted, cabinet set is incomplete
1: ERROR; file "�" cannot be extracted, cabinet set is incomplete
1: ERROR; file "���" cannot be extracted, cabinet set is incomplete
1: ERROR; file "2" cannot be extracted, cabinet set is incomplete
1: ALL 409 TESTS PASSED.
1/3 Test #1: cabd_test ........................   Passed    0.04 sec
test 2
    Start 2: chmd_test

2: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\chmd_test.exe
2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/cve-2015-4468-namelen-bounds.chm: library not compiled to support large files.
2: chmd_search_test_01:79 FAILED chm1 = chmd->open(chmd, files[i])
2/3 Test #2: chmd_test ........................***Failed    0.03 sec
test 3
    Start 3: kwajd_test

3: Test command: C:\Users\micah\workspace\libmspack\libmspack\build\test\Debug\kwajd_test.exe
3: Test timeout computed to be: 10000000
3: ALL 102 TESTS PASSED.
3/3 Test #3: kwajd_test .......................   Passed    0.02 sec


Total Test time (real) =   0.11 sec

The following tests FAILED:
          2 - chmd_test (Failed)
Errors while running CTest

@kyz
Copy link
Owner

kyz commented Jun 2, 2020

After rebasing to include your changes and fix up a the mspack error message stuff, and add additional improvements to the autogen.sh script, I have found that 2/3 tests pass now. The chmd test still fails, because off_t is 32bits on Windows.

You should want 64-bit I/O on Windows, given it is capable of it. config.h needs to have something like this on Windows / Visual C:

#define off_t __int64
#define fseeko _fseeki64
#define ftello _ftelli64
#define SIZEOF_OFF_T 8
#define HAVE_FSEEKO
#define HAVE_FTELLO

However, this is just papering over a bug in the test suite which is visible when any system, not just Windows, runs with 32-bit I/O.

2: C:/Users/micah/workspace/libmspack/libmspack/test/test_files/chmd/cve-2015-4468-namelen-bounds.chm: library not compiled to support large files.
2: chmd_search_test_01:79 FAILED chm1 = chmd->open(chmd, files[i])

The purpose of chmd_search_test_01() is to check that searching various corrupt files does not crash libmspack. It also asserts they can be opened, i.e. they're not so corrupt they're rejected early.

cve-2015-4468-namelen-bounds.chm and cve-2015-4469-namelen-bounds.chm are rejected by the extra checks in 32-bit I/O mode because they have any offsets over 2GB:

  • cve-2015-4468-namelen-bounds.chm HS0 file length is 0xFFFFFFE62344
  • cve-2015-4469-namelen-bounds.chmoffset to CS0 is 0x7000010CC
  • cve-2015-4469-namelen-bounds.chm HS0 file length is 0xDEFF00020000

I've changed these offsets in the test files so both 32-bit I/O mode and 64-bit I/O mode get to the same code path and test the files properly. See commit 7e63519

@micahsnyder
Copy link
Contributor Author

I've changed these offsets in the test files so both 32-bit I/O mode and 64-bit I/O mode get to the same code path and test the files properly. See commit 7e63519

I rebased the PR and fixed a minor issue w/ defining aliases for the static and shared libraries while I was at it. I'm now seeing a test passes on Windows!

image

I just tried playing around with your suggestion to #define off_t to __int64 and to override fseeko & ftello with _fseeki64 and _ftelli64. Unfortunately that results in many errors like this:

  c:\program files (x86)\windows kits\10\include\10.0.18362.0\ucrt\sys\types.h(42): error C2628: '_off_t' followed by '
__int64' is illegal (did you forget a ';'?) [C:\Users\micasnyd\workspace\libmspack\libmspack\build\mspack\mspack_ObjLib
.vcxproj]

I'm not sure what to do about that. Any ideas?

Anyhow, With libmspack building and passing all unit tests (at least with 32bit I/O), I reckon it's time for me to implement the CMake tooling for cabextract. I'll hop on it.

Best,
Micah

@micahsnyder
Copy link
Contributor Author

micahsnyder commented Jun 8, 2020

Okay! I have cabextract building on mac & Windows. The tests will run on posix systems, but not on Windows because they're all shell scripts.

That said, I haven't really tested out if cabextract.exe works properly on Windows. I did have to use a Win32 API since Windows doesn't have fnmatch.h. It does built if you link it with a static mspack library (mspack_static.lib) built with the libmspack CMake tooling first. Instructions added to the cabextract README. It won't build with the "bundled" mspack because the soft links don't work on Windows.

The tests all pass except the encoding test. I haven't spent the time to figure out why it fails:


 1/10 Test  #1: bugs .............................   Passed    0.04 sec
test 2
      Start  2: case-ascii

2: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/case-ascii.test"
2: Test timeout computed to be: 10000000
 2/10 Test  #2: case-ascii .......................   Passed    0.09 sec
test 3
      Start  3: dirwalk-vulns

3: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/dirwalk-vulns.test"
3: Test timeout computed to be: 10000000
 3/10 Test  #3: dirwalk-vulns ....................   Passed    0.06 sec
test 4
      Start  4: encoding

4: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/encoding.test"
4: Test timeout computed to be: 10000000
 4/10 Test  #4: encoding .........................***Failed    0.03 sec
test 5
      Start  5: large-files

5: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/large-files.test"
5: Test timeout computed to be: 10000000

 5/10 Test  #5: large-files ......................   Passed   27.21 sec
test 6
      Start  6: mixed

6: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/mixed.test"
6: Test timeout computed to be: 10000000
 6/10 Test  #6: mixed ............................   Passed    0.07 sec
test 7
      Start  7: search

7: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/search.test"
7: Test timeout computed to be: 10000000
 7/10 Test  #7: search ...........................   Passed    0.04 sec
test 8
      Start  8: simple

8: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/simple.test"
8: Test timeout computed to be: 10000000
 8/10 Test  #8: simple ...........................   Passed    0.05 sec
test 9
      Start  9: split

9: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/split.test"
9: Test timeout computed to be: 10000000
 9/10 Test  #9: split ............................   Passed    0.11 sec
test 10
      Start 10: utf8-stresstest

10: Test command: /bin/bash "/Users/micasnyd/workspace/libmspack-micah/cabextract/test/utf8-stresstest.test"
10: Test timeout computed to be: 10000000
10/10 Test #10: utf8-stresstest ..................   Passed    0.04 sec

90% tests passed, 1 tests failed out of 10

Total Test time (real) =  27.77 sec

The following tests FAILED:
	  4 - encoding (Failed)
Errors while running CTest

Please try it out and let me know what you think.

@micahsnyder
Copy link
Contributor Author

micahsnyder commented Feb 1, 2021

@kyz Have you had a chance to give this a try?

I've been meaning to figure out the reason for the encoding test failure. Will try to figure that out soon.

Adds CMake tooling for libmspack to the project.
This attempts to replicate every feature available in the autotools
tooling and includes support for building on Windows.

Adds a mspack-version.h header file, generated from mspack-version.h.in.
mspack-version.h is to be installed alongside mspack.h so that application
developers can use the version numbers and strings in the event that the API
is modified and the application must maintain backwards compatibility.

Adds error handling to autogen.sh.

Reformats libmspack's README as Markdown.

Adds mspack_error_msg() and MSPACK_ERROR() macro to mspack.h API.
The new API is a cross between the cab_error() function previously found
in cabextract and error_msg() found in test/error.h.  This removes the
need the examples to depend on test/error.h, a header not exported by
any build target, and removes the code duplication with cabextract.

Fixes __func__ macro for Windows.
On Windows, the tests should write to "NUL" instead of "/dev/null".

Resolved an issue where a large unsigned int cast to an off_t resulted
in a negative value, because off_t is 32-bits instead of 64.
Adds CMake tooling for cabextract to the project.

Converts cabextract README to Markdown (.md) so it will look pretty on
Github, and adds build instructions to cabextract README.md.
The current implementations check for iconv.h and set HAVE_ICONV_H but
don't set HAVE_ICONV, which is the macro we actually depend on.
In the autotools build system, HAVE_ICONV is set by the `AM_ICONV`
macro, if you have gettext-devel installed (note, this will cause an
autoreconf warning if you don't have gettext-devel installed).

Anyways... to fix this in CMake, I've added FindICONV.cmake module which
is a slightly modified variant of the official CMake iconv detection
module. This one has some additional features including setting the
ICONV_CONST macro as needed.

This commit fixes the cabextract "encoding" test, which was the final
outstanding issue with CMake support, excluding the fact that there
are no cabextract feature tests for Windows. I think this is acceptable
for now, since it's not like there were any before switching to CMake.
@micahsnyder micahsnyder force-pushed the cmake-tooling branch 3 times, most recently from 37a5a38 to e4bb594 Compare May 2, 2021 02:26
Add CMake build for cabextract and libmspack on GitHub Actions.
Will test on Ubuntu, macOS, Windows.

Added build badges to the readme for each project.
@micahsnyder
Copy link
Contributor Author

@kyz Ok I fixed the encoding test. It was an iconv detection issue. Speaking of which, I noticed that iconv detection will fail for autotools if gettext-devel isn't installed, but I didn't know if I should try to fix it (or how).

I also rebased this with master and since then the Windows tests for libmspack are passing. 🎉

Anyways, I also just added github actions build tests for Mac, Linux, and Windows for cabextract and libmspack. For them to run on this PR, I think you'll have to approve them. I tested them on my fork and they seem to run nicely: https://github.com/micahsnyder/libmspack/actions/workflows/cmake-cabextract.yml

I was also able to use vcpkg to bring in libiconv support with cabextract on Windows! You can see libiconv detection working on Windows here: https://github.com/micahsnyder/libmspack/runs/2484761769?check_suite_focus=true

Anyways, I think this work is basically done? Personally I'd advocate for removing the autotools build system at this point. If you like that, I can help.

Please give this a try and let me know what you think.

@wdlkmpx
Copy link

wdlkmpx commented May 22, 2021

I strongly advise against removing the autotools stuff. I had to create or restore autotools projects to be able to properly cross compile a few apps.

With autotools I can easily cross compile cabextract for 20 linux processor architectures and win32/win64, it needs a couple of tweaks for libiconv and only include the local getopt when the system getopt is missing, for better compatibility with MinGW.

Setting CMake mininum version to 3.20 and remove all the extra cmake files is a good idea.

CMake for modern systems, autotools for older systems and for cross compiling like there's no tomorrow.

@micahsnyder
Copy link
Contributor Author

Cross-compiling should be possible with CMake as well, though I don't have any experience with it: https://cmake.org/cmake/help/latest/manual/cmake-toolchains.7.html

Setting CMake mininum version to 3.20 and remove all the extra cmake files is a good idea.

Can you elaborate?

@wdlkmpx
Copy link

wdlkmpx commented May 22, 2021

I think it got easier in recent versions, but for many years only the old autotools could handle cross-compiling properly. My scripts use autotools, this is crucial for mingw at least, Windows binaries magically appear.

And autotools works with barebones toolchains, with basically gcc and nothing else, simple C apps like this one compile in a lot of architectures without issues. If anybody is interested I can provide static binaries for releases.

I mean all those cmake modules or something that got added to cmake in recent years, I see a lot of them in this PR. But it's just an idea.

Anyway if autotools gets removed I'll restore it for my cross-builds. CMake is good for testing stuff in native installations in Travis CI and the like.

Anyway I think this PR is good and it was a lot of work, I hope it gets merged.

@wdlkmpx
Copy link

wdlkmpx commented May 23, 2021

Porting autotools configs to cmake is a bit problematic, it requires knowledge of both build systems.

Testing unshield in several OS's until Travis CI cancelled my ability to trigger builds, I have some ideas. A better md5.c that is endian-independent, a header-only getopt.h and fnmatch.h only used if system headers are not found, but this should be merged first.

autotools: something like this, configure.ac should check for getopt.h

cabextract.c should have something like this:

#ifdef HAVE_GETOPT_H
# include <getopt.h>
#else
# include "../compat/getopt.h"
#endif

MSVC and ancient Solaris versions lack getopt.h, only MSVC is worth considering.

Now with CMake:
CMakeLists.txt: check_include_files(getopt.h HAVE_GETOPT_H)
cmake.config.in: #cmakedefine HAVE_GETOPT_H @HAVE_GETOPT_H@

and so on

@wdlkmpx
Copy link

wdlkmpx commented May 23, 2021

But here's the thing, one may have many ideas and the ability to perform big changes, but that's not enough for stuff to be merged. Once the momentum is lost, everything is back to normal and the code lays dormant. I'm seeing the same history in basically all repos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants