Fix segmentation faults and items with ID 0 #686
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Segmentation faults
Segmentation faults were observed within Mac OS X and in a development environment where the CLI was compiled by developers and not distributed via package managers such as Homebrew.
Upon inspecting the CMakeLists configuration and some tests with the compilation process, we identified that when compiling under Mac OS X, the linker was linking the CLI against the system libraries for OpenSSL and CURL.
This is not what we want at all, as the system libraries for OpenSSL and CURL are part of the Mac OS X ecosystem, and they are heavily dependent upon Mac OS X. Rather, what we want is to make sure the final CLI executable is linking against the "normal" OpenSSL and CURL versions. To achieve this, we changed the CMakeLists configuration so that it detects if the compilation process is running under Mac OS X, if Homebrew is installed, and if OpenSSL/CURL are installed via Homebrew. In this case, we will make sure that the final CLI executable is linking against the "normal" OpenSSL and CURL versions by specifying in pkg-config the Homebrew paths to search for the dependencies.
Items with ID 0
The issue of items with ID 0 was reproduced on Mac OS X but not on Cygwin or Linux. On Mac OS X, the ID 0 was causing unexpected errors that would lead to the CLI terminating abruptly, therefore, we used the Console Guide application to obtain the core dump and investigate it. The following is the core dump of the CLI tool when the ID 0 occurred:
Looking at the core dump and stack trace information we can derive that the issue happens on a child process of the CLI tool and occurs within a call to a specific Mac OS X function in CURL, which is the curl_macos_init function. The code of that function can be seen in GitHub.
Within the codebase we can see that CLI uses a multi-process approach to carry out its functionalities, specifically, the CLI uses its main process to handle the command line arguments and call the necessary set of functions that will process the command line arguments, but the CLI also spins up a child process that handles the actual HTTP/HTTPs communication to create new items.
The relevant point of the aforementioned approach is that after the CLI spins up a new child process, it proceeds to call the global cleanup/initialization functions of CURL within the scope of the child process, and this is where the issue occurs. When the child process tries to call the global cleanup/initialization functions of CURL, those functions will access CoreFoundation libraries of the Mac OS X and the child process terminates unexpectedly and the new item never gets sent to the backend, therefore, it never gets an ID, hence the item having an ID 0. Relevant code snippet:
The global cleanup/initialization CURL functions are already called at the start of the CLI execution by the main process, and according to CURL documentation:
”This function must be called at least once within a program (a program is all the code that shares a memory space) before the program calls any other function in libcurl. The environment it sets up is constant for the life of the program and is the same for every program, so multiple calls have the same effect as one call.”
Parent and the child processes have their own memory space, but they initially share the same memory contents, which means that the child process inherits the global CURL state so it should not be necessary for the child process to also perform a global cleanup/initialization. There are some articles that also share this view and that advise against calling such functions on child processes:
curl_global_init is to be called once. Remember that both fork creates a copy of the process at the time it was called, so it will replicate curl's state as well. Thus, there will be no need to call curl_global_init again in the child if the initialization was already done prior to the fork.
lib/hostip.c: SCDynamicStoreCopyProxies is not safe to run in a fork · Issue #11252 · curl/curl
curl_global_init() is, unfortunately, not thread safe, so you must ensure that you only do it once and never simultaneously with another call. It initializes global state so you should only call it once, and once your program is completely done using libcurl you can call curl_global_cleanup() to free and clean up the associated global resources the init call allocated.
Better solution than hacking around the issue with ENV flags, is to call Ethon::Curl.init before the application forks (such as in a Rails initializer), since as of Curl v8.2.0, macOS specific calls were moved into the global init.
Based on the aforementioned information, we decided to remove the global cleanup/initialization function calls from the child process and only let the parent process call such functions, which caused the issue with items having ID 0 to stop.
One alternative implementation would be to only let the child process perform the global cleanup/initialization function calls, but this would lead us into a state where if we need to introduce other child process with CURL access, they would also need to call the global cleanup/initialization function calls. Furthermore, its safer to have the global cleanup/initialization within the parent process than in child processes, because executing such actions in child processes means that we are now on multi-process territory, which is inherently less safe.