-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17635: [Python][CI] Sync conda recipe with the arrow-cpp feedstock #14102
Conversation
|
@github-actions crossbow submit -g conda |
This comment was marked as outdated.
This comment was marked as outdated.
Hmm, you might want to rebase and fix conflicts now that the C++17 PR was merged :-) |
I think the config on our |
4d94128
to
fe737d3
Compare
That was very much correct thanks! :)
Yes of course. This comes from more or less 1:1 from the so-called feedstock on the conda-forge side (though in this case it's still the state of a PR), where the recipe is maintained, and where all these Conda-forge is constantly updating builds against new versions (e.g. abseil/grpc/libprotobuf), and once arrow has been successfully built against those packages, the builds are published (by the CI on the feedstock) and the recipe is updated. In this case, there's been a while since the last update, so our infrastructure has changed a bit. Also, the relatively scary-looking changes in I synced the recipe separately from the ci-files, taking care to undo some things that are intentionally different in arrow-CI: diff --git a/dev/tasks/conda-recipes/arrow-cpp/bld-pyarrow.bat b/dev/tasks/conda-recipes/arrow-cpp/bld-pyarrow.bat
index e1073b563..bd20c79ef 100644
--- a/dev/tasks/conda-recipes/arrow-cpp/bld-pyarrow.bat
+++ b/dev/tasks/conda-recipes/arrow-cpp/bld-pyarrow.bat
@@ -1,16 +1,18 @@
+@echo on
pushd "%SRC_DIR%"\python
@rem the symlinks for cmake modules don't work here
-del cmake_modules\BuildUtils.cmake
-del cmake_modules\SetupCxxFlags.cmake
-del cmake_modules\CompilerInfo.cmake
-del cmake_modules\FindNumPy.cmake
-del cmake_modules\FindPythonLibsNew.cmake
-copy /Y "%SRC_DIR%\cpp\cmake_modules\BuildUtils.cmake" cmake_modules\
-copy /Y "%SRC_DIR%\cpp\cmake_modules\SetupCxxFlags.cmake" cmake_modules\
-copy /Y "%SRC_DIR%\cpp\cmake_modules\CompilerInfo.cmake" cmake_modules\
-copy /Y "%SRC_DIR%\cpp\cmake_modules\FindNumPy.cmake" cmake_modules\
-copy /Y "%SRC_DIR%\cpp\cmake_modules\FindPythonLibsNew.cmake" cmake_modules\
+@rem NOTE: In contrast to conda-forge, they work here as we clone from git.
+@rem del cmake_modules\BuildUtils.cmake
+@rem del cmake_modules\SetupCxxFlags.cmake
+@rem del cmake_modules\CompilerInfo.cmake
+@rem del cmake_modules\FindNumPy.cmake
+@rem del cmake_modules\FindPythonLibsNew.cmake
+@rem copy /Y "%SRC_DIR%\cpp\cmake_modules\BuildUtils.cmake" cmake_modules\
+@rem copy /Y "%SRC_DIR%\cpp\cmake_modules\SetupCxxFlags.cmake" cmake_modules\
+@rem copy /Y "%SRC_DIR%\cpp\cmake_modules\CompilerInfo.cmake" cmake_modules\
+@rem copy /Y "%SRC_DIR%\cpp\cmake_modules\FindNumPy.cmake" cmake_modules\
+@rem copy /Y "%SRC_DIR%\cpp\cmake_modules\FindPythonLibsNew.cmake" cmake_modules\
SET ARROW_HOME=%LIBRARY_PREFIX%
SET SETUPTOOLS_SCM_PRETEND_VERSION=%PKG_VERSION%
diff --git a/dev/tasks/conda-recipes/arrow-cpp/meta.yaml b/dev/tasks/conda-recipes/arrow-cpp/meta.yaml
index 107b8433a..fd0625296 100644
--- a/dev/tasks/conda-recipes/arrow-cpp/meta.yaml
+++ b/dev/tasks/conda-recipes/arrow-cpp/meta.yaml
@@ -1,6 +1,7 @@
-{% set version = "9.0.0" %}
+# NOTE: In constrast to the conda-forge recipe, ARROW_VERSION is a templated variable here.
+{% set version = ARROW_VERSION %}
{% set cuda_enabled = cuda_compiler_version != "None" %}
-{% set build_ext_version = "3.0.0" %}
+{% set build_ext_version = ARROW_VERSION %}
{% set build_ext = "cuda" if cuda_enabled else "cpu" %}
{% set proc_build_number = "0" %}
{% set llvm_version = "14" %}
@@ -10,15 +11,10 @@ package:
version: {{ version }}
source:
- url: https://dist.apache.org/repos/dist/release/arrow/arrow-{{ version }}/apache-arrow-{{ version }}.tar.gz
- sha256: a9a033f0a3490289998f458680d19579cf07911717ba65afde6cb80070f7a9b5
- patches:
- # backport of the following commit for compatibility with VS2019
- # https://github.com/apache/arrow/commit/897c186f475f3dd82c1ab47e5cfb87cb0fed8440
- - patches/0001-ARROW-17433-CI-C-Use-Visual-Studio-2019-on-AppVeyor-.patch
+ path: ../../../../
build:
- number: 6
+ number: 0
# for cuda support, building with one version is enough to be compatible with
# all later versions, since arrow is only using libcuda, and not libcudart.
skip: true # [(win or linux64) and cuda_compiler_version not in ("None", "10.2")]
@@ -105,9 +101,7 @@ outputs:
- re2
- snappy
- thrift-cpp
- # https://github.com/apache/arrow/blob/apache-arrow-9.0.0/cpp/cmake_modules/ThirdpartyToolchain.cmake#L2245
- # currently pins to exact xsimd version
- - xsimd 8.1.0
+ - xsimd
- zlib
- zstd
run: After fixing the paths to the CI-support files in Hope that makes things clearer. 🙃 Though, to be frank, I really only know the conda-forge side of this, so for arrow-specific things, we should definitely ask @xhochy to double-check the work here. |
@github-actions crossbow submit -g conda |
This comment was marked as outdated.
This comment was marked as outdated.
Thanks for the detailed response!
This |
It's possible to run this locally in principle, but it would be incomplete, because Another option might be to use the conda-forge feedstock as a submodule (in a fork controlled by arrow) where it would be possible to:
This would have slightly higher impedance for development work that needs to touch the conda recipe (i.e. part of the work would be a commit to that forked feedstock and updating the PR to point to that commit), but with the benefit that we could sync the changes that accumulated throughout the development of a new arrow version faithfully to the feedstock, and easily sync back infra updates from conda-forge the other way. |
It seems
Interestingly, this only fails during the |
@h-vetinari Can we determine what LLVM is used for during the PyArrow build? |
Not sure I understand your question, but there's no LLVM but the one we specify to be there. ;-) The recipe as posted in this PR works for 9.0 (otherwise the feedstock would be broken currently). where we currently use LLVM 14 (though note that the llvm artefacts are a bit split up in conda-forge, i.e. we distinguish between llvm the library, clang, libcxx, openmp, etc.). I don't know what |
Sorry. I guess my question was: what does PyArrow try to do with LLVM? |
That's a good question. I'd have to go hunting in what gets triggered by Perhaps @kou (as the author of the changes in #13892) knows. |
We need LLVM headers to use Gandiva. (Some public Gandiva headers use LLVM headers.) |
Yes, and LLVM + its headers are during build time from the standard location in the |
Ouch. That's rather bad. I hadn't noticed that. |
Not sure it's bad (certainly not a problem from the POV of conda-forge), it's been running for a long time like that. What changed recently is how those headers are detected. |
Yeah, that was more of a general statement, not specifically about conda-forge. |
Could you share the URL for the error?
Let's discuss this on |
It's in the crossbow runs from this comment. The updates in this PR correspond to the state of the feedstock, with (essentially) the only change that here the recipe is building from the master branch rather than from a tagged version, i.e. after importing the diff --git a/dev/tasks/conda-recipes/arrow-cpp/meta.yaml b/dev/tasks/conda-recipes/arrow-cpp/meta.yaml
index 107b8433a..fd0625296 100644
--- a/dev/tasks/conda-recipes/arrow-cpp/meta.yaml
+++ b/dev/tasks/conda-recipes/arrow-cpp/meta.yaml
@@ -1,6 +1,7 @@
-{% set version = "9.0.0" %}
+# NOTE: In constrast to the conda-forge recipe, ARROW_VERSION is a templated variable here.
+{% set version = ARROW_VERSION %}
{% set cuda_enabled = cuda_compiler_version != "None" %}
-{% set build_ext_version = "3.0.0" %}
+{% set build_ext_version = ARROW_VERSION %}
{% set build_ext = "cuda" if cuda_enabled else "cpu" %}
{% set proc_build_number = "0" %}
{% set llvm_version = "14" %} (which restored the previous state) |
Thanks. It seems that LLVM related packages are removed from pyarrow's host requirements: https://github.com/apache/arrow/pull/14102/files#diff-e1dc9dc44f82817c013424829588fc3fe48e63c78b42ce47e012016a243de09bL187-L189 Could you restore them? |
Yes! I did add the llvmdev headers & tools now, though I'm not sure about clangdev. Do you need |
@github-actions crossbow submit -g conda |
I think that we don't need |
LLVM headers needed by gandiva; clang needed because FindLLVMAlt.cmake depends on it to find LLVM correctly. Co-Authored-By: Sutou Kouhei <kou@clear-code.com>
@github-actions crossbow submit conda-win-vs2019-py37-r41 |
🤷 I am not sure, I don't really know if this is important for the user base. We could remove them for nwo if the issue is out of scope to fix here. |
Ok, let's disable PPC for now. The goal should be to improve the state of conda recipes, not necessarily get all of them working. |
Revision: 57e3dc9 Submitted crossbow builds: ursacomputing/crossbow @ actions-ff8a79230b
|
Just to note that in conda-forge/arrow-cpp-feedstock#866, the PPC builds aren't failing anymore. Though there's another topic where I'd be happy for input - now that arrow-cpp does not depend on python anymore, conda smithy by default folds all the pyarrow-outputs into the same job. This is great for all arches except those where we are dead slow due to emulation. To straddle that divide there are 4 options, feedback would be most welcome. :) Spoiler: option 4. would be ideal but is unrealistic, option 3. is second-best and should be compatible with the recipe sync here. PS. I'd deal with this separately from this PR though |
@github-actions crossbow submit conda-win-vs2019-py37-r41 |
Revision: 57e3dc9 Submitted crossbow builds: ursacomputing/crossbow @ actions-2b2b87a9df
|
@h-vetinari @pitrou With Windows fixed, is there anything more to do here? Should we rerun the full conda crossbow suite? |
Whatever you prefer :) Based on testing in conda-forge/arrow-cpp-feedstock#866 PPC should be fine now. Of course it doesn't hurt to test, but in any case, we'll have to sync the recipe again once conda-forge starts leveraging the fact that arrow-cpp does not depend on python anymore. |
@github-actions crossbow submit conda* |
Revision: 57e3dc9 Submitted crossbow builds: ursacomputing/crossbow @ actions-4a206b32ca |
I think the |
Thanks @raulcd! This is looking better much now. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
BTW, how can we maintain dev/tasks/conda-recipes/
?
Can we discuss this on dev@?
https://lists.apache.org/thread/5pfzs9q5l55x7n9vdpclyr7qvbxtq5h3
It seems that there is no objection. |
Benchmark runs are scheduled for baseline = 9707630 and contender = 9d060aa. 9d060aa is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
@h-vetinari thanks a lot for the effort here! |
…ock (#14102) Corresponds to status of feedstock as of conda-forge/arrow-cpp-feedstock#848, minus obvious & intentional divergences in the setup here (with the exception of unpinning xsimd, which was [pinned](https://github.com/apache/arrow/blob/apache-arrow-9.0.0/cpp/cmake_modules/ThirdpartyToolchain.cmake#L2245) as of 9.0.0, but isn't anymore). Lead-authored-by: H. Vetinari <h.vetinari@gmx.com> Co-authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Synching after conda-forge/arrow-cpp-feedstock#875, which does quite a lot of things, see this [summary](conda-forge/arrow-cpp-feedstock#875 (review)). I'm not keeping the commit history here, but it might be instructive to check the commits there to see why certain changes came about. It also fixes the CI that was broken by a3ef64b (undoing the changes of #14102 in `tasks.yml`). Finally, it adapts to conda making a long-planned [switch](conda-forge/conda-forge.github.io#1586) w.r.t. to the format / extension of the artefacts it produces. I'm very likely going to need some help (or at least pointers) for the R-stuff. CC @ xhochy (for context, I never got a response to conda-forge/r-arrow-feedstock#55, but I'll open a PR to build against libarrow 10). Once this is done, I can open issues to tackle the tests that shouldn't be failing, resp. the segfaults on PPC resp. in conjunction with `sparse`. * Closes: #14828 Authored-by: H. Vetinari <h.vetinari@gmx.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Corresponds to status of feedstock as of conda-forge/arrow-cpp-feedstock#848, minus obvious & intentional divergences in the setup here (with the exception of unpinning xsimd, which was pinned as of 9.0.0, but isn't anymore).