Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tensorflow/swift-apis as a SwiftPM dependency. #260

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dan-zheng
Copy link
Collaborator

@dan-zheng dan-zheng commented Jan 8, 2021

Motivation

This enables building SwiftFusion using stock toolchains from swift.org/download.

swift build will clone and build tensorflow/swift-apis as a regular SwiftPM dependency. Eventually, we would like to stop releasing custom toolchains bundled with pre-installed tensorflow/swift-apis.

Build instructions

It is possible to build tensorflow/swift-apis and dependencies like SwiftFusion using stock toolchains by installing pre-built X10 libraries (currently available only for macOS and Windows).

After installing (e.g. to $HOME/Library on macOS), build with SwiftPM via the following:

$ swift build -Xcc -I$HOME/Library/tensorflow-2.4.0/usr/include -Xlinker -L$HOME/Library/tensorflow-2.4.0/usr/lib -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN

swift test is known not to work on macOS for tensorflow/swift-apis and dependencies due to SR-14008: Library not loaded: /usr/lib/swift/libswift_Differentiation.dylib.

Testing

Before merging, let's verify that swift build, swift run, and swift test works for swift.org/download toolchains across platforms, and update GitHub Actions CI so that it passes:

$ swift run Pose3SLAMG2O -Xcc -I$HOME/Library/tensorflow-2.4.0/usr/include -Xlinker -L$HOME/Library/tensorflow-2.4.0/usr/lib -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN
...
Everything is already up-to-date
dyld: Library not loaded: /usr/lib/swift/libswift_Differentiation.dylib
  Referenced from: /Users/danielzheng/SwiftFusion/.build/x86_64-apple-macosx/debug/Pose3SLAMG2O
  Reason: image not found
[1]    79788 abort      swift run Pose3SLAMG2O -Xcc -I$HOME/Library/tensorflow-2.4.0/usr/include
$ swift test -Xcc -I$HOME/Library/tensorflow-2.4.0/usr/include -Xlinker -L$HOME/Library/tensorflow-2.4.0/usr/lib -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN
...
Everything is already up-to-date
2021-01-08 07:14:48.425 xctest[79757:2116295] The bundle “SwiftFusionPackageTests.xctest” couldn’t be loaded because it is damaged or missing necessary resources. Try reinstalling the bundle.
2021-01-08 07:14:48.425 xctest[79757:2116295] (dlopen_preflight(/Users/danielzheng/SwiftFusion/.build/x86_64-apple-macosx/debug/SwiftFusionPackageTests.xctest/Contents/MacOS/SwiftFusionPackageTests): Library not loaded: /usr/lib/swift/libswift_Differentiation.dylib
  Referenced from: /Users/danielzheng/SwiftFusion/.build/x86_64-apple-macosx/debug/SwiftFusionPackageTests.xctest/Contents/MacOS/SwiftFusionPackageTests
  Reason: image not found)

This enables building SwiftFusion using stock toolchains from swift.org/download.
@ProfFan
Copy link
Collaborator

ProfFan commented Feb 14, 2021

Thank you very much Dan! I just tried compiling this with latest Swift nightly, and this (https://gist.github.com/ProfFan/638f61aff223bfcbea94b2ddb026497a) is what I've got. There is one compiler crash, and a lot of errors related to ElementaryFunction being not exist.

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 14, 2021

I have got past the ElementaryFunctions issue with swift build -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN -Xcc -I/usr/include/tensorflow. Now the problem becomes the non-existence of libx10

swift build -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN -Xcc -I/usr/include/tensorflow
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)
<unknown>:0: error: link command failed with exit code 1 (use -v to see invocation)
[0/16] Linking libTensorFlow.so

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 14, 2021

Actually it's more than this, it appears that somehow _NumericShims is built but not linked b/c SPM ended the compilation prematurely. No idea what is happening.

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 15, 2021

Ok I see the problem. X10 needs to be built separately, but is it able to build x10 with an existing tensorflow install? or is it required to use the TF source? Will these two coexist? @BradLarson Could you help me debug this? Thanks a lot!

@dan-zheng
Copy link
Collaborator Author

Hi Fan,

Did you follow "build instructions" above and install pre-built X10 libraries? I believe they're currently available only for macOS and Windows – not Linux unfortunately.

The instructions for "building libraries depending on tensorflow/swift-apis" comes from this documentation. An alternative to using pre-built X10 libraries is to build them yourself, which should work just fine on Linux using swift.org/download toolchains.

Let me know if you need any help! I'm happy to video call if you'd like.

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 15, 2021

Hi Dan,

Thanks for the instructions! I have checked the building instructions and wonder if x10 can be built with a system-packaged tensorflow with headers? I think this is a very important question, as if x10 can be built separately then there will be a much higher chance that it will survive TF updates.

@dan-zheng
Copy link
Collaborator Author

Thanks for the instructions! I have checked the building instructions and wonder if x10 can be built with a system-packaged tensorflow with headers? I think this is a very important question, as if x10 can be built separately then there will be a much higher chance that it will survive TF updates.

Sure thing! I believe @compnerd can provide a more accurate answer to your question about system-packed TensorFlow and X10. I recall discussing such things before - using a system package manager seems more heavyweight and platform-specific, but maybe it's more robust against breakages as you suggest.

@BradLarson
Copy link

BradLarson commented Feb 16, 2021

@ProfFan - When building a Swift for TensorFlow toolchain from scratch, X10 and TensorFlow are built from a specified TensorFlow version, and you have to manually move that version up to build against a new version of TensorFlow. In the worst case, you can still build these libraries as part of building a stock toolchain + swift-apis from scratch.

I don't know if these steps are documented anywhere, so I'll write down the sequence of commands needed to create a new toolchain based on the stock Swift compiler from scratch:

export TF_NEED_CUDA=1

mkdir swift-source
cd swift-source
git clone https://github.com/apple/swift.git
./swift/utils/update-checkout --clone --skip-repo swift
./swift/utils/build-toolchain buildbot_linux
git clone https://github.com/tensorflow/swift-apis.git

cmake -B BinaryCache -D BUILD_SHARED_LIBS=YES -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/media/nvidia/Data/Development/swift-source/swift-nightly-install/usr -D CMAKE_Swift_COMPILER=/media/nvidia/Data/Development/swift-source/swift-nightly-install/usr/bin/swiftc -D TENSORFLOW_USE_STANDARD_TOOLCHAIN=YES -G Ninja -S ./swift-apis

cmake --build BinaryCache --target install

tar -czf swift-tensorflow-stock-Jetson.tar.gz -C swift-nightly-install/ usr

(you may need to alter a few of the hardcoded paths above, this was a quick copy-paste)

For a Jetson build, you also need to add the following at the beginning to specify CUDA architectures:

export TF_CUDA_COMPUTE_CAPABILITIES=compute_53,compute_62,compute_72

In the process of building this, all headers and binaries are generated for X10 and TensorFlow. I can extract and package these for Ubuntu, based on our 0.13 toolchains. That should contain everything you'd need to build swift-apis as a package, and would serve as long as you didn't need to advance beyond TensorFlow 2.4.0. Would that be useful to have? If so, which Ubuntu configurations would be most useful to focus on?

@BradLarson
Copy link

OK, I tried it out and I think my idea of extracting the binary libraries from the completed toolchains will work. This is a version of the X10 standalone libraries (with TensorFlow headers) that builds on Ubuntu 18.04, CPU-only, with Dan's setup here. You might need to find the right Swift toolchain to use, however, because the zeroTangentVector changes upstream look like they might cause problems here.

If you want me to, I can create X10 snapshots from all of our Ubuntu variants and add them to the Windows and macOS snapshots linked on our development page.

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 17, 2021

@BradLarson Thanks a lot Brad! One last question - is it possible to build X10 with only the TF headers in a vendor install of TF? For example Arch Linux ships TF with full headers as a prebuilt package. In my experiments the X10 cmake seems to be always cloning from GitHub the full source tree.

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 17, 2021

But you are right, since Swift lives in a prefix we can definitely ship the TF libraries with the toolchain (separate from system TF) as well.

@BradLarson
Copy link

BradLarson commented Feb 17, 2021

@ProfFan - I don't believe that libx10 can be built without access to the TensorFlow source, due to its need to compile in elements of XLA. Not entirely sure if the same is true for our eager-mode access, but I believe we build that in, too. Our toolchains exist independently of the system-installed TensorFlow, as does a binary library package like the one I linked above, and don't make use of it if it is available. Our TensorFlow support is pretty much standalone.

@ProfFan
Copy link
Collaborator

ProfFan commented Feb 17, 2021

@BradLarson Thanks for the explanation! That is totally good :)

@BradLarson
Copy link

I've created both CUDA 11 and CPU-only Ubuntu 18.04 X10 packages and linked them here: tensorflow/swift-apis#1182 . I figured those would be the two most popular platforms for people carrying this on in the near term, but can add others if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants