Skip to content

[Improvement-18035][dist] Exclude macOS junk from bin tar.gz for clean Linux extraction#18036

Open
macdoor wants to merge 7 commits intoapache:devfrom
macdoor:fix/dist-macos-tar-linux-extraction
Open

[Improvement-18035][dist] Exclude macOS junk from bin tar.gz for clean Linux extraction#18036
macdoor wants to merge 7 commits intoapache:devfrom
macdoor:fix/dist-macos-tar-linux-extraction

Conversation

@macdoor
Copy link
Contributor

@macdoor macdoor commented Mar 7, 2026

Purpose

When the release tar.gz is built on macOS, the archive can contain ._* files, .DS_Store, and extended attributes, causing warnings and junk files when extracted on Linux. This change updates assembly-plugins.sh so the repack step strips macOS-specific data and produces a Linux-friendly archive.

fixes #18035

Brief change log

  • dolphinscheduler-dist/src/main/assembly/assembly-plugins.sh
    • Before repacking: remove ._* and .DS_Store under the bin directory.
    • On macOS: run xattr -cr on the bin directory; repack with COPYFILE_DISABLE=1 and tar --no-xattrs so the final tar.gz has no extended attributes.

No changes to dolphinscheduler-bin.xml or other assembly descriptors.

Made with Cursor

@macdoor macdoor requested a review from SbloodyS as a code owner March 7, 2026 13:52
@SbloodyS SbloodyS changed the title [Bug-18035][dist] Exclude macOS junk from bin tar.gz for clean Linux extraction [Improvement-18035][dist] Exclude macOS junk from bin tar.gz for clean Linux extraction Mar 7, 2026
@SbloodyS SbloodyS added the improvement make more easy to user or prompt friendly label Mar 7, 2026
@SbloodyS SbloodyS added this to the 3.4.2 milestone Mar 7, 2026
- Add excludes for **/._* and **/.DS_Store in dolphinscheduler-bin.xml fileSets
- Add ._* to .gitignore (AppleDouble files)

Made-with: Cursor
@macdoor macdoor requested a review from SbloodyS March 7, 2026 14:58
Replace find delete + xattr + conditional tar with single tar --exclude='._*' --exclude='.DS_Store' to avoid extra logic in packaging.

Made-with: Cursor
@macdoor macdoor requested a review from SbloodyS March 8, 2026 04:20
@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 8, 2026

@macdoor
Copy link
Contributor Author

macdoor commented Mar 8, 2026

Hi @SbloodyS,

I built the package on macOS with the current change (tar --exclude='._*' --exclude='.DS_Store') and extracted the resulting apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz on Linux. There is still a lot of junk:

tar xvfz apache-dolphinscheduler-dev-SNAPSHOT-bin.tar.gz
._apache-dolphinscheduler-dev-SNAPSHOT-bin
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.provenance'
apache-dolphinscheduler-dev-SNAPSHOT-bin/
apache-dolphinscheduler-dev-SNAPSHOT-bin/._master-server
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.provenance'
apache-dolphinscheduler-dev-SNAPSHOT-bin/master-server/
apache-dolphinscheduler-dev-SNAPSHOT-bin/._ui
...

So --exclude alone does not stop macOS tar from adding ._* (AppleDouble) entries and extended-attribute headers when creating the archive; it only excludes paths when reading the directory. On macOS, the system tar still injects those when packing. To get a clean archive for Linux when building on macOS, we likely need either (1) COPYFILE_DISABLE=1 and tar --no-xattrs (and possibly cleaning ._* before repack), or (2) to document that the release tar.gz must be built on Linux only.

How would you like to proceed?

@SbloodyS
Copy link
Member

SbloodyS commented Mar 9, 2026

I used the current PR compiler in my environment and didn't find the file and warning you mentioned. And I didn't find the corresponding file in the published binary package. Please recheck your environment. @macdoor

@macdoor
Copy link
Contributor Author

macdoor commented Mar 9, 2026

Hi @SbloodyS, I ran the test again with the same result (junk and xattr warnings on Linux). Here is my environment for comparison:

Packaging (build) environment (macOS):

macdoor@macdoorMacBook dolphinscheduler % uname -a
Darwin localhost 25.4.0 Darwin Kernel Version 25.4.0: Wed Feb 25 21:03:15 PST 2026; root:xnu-12377.100.630.501.3~2/RELEASE_ARM64_T6000 arm64
macdoor@macdoorMacBook dolphinscheduler % tar --version
bsdtar 3.5.3 - libarchive 3.7.4 zlib/1.2.12 liblzma/5.4.3 bz2lib/1.0.8

Extraction environment (Linux):

macdoor@ubuntu2510:~$ uname -a
Linux ubuntu2510 6.17.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Fri Jan  9 17:01:16 UTC 2026 x86_64 GNU/Linux
macdoor@ubuntu2510:~$ tar --version
tar (GNU tar) 1.35
Copyright (C) 2023 Free Software Foundation, Inc.
...
Written by John Gilmore and Jay Fenlason.

So the package is created with bsdtar on macOS (ARM64), then extracted with GNU tar on Ubuntu. The ._* entries and LIBARCHIVE.xattr.com.apple.provenance warnings still appear.

@SbloodyS
Copy link
Member

SbloodyS commented Mar 9, 2026

I packed it with m1 pro chip. And Unzipped in ubuntu24.04. I still can't reproduce your problem. This problem can be ignored for the time being.

@macdoor
Copy link
Contributor Author

macdoor commented Mar 9, 2026

Hi @SbloodyS, I'm fine with leaving this as-is for now. We can revisit the macOS build case later if required.

- Add maven-antrun-plugin in root pom: package phase deletes ._* and .DS_Store from each module's target/
- Remove <excludes> from dolphinscheduler-bin.xml fileSets (handled at module level now)

Made-with: Cursor
@macdoor macdoor requested a review from SbloodyS March 9, 2026 06:52
@SbloodyS SbloodyS requested a review from ruanwenjun March 12, 2026 01:36
Comment on lines +800 to +821
<!-- Remove macOS junk from each module's target so assembly picks up clean output -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<version>3.1.0</version>
<executions>
<execution>
<id>clean-macos-junk</id>
<phase>package</phase>
<goals>
<goal>run</goal>
</goals>
<configuration>
<target>
<delete failonerror="false">
<fileset dir="${project.build.directory}" includes="**/._*, **/.DS_Store, ._*, .DS_Store" />
</delete>
</target>
</configuration>
</execution>
</executions>
</plugin>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to add this, we use fileSets to collect the file, we already exclude this

 <exclude>**/.DS_Store</exclude>
 <exclude>**/Thumbs.db</exclude>

Comment on lines +80 to +82
# repack bin tar (exclude macOS junk so extraction on Linux is clean)
BIN_TAR_FILE_NAME=$(basename $BIN_TAR_FILE)
cd $DIST_DIR && tar -zcf $BIN_TAR_FILE_NAME apache-dolphinscheduler-*-bin
cd $DIST_DIR && tar -zcf $BIN_TAR_FILE_NAME --exclude='._*' --exclude='.DS_Store' apache-dolphinscheduler-*-bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is caused by you open the directory by finder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is caused by you open the directory by finder?

Hi @ruanwenjun,

The root cause of this issue is macOS tar (bsdtar) behavior — it automatically embeds ._* (AppleDouble) entries and extended attribute headers into the archive when creating a tarball, even if no ._* files exist on disk. This is a well-known cross-platform issue:

https://superuser.com/questions/61185/why-do-i-get-files-like-foo-in-my-tarball-on-os-x

OS X's tar uses the AppleDouble format to store extended attributes and ACLs. [...] You can tell tar to not include the metadata by setting COPYFILE_DISABLE to some value.

So it's not caused by opening the directory in Finder — it happens whenever macOS bsdtar packs any file that has extended attributes (which macOS adds silently, e.g. com.apple.provenance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend improvement make more easy to user or prompt friendly

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement][dist] macOS-built bin tar.gz leaves junk files and xattr when extracted on Linux

3 participants