[Improve][CI] Avoid repeated environment downloads in loader-ci

Observed while rerunning `loader-ci` during PR #716 review.

## Problem

The `Prepare env and service` step in `loader-ci` appears to spend a large amount of time repeatedly downloading or rebuilding external dependencies on each run, even when the versions do not change.

From the current workflow:

- `.github/workflows/loader-ci.yml` only caches `~/.m2`
- `hugegraph-loader/assembly/travis/install-hadoop.sh` always downloads `hadoop-2.8.5.tar.gz` from `archive.apache.org`
- `hugegraph-loader/assembly/travis/install-mysql.sh` always runs `docker pull mysql:5.7`
- `hugegraph-loader/assembly/travis/install-hugegraph-from-source.sh` always clones `apache/hugegraph` and rebuilds the server package from source

The screenshot from the failing/re-run workflow shows `Prepare env and service` taking about 19 minutes, with a large Hadoop tarball download dominating the step.

```text
loader-ci
└─ Prepare env and service
   ├─ install-hadoop.sh
   │  └─ wget hadoop-2.8.5.tar.gz  (large tarball, repeated)
   ├─ install-mysql.sh
   │  └─ docker pull mysql:5.7     (repeated image pull)
   └─ install-hugegraph-from-source.sh
      └─ git clone + mvn package   (repeated source build)
```

## Why this matters

- CI duration is much longer than necessary
- CI becomes more fragile because it depends on multiple external downloads during the test phase
- Re-runs are expensive even when the code change is unrelated to loader integration environments
- Current cache coverage likely does not match the real bottlenecks

## Suggested directions

### Prefer official artifacts / containers over ad-hoc install scripts

- Replace the MySQL setup script with a GitHub Actions `services` container or another pinned official image
- Replace the Hadoop local install script with a pinned container/image or other official prebuilt artifact if possible
- For HugeGraph server, prefer a reusable prebuilt tarball/artifact for the pinned commit/version instead of cloning and packaging from source on every CI run

### If scripts must remain, make them cache-aware and idempotent

- Add cache coverage for downloaded tarballs or extracted runtime directories if we still use script-based setup
- Skip `wget` / `docker pull` / clone+build when the required artifact is already available
- Make the scripts check for existing files/directories before re-downloading or rebuilding
- Verify whether GitHub Actions cache is currently missing the relevant paths, or whether restore keys are ineffective for this use case

## Possible scope

- `.github/workflows/loader-ci.yml`
- `hugegraph-loader/assembly/travis/install-hadoop.sh`
- `hugegraph-loader/assembly/travis/install-mysql.sh`
- `hugegraph-loader/assembly/travis/install-hugegraph-from-source.sh`

## Expected outcome

- Repeated `loader-ci` runs should not re-download the same Hadoop tarball every time
- MySQL setup should rely on a reusable/pinned container path rather than always pulling inside the script
- HugeGraph server setup should reuse a stable artifact or cacheable output where possible
- `Prepare env and service` time should drop significantly and become more stable


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improve][CI] Avoid repeated environment downloads in loader-ci #723

Problem

Why this matters

Suggested directions

Prefer official artifacts / containers over ad-hoc install scripts

If scripts must remain, make them cache-aware and idempotent

Possible scope

Expected outcome

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Improve][CI] Avoid repeated environment downloads in loader-ci #723

Description

Problem

Why this matters

Suggested directions

Prefer official artifacts / containers over ad-hoc install scripts

If scripts must remain, make them cache-aware and idempotent

Possible scope

Expected outcome

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions