How to speed up image generation? #2320
Replies: 9 comments 8 replies
-
One thing is to make module installation parallel. parallel --jobs 0 --halt soon,fail=1 'pwsh -Command "Save-Module -Name Az -LiteralPath /usr/share/az_{} -RequiredVersion {} -Force -Verbose"' ::: "${versions[@]}" I'm pretty sure there could be more actions that can be parallelised in this repo, unfortunately it's quite hard to read all the scripts where everything is mixed together with multiple different downloading tools (since that takes most of the time). Another option would be (if you care about anywhere from seconds to minutes, I have not measured performance gain) to not run PowerShell, since, lets be honest here, its start-up and performance is just not great in comparison to bash. I've never used |
Beta Was this translation helpful? Give feedback.
-
Speed up .NET Core installation: #2367 |
Beta Was this translation helpful? Give feedback.
-
What about using docker? Images are much smaller than vms and you can reuse the existing layers/base images. Have you thought about it? |
Beta Was this translation helpful? Give feedback.
-
Has anyone done a performance audit?
|
Beta Was this translation helpful? Give feedback.
-
A bit of a crazy idea, but since we spend a decent amount of time downloading. I had a thought of checking binary hashes for installs against a Ultra/Premium storage disk (Cache) that gets attached to the image packer machine at build. I haven't really put to much engineering thought into it, but Scripts ideally would run with parallelism (if applicable) and prior to downloading they look for a hash signature hit. If found, it runs the install from the mounted Disk. |
Beta Was this translation helpful? Give feedback.
This comment was marked as off-topic.
This comment was marked as off-topic.
-
My guess is that applications that use a simple copy deploy scheme (i.e. download the files and copy them to their install location; no additional install work needed) could easily be installed in parallel. But applications that are installed using package managers (NPM / choco) and/or Windows MSI packages probably can't be installed in parallel, due to the fact that those package managers need to resolve dependencies. Unless a package manager itself offers a parallel install feature (I think Yarn does this), you're pretty much stuck. I do wonder if it would be feasible to separate some of the downloads from their accompanying install phase. Visual Studio seems like a large download. Why not start downloading those files immediately, while the system is busy installing other apps? |
Beta Was this translation helpful? Give feedback.
-
Hi, we regularly see build breaks, it's nothing consistent. (one time it might be an error with a chocolatey install, the next 5 times it works fine, then a Haskell download error.) |
Beta Was this translation helpful? Give feedback.
-
For me, the vast majority of the build time is in the one large shell provisioner that takes ~60 scripts and runs them one at a time. I built a small (~100 line) Python script that takes that list of scripts and runs them in a semi-parallel manner. Firstly, it takes as arguments any script-to-script dependencies (such as install-azure-devops-cli.sh must run after install-azure-cli.sh finishes), then it creates a topological sort of the packages and runs them via a thread pool limited to half the number of cores. I give the packer VM 8 cores and 16GB of RAM, so it's running 4 scripts in parallel, which mainly is about running apt-get in parallel with other operations (like testing). When I run it single-threaded, it takes ~48 minutes, but when I run 4 at a time it takes ~22 minutes on my machine. Not 4x faster, but pretty good! More threads don't speed it up any more, since it seems it's spending most of its critical path on apt operations. If there's interest from the maintainers in it, I'd be happy to create a PR for it |
Beta Was this translation helpful? Give feedback.
-
Hi folks,
We are constantly trying to improve image generation speed, because we provide a lot of pre-installed software, and along with adding a new one, we should try not to significantly increase image generation time. We already implemented some ideas with parallel tools installation and changes in image specs, and are actively looking into other ideas.
Effective image generation time is important for testing changes and for those customers who build image based on our scripts.
This is an open discussion, any ideas are warmly welcomed 🍕
Beta Was this translation helpful? Give feedback.
All reactions