Skip to content

Commit 46464c4

Browse files
authored
Merge pull request #3728 from vladmandic/dev
merge dev to master
2 parents 586ef9a + b2df5e4 commit 46464c4

File tree

99 files changed

+5035
-2105
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

99 files changed

+5035
-2105
lines changed

.github/workflows/build_readme.yaml

-2
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,6 @@ name: update-readme
22

33
on:
44
workflow_dispatch:
5-
schedule:
6-
- cron: '0 */4 * * *'
75

86
jobs:
97
deploy:

CHANGELOG.md

+100-4
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,111 @@
11
# Change Log for SD.Next
22

3-
## Update for 2025-01-16
3+
## Highlights for 2025-01-29
44

5-
- **Gallery**:
6-
- add http fallback for slow/unreliable links
7-
- **Fixes**:
5+
Two weeks since last release, time for update!
6+
7+
*What's New?*
8+
- New **Detailer** functionality including ability to use several new
9+
face-restore models: *RestoreFormer, CodeFormer, GFPGan, GPEN-BFR*
10+
- Support for new models/pipelines:
11+
face-swapper with **Photomaker-v2** and video with **Fast-Hunyuan**
12+
- Support for several new optimizations and accelerations:
13+
Many **IPEX** improvements, native *torch fp8* support,
14+
support for **PAB:Pyramid-attention-broadcast**, **ParaAttention** and **PerFlow**
15+
- Fully built-in both model **merge weights** as well as model **merge component**
16+
Finally replace that pesky VAE in your favorite model with a fixed one!
17+
- Improved remote access control and reliability as well as running inside containers
18+
- And of course, hotfixes for all reported issues...
19+
20+
## Details for 2025-01-28
21+
22+
- **Contributing**:
23+
- if you'd like to contribute, please see updated [contributing](https://github.com/vladmandic/automatic/blob/dev/CONTRIBUTING) guidelines
24+
- **Model Merge**
25+
- replace model components and merge LoRAs
26+
in addition to existing model weights merge support
27+
now also having ability to replace model components and merge LoRAs
28+
you can also test merges in-memory without needing to save to disk at all
29+
and you can also use it to convert diffusers to safetensors if you want
30+
*example*: replace vae in your favorite model with a fixed one? replace text encoder? etc.
31+
*note*: limited to sdxl for now, additional models can be added depending on popularity
32+
- **Detailer**:
33+
- in addition as standard behavior of detect & run-generate, it can now also run face-restore models
34+
- included models are: *CodeFormer, RestoreFormer, GFPGan, GPEN-BFR*
35+
- **Face**:
36+
- new [PhotoMaker v2](https://huggingface.co/TencentARC/PhotoMaker-V2) and reimplemented [PhotoMaker v1](https://huggingface.co/TencentARC/PhotoMaker)
37+
compatible with sdxl models, generates pretty good results and its faster than most other methods
38+
select under *scripts -> face -> photomaker*
39+
- new [ReSwapper](https://github.com/somanchiu/ReSwapper)
40+
todo: experimental-only and unfinished, only noting in changelog for future reference
41+
- **Video**
42+
- **hunyuan video** support for [FastHunyuan](https://huggingface.co/FastVideo/FastHunyuan)
43+
simply select model variant and set appropriate parameters
44+
recommended: sampler-shift=17, steps=6, resolution=720x1280, frames=125, guidance>6.0
45+
- [PAB: Pyramid Attention Broadcast](https://oahzxl.github.io/PAB/)
46+
- speed up generation by caching attention results between steps
47+
- enable in *settings -> pipeline modifiers -> pab*
48+
- adjust settings as needed: wider timestep range means more acceleration, but higher accuracy drop
49+
- compatible with most `transformer` based models: e.g. flux.1, hunyuan-video, lyx-video, mochi, etc.
50+
- [ParaAttention](https://github.com/chengzeyi/ParaAttention)
51+
- first-block caching that can significantly speed up generation by dynamically reusing partial outputs between steps
52+
- available for: flux, hunyuan-video, ltx-video, mochi
53+
- enable in *settings -> pipeline modifiers -> para-attention*
54+
- adjust residual diff threshold to balance the speedup and the accuracy:
55+
higher values leads to more cache hits and speedups, but might also lead to a higher accuracy drop
56+
- **IPEX**
57+
- enable force attention slicing, fp64 emulation, jit cache
58+
- use the us server by default on linux
59+
- use pytorch test branch on windows
60+
- extend the supported python versions
61+
- improve sdpa dynamic attention
62+
- **Torch FP8**
63+
- uses torch `float8_e4m3fn` or `float8_e5m2` as data storage and performs dynamic upcasting to compute `dtype` as needed
64+
- compatible with most `unet` and `transformer` based models: e.g. *sd15, sdxl, sd35, flux.1, hunyuan-video, ltx-video, etc.*
65+
this is alternative to `bnb`/`quanto`/`torchao` quantization on models/platforms/gpus where those libraries are not available
66+
- enable in *settings -> quantization -> layerwise casting*
67+
- [PerFlow](https://github.com/magic-research/piecewise-rectified-flow)
68+
- piecewise rectified flow as model acceleration
69+
- use `perflow` scheduler combined with one of the available pre-trained [models](https://huggingface.co/hansyan)
70+
- **Other**:
71+
- **upscale**: new [asymmetric vae](Heasterian/AsymmetricAutoencoderKLUpscaler) upscaling method
72+
- **gallery**: add http fallback for slow/unreliable links
73+
- **splash**: add legacy mode indicator on splash screen
74+
- **network**: extract thumbnail from model metadata if present
75+
- **network**: setting value to disable use of reference models
76+
- **Refactor**:
77+
- **upscale**: code refactor to unify latent, resize and model based upscalers
78+
- **loader**: ability to run in-memory models
79+
- **schedulers**: ability to create model-less schedulers
80+
- **quantization**: code refactor into dedicated module
81+
- **dynamic attention sdpa**: more correct implementation and new trigger rate control
82+
- **Remote access**:
83+
- perform auth check on ui startup
84+
- unified standard and modern-ui authentication method & cleanup auth logging
85+
- detect & report local/external/public ip addresses if using `listen` mode
86+
- detect *docker* enforced limits instead of system limits if running in a container
87+
- warn if using public interface without authentication
88+
- **Fixes**:
889
- non-full vae decode
990
- send-to image transfer
1091
- sana vae tiling
1192
- increase gallery timeouts
1293
- update ui element ids
94+
- modernui use local font
95+
- unique font family registration
96+
- mochi video number of frames
97+
- mark large models that should offload
98+
- avoid repeated optimum-quanto installation
99+
- avoid reinstalling bnb if not cuda
100+
- image metadata civitai compatibility
101+
- xyz grid handle invalid values
102+
- omnigen pipeline handle float seeds
103+
- correct logging of docker status on logs, thanks @kmscode
104+
- fix omnigen
105+
- fix docker status reporting
106+
- vlm/vqa with moondream2
107+
- rocm do not override triton installation
108+
- port streaming model load to diffusers
13109

14110
## Update for 2025-01-15
15111

CONTRIBUTING

+16-9
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,24 @@ Pull requests from everyone are welcome
44

55
Procedure for contributing:
66

7+
- Select SD.Next `dev` branch:
8+
<https://github.com/vladmandic/automatic/tree/dev>
79
- Create a fork of the repository on github
8-
In a top right corner of a GitHub, select "Fork"
9-
Its recommended to fork latest version from main branch to avoid any possible conflicting code updates
10+
In a top right corner of a GitHub, select "Fork"
11+
Its recommended to fork latest version from main branch to avoid any possible conflicting code updates
1012
- Clone your forked repository to your local system
11-
`git clone https://github.com/<your-username>/<your-fork>
13+
`git clone https://github.com/<your-username>/<your-fork>`
1214
- Make your changes
13-
- Test your changes
14-
- Test your changes against code guidelines
15-
- `ruff check`
16-
- `pylint <folder>/<filename>.py`
15+
- Test your changes
16+
- Lint your changes against code guidelines
17+
- `ruff check`
18+
- `pylint <folder>/<filename>.py`
1719
- Push changes to your fork
18-
- Submit a PR (pull request)
20+
- Submit a PR (pull request)
21+
- Make sure that PR is against `dev` branch
22+
- Update your fork before createing PR so that it is based on latest code
23+
- Make sure that PR does NOT include any unrelated edits
24+
- Make sure that PR does not include changes to submodules
1925

20-
Your pull request will be reviewed and pending review results, merged into main branch
26+
Your pull request will be reviewed and pending review results, merged into `dev` branch
27+
Dev merges to main are performed regularly and any PRs that are merged to `dev` will be included in the next main release

README.md

+7-8
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
- [Documentation](https://vladmandic.github.io/sdnext-docs/)
1919
- [SD.Next Features](#sdnext-features)
20-
- [Model support](#model-support) and [Specifications]()
20+
- [Model support](#model-support)
2121
- [Platform support](#platform-support)
2222
- [Getting started](#getting-started)
2323

@@ -32,7 +32,7 @@ All individual features are not listed here, instead check [ChangeLog](CHANGELOG
3232
**Windows | Linux | MacOS | nVidia | AMD | IntelArc/IPEX | DirectML | OpenVINO | ONNX+Olive | ZLUDA**
3333
- Platform specific autodetection and tuning performed on install
3434
- Optimized processing with latest `torch` developments with built-in support for model compile, quantize and compress
35-
Compile backends: *Triton | StableFast | DeepCache | OneDiff*
35+
Compile backends: *Triton | StableFast | DeepCache | OneDiff | TeaCache | etc.*
3636
Quantization and compression methods: *BitsAndBytes | TorchAO | Optimum-Quanto | NNCF*
3737
- Built-in queue management
3838
- Built in installer with automatic updates and dependency management
@@ -82,6 +82,11 @@ SD.Next supports broad range of models: [supported models](https://vladmandic.gi
8282
> [!WARNING]
8383
> If you run into issues, check out [troubleshooting](https://vladmandic.github.io/sdnext-docs/Troubleshooting/) and [debugging](https://vladmandic.github.io/sdnext-docs/Debug/) guides
8484
85+
### Contributing
86+
87+
Please see [Contributing](CONTRIBUTING) for details on how to contribute to this project
88+
And for any question, reach out on [Discord](https://discord.gg/VjvR2tabEX) or open an [issue](https://github.com/vladmandic/automatic/issues) or [discussion](https://github.com/vladmandic/automatic/discussions)
89+
8590
### Credits
8691

8792
- Main credit goes to [Automatic1111 WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) for the original codebase
@@ -104,10 +109,4 @@ SD.Next supports broad range of models: [supported models](https://vladmandic.gi
104109
If you're unsure how to use a feature, best place to start is [Docs](https://vladmandic.github.io/sdnext-docs/) and if its not there,
105110
check [ChangeLog](https://vladmandic.github.io/sdnext-docs/CHANGELOG/) for when feature was first introduced as it will always have a short note on how to use it
106111

107-
### Sponsors
108-
109-
<div align="center">
110-
<!-- sponsors --><a href="https://github.com/allangrant"><img src="https://github.com/allangrant.png" width="60px" alt="Allan Grant" /></a><a href="https://github.com/mantzaris"><img src="https://github.com/mantzaris.png" width="60px" alt="a.v.mantzaris" /></a><a href="https://github.com/CurseWave"><img src="https://github.com/CurseWave.png" width="60px" alt="" /></a><a href="https://github.com/smlbiobot"><img src="https://github.com/smlbiobot.png" width="60px" alt="SML (See-ming Lee)" /></a><!-- sponsors -->
111-
</div>
112-
113112
<br>

installer.py

+45-11
Original file line numberDiff line numberDiff line change
@@ -447,7 +447,7 @@ def get_platform():
447447
'system': platform.system(),
448448
'release': release,
449449
'python': platform.python_version(),
450-
'docker': os.environ.get('SD_INSTALL_DEBUG', None) is not None,
450+
'docker': os.environ.get('SD_DOCKER', None) is not None,
451451
# 'host': platform.node(),
452452
# 'version': platform.version(),
453453
}
@@ -492,7 +492,7 @@ def check_diffusers():
492492
t_start = time.time()
493493
if args.skip_all or args.skip_git:
494494
return
495-
sha = 'b785ddb654e4be3ae0066e231734754bdb2a191c' # diffusers commit hash
495+
sha = '7b100ce589b917d4c116c9e61a6ec46d4f2ab062' # diffusers commit hash
496496
pkg = pkg_resources.working_set.by_key.get('diffusers', None)
497497
minor = int(pkg.version.split('.')[1] if pkg is not None else 0)
498498
cur = opts.get('diffusers_version', '') if minor > 0 else ''
@@ -625,6 +625,9 @@ def install_rocm_zluda():
625625
else:
626626
torch_command = os.environ.get('TORCH_COMMAND', f'torch torchvision --index-url https://download.pytorch.org/whl/rocm{rocm.version}')
627627

628+
if os.environ.get('TRITON_COMMAND', None) is None:
629+
os.environ.setdefault('TRITON_COMMAND', 'skip') # pytorch auto installs pytorch-triton-rocm as a dependency instead
630+
628631
if sys.version_info < (3, 11):
629632
ort_version = os.environ.get('ONNXRUNTIME_VERSION', None)
630633
if rocm.version is None or float(rocm.version) > 6.0:
@@ -659,22 +662,39 @@ def install_rocm_zluda():
659662

660663
def install_ipex(torch_command):
661664
t_start = time.time()
662-
check_python(supported_minors=[10,11], reason='IPEX backend requires Python 3.10 or 3.11')
665+
# Python 3.12 will cause compatibility issues with other dependencies
666+
# IPEX supports Python 3.12 so don't block it but don't advertise it in the error message
667+
check_python(supported_minors=[9, 10, 11, 12], reason='IPEX backend requires Python 3.9, 3.10 or 3.11')
663668
args.use_ipex = True # pylint: disable=attribute-defined-outside-init
664669
log.info('IPEX: Intel OneAPI toolkit detected')
670+
665671
if os.environ.get("NEOReadDebugKeys", None) is None:
666672
os.environ.setdefault('NEOReadDebugKeys', '1')
667673
if os.environ.get("ClDeviceGlobalMemSizeAvailablePercent", None) is None:
668674
os.environ.setdefault('ClDeviceGlobalMemSizeAvailablePercent', '100')
675+
if os.environ.get("SYCL_CACHE_PERSISTENT", None) is None:
676+
os.environ.setdefault('SYCL_CACHE_PERSISTENT', '1') # Jit cache
677+
669678
if os.environ.get("PYTORCH_ENABLE_XPU_FALLBACK", None) is None:
670-
os.environ.setdefault('PYTORCH_ENABLE_XPU_FALLBACK', '1')
679+
os.environ.setdefault('PYTORCH_ENABLE_XPU_FALLBACK', '1') # CPU fallback for unsupported ops
680+
if os.environ.get("OverrideDefaultFP64Settings", None) is None:
681+
os.environ.setdefault('OverrideDefaultFP64Settings', '1')
682+
if os.environ.get("IGC_EnableDPEmulation", None) is None:
683+
os.environ.setdefault('IGC_EnableDPEmulation', '1') # FP64 Emulation
684+
if os.environ.get('IPEX_FORCE_ATTENTION_SLICE', None) is None:
685+
# XPU PyTorch doesn't support Flash Atten or Memory Atten yet so Battlemage goes OOM without this
686+
os.environ.setdefault('IPEX_FORCE_ATTENTION_SLICE', '1')
687+
671688
if "linux" in sys.platform:
672-
torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/')
673-
# torch_command = os.environ.get('TORCH_COMMAND', 'torch torchvision --index-url https://download.pytorch.org/whl/test/xpu') # test wheels are stable previews, significantly slower than IPEX
674-
# os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow==2.15.1 intel-extension-for-tensorflow[xpu]==2.15.0.1')
689+
# default to US server. If The China server is needed, change .../release-whl/stable/xpu/us/ to .../release-whl/stable/xpu/cn/
690+
torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.5.1+cxx11.abi torchvision==0.20.1+cxx11.abi intel-extension-for-pytorch==2.5.10+xpu oneccl_bind_pt==2.5.0+xpu --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/')
691+
if os.environ.get('TRITON_COMMAND', None) is None:
692+
os.environ.setdefault('TRITON_COMMAND', '--pre pytorch-triton-xpu==3.1.0+91b14bf559 --index-url https://download.pytorch.org/whl/nightly/xpu')
693+
# os.environ.setdefault('TENSORFLOW_PACKAGE', 'tensorflow==2.15.1 intel-extension-for-tensorflow[xpu]==2.15.0.2')
675694
else:
676-
torch_command = os.environ.get('TORCH_COMMAND', '--pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/xpu') # torchvision doesn't exist on test/stable branch for windows
677-
install(os.environ.get('OPENVINO_PACKAGE', 'openvino==2024.5.0'), 'openvino', ignore=True)
695+
torch_command = os.environ.get('TORCH_COMMAND', 'torch==2.6.0+xpu torchvision==0.21.0+xpu --index-url https://download.pytorch.org/whl/test/xpu')
696+
697+
install(os.environ.get('OPENVINO_PACKAGE', 'openvino==2024.6.0'), 'openvino', ignore=True)
678698
install('nncf==2.7.0', ignore=True, no_deps=True) # requires older pandas
679699
install(os.environ.get('ONNXRUNTIME_PACKAGE', 'onnxruntime-openvino'), 'onnxruntime-openvino', ignore=True)
680700
ts('ipex', t_start)
@@ -683,6 +703,8 @@ def install_ipex(torch_command):
683703

684704
def install_openvino(torch_command):
685705
t_start = time.time()
706+
# Python 3.12 will cause compatibility issues with other dependencies.
707+
# OpenVINO supports Python 3.12 so don't block it but don't advertise it in the error message
686708
check_python(supported_minors=[9, 10, 11, 12], reason='OpenVINO backend requires Python 3.9, 3.10 or 3.11')
687709
log.info('OpenVINO: selected')
688710
if sys.platform == 'darwin':
@@ -726,11 +748,22 @@ def install_torch_addons():
726748
install('optimum-quanto==0.2.6', 'optimum-quanto')
727749
if not args.experimental:
728750
uninstall('wandb', quiet=True)
729-
if triton_command is not None:
751+
if triton_command is not None and triton_command != 'skip':
730752
install(triton_command, 'triton', quiet=True)
731753
ts('addons', t_start)
732754

733755

756+
# check cudnn
757+
def check_cudnn():
758+
import site
759+
site_packages = site.getsitepackages()
760+
cuda_path = os.environ.get('CUDA_PATH', '')
761+
for site_package in site_packages:
762+
folder = os.path.join(site_package, 'nvidia', 'cudnn', 'lib')
763+
if os.path.exists(folder) and folder not in cuda_path:
764+
os.environ['CUDA_PATH'] = f"{cuda_path}:{folder}"
765+
766+
734767
# check torch version
735768
def check_torch():
736769
t_start = time.time()
@@ -842,6 +875,7 @@ def check_torch():
842875
return
843876
if not args.skip_all:
844877
install_torch_addons()
878+
check_cudnn()
845879
if args.profile:
846880
pr.disable()
847881
print_profile(pr, 'Torch')
@@ -1056,7 +1090,7 @@ def install_optional():
10561090
install('gfpgan')
10571091
install('clean-fid')
10581092
install('pillow-jxl-plugin==1.3.1', ignore=True)
1059-
install('optimum-quanto=0.2.6', ignore=True)
1093+
install('optimum-quanto==0.2.6', ignore=True)
10601094
install('bitsandbytes==0.45.0', ignore=True)
10611095
install('pynvml', ignore=True)
10621096
install('ultralytics==8.3.40', ignore=True)

javascript/base.css

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSans'), url('notosans-nerdfont-regular.ttf') }
1+
@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSansNerd'), url('notosans-nerdfont-regular.ttf') }
22

33
/* toolbutton */
44
.gradio-button.tool { max-width: min-content; min-width: min-content !important; align-self: end; font-size: 1.4em; color: var(--body-text-color) !important; }
@@ -77,7 +77,7 @@ table.settings-value-table td { padding: 0.4em; border: 1px solid #ccc; max-widt
7777
#extensions .info { margin: 0; }
7878
#extensions .date { opacity: 0.85; font-size: 90%; }
7979

80-
/* extra networks */
80+
/* networks */
8181
.extra-networks > div { margin: 0; border-bottom: none !important; }
8282
.extra-networks .second-line { display: flex; width: -moz-available; width: -webkit-fill-available; gap: 0.3em; box-shadow: var(--input-shadow); margin-bottom: 2px; }
8383
.extra-networks .search { flex: 1; }

javascript/black-gray.css

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/* generic html tags */
2-
@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSans'), url('notosans-nerdfont-regular.ttf') }
2+
@font-face { font-family: 'NotoSans'; font-display: swap; font-style: normal; font-weight: 100; src: local('NotoSansNerd'), url('notosans-nerdfont-regular.ttf') }
33
:root, .light, .dark {
44
--font: 'NotoSans';
55
--font-mono: 'ui-monospace', 'Consolas', monospace;

0 commit comments

Comments
 (0)