Pytorch implementation for MMNet: A Multi-Scale Multimodal Model for End-to-End Grouping of Fragmented UI Elements.
Graphical User Interface (GUI) designs often result in fragmented elements, leading to inefficient and redundant code when automatically converted. This paper presents MMNet, a novel end-to-end model for grouping these fragmented elements, leveraging multimodal feature representations and advanced retention mechanisms to improve grouping accuracy. MMNet uses UI sequence prediction, enhanced by large multimodal models, and a multi-scale retention mechanism to build a UI encoder. This approach captures temporal dependencies and multi-scale features, improving multimodal representation learning. To address the scarcity of fragmented UI element datasets, we have collected and constructed our dataset, and enhanced the visual information within the dataset using large multimodal models. For the complex context of UI design prototypes, it is challenging for models to learn the connections between different modalities.We have adopted a multi-scale retention mechanism to further refine the relationship modeling between UI elements. Evaluated on our dataset of 71,851 UI elements, MMNet outperformed three state-of-the-art deep learning methods, demonstrating its effectiveness and innovation.
Method | ACC | F1 | Precision | Recall |
---|---|---|---|---|
EfficientNet | 0.799 | 0.636 | 0.637 | 0.636 |
SwinTransformer | 0.769 | 0.575 | 0.550 | 0.612 |
EGFE | 0.853 | 0.738 | 0.735 | 0.748 |
MMNet(Ours) | 0.890 | 0.760 | 0.773 | 0.757 |
pip install -r requirements.txt
This is the Pytorch implementation of MMNet. It has been trained and tested on Linux (Ubuntu20 + Cuda 11.6 + Python 3.9 + Pytorch 1.13 + NVIDIA GeForce RTX 3090 GPU), and it can also work on Windows.
git clone https://github.com/ssea-lab/MMNet
cd MMnet
-
Start to train with
torchrun --nnodes 1 --nproc_per_node 1 main.py --batch_size 10 --lr 5e-4
-
Start to test with
torchrun --nnodes 1 --nproc_per_node 1 main.py --evaluate --resume ./work_dir/set-wei-05-0849/checkpoints/latest.pth --batch_size 40
-
Start to train with
torchrun --nnodes 1 --nproc_per_node 1 efficient_main.py --batch_size 4 --lr 5e-4
-
Start to test with
torchrun --nnodes 1 --nproc_per_node 1 efficient_main.py --evaluate --resume ./work_dir/efficient_net/latest.pth --batch_size 8
- Start to train with
torchrun --nnodes 1 --nproc_per_node 1 sw_vit_main.py --batch_size 4 --lr 5e-4
- Start to test with
torchrun --nnodes 1 --nproc_per_node 1 sw_vit_main.py --evaluate --resume ./work_dir/swin/latest.pth --batch_size 8
The implementations of EfficientNet, Vision Transformer, and Swin Transformer are based on the following GitHub Repositories. Thank for the works.
- EfficientNet: https://github.com/lukemelas/EfficientNet-PyTorch
- Swin Transformer: https://github.com/microsoft/Swin-Transformer