Skip to content

ssea-lab/MMNet

Repository files navigation

MMNet

Pytorch implementation for MMNet: A Multi-Scale Multimodal Model for End-to-End Grouping of Fragmented UI Elements.

Abstract

Graphical User Interface (GUI) designs often result in fragmented elements, leading to inefficient and redundant code when automatically converted. This paper presents MMNet, a novel end-to-end model for grouping these fragmented elements, leveraging multimodal feature representations and advanced retention mechanisms to improve grouping accuracy. MMNet uses UI sequence prediction, enhanced by large multimodal models, and a multi-scale retention mechanism to build a UI encoder. This approach captures temporal dependencies and multi-scale features, improving multimodal representation learning. To address the scarcity of fragmented UI element datasets, we have collected and constructed our dataset, and enhanced the visual information within the dataset using large multimodal models. For the complex context of UI design prototypes, it is challenging for models to learn the connections between different modalities.We have adopted a multi-scale retention mechanism to further refine the relationship modeling between UI elements. Evaluated on our dataset of 71,851 UI elements, MMNet outperformed three state-of-the-art deep learning methods, demonstrating its effectiveness and innovation.

Results

Method ACC F1 Precision Recall
EfficientNet 0.799 0.636 0.637 0.636
SwinTransformer 0.769 0.575 0.550 0.612
EGFE 0.853 0.738 0.735 0.748
MMNet(Ours) 0.890 0.760 0.773 0.757

Requirements

pip install -r requirements.txt

Usage

This is the Pytorch implementation of MMNet. It has been trained and tested on Linux (Ubuntu20 + Cuda 11.6 + Python 3.9 + Pytorch 1.13 + NVIDIA GeForce RTX 3090 GPU), and it can also work on Windows.

Getting Started

git clone https://github.com/ssea-lab/MMNet
cd MMnet

Train Our Model

  • Start to train with

    torchrun --nnodes 1 --nproc_per_node 1  main.py --batch_size 10 --lr 5e-4
    

Test Our Model

  • Start to test with

    torchrun --nnodes 1 --nproc_per_node 1  main.py --evaluate --resume ./work_dir/set-wei-05-0849/checkpoints/latest.pth --batch_size 40
    

Baselines of UI Fragmented Element Classification

EfficientNet

  • Start to train with

    torchrun --nnodes 1 --nproc_per_node 1  efficient_main.py --batch_size 4 --lr 5e-4
    
  • Start to test with

    torchrun --nnodes 1 --nproc_per_node 1  efficient_main.py --evaluate --resume ./work_dir/efficient_net/latest.pth --batch_size 8
    

Swin Transformer

  • Start to train with
torchrun --nnodes 1 --nproc_per_node 1  sw_vit_main.py --batch_size 4 --lr 5e-4
  • Start to test with
torchrun --nnodes 1 --nproc_per_node 1  sw_vit_main.py --evaluate --resume ./work_dir/swin/latest.pth --batch_size 8

ACKNOWNLEDGES

The implementations of EfficientNet, Vision Transformer, and Swin Transformer are based on the following GitHub Repositories. Thank for the works.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •