Great work! May I ask if you have considered adding the bbox of the element executed by the action on the image to the data set, which is quite useful for the CV based methods. #15

XuRui314 · 2024-03-26T15:16:31Z

XuRui314
Mar 26, 2024

Like the red bbox here:

Yes the bboxes/bbox-*.json are part of the dataset (see weblinx-full on huggingface) which map an element id to coordinates. the target element id can be found in the metadata.json. If you want a tutorial you can check out the modeling/llama/eval.py or the new colab notebook: https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb

xhluca · 2024-03-26T21:34:03Z

Yes the bboxes/bbox-*.json are part of the dataset (see weblinx-full on huggingface) which map an element id to coordinates. the target element id can be found in the metadata.json. If you want a tutorial you can check out the modeling/llama/eval.py or the new colab notebook: https://colab.research.google.com/github/McGill-NLP/weblinx/blob/main/examples/WebLINX_Colab_Notebook.ipynb

2 replies

Btw I've converted this to discussions rather than issues

Thx for sharing :)