Create a Vimium fork to support multimodal models #5

ishan0102 · 2023-11-09T05:28:56Z

It might make sense to create a fork of Vimium designed specifically for making it easier for multimodal LLMs to choose relevant elements on a page. This might involve messing around with annotation colors, sizes, fonts, etc.

philc · 2023-11-09T22:51:26Z

Vimium author here. I have no opinion about whether to fork. I just heard about this project today and wanted to say, this is cool! Good luck!

ishan0102 · 2023-11-10T02:48:51Z

@philc Wow thank you so much, means a lot coming from you! I love your work!

asim-shrestha · 2023-11-11T20:54:15Z

We just open sourced a utility library that can tagify web pages for you: https://github.com/reworkd/tarsier

Could be a drop in replacement for vimium. We have plans to be able to customize tag appearance / positioning if that's interesting

aincube · 2023-11-13T09:45:43Z

Just an idea, but maybe possible to use qutebrowser :

written in Python (using QtWebEngine)
has vim-mode par default
has userscripts - really neat feature to automate things

ishan0102 added the enhancement New feature or request label Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a Vimium fork to support multimodal models #5

Create a Vimium fork to support multimodal models #5

ishan0102 commented Nov 9, 2023

philc commented Nov 9, 2023

ishan0102 commented Nov 10, 2023 •

edited

Loading

asim-shrestha commented Nov 11, 2023

aincube commented Nov 13, 2023

Create a Vimium fork to support multimodal models #5

Create a Vimium fork to support multimodal models #5

Comments

ishan0102 commented Nov 9, 2023

philc commented Nov 9, 2023

ishan0102 commented Nov 10, 2023 • edited Loading

asim-shrestha commented Nov 11, 2023

aincube commented Nov 13, 2023

ishan0102 commented Nov 10, 2023 •

edited

Loading