webllama.experimental
is the new experimental API for working with webllama models. It will eventually be moved to webllama
directly (once the API is deemed stable).
# Please choose the proper version to ensure you do not break the code
# if there are breaking changes in the future.
# e.g. 0.1.0
pip install webllama=="<version>"
You will need to download test demonstrations if you want to run the subsequent scripts that use existing weblinx demonstrations.
mkdir -p tests/demonstrations
curl -L -o tests/demonstrations/aaabtsd.zip https://github.com/McGill-NLP/weblinx/releases/download/tests-assets/aaabtsd.zip
unzip -u tests/demonstrations/aaabtsd.zip -d tests/demonstrations
curl -L -o tests/demonstrations/aajfwoq.zip https://github.com/McGill-NLP/weblinx/releases/download/tests-assets/aajfwoq.zip
unzip -u tests/demonstrations/aajfwoq.zip -d tests/demonstrations
To install:
pip install webllama
# if you want to install transformers, pytorch and sentence-transformers, run:
pip install webllama[modeling]
First, you will need to construct your own action_history
and state
using webllama.experimental.classes
:
import webllama.experimental as wa
# Create your action history and state!
action_history = [
wa.classes.Action(...), # ...
]
state = wa.classes.State(...)
You will also need to load your dmr
and act_model
models. For example, you can use transformers
and sentence-transformers
to load them:
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, pipeline
# You can choose your own DMR model, and action model
act_model = pipeline(model=action_model_name, device=0, torch_dtype="auto")
dmr = SentenceTransformer(dmr_name, device="cuda")
Now, inside a Python script, you can use the webllama.experimental.processing
to seamlessly use Action
and State
with action model and DMR, and also process the output:
import webllama.experimental as wa
# We will initialize our processor, which helps us prepare the input for action model
proc = wa.processing.WebTurnProcessor(tokenizer=act_model.tokenizer)
# Step 1: prepare query, run DMR and prepare retrieved candidates
query_dmr = proc.prepare_dmr_query(action_history, state)
elems = proc.prepare_dmr_elements(state=state)
scores = wa.functions.compute_dmr_scores(dmr, query_dmr, elems)
top_cands, cands_uids = wa.functions.get_top_dmr_candidates(elems, scores, proc.top_k)
cands_str = proc.prepare_candidates(top_cands)
# Step 2: format candidates, utterances, state, and previous actions
html = proc.prepare_state_html(state.html, cands_uids=cands_uids)
utterances = proc.prepare_instructor_chat(action_history, state)
prev_actions = proc.prepare_prev_actions(action_history, state)
# Let's use the default system prompt template, but you can also use your own
sys_prompt_template: str = proc.default_system_prompt_template
sys_prompt = sys_prompt_template.format(
html=html,
utterances=utterances,
candidates=cands_str,
# ...
)
input_chat = proc.convert_to_chat_list(sys_prompt, prev_actions)
# Use your tokenizer to convert the input to string and pass it to the action model
input_str = act_model.tokenizer.apply_chat_template(input_chat, tokenize=False)
output = act_model(input_str, ...)
pred_action = proc.process_action_model_output(output, state.index, elems)
a = wa.classes.Action.from_dict(pred_action)
Here's a full, self-contained example of how to use webllama.experimental
to interact with a web page using a DMR model and an action model:
from functools import partial
import time
import logging
from sentence_transformers import SentenceTransformer
from transformers import AutoTokenizer, pipeline
import weblinx as wl
import webllama.experimental as wa
logging.getLogger("urllib3").setLevel(logging.WARNING)
# Prerequisite: We need a `wa.classes.State` object and a list of `wa.classes.Action` objects
# To get that, we will use an example from weblinx, but it's easy to do manually (see below).
demos = wl.list_demonstrations("tests/demonstrations")
replay = wl.Replay.from_demonstration(demos[0])
turn = replay[26]
format_intent_am = partial(
wa.formatting.build_formatters_action_model(), return_as=dict
)
format_intent_input_dmr, format_intent_out_dmr = wa.formatting.build_formatters_dmr()
action_history = wa.functions.create_action_history_from_replay(
replay, format_intent_input_dmr, format_intent_am, stop_at_idx=turn.index
)
state = wa.classes.State(
index=turn.index,
html=turn.html,
bboxes=turn.bboxes,
viewport_height=turn.viewport_height,
viewport_width=turn.viewport_width,
type=turn.type,
)
# Now, we can start!
# First, load the DMR model we will use to select candidate elements
dmr_name = "McGill-NLP/MiniLM-L6-dmr"
action_model_name = "McGill-NLP/Sheared-LLaMA-2.7B-weblinx"
tokenizer_chat_name = "McGill-NLP/Llama-2-7b-chat-weblinx"
tokenizer_chat = AutoTokenizer.from_pretrained(tokenizer_chat_name)
act_model = pipeline(model=action_model_name, device=0, torch_dtype="auto")
dmr = SentenceTransformer(dmr_name, device="cuda")
# We will initialize our processor, which helps us prepare the input for action model
proc = wa.processing.WebTurnProcessor(tokenizer=act_model.tokenizer, start_time=time.time())
# Step 1: prepare query, run DMR and prepare retrieved candidates
query_dmr = proc.prepare_dmr_query(action_history, state)
elems = proc.prepare_dmr_elements(state=state)
scores = wa.functions.compute_dmr_scores(dmr, query_dmr, elems)
top_cands, cands_uids = wa.functions.get_top_dmr_candidates(elems, scores, proc.top_k)
cands_str = proc.prepare_candidates(top_cands)
# Step 2: format candidates, utterances, state, and previous actions
html = proc.prepare_state_html(state.html, cands_uids=cands_uids)
utterances = proc.prepare_instructor_chat(action_history, state)
prev_actions = proc.prepare_prev_actions(action_history, state)
# Let's use the default system prompt template, but you can also use your own
sys_prompt_template: str = proc.default_system_prompt_template
sys_prompt = sys_prompt_template.format(
html=html,
num_utterances=proc.num_utterances - 1,
utterances=utterances,
height=state.viewport_height,
width=state.viewport_width,
num_prev_actions=proc.num_prev_actions,
candidates=cands_str,
)
input_chat = proc.convert_to_chat_list(sys_prompt, prev_actions)
# We can now use the tokenizer's apply_chat_template method to convert it to a format
# that can be used by the action model
input_str = tokenizer_chat.apply_chat_template(input_chat, tokenize=False)
# Let's now pass our input to the action model
output = act_model(
input_str,
max_new_tokens=256,
return_full_text=False,
batch_size=1,
pad_token_id=tokenizer.eos_token_id,
)
pred_action = proc.process_action_model_output(
output=output, index=state.index, elems=elems
)
# optional: For certain platforms you may need to postprocess the action
pred_action = wa.integrations.browsergym.postprocess_for_browsergym(pred_action)
print(pred_action)
# You can now convert this an Action object and add it to the action history
a = wa.classes.Action.from_dict(pred_action)
action_history.append(a)
To run the tests:
python -m unittest discover -s tests
To launch the default server:
# If you do not want to save logs, omit `--save_logs`
python -m webllama.experimental.web.server --save_logs
To create your own server, simply inherit:
from webllama.experimental.web.server import Server
from ..classes import Action, State
# Assuming the classes Action, State, and other necessary imports are already defined
# as provided in your initial setup.
# Initialize logging
logging.basicConfig(level=logging.INFO)
class Server(Server):
# override initialize and run
def initialize(self, dmr_name, action_model_name, device, dmr_device, am_device, torch_dtype):
# initialize your model here
def run(self, action_history_json, state_json):
# ...
pred_action = {
# ...
}
return json.dumps(pred_action)
To connect to the server via SSH, you can use the following command:
ssh -N -L 8450:localhost:8450 user@server
# Example:
ssh -N -L 8450:localhost:8450 nlp-gpu-2
You can directly send http request to the web server, or use the client.
Example of HTTP request in python:
from functools import partial
import http.client
import json
from functools import partial
import webllama.experimental as wa
import weblinx as wl
# Prerequisite: We need a `wa.classes.State` object and a list of `wa.classes.Action` objects
demos = wl.list_demonstrations("tests/demonstrations")
replay = wl.Replay.from_demonstration(demos[0])
turn = replay[26]
format_intent_am = partial(
wa.formatting.build_formatters_action_model(), return_as=dict
)
format_intent_input_dmr, format_intent_out_dmr = wa.formatting.build_formatters_dmr()
action_history = wa.functions.create_action_history_from_replay(
replay, format_intent_input_dmr, format_intent_am, stop_at_idx=turn.index
)
state = wa.classes.State(
index=turn.index,
html=turn.html,
bboxes=turn.bboxes,
viewport_height=turn.viewport_height,
viewport_width=turn.viewport_width,
type=turn.type,
)
# Create a connection to the localhost on the port where your server is running
conn = http.client.HTTPConnection('localhost', 8450)
# Prepare the POST request data
post_data = json.dumps({
'action_history': action_history_dict,
'state': state_dict
})
headers = {'Content-Type': 'application/json'}
# Send a POST request with JSON data
conn.request("POST", "/", body=post_data, headers=headers)
response = conn.getresponse()
print(f"Status: {response.status}")
print(f"Reason: {response.reason}")
print(f"Body: {response.read().decode()}")
response.close()
# Close the connection
conn.close()
A high level client is provided in webllama.experimental.web.client
. You can use it as follows:
from functools import partial
import webllama.experimental as wa
import weblinx as wl
# Prerequisite: We need a `wa.classes.State` object and a list of `wa.classes.Action` objects
demos = wl.list_demonstrations("tests/demonstrations")
replay = wl.Replay.from_demonstration(demos[0])
turn = replay[26]
format_intent_am = partial(
wa.formatting.build_formatters_action_model(), return_as=dict
)
format_intent_input_dmr, format_intent_out_dmr = wa.formatting.build_formatters_dmr()
action_history = wa.functions.create_action_history_from_replay(
replay, format_intent_input_dmr, format_intent_am, stop_at_idx=turn.index
)
state = wa.classes.State(
index=turn.index,
html=turn.html,
bboxes=turn.bboxes,
viewport_height=turn.viewport_height,
viewport_width=turn.viewport_width,
type=turn.type,
)
# Now, we can start!
pred_action = wa.web.client.get_prediction(
action_history, state, address="localhost", port=8450, max_new_tokens=128
)
pred_action = wa.integrations.browsergym.postprocess_for_browsergym(pred_action)
print(pred_action)
a = wa.classes.Action.from_dict(pred_action)
print(a)
Note: This section is a work in progress.
utterance_instructor = wa.classes.Action(
type="chat",
intent="say",
index=2,
args=dict(
speaker="instructor", utterance="Open independent ie Website.", x=None, y=None
),
timestamp=13.234,
tag=None,
attrs=None,
)
To be added.
To be added.
To be added.
To be added.
To be added.
To be added.
For more information on contributing, please check out the contributing docs.