-
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Select among photo or screenshot caption
- Loading branch information
Showing
6 changed files
with
138 additions
and
22 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import ast | ||
import json | ||
from typing import AnyStr | ||
|
||
from pydantic import BaseModel, Field | ||
|
||
|
||
class ScreenshotResult(BaseModel): | ||
"""Information about an image. This class provides a schema for storing and validating image-related information | ||
using Pydantic's data validation features. | ||
""" | ||
|
||
open_applications: list[str] = Field(description="List of open applications") | ||
docs_description: list[str] = Field(description="List of document descriptions") | ||
web_pages: str = Field(description="Description of visible web pages") | ||
user_response: str = Field(description="A response to the user question") | ||
|
||
@staticmethod | ||
def of(image_caption: AnyStr) -> "ScreenshotResult": | ||
"""Parses a string into an ScreenshotResult instance with enhanced handling for mixed quotes. | ||
:param image_caption: The string to parse. | ||
:return: An instance of ScreenshotResult populated with the parsed data. | ||
:raises ValueError: If the string cannot be parsed as a Python object or JSON. | ||
""" | ||
|
||
try: | ||
parsed_data = ast.literal_eval(image_caption) | ||
except (ValueError, SyntaxError): | ||
try: | ||
parsed_data = json.loads(image_caption) | ||
except json.JSONDecodeError as e_json: | ||
raise ValueError("String could not be parsed as Python object or JSON.") from e_json | ||
try: | ||
return ScreenshotResult(**parsed_data) | ||
except Exception as e_pydantic: | ||
raise ValueError("Parsed data does not conform to ScreenshotResult schema.") from e_pydantic |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
You are an Image Captioner specialized in describing Screenshots. | ||
|
||
**Instructions:** | ||
|
||
Given the provided screenshot, please perform the following tasks: | ||
|
||
1. **Identify Open Applications:** | ||
- **List all open applications** visible in the screenshot. | ||
|
||
2. **Detailed Descriptions of Documents:** | ||
- For each open document identified, provide a comprehensive description including: | ||
- **Page Number**: Indicate the current page number. | ||
- **Header/Footer**: Describe any headers or footers present. | ||
- **Headlines**: Summarize the main headlines or titles. | ||
- **Content Overview**: Provide an overview of the document's content. | ||
|
||
3. **Detailed Descriptions of Web Pages:** | ||
- **List all open websites** visible in the screenshot. | ||
- For each website, include: | ||
- **Website Description**: Offer a detailed description of the website's purpose and content. | ||
- **Identified URLs**: Mention any URLs or web addresses visible. | ||
|
||
4. **Respond to Human Questions (If Provided):** | ||
- If a **Human Question** is provided at the end of the screenshot, **provide a clear and concise response** to it. | ||
|
||
|
||
Human Question: "{question}" |