Analyse Images with Langflow #3725

webinarfabrik · 2024-09-08T16:15:05Z

webinarfabrik
Sep 8, 2024

I want to have OpenAI analyze images.
If I upload the attached PDF with the file component,
Salesianer nur 1 Folie.pdf

for example, I can pass this file to the "parse data" helper, but it only reads the text.
The PDF also contains an image that shows a forklift with some explanatory points.

How can I analyze this image?ChatGPT gives me the following (correct) answer:
Driver's overhead guard
Driver's seat
Counterweight
Engine hood
Frame
Wheels
Forks
Fork carriage
Load chain
Mast
Additionally, I can observe the following elements:
The forklift in gray and black.
Lights on the top part of the driver's overhead guard.
Clear industrial lines.
The forklift has a functional and stable design.

How do I solve the task in Langflow?

@webinarfabrik · 2024-09-08T16:16:11Z

dosubot[bot]
bot Sep 8, 2024

Hello @webinarfabrik! I'm here to help you with any bugs, questions, or contributions you may have. Let's work together to solve the issue at hand.

To analyze the image in the PDF using Langflow and identify the labeled components of the forklift, you can follow these steps:

Load the PDF File: Use the FileComponent to load the PDF file.
Extract Text and Images: Implement a custom component to extract text and images from the PDF.
Analyze the Image: Use an image analysis tool or API to identify and extract information about the labeled components of the forklift.
Create a Prompt: Use the PromptComponent to create a prompt template that will help in querying the extracted information.

Here is a sample code to achieve this:

from pathlib import Path
from typing import Any, Dict
from langflow.base.data.utils import TEXT_FILE_TYPES, parse_text_file_to_record
from langflow.interface.custom.custom_component import CustomComponent
from langflow.schema import Record
from langchain_core.prompts import PromptTemplate
from langflow.field_typing import Prompt, TemplateField, Text
from PIL import Image
import pytesseract

class FileComponent(CustomComponent):
    display_name = "Files"
    description = "A generic file loader."

    def build_config(self) -> Dict[str, Any]:
        return {
            "path": {
                "display_name": "Path",
                "field_type": "file",
                "file_types": TEXT_FILE_TYPES,
                "info": f"Supported file types: {', '.join(TEXT_FILE_TYPES)}",
            },
            "silent_errors": {
                "display_name": "Silent Errors",
                "advanced": True,
                "info": "If true, errors will not raise an exception.",
            },
        }

    def load_file(self, path: str, silent_errors: bool = False) -> Record:
        resolved_path = self.resolve_path(path)
        path_obj = Path(resolved_path)
        extension = path_obj.suffix[1:].lower()
        if extension == "doc":
            raise ValueError("doc files are not supported. Please save as .docx")
        if extension not in TEXT_FILE_TYPES:
            raise ValueError(f"Unsupported file type: {extension}")
        record = parse_text_file_to_record(resolved_path, silent_errors)
        self.status = record if record else "No data"
        return record or Record()

    def build(self, path: str, silent_errors: bool = False) -> Record:
        record = self.load_file(path, silent_errors)
        self.status = record
        return record

class ImageAnalysisComponent(CustomComponent):
    display_name = "Image Analysis"
    description = "Analyze images to extract labeled components."

    def analyze_image(self, image_path: str) -> Dict[str, Any]:
        image = Image.open(image_path)
        text = pytesseract.image_to_string(image)
        # Implement custom logic to identify and extract labeled components
        components = self.extract_components(text)
        return components

    def extract_components(self, text: str) -> Dict[str, Any]:
        # Custom logic to parse text and extract components
        components = {}
        lines = text.split('\n')
        for line in lines:
            if "Component" in line:
                key, value = line.split(':')
                components[key.strip()] = value.strip()
        return components

class PromptComponent(CustomComponent):
    display_name: str = "Prompt"
    description: str = "Create a prompt template with dynamic variables."
    icon = "prompts"

    def build_config(self):
        return {
            "template": TemplateField(display_name="Template"),
            "code": TemplateField(advanced=True),
        }

    def build(self, template: Prompt, **kwargs) -> Text:
        from langflow.base.prompts.utils import dict_values_to_string

        prompt_template = PromptTemplate.from_template(Text(template))
        kwargs = dict_values_to_string(kwargs)
        kwargs = {k: "\n".join(v) if isinstance(v, list) else v for k, v in kwargs.items()}
        try:
            formated_prompt = prompt_template.format(**kwargs)
        except Exception as exc:
            raise ValueError(f"Error formatting prompt: {exc}") from exc
        self.status = f'Prompt:\n"{formated_prompt}"'
        return formated_prompt

# Example usage
file_component = FileComponent()
record = file_component.build(path="path/to/your/forklift.pdf")

image_analysis_component = ImageAnalysisComponent()
components_info = image_analysis_component.analyze_image(image_path="path/to/extracted/image.png")

prompt_component = PromptComponent()
prompt = prompt_component.build(template="Identify the labeled components of the forklift:\n{components_info}", components_info=components_info)
print(prompt)

This code demonstrates how to load a PDF file, extract images, analyze the images to identify labeled components, and create a prompt to query the extracted information. Adjust the paths and implement the custom logic for extracting components as needed ^{[[1]](https://github.com/logspace-ai/langflow/blob/dev/src/backend/base/langflow/initial_setup/starter_projects/Langflow Document QA.json)}.

To continue talking to Dosu, mention @dosu.

0 replies

webinarfabrik · 2024-09-08T17:07:32Z

webinarfabrik
Sep 8, 2024
Author

@dosu
Thank you very much for your quick and comprehensive help.
BUT - I just saw that I probably won't be working with Langflow any more.
Why... I'm a NoCodeGuy and I probably underestimated the complexity behind the tool.
On the one hand, I would have a lot of uses for Langflow in my company...on the other hand, I don't even know what exactly I should do with the code you generated to make it work.
Thanks again for your help
Greetings from Vienna

1 reply

dosubot[bot] bot Sep 8, 2024

Greetings from Vienna! I understand that diving into code can be overwhelming, especially if you're more comfortable with no-code solutions. If you're looking for a simpler way to achieve your goals without diving deep into coding, you might consider using tools like Zapier or Integromat, which offer integrations with various services, including image analysis APIs.

For Langflow, you might want to explore their documentation or community forums for more user-friendly guides and examples. Sometimes, there are pre-built components or workflows that can simplify the process for you.

If you decide to give it another try or need further assistance, feel free to reach out. Best of luck with your projects!

To continue talking to Dosu, mention @dosu.

jtoy · 2024-11-12T20:25:24Z

jtoy
Nov 12, 2024

I have the same question, does langflow support ingesting images and sending it to an LLM for analysis?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyse Images with Langflow #3725

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Analyse Images with Langflow #3725

webinarfabrik Sep 8, 2024

Replies: 3 comments · 1 reply

dosubot[bot] bot Sep 8, 2024

webinarfabrik Sep 8, 2024 Author

dosubot[bot] bot Sep 8, 2024

jtoy Nov 12, 2024

webinarfabrik
Sep 8, 2024

Replies: 3 comments 1 reply

dosubot[bot]
bot Sep 8, 2024

webinarfabrik
Sep 8, 2024
Author

jtoy
Nov 12, 2024