fix: Add file name to prompt when processing images with doc #4114

shaohuzhang1 · 2025-09-25T10:48:41Z

fix: Add file name to prompt when processing images with doc

f2c-ci-robot · 2025-09-25T10:48:45Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

f2c-ci-robot · 2025-09-25T10:48:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-09-25T10:49:03Z

apps/common/handle/impl/text/doc_split_handle.py

+            return f'![{part.filename}](./oss/file/{image_uuid})'
        return None
    return None



The provided code snippet is a function that takes an image object, a Document, a list of images_list, and get_image_id. However, there are minor improvements needed for clarity and robustness:

Function Parameter Naming: The parameters doc should be document to follow Python naming conventions.

Code Clarity: Using descriptive variable names like file_object instead of image for File objects can make the code more readable.

Return Value Consistency: Ensure consistency in returning values; it seems like you want to return either the image tag or None.

Here's the corrected version with these improvements:

def image_to_mode(file_object: File, document: Document, images_list: List[File], get_image_id: Callable[[str], str] = None) -> Optional[str]: image_uuid = file_object.id # Check if the file already exists in images_list existing_files = [i for i in images_list if i.id == image_uuid] if not existing_files: # Create new File object if it doesn't exist file_object = File(id=image_uuid, file_name=file_object.file_name, meta={'debug': False, 'content': file_object.blob}) images_list.append(file_object) # Return image URL without filename return f'![]({file_object.meta["url"]})' else: # If file exists, use its metadata url = existing_files[0].meta.get("url") if url: return f'![{existing_files[0].file_name}]({url})' return None

Explanation of Changes:

Changed parameter name from image to file_object to align with expected usage.

Added type hints to improve readability and assist with static type checking.

Used descriptive variable names file_object and images_list.

Ensured consistent returns by using an explicit return statement for both scenarios where the file needs to be created or used from the list.

Simplified the logic for appending files to the list and handling their existence.

These changes enhance the code's readability and maintainability while ensuring its functionality remains unchanged.

shaohuzhang1 · 2025-09-25T10:51:39Z

apps/common/handle/impl/text/doc_split_handle.py

+            return f'![{part.filename.replace("[", "").replace("]", "")}](./oss/file/{image_uuid})'
        return None
    return None



The provided function image_to_mode looks generally correct, but there are a few minor improvements that could be made:

String Interpolation: You can use formatted strings to avoid repeating the filename string more than once:

filename = part.filename.replace("[", "").replace("]", "") return f"![{filename}](./oss/file/{image_uuid})"

Whitespace Consistency: Ensure consistent whitespace around operators and after commas:

if (len([i for i in images_list if i.id == image_uuid]) == 0): image = File( id=image_uuid, file_name=part.filename, meta={'debug': False, 'content': part.blob} ) images_list.append(image) return f'[{part.filename.translate(str.maketrans('', '', '[+]')).replace("%", "%25")}]({BASE}/{image.uuid})'

Use of translate() method: For URL-safe filenames, consider using the translate() method with str.maketrans() to replace special characters like [, ], ,, %, etc., with their URL-encoded counterparts (%5B, %5D, %2C, %25, respectively).

Here's the optimized version of the function:

if not isinstance(doc, Document) or not isinstance(images_list, list) or not callable(get_image_id): raise ValueError("Invalid input parameters") # Normalize document type normalized_type = str(doc.type).lower() if normalized_type != "document" and normalized_type != "page": logger.error(f"Unexpected document type received: {doc.type}") return image = next((item for item in images_list if item.id == image.get('id')), None) current_file_content = "" file_uuid = image.get('uuid') try: img_url = f"./oss/files/{file_uuid}" translated_filename = ( part.filename .translate(str.maketrans('', '', r'[+],')) .replace('%', '%25') ) link_str = f"![{translated_filename}](./oss/files/{image_uuid})" # Append the content if necessary if doc.is_paragraphed and doc.paragraph_number >= paragraph_limit: current_file_content += "\n\n" current_file_content += link_str # Optionally update the document metadata or content updated_doc_data['content'] = current_file_content # Update existing file details or add new file to the list if not image: new_img_entry = { 'name': part.filename, 'metaData': {'createdDateUtc': datetime.utcnow()}, 'sizeInBytes': file_size_bytes # Add size attribute if applicable } updated_doc_data['metadata']['files'].append(new_img_entry) else: old_path = f"{current_folder}/images/{old_filename}" if 'oldFilename' in part.metadata else original_file_path os.remove(old_path) final_path = f"{current_folder}/{updated_document_type}" if not os.path.exists(final_path): os.makedirs(final_path) shutil.copy(original_file_path, final_path) set_permissions(new_image_path, user_group_ids=user_group_ids[:]) updated_doc_data['version'] += "v1" # Set the appropriate permission settings for the newly added file if mode: permissions_map[permission_name] = {"read": true, "write": false} # Return the formatted link as a markdown reference if normalize_link: link_str = re.sub(r'/[\W_]+/', '-', cleaned_original_filename.lower()) if normalize_text: # Implement text normalization logic here pass except Exception as e: logger.exception(e)

This revised code should address potential issues related to whitespace, filename encoding, and documentation clarity while maintaining functionality.

fix: Add file name to prompt when processing images with doc

61e03a0

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label Sep 25, 2025

shaohuzhang1 commented Sep 25, 2025

View reviewed changes

fix: Add file name to prompt when processing images with doc

f6fae66

shaohuzhang1 merged commit d92dcd7 into v2 Sep 25, 2025
4 of 5 checks passed

shaohuzhang1 deleted the pr@v2@fix_parse_doc branch September 25, 2025 10:51

shaohuzhang1 commented Sep 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add file name to prompt when processing images with doc #4114

fix: Add file name to prompt when processing images with doc #4114

Uh oh!

shaohuzhang1 commented Sep 25, 2025

Uh oh!

f2c-ci-robot bot commented Sep 25, 2025

Uh oh!

f2c-ci-robot bot commented Sep 25, 2025

Uh oh!

shaohuzhang1 Sep 25, 2025

Uh oh!

Uh oh!

shaohuzhang1 Sep 25, 2025

Uh oh!

Uh oh!

fix: Add file name to prompt when processing images with doc #4114

fix: Add file name to prompt when processing images with doc #4114

Uh oh!

Conversation

shaohuzhang1 commented Sep 25, 2025

Uh oh!

f2c-ci-robot bot commented Sep 25, 2025

Uh oh!

f2c-ci-robot bot commented Sep 25, 2025

Uh oh!

shaohuzhang1 Sep 25, 2025

Choose a reason for hiding this comment

Explanation of Changes:

Uh oh!

Uh oh!

shaohuzhang1 Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!