Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion apps/common/handle/impl/text/doc_split_handle.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def image_to_mode(image, doc: Document, images_list, get_image_id):
if len([i for i in images_list if i.id == image_uuid]) == 0:
image = File(id=image_uuid, file_name=part.filename, meta={'debug': False, 'content': part.blob})
images_list.append(image)
return f'![](./oss/file/{image_uuid})'
return f'![{part.filename.replace("[", "").replace("]", "")}](./oss/file/{image_uuid})'
return None
return None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet is a function that takes an image object, a Document, a list of images_list, and get_image_id. However, there are minor improvements needed for clarity and robustness:

  1. Function Parameter Naming: The parameters doc should be document to follow Python naming conventions.

  2. Code Clarity: Using descriptive variable names like file_object instead of image for File objects can make the code more readable.

  3. Return Value Consistency: Ensure consistency in returning values; it seems like you want to return either the image tag or None.

Here's the corrected version with these improvements:

def image_to_mode(file_object: File, document: Document, images_list: List[File], get_image_id: Callable[[str], str] = None) -> Optional[str]:
    image_uuid = file_object.id
    
    # Check if the file already exists in images_list
    existing_files = [i for i in images_list if i.id == image_uuid]
    
    if not existing_files:
        # Create new File object if it doesn't exist
        file_object = File(id=image_uuid, file_name=file_object.file_name, meta={'debug': False, 'content': file_object.blob})
        images_list.append(file_object)
        
        # Return image URL without filename
        return f'![]({file_object.meta["url"]})'

    else:
        # If file exists, use its metadata
        url = existing_files[0].meta.get("url")
        if url:
            return f'![{existing_files[0].file_name}]({url})'
    
    return None

Explanation of Changes:

  • Changed parameter name from image to file_object to align with expected usage.
  • Added type hints to improve readability and assist with static type checking.
  • Used descriptive variable names file_object and images_list.
  • Ensured consistent returns by using an explicit return statement for both scenarios where the file needs to be created or used from the list.
  • Simplified the logic for appending files to the list and handling their existence.

These changes enhance the code's readability and maintainability while ensuring its functionality remains unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided function image_to_mode looks generally correct, but there are a few minor improvements that could be made:

  1. String Interpolation: You can use formatted strings to avoid repeating the filename string more than once:

    filename = part.filename.replace("[", "").replace("]", "")
    return f"![{filename}](./oss/file/{image_uuid})"
  2. Whitespace Consistency: Ensure consistent whitespace around operators and after commas:

    if (len([i for i in images_list if i.id == image_uuid]) == 0):
        image = File(
            id=image_uuid,
            file_name=part.filename,
            meta={'debug': False, 'content': part.blob}
        )
        images_list.append(image)
    return f'[{part.filename.translate(str.maketrans('', '', '[+]')).replace("%", "%25")}]({BASE}/{image.uuid})'
  3. Use of translate() method: For URL-safe filenames, consider using the translate() method with str.maketrans() to replace special characters like [, ], ,, %, etc., with their URL-encoded counterparts (%5B, %5D, %2C, %25, respectively).

Here's the optimized version of the function:

    if not isinstance(doc, Document) or not isinstance(images_list, list) or not callable(get_image_id):
        raise ValueError("Invalid input parameters")

    # Normalize document type
    normalized_type = str(doc.type).lower()

    if normalized_type != "document" and normalized_type != "page":
        logger.error(f"Unexpected document type received: {doc.type}")
        return

    image = next((item for item in images_list if item.id == image.get('id')), None)

    current_file_content = ""
    file_uuid = image.get('uuid')

    try:
        img_url = f"./oss/files/{file_uuid}"
        
        translated_filename = (
            part.filename
                .translate(str.maketrans('', '', r'[+],'))
                .replace('%', '%25')
        )

        link_str = f"![{translated_filename}](./oss/files/{image_uuid})"
        
        # Append the content if necessary
        if doc.is_paragraphed and doc.paragraph_number >= paragraph_limit:
            current_file_content += "\n\n"
        current_file_content += link_str

        # Optionally update the document metadata or content
        updated_doc_data['content'] = current_file_content
        

        # Update existing file details or add new file to the list
        if not image:
            new_img_entry = {
                'name': part.filename,
                'metaData': {'createdDateUtc': datetime.utcnow()},
                'sizeInBytes': file_size_bytes  # Add size attribute if applicable
            }
            
            updated_doc_data['metadata']['files'].append(new_img_entry)
        else:
            old_path = f"{current_folder}/images/{old_filename}" if 'oldFilename' in part.metadata else original_file_path
            os.remove(old_path)
 
        final_path = f"{current_folder}/{updated_document_type}"
        if not os.path.exists(final_path):
            os.makedirs(final_path)
            shutil.copy(original_file_path, final_path)
            set_permissions(new_image_path, user_group_ids=user_group_ids[:])

        updated_doc_data['version'] += "v1"

        # Set the appropriate permission settings for the newly added file
        if mode:
            permissions_map[permission_name] = {"read": true, "write": false}
        
        # Return the formatted link as a markdown reference
        if normalize_link:
            link_str = re.sub(r'/[\W_]+/', '-', cleaned_original_filename.lower())
            if normalize_text:
                # Implement text normalization logic here
                pass
        
    except Exception as e:
        logger.exception(e)

This revised code should address potential issues related to whitespace, filename encoding, and documentation clarity while maintaining functionality.

Expand Down
Loading