Added handling of filename_as_id and file_extractor to SharePointReader #934

ferdinandosimonetti · 2024-02-08T11:52:04Z

Description

I've taken MinioReader's handling of file_extractor parameter for SimpleDirectoryReader
This allows to choose a customized matching between file extension and its Reader/Decoder, and *shouldn't wreak havoc on SharePointReader's functionality.

Type of Change

Please delete options that are not relevant.

Bug fix / Smaller change

How Has This Been Tested?

I stared at the code and made sure it makes sense

Suggested Checklist:

I have added a library.json file if a new loader/tool was added
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I ran make format; make lint to appease the lint gods

…er (mimicking what is done, for example, for MinioReader

anoopshrma

Hey @ferdinandosimonetti!,
added one comment, rest everything looks great!

anoopshrma · 2024-02-09T09:00:21Z

llama_hub/microsoft_sharepoint/base.py

@@ -28,6 +28,8 @@ def __init__(
        client_id: str,
        client_secret: str,
        tenant_id: str,
+        filename_as_id: bool = False,
+        file_extractor: Optional[Dict[str, Union[str, BaseReader]]] = None


I think there's a small mistake here:
file_extractor: Optional[Dict[str, BaseReader]] = None,

For ref: https://github.com/run-llama/llama_index/blob/0393b081f3aed854e0a628f49b8e51f8da7906ef/llama_index/readers/file/base.py#L118

Hopefully my second commit should solve the issue that prevented the first run to work out... previously I just forgot to add the appropriate imports from typing (Optional and Union).

You don't have to add union for basereader. It'll be always a baseReader class

Replace file_extractor line with the below one

file_extractor: Optional[Dict[str, BaseReader]] = None,

I wrote it that way because I was shamelessly copying line 25 of llama_hub/minio/minio-client/base.py

file_extractor: Optional[Dict[str, Union[str, BaseReader]]] = None,

however, I'll rewrite it that way

anoopshrma · 2024-02-09T09:01:59Z

You'll need to look at linting and test case as well on this.

review-notebook-app · 2024-02-09T17:35:35Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ferdinandosimonetti · 2024-02-12T14:16:48Z

Solved the last complaint about importing Union, that is unused

anoopshrma

Thanks for the prompt action on the lint part @ferdinandosimonetti
Highly appreciated!

Added handling of filename_as_id and file_extractor to SharePointRead…

4ec10c9

…er (mimicking what is done, for example, for MinioReader

anoopshrma approved these changes Feb 9, 2024

View reviewed changes

Ferdinando Simonetti added 2 commits February 9, 2024 10:01

added Optional and Union to imports from typing

e2353ee

after make lint

01f048a

ferdinandosimonetti mentioned this pull request Feb 9, 2024

[Feature Request]: I'd like to specify the appropriate Reader for each file found while using SharePointReader #933

Open

Ferdinando Simonetti added 2 commits February 9, 2024 19:19

No need for Union in line 34 of Sharepoint base.py

597783b

removing import of unused Union from typing, in Sharepoint Reader

8817373

anoopshrma approved these changes Feb 12, 2024

View reviewed changes

anoopshrma merged commit 642dc8a into run-llama:main Feb 12, 2024
3 checks passed

ferdinandosimonetti deleted the feat/file_extractor-for-SharePointReader branch February 12, 2024 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added handling of filename_as_id and file_extractor to SharePointReader #934

Added handling of filename_as_id and file_extractor to SharePointReader #934

ferdinandosimonetti commented Feb 8, 2024 •

edited

Loading

anoopshrma left a comment

anoopshrma Feb 9, 2024

ferdinandosimonetti Feb 9, 2024

anoopshrma Feb 9, 2024

ferdinandosimonetti Feb 9, 2024

anoopshrma commented Feb 9, 2024

review-notebook-app bot commented Feb 9, 2024

ferdinandosimonetti commented Feb 12, 2024

anoopshrma left a comment

Added handling of filename_as_id and file_extractor to SharePointReader #934

Added handling of filename_as_id and file_extractor to SharePointReader #934

Conversation

ferdinandosimonetti commented Feb 8, 2024 • edited Loading

Description

Type of Change

How Has This Been Tested?

Suggested Checklist:

anoopshrma left a comment

Choose a reason for hiding this comment

anoopshrma Feb 9, 2024

Choose a reason for hiding this comment

ferdinandosimonetti Feb 9, 2024

Choose a reason for hiding this comment

anoopshrma Feb 9, 2024

Choose a reason for hiding this comment

ferdinandosimonetti Feb 9, 2024

Choose a reason for hiding this comment

anoopshrma commented Feb 9, 2024

review-notebook-app bot commented Feb 9, 2024

ferdinandosimonetti commented Feb 12, 2024

anoopshrma left a comment

Choose a reason for hiding this comment

ferdinandosimonetti commented Feb 8, 2024 •

edited

Loading