Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.x]: Excessive client calls #14

Open
wimulkeman opened this issue Mar 24, 2023 · 2 comments
Open

[1.x]: Excessive client calls #14

wimulkeman opened this issue Mar 24, 2023 · 2 comments

Comments

@wimulkeman
Copy link

wimulkeman commented Mar 24, 2023

Problem
In version 1.x of this package does a separate metadata call for each file and directory found in a directory content listing.

Additional information
Since the update to version 1.x I started to receive rate limit warnings from Azure. This turned out to be from the listContents method call used on Flysystem using the Azure Adapter. This methods uses the getContents call https://github.com/consilience/flysystem-azure-file-storage/blob/1.0.0/src/AzureFileAdapter.php#L407 in the AzureFileAdapter. In the getContents a request is done to retrieve the listDirectoriesAndFiles method from the client to retrieve a list from the Azure File Storage. This returns a list of all the directories and files and their sizes. On version 0.1.6 and before of the package this was the only information used. https://github.com/consilience/flysystem-azure-file-storage/blob/0.1.6/src/AzureFileAdapter.php#L456

In 1.0 this information is expanded with additional meta information retrieved by calling getFileProperties and getDirectoryProperties for each file or directory found https://github.com/consilience/flysystem-azure-file-storage/blob/1.0.0/src/AzureFileAdapter.php#L483. This generates a lot of calls quickly. Their is the first call (listDirectoriesAndFiles). If their are for example 3 directories and 60 files, then the total would be 1 + 3 + 60 = 64 calls just to retrieve the list of contents of a directory.

Possible solution
Not retrieving the additional information for files (and directories). This forces the user to retrieve the additional information if necessary, but I would estimate that the additional information besides the filename is not always used and therefore removing the need to have the information available by default.

In a project it was for now mitigated by patching the file with the following code:

diff --git a/vendor/consilience/flysystem-azure-file-storage/src/AzureFileAdapter.php b/vendor/consilience/flysystem-azure-file-storage/src/AzureFileAdapter.php
--- a/vendor/consilience/flysystem-azure-file-storage/src/AzureFileAdapter.php
+++ b/vendor/consilience/flysystem-azure-file-storage/src/AzureFileAdapter.php
@@ -477,14 +481,11 @@

         foreach ($listResults->getFiles() as $fileObject) {
             $pathName = trim(sprintf('%s/%s', $path, $fileObject->getName()), '/');
-
-            $contents[] = $this->normalizeFileProperties(
-                $pathName,
-                $this->azureClient->getFileProperties(
-                    $this->container,
-                    $this->prefixer->prefixPath($pathName),
-                ),
-            );
+            $contents[] = FileAttributes::fromArray([
+                'type' => 'file',
+                'path' => $pathName,
+                'dirname' => dirname($pathName),
+            ]);
         }

         return $contents;
@judgej
Copy link
Member

judgej commented Mar 25, 2023

Thank you for the report, Wim. I'll need to schedule a little time to take a look at this. I wonder is there is some caching that we are missing out on? From what I remember, Flysystem 1.x did a lot of caching automtaically, but maybe for 2.x that needs to be brought into the adapters?

@wimulkeman
Copy link
Author

It was mentioned as a little sidenote, but the cached adapter that could be used as a decorator was removed by flysystem during the transition from v1 to v2. https://flysystem.thephpleague.com/docs/upgrade-from-1.x/#miscellaneous-changes

That could cause problems when requesting the same resource multiple times. In this case I think caching won't resolve the problem for the implementation requests more information than it would have available from the first call.

I dug into the Azure API options and it seems possible to retrieve additional information on the list call by using the include option. This option triggers the x-ms-file-extended-info header. https://learn.microsoft.com/en-us/rest/api/storageservices/list-directories-and-files#uri-parameters
Unfortunate I wasn't able to test is for my Azure version seems to be pinned to version [2016-05-31](https://learn.microsoft.com/en-us/rest/api/storageservices/version-2016-05-31) which errors out on the include option.

Maybe that option could reduce the need to collect metadata by additional client calls. To implement it being able to overwrite the header x-ms-version to a version at or after 2020-04-08 is a requirement for it would break calls if their are on a older version. I am not sure if that is possible for I was unable to overwrite it in my project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants