Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download_Mode for File, S3_File and Enso_File #12017

Merged
merged 28 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
2c44654
wip
GregoryTravis Nov 13, 2024
3340832
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Nov 14, 2024
f49e29c
wip
GregoryTravis Nov 14, 2024
04ebbda
wip
GregoryTravis Nov 14, 2024
4d1db59
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Nov 19, 2024
2bfe697
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Dec 17, 2024
3daae54
merge
GregoryTravis Jan 7, 2025
c8c77b2
Always mode, changed helper to return bool
GregoryTravis Jan 7, 2025
b4f5a9c
remove last-modified-time setter
GregoryTravis Jan 7, 2025
b0fb99d
wip
GregoryTravis Jan 7, 2025
b2ce858
wip
GregoryTravis Jan 7, 2025
85902ad
changelog
GregoryTravis Jan 7, 2025
269b7c4
merge
GregoryTravis Jan 8, 2025
9bb0dc8
wip
GregoryTravis Jan 8, 2025
6e5fca2
wip
GregoryTravis Jan 8, 2025
9c04163
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 9, 2025
00e40af
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 9, 2025
6bf36ce
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 10, 2025
af5d144
green for all 3
GregoryTravis Jan 10, 2025
a6d66c6
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 13, 2025
e9bece1
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 14, 2025
4cffe39
review
GregoryTravis Jan 14, 2025
a19064c
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 15, 2025
c451f82
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 16, 2025
fdc1281
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 21, 2025
acfca95
debug logging
GregoryTravis Jan 21, 2025
4ed8d35
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 22, 2025
d5c9cb2
Merge branch 'develop' into wip/gmt/11411-download-cache
GregoryTravis Jan 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@
- [Added `add_group_number` to the in-memory database.[11818]
- [The reload button clears the HTTP cache.][11673]
- [SQL Server Support for Aggregate][11811]
- [Added `Download_Mode` parameter to `Data.download`.][12017]

[11235]: https://github.com/enso-org/enso/pull/11235
[11255]: https://github.com/enso-org/enso/pull/11255
Expand All @@ -156,6 +157,7 @@
[11818]: https://github.com/enso-org/enso/pull/11818
[11673]: https://github.com/enso-org/enso/pull/11673
[11811]: https://github.com/enso-org/enso/pull/11811
[12017]: https://github.com/enso-org/enso/pull/12017

#### Enso Language & Runtime

Expand Down
22 changes: 12 additions & 10 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data.enso
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import project.Any.Any
import project.Data.Download.Download_Mode.Download_Mode
import project.Data.Pair.Pair
import project.Data.Read.Many_Files_List.Many_Files_List
import project.Data.Read.Return_As.Return_As
Expand Down Expand Up @@ -443,17 +444,18 @@ post (uri:(URI | Text)=(Missing_Argument.throw "uri")) (body:Request_Body=..Empt
- headers: The headers to send with the request. Defaults to an empty vector.
@uri (Text_Input display=..Always)
@headers Header.default_widget
download : (URI | Text) -> Writable_File -> HTTP_Method -> Vector (Header | Pair Text Text) -> File ! Request_Error | HTTP_Error
download (uri:(URI | Text)=(Missing_Argument.throw "uri")) file:Writable_File (method:HTTP_Method=..Get) (headers:(Vector (Header | Pair Text Text))=[]) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API suggestion: what if we rename mode into replace_existing?

IMO it will be much clearer for the user.

If I do:

Data.download url1 file mode=..If_Not_Exists
Data.download url2 file mode=..If_Not_Exists

tbh I'd still expect the second expression to download. That is because I am now downloading a different file to that destination. So while the destination exists, as a user I would expect it to be overwritten, because I've changed the URL - e.g. I was working with a report from June relying on the cache to only download it once, but now I want to start working with reports from July. I change the URL and expect the file to get redownloaded even if it exists - because I expect the data is new. I even reset caches and am confused why I'm still seeing June data in the file.

I understand that this is not what this was designed for, but I think the above is a likely user scenario.

Now, if we rename the parameter to replace_existing, the code reads as:

Data.download url1 file replace_existing=..If_Not_Exists
Data.download url2 file replace_existing=..If_Not_Exists

Now it is obvious to me that the second statement will do nothing if the first one succeeded - because the file is redownloaded only if it didn't exist in the first place (regardless of the URL). And now that semantics (what is currently implemented) is completely clear when reading the calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and regardless of parameter name - we need to update the method documentation to include it and ideally describe what the expected semantics is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we rely on the 'refresh' button to clear caches in many cases, we probably should add a note that this download method works only based on file existance/age and so refresh button does not affect it.

As a user I might expect the refresh button to ensure the file is redownloaded (whether that should work this way or not is up to discussion, I think current semantics are ok) - but with current semantics the refresh button just does nothing for download. I think it would be good if the documentation mentioned that, so that the user can know what to expect / can see that this is expected and not a bug if they get confused seeing that refresh button does nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to replace_existing, and added documentation.

I am not sure about how the refresh button interacts with the Always option, and I cannot run the front end to find out, so I did not mention that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Always we always redownload the file, right? How could the refresh button interfere with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is, I assume that refresh will also cause a download.

download : (URI | Text) -> Writable_File -> Download_Mode -> HTTP_Method -> Vector (Header | Pair Text Text) -> File ! Request_Error | HTTP_Error
download (uri:(URI | Text)=(Missing_Argument.throw "uri")) file:Writable_File (mode:Download_Mode=..If_Not_Exists) (method:HTTP_Method=..Get) (headers:(Vector (Header | Pair Text Text))=[]) =
Context.Output.if_enabled disabled_message="As writing is disabled, cannot download to a file. Press the Write button ▶ to perform the operation." panic=False <|
response = HTTP.fetch uri method headers cache_policy=Cache_Policy.No_Cache
case Data_Link.is_data_link response.body.metadata of
True ->
# If the resource was a data link, we follow it, download the target data and try to write it to a file.
data_link = Data_Link_Helpers.interpret_json_as_data_link response.decode_as_json
Data_Link_Helpers.save_data_link_to_file data_link file
False ->
response.write file
if mode.should_download file then
response = HTTP.fetch uri method headers cache_policy=Cache_Policy.No_Cache
case Data_Link.is_data_link response.body.metadata of
True ->
# If the resource was a data link, we follow it, download the target data and try to write it to a file.
data_link = Data_Link_Helpers.interpret_json_as_data_link response.decode_as_json
Data_Link_Helpers.save_data_link_to_file data_link file
False ->
response.write file

## If the `format` is set to `Raw_Response`, a raw HTTP `Response` is returned
that can be then processed further manually.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
import project.Data.Time.Date_Time.Date_Time
import project.Data.Time.Duration.Duration
import project.Nothing.Nothing
import project.System.File.Generic.Writable_File.Writable_File
from project.Data.Boolean import Boolean, False, True

type Download_Mode
## Download the file if it does not already exist on disk.
If_Not_Exists

## Download the file if it is older than the specified age.
If_Older_Than age:Duration

## Always download.
Always

## PRIVATE
Determine if a file should be downloaded, based on the file type,
download mode, and file age.
should_download self (file:Writable_File) -> Boolean =
case self of
Download_Mode.If_Not_Exists ->
file.file.exists.not
Download_Mode.If_Older_Than age ->
file.file.exists.not || file.file.creation_time < (Date_Time.now - age)
Download_Mode.Always ->
True
1 change: 1 addition & 0 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Main.enso
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ export project.Data.Boolean.Boolean.True
export project.Data.Decimal.dec
export project.Data.Decimal.Decimal
export project.Data.Dictionary.Dictionary
export project.Data.Download.Download_Mode.Download_Mode
export project.Data.Filter_Condition.Filter_Action
export project.Data.Filter_Condition.Filter_Condition
export project.Data.Hashset.Hashset
Expand Down
58 changes: 58 additions & 0 deletions test/Base_Tests/src/Data/Data_Spec.enso
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
from Standard.Base import all

from Standard.Test import all
import Standard.Test.Test_Environment

from enso_dev.Base_Tests.Network.Http.Http_Test_Setup import base_url_with_slash, pending_has_url

polyglot java import java.lang.Thread

with_test_file f ~action =
f.delete_if_exists
Panic.with_finalizer (f.delete_if_exists) <|
action f

## file_maker should take a path component and return a full file path, for example:
"if_not_exist"
=>
(enso_project.data / "transient" / "if_not_exist.txt")
add_specs prefix suite_builder file_maker =
suite_builder.group prefix+"Download Mode" pending=pending_has_url group_builder->
url_n_bytes n = base_url_with_slash+'test_download?length='+n.to_text

group_builder.specify prefix+"Will always download a file for mode Always" <|
with_test_file (file_maker "always") file->
file.exists . should_be_false
Data.download (url_n_bytes 10) mode=..Always file
first_contents = file.read
Data.download (url_n_bytes 11) mode=..Always file
second_contents = file.read
first_contents . should_not_equal second_contents

group_builder.specify prefix+"Will download a file if it does not exist for default mode If_Not_Exists" <|
with_test_file (file_maker "if_not_exist") file->
file.exists . should_be_false
Data.download (url_n_bytes 10) file
first_contents = file.read
Data.download (url_n_bytes 11) file
second_contents = file.read
first_contents . should_equal second_contents

group_builder.specify prefix+"Will download a file if it is older than a specified duration for mode If_Older_Than" <|
with_test_file (file_maker "if_older_than") file->
sleep_duration_secs = 3

file.exists . should_be_false

Data.download (url_n_bytes 10) file
first_contents = file.read

Data.download (url_n_bytes 11) (mode=..If_Older_Than (Duration.new seconds=sleep_duration_secs)) file
second_contents = file.read
first_contents . should_equal second_contents

Thread.sleep (sleep_duration_secs * 1000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this changes anything, but my intuition suggests to make the sleep duration a little bit larger than the duration in If_Older_Than.

e.g. do If Older Than 3 seconds but sleep for 3.5 seconds. If you have the exact same amount I imagine some small fluctuation in how the times are computed could make the test fail, however if you add a little bit of offset it feels like it should decrease the chances of this randomly failing.

Although I do admit this is purely based on intuition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a 0.5 -- and also, this test has retries since it will inevitably fail.


Data.download (url_n_bytes 12) (mode=..If_Older_Than (Duration.new seconds=sleep_duration_secs)) file
third_contents = file.read
first_contents . should_not_equal third_contents
7 changes: 7 additions & 0 deletions test/Base_Tests/src/System/File_Spec.enso
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ polyglot java import org.enso.base_test_helpers.FileSystemHelper
from Standard.Test import all
import Standard.Test.Test_Environment

import project.Data.Data_Spec
import project.Network.Enso_Cloud.Cloud_Tests_Setup.Cloud_Tests_Setup

## We rely on a less strict equality for `File` as it is fine if a relative file gets resolved to absolute.
Expand All @@ -39,6 +40,8 @@ add_specs suite_builder =
Platform.OS.Windows -> Nothing
_ -> "This test runs only on Windows."

add_data_specs suite_builder

suite_builder.group "File Operations" group_builder->
group_builder.specify "should get name of the root" <|
root = File.new "/"
Expand Down Expand Up @@ -1049,6 +1052,10 @@ add_create_and_delete_directory_specs group_builder ~parent_dir =

dir.delete_if_exists . should_equal Nothing

add_data_specs suite_builder =
file_maker path_element:Text = enso_project.data / "transient" / (path_element+".txt")
Data_Spec.add_specs "(File) " suite_builder file_maker

add_copy_edge_cases_specs group_builder ~parent_dir file_from_path=File.new =
group_builder.specify "should allow copy/move edge case with source=target" <|
file = parent_dir / "src-target-in-one.txt"
Expand Down
Loading