Skip to content
This repository has been archived by the owner on Oct 24, 2024. It is now read-only.

🎁 Only run thumbnail derivatives on thumbnails #684

Merged
merged 1 commit into from
Nov 29, 2023

Conversation

kirkkwang
Copy link
Contributor

We were seeing that our thumbnail files were getting word coordinates derivatives, which is a waste of processing time. The jobs would eventually fail but ideally we don't want them to run at all for thumbnails.

This commit will override the DerivativeRodeoService and remove all other derivatives except for thumbnails if our file is a thumbnail file.

Also in this commit, we were seeing a nilClass error for a logger in development, so that has also been fixed.

Ref:

@kirkkwang kirkkwang requested a review from jeremyf November 29, 2023 02:23
Copy link
Contributor

@jeremyf jeremyf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about values being deleted and having an impact.

@kirkkwang
Copy link
Contributor Author

kirkkwang commented Nov 29, 2023

Ah understood, I thought each run was pretty well scoped to the file at play so it seemed like it shouldn't have side effects for the next file that comes through. I'd like to hear your thoughts.

in a different iteration i had

    def create_derivatives(filename)
      named_derivatives_and_generators.flat_map do |named_derivative, generator_name|
        return if named_derivative == :json && filename.downcase.ends_with?(Hyku::THUMBNAIL_FILE_SUFFIX)

        lasso_up_some_derivatives(
          named_derivative: named_derivative,
          generator_name: generator_name,
          filename: filename
        )
      end
    end

jeremyf added a commit to notch8/iiif_print that referenced this pull request Nov 29, 2023
Prior to this commit, IIIF Print assumed that every file of a given
mime-type would use all of the same generators.  However, that is not
necessarily the case.

With this commit:

- Updated documentation based on a read of the generated Yardoc
- Added `DerivativeRodeoService.named_derivatives_and_generators_filter`
- Added a `clone` of attributes

The clone is in place to help ensure that as we apply the filter we
don't accidentally delete the application's configuration for mime
category and expected derivatives.

For example, let's say I have the following nested hash:

```ruby
nested_hash = {
  pdf: {
    thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator"
  },
  image: {
    thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator",
    json: "DerivativeRodeo::Generators::WordCoordinatesGenerator",
    xml: "DerivativeRodeo::Generators::AltoGenerator",
    txt: "DerivativeRodeo::Generators::PlainTextGenerator"
  }
}
```

If I then call the following:

```ruby
nested_hash.fetch(:pdf).delete_if { |key, value| key == :thumbnail }
```

Then look at `nested_hash`, I will see the following:

```ruby
pp nested_hash

{:pdf=>{},
 :image=>
  {:thumbnail=>"DerivativeRodeo::Generators::ThumbnailGenerator",
   :json=>"DerivativeRodeo::Generators::WordCoordinatesGenerator",
   :xml=>"DerivativeRodeo::Generators::AltoGenerator",
   :txt=>"DerivativeRodeo::Generators::PlainTextGenerator"}}
```

Why? Because we haven't changed objects.  It's possible that Rails's
class_attribute will do deep clones of hashes, but with this clone
behavior we remove that possibility of a problem.

Related to:

- notch8/adventist-dl#684
- https://github.com/scientist-softserv/adventist-dl/issues/676
@jeremyf
Copy link
Contributor

jeremyf commented Nov 29, 2023

In that updated iteration, instead of return you should probably use next; return might leave the create_derivative function entirely.

Note, I've added notch8/iiif_print#306 which provides a configurable point instead of a decorator point.

jeremyf added a commit to notch8/iiif_print that referenced this pull request Nov 29, 2023
Prior to this commit, IIIF Print assumed that every file of a given
mime-type would use all of the same generators.  However, that is not
necessarily the case.

With this commit:

- Updated documentation based on a read of the generated Yardoc
- Added `DerivativeRodeoService.named_derivatives_and_generators_filter`
- Added a `clone` of attributes

The clone is in place to help ensure that as we apply the filter we
don't accidentally delete the application's configuration for mime
category and expected derivatives.

For example, let's say I have the following nested hash:

```ruby
nested_hash = {
  pdf: {
    thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator"
  },
  image: {
    thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator",
    json: "DerivativeRodeo::Generators::WordCoordinatesGenerator",
    xml: "DerivativeRodeo::Generators::AltoGenerator",
    txt: "DerivativeRodeo::Generators::PlainTextGenerator"
  }
}
```

If I then call the following:

```ruby
nested_hash.fetch(:pdf).delete_if { |key, value| key == :thumbnail }
```

Then look at `nested_hash`, I will see the following:

```ruby
pp nested_hash

{:pdf=>{},
 :image=>
  {:thumbnail=>"DerivativeRodeo::Generators::ThumbnailGenerator",
   :json=>"DerivativeRodeo::Generators::WordCoordinatesGenerator",
   :xml=>"DerivativeRodeo::Generators::AltoGenerator",
   :txt=>"DerivativeRodeo::Generators::PlainTextGenerator"}}
```

Why? Because we haven't changed objects.  It's possible that Rails's
class_attribute will do deep clones of hashes, but with this clone
behavior we remove that possibility of a problem.

Related to:

- notch8/adventist-dl#684
- https://github.com/scientist-softserv/adventist-dl/issues/676
jeremyf added a commit to notch8/iiif_print that referenced this pull request Nov 29, 2023
Prior to this commit, IIIF Print assumed that every file of a given
mime-type would use all of the same generators.  However, that is not
necessarily the case.

With this commit:

- Updated documentation based on a read of the generated Yardoc
- Added `DerivativeRodeoService.named_derivatives_and_generators_filter`
- Added a `clone` of attributes

The clone is in place to help ensure that as we apply the filter we
don't accidentally delete the application's configuration for mime
category and expected derivatives.

For example, let's say I have the following nested hash:

```ruby
nested_hash = {
  pdf: {
    thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator"
  },
  image: {
    thumbnail: "DerivativeRodeo::Generators::ThumbnailGenerator",
    json: "DerivativeRodeo::Generators::WordCoordinatesGenerator",
    xml: "DerivativeRodeo::Generators::AltoGenerator",
    txt: "DerivativeRodeo::Generators::PlainTextGenerator"
  }
}
```

If I then call the following:

```ruby
nested_hash.fetch(:pdf).delete_if { |key, value| key == :thumbnail }
```

Then look at `nested_hash`, I will see the following:

```ruby
pp nested_hash

{:pdf=>{},
 :image=>
  {:thumbnail=>"DerivativeRodeo::Generators::ThumbnailGenerator",
   :json=>"DerivativeRodeo::Generators::WordCoordinatesGenerator",
   :xml=>"DerivativeRodeo::Generators::AltoGenerator",
   :txt=>"DerivativeRodeo::Generators::PlainTextGenerator"}}
```

Why? Because we haven't changed objects.  It's possible that Rails's
class_attribute will do deep clones of hashes, but with this clone
behavior we remove that possibility of a problem.

Related to:

- notch8/adventist-dl#684
- https://github.com/scientist-softserv/adventist-dl/issues/676
@kirkkwang kirkkwang force-pushed the i676-only-run-thumbnail-on-thumbnails branch from 94914f7 to 358d694 Compare November 29, 2023 18:02
We were seeing that our thumbnail files were getting word coordinates
derivatives, which is a waste of processing time.  The jobs would
eventually fail but ideally we don't want them to run at all for
thumbnails.

This commit will upgrade to the latest version of IIIF Print where a new
configurable class attribute has been added to enable a more granular
control over which derivatives are run on what files.

Also in this commit, we were seeing a nilClass error for a logger in
development, so that has also been fixed.

Ref:
  - https://github.com/scientist-softserv/adventist-dl/issues/676
  - notch8/iiif_print#306
@kirkkwang kirkkwang force-pushed the i676-only-run-thumbnail-on-thumbnails branch from 358d694 to aab7d39 Compare November 29, 2023 18:09
@kirkkwang kirkkwang requested a review from jeremyf November 29, 2023 18:28
@jeremyf jeremyf merged commit de5f951 into main Nov 29, 2023
7 checks passed
@jeremyf jeremyf deleted the i676-only-run-thumbnail-on-thumbnails branch November 29, 2023 19:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants