-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
🎁 Add derivative_rodeo_splitter (#250)
* 🎁 Add derivative_rodeo_splitter Add a new PDF splitter option that wraps the DerivateRodeo's PdfSplitGenerator. It handles, in theory, PDF splitting and the derivative's generated in the DerivativeRodeo. Related to: - #220 Co-authored-by: Shana Moore <shana@scientist.com> Co-authored-by: Jeremy Friesen <jeremy.n.friesen@gmail.com>
- Loading branch information
1 parent
ab20f55
commit 15c035c
Showing
10 changed files
with
208 additions
and
55 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
module IiifPrint | ||
module SplitPdfs | ||
## | ||
# This class wraps the DerivativeRodeo::Generators::PdfSplitGenerator to find preprocessed | ||
# images, or split a PDF if there are no preprocessed images. | ||
# | ||
# We have already attached the original file to the file_set. We want to convert that original | ||
# file that's attached to a input_uri (e.g. "file://path/to/original-file" as in what we have | ||
# written to Fedora as the PDF) | ||
# | ||
# @see .call | ||
class DerivativeRodeoSplitter | ||
## | ||
# @param filename [String] the local path to the PDFDerivativeServicele | ||
# @param file_set [FileSet] file set containing the PDF file to split | ||
# | ||
# @return [Array] paths to images split from each page of PDF file | ||
def self.call(filename, file_set:) | ||
new(filename, file_set: file_set).split_files | ||
end | ||
|
||
def initialize(filename, file_set:, output_tmp_dir: Dir.tmpdir) | ||
@input_uri = "file://#{filename}" | ||
|
||
# We are writing the images to a local location that CarrierWave can upload. This is a | ||
# local file, internal to IiifPrint; it looks like SpaceStone/DerivativeRodeo lingo, but | ||
# that's just a convenience. | ||
output_template_path = File.join(output_tmp_dir, '{{ dir_parts[-1..-1] }}', '{{ filename }}') | ||
|
||
@output_location_template = "file://#{output_template_path}" | ||
@preprocessed_location_template = IiifPrint::DerivativeRodeoService.derivative_rodeo_uri(file_set: file_set, filename: filename) | ||
end | ||
|
||
## | ||
# This is where, in "Fedora" we have the original file. This is not the original file in the | ||
# pre-processing location but instead the long-term location of the file in the application | ||
# that mounts IIIF Print. | ||
# | ||
# @return [String] | ||
attr_reader :input_uri | ||
|
||
## | ||
# This is the location where we're going to write the derivatives that will "go into Fedora"; | ||
# it is a local location, one that IIIF Print's mounting application can directly do | ||
# "File.read" | ||
# | ||
# @return [String] | ||
attr_reader :output_location_template | ||
|
||
## | ||
# Where can we find, in the DerivativeRodeo's storage, what has already been done regarding | ||
# derivative generation. | ||
# | ||
# For example, SpaceStone::Serverless will pre-process derivatives and write them into an S3 | ||
# bucket that we then use for IIIF Print. | ||
# | ||
# @return [String] | ||
# | ||
# @see https://github.com/scientist-softserv/space_stone-serverless/blob/7f46dd5b218381739cd1c771183f95408a4e0752/awslambda/handler.rb#L58-L63 | ||
attr_reader :preprocessed_location_template | ||
|
||
## | ||
# @return [Array<Strings>] the paths to each of the images split off from the PDF. | ||
def split_files | ||
DerivativeRodeo::Generators::PdfSplitGenerator.new( | ||
input_uris: [@input_uri], | ||
output_location_template: output_location_template, | ||
preprocessed_location_template: preprocessed_location_template | ||
).generated_files.map(&:file_path) | ||
end | ||
end | ||
end | ||
end |
34 changes: 34 additions & 0 deletions
34
spec/iiif_print/split_pdfs/derivative_rodeo_splitter_spec.rb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'spec_helper' | ||
|
||
RSpec.describe IiifPrint::SplitPdfs::DerivativeRodeoSplitter do | ||
let(:path) { nil } | ||
let(:work) { double(MyWork, aark_id: '12345') } | ||
let(:file_set) { FileSet.new.tap { |fs| fs.save!(validate: false) } } | ||
|
||
describe 'class' do | ||
subject { described_class } | ||
|
||
it { is_expected.to respond_to(:call) } | ||
end | ||
|
||
describe "instance" do | ||
subject { described_class.new(path, file_set: file_set) } | ||
let(:generator) { double(DerivativeRodeo::Generators::PdfSplitGenerator, generated_files: []) } | ||
|
||
before do | ||
allow(file_set).to receive(:parent).and_return(work) | ||
# TODO: This is a hack that leverages the internals of Hydra::Works; not excited about it but | ||
# this part is only one piece of the over all integration. | ||
allow(file_set).to receive(:original_file).and_return(double(original_filename: __FILE__)) | ||
end | ||
|
||
it { is_expected.to respond_to :split_files } | ||
|
||
it 'uses the rodeo to split' do | ||
expect(DerivativeRodeo::Generators::PdfSplitGenerator).to receive(:new).and_return(generator) | ||
described_class.call(path, file_set: file_set) | ||
end | ||
end | ||
end |
Oops, something went wrong.