Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archival exports (#511, #512, #513, #514) #521

Merged
merged 29 commits into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
104eed8
Upgrade storyblok-richtext-renderer and add custom marks/nodes (#512)
blms Aug 30, 2024
eddbb48
Basic html export (no images)
blms Sep 3, 2024
6a17d63
Use template for index html (#511)
blms Sep 4, 2024
226b30c
Improve highlight handling (#512)
blms Sep 4, 2024
4e459fb
Use templates to render rich text pages (#512)
blms Sep 5, 2024
403591a
Add table node types to html serialization
blms Sep 13, 2024
9d4c629
Improve file naming conventions for export
blms Sep 13, 2024
9d990e2
Fix "ghost highlight" bug in exports
blms Sep 13, 2024
793ec92
Move reused download logic to helper
blms Sep 17, 2024
9e33d7e
Add images to export
blms Sep 17, 2024
6a66e93
Move export code to sidekiq worker
blms Sep 17, 2024
0de0237
Error handling for failed file download
blms Sep 19, 2024
d8caa82
Prevent retry and keep track of progress
blms Sep 19, 2024
4858177
Fix zipfile storage bug
blms Sep 19, 2024
1dee161
Fix template rendering cache bug
blms Sep 19, 2024
a12fa87
Use activestorage attachments for exports
blms Sep 19, 2024
648c91e
Add route for export monitoring
blms Sep 19, 2024
0470974
Allow export creation/monitoring/download via frontend
blms Sep 19, 2024
0415518
Minor reorganization
blms Sep 19, 2024
48f93e0
Bump listen gem to prevent memory error on MacOS
blms Sep 24, 2024
e976c0b
Add image annotations to export (#514)
blms Sep 25, 2024
ab04e1b
Add nil check to highlight.links_to
blms Sep 25, 2024
e2ffcd9
Improve export UX slightly
blms Sep 26, 2024
df6a034
Retain link title formatting from existing UI in export (#512)
blms Sep 26, 2024
af6d7a9
Order highlights correctly according to reading order (#512, #514)
blms Sep 26, 2024
6756307
rm unnecessary logging
blms Oct 1, 2024
bc48c2a
Merge branch 'develop' into feature/511-basic-export
blms Oct 3, 2024
9a96ee4
Add support for multicolumn layout and margins in export
blms Oct 10, 2024
2bd0e20
Merge branch 'develop' into feature/511-basic-export
blms Oct 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ gem 'rack-cors', :require => 'rack/cors'
gem 'pg_search'
gem 'figaro'
gem 'open-uri'
gem 'storyblok-richtext-renderer', github: 'performant-software/storyblok-ruby-richtext-renderer', ref: '0a6c2e8e81560311569d49d06c0e32abd0effcd5'
gem 'storyblok-richtext-renderer', github: 'performant-software/storyblok-ruby-richtext-renderer', ref: 'bef6903146426e01175887eb92a75bf9bac4c3cb'
gem 'sidekiq', '~>6.5.1'
gem 'sidekiq-status', '~>2.1.1'

Expand All @@ -39,7 +39,7 @@ group :development, :test do
end

group :development do
gem 'listen', '>= 3.0.5', '< 3.2'
gem 'listen', '>= 3.3.0', '< 4.0'
# Spring speeds up development by keeping your application running in the background. Read more: https://github.com/rails/spring
gem 'spring'
gem 'spring-watcher-listen', '~> 2.0.0'
Expand All @@ -56,3 +56,4 @@ gem 'psych', '< 4'
gem 'dotenv-rails', '~> 2.8'
# pinned to < 9.3 until https://github.com/ilyakatz/data-migrate/issues/302 resolved
gem 'data_migrate', '~> 9.2', '< 9.3'
gem 'rubyzip', '~> 2.3'
16 changes: 9 additions & 7 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
GIT
remote: https://github.com/performant-software/storyblok-ruby-richtext-renderer.git
revision: 0a6c2e8e81560311569d49d06c0e32abd0effcd5
ref: 0a6c2e8e81560311569d49d06c0e32abd0effcd5
revision: bef6903146426e01175887eb92a75bf9bac4c3cb
ref: bef6903146426e01175887eb92a75bf9bac4c3cb
specs:
storyblok-richtext-renderer (0.0.6)
storyblok-richtext-renderer (0.0.10)

GEM
remote: https://rubygems.org/
Expand Down Expand Up @@ -143,9 +143,9 @@ GEM
ruby-vips (>= 2.0.17, < 3)
jmespath (1.6.2)
jsonapi-renderer (0.2.2)
listen (3.0.8)
rb-fsevent (~> 0.9, >= 0.9.4)
rb-inotify (~> 0.9, >= 0.9.7)
listen (3.9.0)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
logger (1.6.0)
loofah (2.22.0)
crass (~> 1.0.2)
Expand Down Expand Up @@ -243,6 +243,7 @@ GEM
ruby-vips (2.2.2)
ffi (~> 1.12)
logger
rubyzip (2.3.2)
sidekiq (6.5.12)
connection_pool (>= 2.2.5, < 3)
rack (~> 2.0)
Expand Down Expand Up @@ -307,7 +308,7 @@ DEPENDENCIES
figaro
foreman
image_processing (~> 1.12)
listen (>= 3.0.5, < 3.2)
listen (>= 3.3.0, < 4.0)
mutex_m (~> 0.2.0)
open-uri
pg (>= 0.18, < 2.0)
Expand All @@ -317,6 +318,7 @@ DEPENDENCIES
puma (~> 4.3)
rack-cors
rails (~> 6.1)
rubyzip (~> 2.3)
sidekiq (~> 6.5.1)
sidekiq-status (~> 2.1.1)
spring
Expand Down
72 changes: 70 additions & 2 deletions app/controllers/projects_controller.rb
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
class ProjectsController < ApplicationController
before_action :set_project, only: [:show, :update, :destroy, :search, :check_in, :move_many]
before_action :set_project, only: [:show, :update, :destroy, :search, :check_in, :move_many, :create_export, :exports]
before_action :validate_user_approved, only: [:create]

before_action only: [:update, :destroy] do
before_action only: [:update, :destroy, :create_export] do
validate_user_admin(@project)
end

Expand Down Expand Up @@ -71,6 +71,74 @@ def check_in
render json: { checked_in_docs: checked_in_doc_ids }
end

# GET /projects/1/exports
def exports
# completed exports
@exports = @project.exports.collect do |exp|
{
:id => exp.id,
:updated_at => exp.created_at,
:status => "Complete",
:url => Rails.application.routes.url_helpers.rails_blob_url(exp),
}
end
# queued exports
queues = Sidekiq::Queue.all
queues.each do |queue|
queue.each do |job|
if job.klass == "ExportProjectWorker" and job.args[0].to_i == @project.id
@exports.push({
:id => job.jid,
:status => "Queued",
:updated_at => job.created_at,
})
end
end
end
# in progress exports
worker_set = Sidekiq::WorkSet.new
worker_set.each do |_, _, worker|
if worker.is_a? Hash and worker["payload"]["class"] == "ExportProjectWorker" and worker["payload"]["args"][0].to_i == @project.id
exp = {
:id => worker["payload"]["jid"],
:updated_at => Time.at(worker["payload"]["created_at"]),
:status => "In progress",
}
status = Sidekiq::Status.get_all worker["payload"]["jid"]
if status
exp[:status] = "In progress (#{status['pct_complete']}%)"
exp[:updated_at] = Time.at(status["update_time"].to_f)
end
@exports.push(exp)
end
end
# errored exports
dead_set = Sidekiq::DeadSet.new
dead_set.each do |job|
if job.klass == "ExportProjectWorker" and job.args[0].to_i == @project.id
@exports.push({
:id => job.jid,
:status => "Failed",
:updated_at => job.created_at,
:error_message => job.item["error_message"],
:error_class => job.item["error_class"],
})
end
end
render json: @exports.sort_by { |hsh| hsh[:updated_at] }.reverse, status: 200
end

# POST /projects/1/create_export
def create_export
job_id = ExportProjectWorker.perform_async(@project.id)
@job = { id: job_id }
if @job
render json: @job, status: 202
else
render status: 500
end
end

private
# Use callbacks to share common setup or constraints between actions.
def set_project
Expand Down
18 changes: 18 additions & 0 deletions app/helpers/download_helper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
module DownloadHelper
def self.download_to_file(uri)
begin
stream = URI.open(uri, :read_timeout => 10)
return stream if stream.respond_to?(:path) # Already file-like

# Workaround when open(uri) doesn't return File
Tempfile.new.tap do |file|
file.binmode
IO.copy_stream(stream, file)
stream.close
file.rewind
end
rescue Net::ReadTimeout
return 'failed'
end
end
end
132 changes: 132 additions & 0 deletions app/helpers/export_helper.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
module ExportHelper
def self.sanitize_filename(filename)
filename.gsub(/[\x00\/\\:\*\?\"<>\|]/, "_").strip
end

def self.get_path(link, current_depth)
document_id = link[:document_id]
highlight_uid = link[:highlight_uid]
# get a relative URL to a document by id, taking into account the current location
begin
document = Document.find(document_id)
filename = self.sanitize_filename(document.title).parameterize
path_segments = ["#{filename}.html"]
while document[:parent_type] != "Project"
# back out from the target document until we hit the project root
document = document.parent
path_segments.unshift(self.sanitize_filename(document[:title]).parameterize)
end
to_project_root = current_depth > 0 ? Array.new(current_depth, "..").join("/") + "/" : ""
path = to_project_root + path_segments.join("/")
if highlight_uid.present?
# append #highlight_uid to url if present in order to target the highlight
path = "#{path}#highlight-#{highlight_uid}"
end
return path
rescue ActiveRecord::RecordNotFound
return "#"
end
end

def self.get_svg_styles(obj)
# convert fabric object style properties to css
styles = [
"stroke: #{obj['stroke']};",
"fill: #{obj['fill'] || 'transparent'};",
"stroke-width: 3px;",
"stroke-linecap: #{obj['strokeLineCap'] || 'butt'};",
"stroke-dashoffset: #{obj['strokeDashOffset'] || '0'};",
"stroke-linejoin: #{obj['strokeLineJoin'] || 'miter'};",
"stroke-miterlimit: #{obj['strokeMiterLimit'] || '4'};",
"opacity: #{obj['opacity'] || '1'};",
"visibility: #{obj['visible'] ? 'visible' : 'hidden'};",
]
styles.push("stroke-dasharray: #{obj['strokeDashArray']};") if obj['strokeDashArray']
styles.join(" ")
end

def self.get_svg_path(paths)
# convert fabric object path property to svg path
path = ''
paths.each_with_index do |ls, i|
path += " " unless i == 0
if ls[0] == "C"
# C x1 y1, x2 y2, x y
path += "#{ls[0]} #{ls[1]} #{ls[2]}, #{ls[3]} #{ls[4]}, #{ls[5]} #{ls[6]}"
elsif ls[0] == "S" or ls[0] == "Q"
# S x2 y2, x y || Q x1 y1, x y
path += "#{ls[0]} #{ls[1]} #{ls[2]}, #{ls[3]} #{ls[4]}"
else
# M x y || L x y || T x y, etc
path += ls.join(" ")
end
end
path
end

def self.fabric_to_svg(highlights)
# convert image annotation highlights (fabric objects) to svgs
svgs = []
self.order_highlights(highlights, "canvas", nil).each do |uid, hl|
svg_hash = JSON.parse(hl[:target])
elm = "#{svg_hash['type']}"
if svg_hash["path"]
# path
elm += " d=\"#{self.get_svg_path(svg_hash['path'])}\""
elsif svg_hash["points"]
# polyline
elm += ' points="'
elm += svg_hash["points"].map { |pt| "#{pt['x']},#{pt['y']}" }.join(" ")
elm += '"'
elsif svg_hash["type"] == "circle"
# circle
elm += " r=\"#{svg_hash['radius']}\""
cx = svg_hash["left"]
cy = svg_hash["top"]
if svg_hash["originX"] == "left"
cx += svg_hash["radius"]
cy += svg_hash["radius"]
end
elm += " cx=\"#{cx}\" cy=\"#{cy}\""
elsif svg_hash["type"] == "rect"
# rect
elm += " x=\"#{svg_hash['left']}\" y=\"#{svg_hash['top']}\""
elm += " width=\"#{svg_hash['width']}\" height=\"#{svg_hash['height']}\""
end
# common styles
elm += " style=\"#{self.get_svg_styles(svg_hash)}\""
svg_elm = "<#{elm} vector-effect=\"non-scaling-stroke\" />"
# add link to highlight in footer
svgs.push("<a id=\"highlight-#{uid}\" href=\"##{uid}\">#{svg_elm}</a>")
end
return svgs
end

def self.order_highlights(highlights, document_kind, content_html)
if !content_html.present? and document_kind == "text"
highlights
elsif content_html.present?
# text type document: order by position in text
highlights.sort_by { |uid, hl| content_html.index(uid) || Float::INFINITY }
elsif document_kind == "canvas"
# canvas type document: order highlights by position on page (top to bottom, LTR)
highlights.sort_by { |uid, hl|
drawing = JSON.parse(hl[:target])
# divide page vertically into 50px blocks; consider all within 50px range to have equal y
y = (drawing["top"] / 50).floor()
# then sort by x, unmodified
x = drawing["left"]
[y, x]
}
end
end

def self.get_link_label(link)
label = link[:document_title]
if link[:excerpt] and link[:excerpt].length > 0
label = "<span>#{link[:title] ? link[:title] : link[:excerpt]}</span>"
label += " in <em>#{link[:document_title]}</em>"
end
label
end
end
21 changes: 2 additions & 19 deletions app/models/document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -148,32 +148,15 @@ def color
nil
end

def download_to_file(uri)
begin
stream = URI.open(uri, :read_timeout => 10)
return stream if stream.respond_to?(:path) # Already file-like

# Workaround when open(uri) doesn't return File
Tempfile.new.tap do |file|
file.binmode
IO.copy_stream(stream, file)
stream.close
file.rewind
end
rescue Net::ReadTimeout
return 'failed'
end
end

def add_thumbnail( image_url )
begin
# Try with PNG
opened = download_to_file(image_url)
opened = DownloadHelper.download_to_file(image_url)
rescue OpenURI::HTTPError
# Only JPG is required for IIIF level 1 compliance,
# so if we get back a 400 error, use JPG for thumbnail
with_jpg = image_url.sub('.png', '.jpg')
opened = download_to_file(with_jpg)
opened = DownloadHelper.download_to_file(with_jpg)
end
if opened != 'failed'
processed = ImageProcessing::MiniMagick.source(opened)
Expand Down
19 changes: 3 additions & 16 deletions app/models/highlight.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ class Highlight < Linkable

def links_to
all_links = self.highlights_links.sort_by{ |hll| hll.position }.map{ |hll| Link.where(:id => hll.link_id).first }
result = all_links.map { |link| self.to_link_obj(link) }.compact
result = all_links.map { |link| self.to_link_obj(link) unless link.nil? }.compact
result.each {|r|
if r[:highlight_id]
hl = Highlight.where(:id => r[:highlight_id]).first
Expand Down Expand Up @@ -55,19 +55,6 @@ def add_link_from_duplication(linked, original_id, position)
end
end

def download_to_file(uri)
stream = URI.open(uri)
return stream if stream.respond_to?(:path) # Already file-like

# Workaround when open(uri) doesn't return File
Tempfile.new.tap do |file|
file.binmode
IO.copy_stream(stream, file)
stream.close
file.rewind
end
end

def set_thumbnail( image_url, thumb_rect )
if !thumb_rect.nil?
pad_factor = 0.06
Expand Down Expand Up @@ -98,12 +85,12 @@ def set_thumbnail( image_url, thumb_rect )
else
begin
# Try with PNG
opened = download_to_file(image_url)
opened = DownloadHelper.download_to_file(image_url)
rescue OpenURI::HTTPError
# Only JPG is required for IIIF level 1 compliance,
# so if we get back a 400 error, use JPG for thumbnail
with_jpg = image_url.sub('.png', '.jpg')
opened = download_to_file(with_jpg)
opened = DownloadHelper.download_to_file(with_jpg)
end
io = ImageProcessing::MiniMagick.source(opened)
.resize_to_fill(80, 80)
Expand Down
1 change: 1 addition & 0 deletions app/models/project.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ class Project < ApplicationRecord
belongs_to :owner, class_name: 'User', optional: true
has_many :documents, as: :parent
has_many :document_folders, as: :parent
has_many_attached :exports
has_many :user_project_permissions, dependent: :destroy
has_many :users, through: :user_project_permissions

Expand Down
Loading