Class: Pdf::Toolkit

Inherits:
Object
  • Object
show all
Defined in:
app/services/pdf/toolkit.rb

Overview

Bounded, programmatic PDF operations backed by HexaPDF 1.9 — the engine
behind the Sunny pdf_tools service (Assistant::PdfToolBuilder).

Every method loads HexaPDF lazily (Loader.load!) and operates on a
local file path. Transform methods return a Result carrying the output PDF
+bytes+ plus a small descriptive +meta+ hash; Toolkit.inspect_pdf returns a
meta-only Result (+bytes+ nil); Toolkit.split returns an Array of Results.

This is deliberately a curated operation set — not raw HexaPDF — so it can
be driven safely by the LLM tool loop (no arbitrary code execution):

inspect_pdf — structure, AcroForm fields, text preview, image count
stamp — overlay text / cover (redact) / image / watermark on pages
fill_form — set AcroForm field values, optionally flatten
merge — concatenate several PDFs
split — explode into per-page (or per-range) PDFs
rotate — rotate selected pages by a multiple of 90°
select_pages — keep/reorder a subset (covers extract, delete, reorder)
compress — Ghostscript size reduction (delegates to Compressor)
generate — build a new branded PDF from a declarative layout

Coordinates use the PDF user-space convention: points (1/72") measured from
the page's lower-left corner, +x right, +y up.

Defined Under Namespace

Classes: Error, Result

Constant Summary collapse

NIMBUS =

Brand fonts (mirrors Base). Relative paths resolve from Rails root.

'data/fonts/NimbusSans.ttf'
NIMBUS_BOLD =
'data/fonts/NimbusSansBold.ttf'
SOFIA =

WarmlyYours website design-system fonts, converted (lossless woff2→sfnt) from
the self-hosted webfonts in public/fonts/ so generated PDFs match warmlyyours.com:
Sofia Pro is the site's primary sans (body), Orpheus Pro its serif display face.

'data/fonts/sofiapro/SofiaPro-Regular.ttf'
SOFIA_LIGHT =
'data/fonts/sofiapro/SofiaPro-Light.ttf'
SOFIA_BOLD =
'data/fonts/sofiapro/SofiaPro-Semibold.ttf'
ORPHEUS =
'data/fonts/orpheuspro/OrpheusPro-Regular.ttf'
ORPHEUS_BOLD =
'data/fonts/orpheuspro/OrpheusPro-Bold.ttf'
FONT_SPECS =

Logical font name → [hexapdf font spec, kwargs].

{
  'helvetica'      => ['Helvetica', {}],
  'helvetica_bold' => ['Helvetica', { variant: :bold }],
  'sofia'          => [SOFIA, {}],
  'sofia_bold'     => [SOFIA_BOLD, {}],
  'sofia_light'    => [SOFIA_LIGHT, {}],
  'orpheus'        => [ORPHEUS, {}],
  'orpheus_bold'   => [ORPHEUS_BOLD, {}],
  'nimbus'         => [NIMBUS, {}],
  'nimbus_bold'    => [NIMBUS_BOLD, {}]
}.freeze
LH_BURGUNDY =

WarmlyYours letterhead chrome (matches the approved cover-letter sample).
Contact strings default to the sample's presentation; CompanyConstants holds
the underlying data (PHONE[:usa], ADDRESS[:usa]). Override via the layout.

'922328'
LH_INK =

logo wordmark, tagline, H1, footer separators

'262626'
LH_RULE =

body text

'd9d2d0'
LH_TAGLINE =

header/footer hairline

'Modern Radiant Heating Solutions'
LH_PHONE =
'1 (800) 875-5285'
LH_ADDRESS =
'590 Telser Rd Suite B, Lake Zurich, IL, 60047'
LH_WEBSITE =
'www.WarmlyYours.com'
LH_SIDE =

Page geometry in PostScript points (US Letter). Margins clear the header/footer
bands so flowing content never collides with the chrome drawn in the post-pass.

72
LH_TOP_MARGIN =
128
LH_BOTTOM_MARGIN =
100
MAX_TEXT_PREVIEW_CHARS =

Per-page text preview cap (chars) returned by inspect_pdf.

1_500
MAX_TEXT_PREVIEW_PAGES =

Max pages scanned for text preview.

10

Class Method Summary collapse

Class Method Details

.compress(path, level: '/printer') ⇒ Result

Reduce file size via Ghostscript. No-op (returns input) when gs is absent.

Parameters:

Returns:



300
301
302
303
304
305
306
307
308
# File 'app/services/pdf/toolkit.rb', line 300

def compress(path, level: '/printer')
  blob = File.binread(path)
  res  = Pdf::Compressor.new(input_blob: blob, pdf_setting: level).compress
  out  = res.output_blob || blob

  Result.new(bytes: out,
             meta: { status: res.result.to_s, original_bytes: blob.bytesize, new_bytes: out.bytesize,
                     saved_bytes: [blob.bytesize - out.bytesize, 0].max })
end

.fill_form(path, values:, flatten: false) ⇒ Result

Set AcroForm field values by full field name.

Parameters:

  • path (String)
  • values (Hash{String=>Object})

    field name → value

  • flatten (Boolean) (defaults to: false)

    bake values into page content (no longer editable)

Returns:



163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
# File 'app/services/pdf/toolkit.rb', line 163

def fill_form(path, values:, flatten: false)
  Pdf::Loader.load!
  raise Error, 'values must be a non-empty map of field_name => value' unless values.is_a?(Hash) && values.any?

  doc  = HexaPDF::Document.open(path)
  form = doc.acro_form
  raise Error, 'This PDF has no fillable form fields (no AcroForm). Use stamp to overlay text instead.' unless form

  applied = []
  unknown = []
  failed  = []
  values.each do |name, val|
    field = form.field_by_name(name.to_s)
    if field.nil?
      unknown << name.to_s
      next
    end
    begin
      field.field_value = coerce_field_value(field, val)
      applied << name.to_s
    rescue StandardError => e
      failed << { field: name.to_s, error: e.message }
    end
  end

  form.create_appearances
  form.flatten if flatten

  Result.new(bytes: write(doc),
             meta: { applied: applied, unknown_fields: unknown, failed: failed,
                     flattened: !!flatten, pages: doc.pages.count })
rescue HexaPDF::Error => e
  raise Error, "Fill form failed: #{e.message}"
end

.generate(layout:) ⇒ Result

Build a new branded PDF from a declarative layout.
{ title:, subtitle:, logo: true, page_size: "Letter", orientation: "portrait",
blocks: [ { type: "heading"|"paragraph"|"bullets"|"spacer", ... } ] }

Parameters:

  • layout (Hash)

Returns:



317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# File 'app/services/pdf/toolkit.rb', line 317

def generate(layout:)
  Pdf::Loader.load!
  layout = symbolize(layout)
  return generate_letterhead(layout) if layout[:template].to_s == 'letterhead'

  composer = HexaPDF::Composer.new(
    page_size:        (layout[:page_size] || 'Letter').to_sym,
    page_orientation: (layout[:orientation] || 'portrait').to_sym,
    margin:           [54, 54, 54, 54]
  )
  configure_brand(composer.document)
  apply_generate_styles(composer)

  if layout.fetch(:logo, true) && File.exist?(Pdf::Config::LOGO_PATH)
    composer.image(Pdf::Config::LOGO_PATH, width: 180)
    composer.text(' ', font_size: 8)
  end
  composer.text(layout[:title].to_s, style: :gen_title)       if present?(layout[:title])
  composer.text(layout[:subtitle].to_s, style: :gen_subtitle) if present?(layout[:subtitle])
  composer.text(' ', font_size: 6)

  Array(layout[:blocks]).each { |b| render_block(composer, symbolize(b)) }

  Result.new(bytes: write_composer(composer), meta: { pages: composer.document.pages.count })
rescue HexaPDF::Error => e
  raise Error, "Generate failed: #{e.message}"
end

.inspect_pdf(path) ⇒ Result

Read structure and content of a PDF without modifying it.

Parameters:

  • path (String)

    local file path

Returns:

  • (Result)

    meta-only (bytes nil)



91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'app/services/pdf/toolkit.rb', line 91

def inspect_pdf(path)
  Pdf::Loader.load!
  doc  = HexaPDF::Document.open(path)
  info = doc.trailer.info

  sizes = doc.pages.map do |pg|
    mb = pg.box(:media)
    { width: mb.width.round(1), height: mb.height.round(1), rotation: (pg[:Rotate] || 0).to_i }
  end

  form   = doc.acro_form
  fields = []
  form&.each_field do |f|
    fields << {
      name:  f.full_field_name,
      type:  (f.concrete_field_type || f.field_type).to_s,
      value: stringify(f.field_value)
    }
  end

  Result.new(meta: {
    pages:        doc.pages.count,
    encrypted:    doc.encrypted?,
    page_sizes:   sizes,
    has_acroform: !form.nil?,
    field_count:  fields.size,
    fields:       fields,
    image_count:  image_count(doc),
    title:        present_str(info[:Title]),
    author:       present_str(info[:Author]),
    producer:     present_str(info[:Producer]),
    text_preview: text_preview(path)
  })
rescue HexaPDF::Error => e
  raise Error, "Could not read PDF: #{e.message}"
end

.merge(paths:) ⇒ Result

Concatenate several PDFs into one, in the order given.

Parameters:

  • paths (Array<String>)

Returns:



203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
# File 'app/services/pdf/toolkit.rb', line 203

def merge(paths:)
  Pdf::Loader.load!
  raise Error, 'merge needs at least one source PDF' if Array(paths).empty?

  target = HexaPDF::Document.new
  total  = 0
  Array(paths).each do |p|
    src = HexaPDF::Document.open(p)
    src.pages.each do |pg|
      target.pages << target.import(pg)
      total += 1
    end
  end
  raise Error, 'no pages found across sources' if total.zero?

  Result.new(bytes: write(target), meta: { pages: total, sources: Array(paths).size })
rescue HexaPDF::Error => e
  raise Error, "Merge failed: #{e.message}"
end

.rotate(path, degrees:, pages: :all) ⇒ Result

Rotate selected pages by a multiple of 90° (clockwise).

Parameters:

  • path (String)
  • degrees (Integer)
  • pages (:all, Integer, Array<Integer>) (defaults to: :all)

Returns:



256
257
258
259
260
261
262
263
264
265
266
267
268
# File 'app/services/pdf/toolkit.rb', line 256

def rotate(path, degrees:, pages: :all)
  Pdf::Loader.load!
  norm = Integer(degrees) % 360
  raise Error, 'degrees must be a multiple of 90' unless (norm % 90).zero?

  doc     = HexaPDF::Document.open(path)
  targets = resolve_pages(doc, pages)
  targets.each { |pg| pg[:Rotate] = ((pg[:Rotate] || 0).to_i + norm) % 360 }

  Result.new(bytes: write(doc), meta: { pages: doc.pages.count, rotated: targets.size, degrees: norm })
rescue HexaPDF::Error => e
  raise Error, "Rotate failed: #{e.message}"
end

.select_pages(path, pages:) ⇒ Result

Build a new PDF from an ordered list of page numbers to keep. Covers
extraction (a subset), deletion (omit pages), and reordering (permute).

Parameters:

  • path (String)
  • pages (Array<Integer>)

    ordered, 1-based

Returns:



277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# File 'app/services/pdf/toolkit.rb', line 277

def select_pages(path, pages:)
  Pdf::Loader.load!
  src   = HexaPDF::Document.open(path)
  count = src.pages.count
  order = Array(pages).map { |i| Integer(i) }
  raise Error, 'pages must be a non-empty list of page numbers' if order.empty?

  order.each { |i| raise Error, "page #{i} out of range (1-#{count})" if i < 1 || i > count }

  target = HexaPDF::Document.new
  order.each { |i| target.pages << target.import(src.pages[i - 1]) }

  Result.new(bytes: write(target), meta: { pages: order.size, kept: order })
rescue HexaPDF::Error => e
  raise Error, "Select pages failed: #{e.message}"
end

.split(path, ranges: nil) ⇒ Array<Result>

Explode a PDF into multiple documents. With no +ranges+, yields one
document per page. +ranges+ is an array of [start, end] (1-based, inclusive).

Parameters:

  • path (String)
  • ranges (Array<Array(Integer,Integer)>, nil) (defaults to: nil)

Returns:



230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
# File 'app/services/pdf/toolkit.rb', line 230

def split(path, ranges: nil)
  Pdf::Loader.load!
  src   = HexaPDF::Document.open(path)
  count = src.pages.count
  specs = ranges.presence || (1..count).map { |i| [i, i] }

  specs.map do |(a, b)|
    a = Integer(a)
    b = Integer(b || a)
    raise Error, "range #{a}-#{b} out of bounds (document has #{count} pages)" if a < 1 || b > count || a > b

    target = HexaPDF::Document.new
    (a..b).each { |i| target.pages << target.import(src.pages[i - 1]) }
    Result.new(bytes: write(target), meta: { pages: b - a + 1, range: [a, b] })
  end
rescue HexaPDF::Error => e
  raise Error, "Split failed: #{e.message}"
end

.stamp(path, operations:, pages: :all) ⇒ Result

Draw overlay operations on top of existing page content. Each operation
is a Hash with a +:type+ of "text", "cover", "image", or "watermark".
Operations are applied in order to every target page.

Parameters:

  • path (String)
  • operations (Array<Hash>)

    ordered overlay ops

  • pages (:all, Integer, Array<Integer>) (defaults to: :all)

    1-based target pages

Returns:



138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
# File 'app/services/pdf/toolkit.rb', line 138

def stamp(path, operations:, pages: :all)
  Pdf::Loader.load!
  raise Error, 'operations must be a non-empty array' unless operations.is_a?(Array) && operations.any?

  doc     = HexaPDF::Document.open(path)
  targets = resolve_pages(doc, pages)
  targets.each do |page|
    canvas = page.canvas(type: :overlay)
    mb     = page.box(:media)
    operations.each { |op| apply_op(doc, canvas, mb, symbolize(op)) }
  end

  Result.new(bytes: write(doc),
             meta: { pages: doc.pages.count, operations: operations.size, target_pages: targets.size })
rescue HexaPDF::Error => e
  raise Error, "Stamp failed: #{e.message}"
end