Class: Pdf::Utility::ImageExtractor
- Inherits:
-
Object
- Object
- Pdf::Utility::ImageExtractor
- Defined in:
- app/services/pdf/utility/image_extractor.rb
Overview
Extracts images from PDF publications for Vision AI analysis.
Uses HexaPDF to extract embedded images from PDF files.
Filters out small images (icons, bullets) and keeps only significant images
like diagrams, photos, and illustrations.
Defined Under Namespace
Classes: Result
Constant Summary collapse
- MIN_WIDTH =
Minimum dimensions for an image to be considered significant
100- MIN_HEIGHT =
100- MIN_FILE_SIZE =
Minimum file size in bytes (skip tiny images like spacers)
5_000- MAX_IMAGES =
Maximum images to extract per PDF (avoid processing massive documents)
20
Instance Method Summary collapse
-
#extract(item) ⇒ Result
Result with extracted image paths.
Instance Method Details
#extract(item) ⇒ Result
Returns Result with extracted image paths.
31 32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'app/services/pdf/utility/image_extractor.rb', line 31 def extract(item) return Result.new(success?: false, images: [], error: 'Not a publication') unless item.is_publication? return Result.new(success?: false, images: [], error: 'No literature attached') unless item.literature&. return Result.new(success?: false, images: [], error: 'File not found') unless File.exist?(pdf_path(item)) return Result.new(success?: false, images: [], error: 'Not a PDF') unless pdf_file?(item) images = extract_images_from_pdf(pdf_path(item), item) Result.new(success?: true, images: images, error: nil) rescue StandardError => e Rails.logger.error "[Pdf::Utility::ImageExtractor] Error extracting images: #{e.}" ErrorReporting.error(e) Result.new(success?: false, images: [], error: e.) end |