Module: Heatwave::TypographicQuotes
- Defined in:
- app/lib/heatwave/typographic_quotes.rb
Overview
Convert straight ASCII quotes (", ') to curly typographic quotes
(“ ” ‘ ’) per https://typographyforlawyers.com/straight-and-curly-quotes.html.
Used by save-time hooks (Heatwave::Normalizers#html_scrubber,
Seo::HtmlContentSanitizer) to keep storage in line with the project's
curly-quote convention for user-editable prose, and by render-time
safety nets (Oembed::ProductProvider.curl_typographic_quotes) for
legacy data that hasn't been re-saved yet.
The straight → curly pairing is positional, not semantic: opening quote
follows whitespace, opening bracket, or start of string; otherwise
closing. A " at the very start of a text node after an inline tag
(e.g. the closing quote in <p>"<em>good</em>"</p>) is mis-curled as
an opening — an acceptable edge case since curly-either-way is still
JSON-safe; render-time sanitize_schema_text runs the curl on the
full stripped string so the bug-trigger cases pair correctly there.
Inside <code>, <pre>, <kbd>, <samp>, <tt>, <var>,
<script>, <style>, <title>, text is left untouched — those tags
carry literal content where curly substitution would corrupt meaning.
Constant Summary collapse
- SKIP_TAG_NAMES =
Tags whose text content must never be curled.
%w[code pre kbd samp tt var script style title].freeze
- OPEN_DOUBLE =
/(\A|[\s(\[{])"/- OPEN_SINGLE =
/(\A|[\s(\[{])'/
Class Method Summary collapse
-
.curl_doc_text_nodes!(doc) ⇒ void
Curl text nodes inside a Nokogiri document in place.
-
.curl_html(html) ⇒ String?
Walk text nodes in an HTML string and curl straight quotes inside, skipping
<code>/<pre>/etc. -
.curl_plain(text) ⇒ String?
Curl straight quotes in a plain string (no HTML awareness).
-
.skip_ancestor?(node) ⇒ Boolean
Walk up from
nodechecking whether any ancestor is inSKIP_TAG_NAMES.
Class Method Details
.curl_doc_text_nodes!(doc) ⇒ void
This method returns an undefined value.
Curl text nodes inside a Nokogiri document in place.
65 66 67 68 69 70 71 72 73 |
# File 'app/lib/heatwave/typographic_quotes.rb', line 65 def self.curl_doc_text_nodes!(doc) doc.traverse do |node| next unless node.text? next if skip_ancestor?(node) curled = curl_plain(node.content) node.content = curled if curled != node.content end end |
.curl_html(html) ⇒ String?
Walk text nodes in an HTML string and curl straight quotes inside,
skipping <code>/<pre>/etc. blocks and any attribute values
(attributes aren't text nodes, so they're naturally excluded).
52 53 54 55 56 57 58 59 |
# File 'app/lib/heatwave/typographic_quotes.rb', line 52 def self.curl_html(html) return html if html.blank? return html unless html.include?('"') || html.include?("'") doc = Nokogiri::HTML5.fragment(html) curl_doc_text_nodes!(doc) doc.to_html end |
.curl_plain(text) ⇒ String?
Curl straight quotes in a plain string (no HTML awareness).
35 36 37 38 39 40 41 42 43 |
# File 'app/lib/heatwave/typographic_quotes.rb', line 35 def self.curl_plain(text) return text if text.blank? text .gsub(OPEN_DOUBLE, '\1“') .tr('"', '”') .gsub(OPEN_SINGLE, '\1‘') .tr("'", '’') end |
.skip_ancestor?(node) ⇒ Boolean
Walk up from node checking whether any ancestor is in
SKIP_TAG_NAMES. Elements named in that list opt their entire
subtree out of curl substitution. Stops at the document root —
Nokogiri::HTML4::Document does not respond to #parent.
83 84 85 86 87 88 89 90 91 |
# File 'app/lib/heatwave/typographic_quotes.rb', line 83 def self.skip_ancestor?(node) ancestor = node.parent while ancestor.respond_to?(:element?) && ancestor.element? return true if SKIP_TAG_NAMES.include?(ancestor.name) ancestor = ancestor.parent end false end |