Module: Heatwave::TypographicQuotes

Defined in:
app/lib/heatwave/typographic_quotes.rb

Overview

Convert straight ASCII quotes (", ') to curly typographic quotes
(“ ” ‘ ’) per https://typographyforlawyers.com/straight-and-curly-quotes.html.

Used by save-time hooks (Heatwave::Normalizers#html_scrubber,
Seo::HtmlContentSanitizer) to keep storage in line with the project's
curly-quote convention for user-editable prose, and by render-time
safety nets (Oembed::ProductProvider.curl_typographic_quotes) for
legacy data that hasn't been re-saved yet.

The straight → curly pairing is positional, not semantic: opening quote
follows whitespace, opening bracket, or start of string; otherwise
closing. A " at the very start of a text node after an inline tag
(e.g. the closing quote in <p>"<em>good</em>"</p>) is mis-curled as
an opening — an acceptable edge case since curly-either-way is still
JSON-safe; render-time sanitize_schema_text runs the curl on the
full stripped string so the bug-trigger cases pair correctly there.

Inside <code>, <pre>, <kbd>, <samp>, <tt>, <var>,
<script>, <style>, <title>, text is left untouched — those tags
carry literal content where curly substitution would corrupt meaning.

Constant Summary collapse

SKIP_TAG_NAMES =

Tags whose text content must never be curled.

%w[code pre kbd samp tt var script style title].freeze
OPEN_DOUBLE =
/(\A|[\s(\[{])"/
OPEN_SINGLE =
/(\A|[\s(\[{])'/

Class Method Summary collapse

Class Method Details

.curl_doc_text_nodes!(doc) ⇒ void

This method returns an undefined value.

Curl text nodes inside a Nokogiri document in place.

Parameters:

  • doc (Nokogiri::XML::Node)

    document fragment or element



65
66
67
68
69
70
71
72
73
# File 'app/lib/heatwave/typographic_quotes.rb', line 65

def self.curl_doc_text_nodes!(doc)
  doc.traverse do |node|
    next unless node.text?
    next if skip_ancestor?(node)

    curled = curl_plain(node.content)
    node.content = curled if curled != node.content
  end
end

.curl_html(html) ⇒ String?

Walk text nodes in an HTML string and curl straight quotes inside,
skipping <code>/<pre>/etc. blocks and any attribute values
(attributes aren't text nodes, so they're naturally excluded).

Parameters:

  • html (String, nil)

Returns:

  • (String, nil)

    curled HTML, or input unchanged when blank or
    when no straight quotes are present



52
53
54
55
56
57
58
59
# File 'app/lib/heatwave/typographic_quotes.rb', line 52

def self.curl_html(html)
  return html if html.blank?
  return html unless html.include?('"') || html.include?("'")

  doc = Nokogiri::HTML5.fragment(html)
  curl_doc_text_nodes!(doc)
  doc.to_html
end

.curl_plain(text) ⇒ String?

Curl straight quotes in a plain string (no HTML awareness).

Parameters:

  • text (String, nil)

Returns:

  • (String, nil)

    curled text, or input unchanged when blank



35
36
37
38
39
40
41
42
43
# File 'app/lib/heatwave/typographic_quotes.rb', line 35

def self.curl_plain(text)
  return text if text.blank?

  text
    .gsub(OPEN_DOUBLE, '\1“')
    .tr('"', '')
    .gsub(OPEN_SINGLE, '\1‘')
    .tr("'", '')
end

.skip_ancestor?(node) ⇒ Boolean

Walk up from node checking whether any ancestor is in
SKIP_TAG_NAMES. Elements named in that list opt their entire
subtree out of curl substitution. Stops at the document root —
Nokogiri::HTML4::Document does not respond to #parent.

Parameters:

  • node (Nokogiri::XML::Node)

    Text node whose ancestors to check.

Returns:

  • (Boolean)

    true when any ancestor element name matches
    SKIP_TAG_NAMES; false otherwise.



83
84
85
86
87
88
89
90
91
# File 'app/lib/heatwave/typographic_quotes.rb', line 83

def self.skip_ancestor?(node)
  ancestor = node.parent
  while ancestor.respond_to?(:element?) && ancestor.element?
    return true if SKIP_TAG_NAMES.include?(ancestor.name)

    ancestor = ancestor.parent
  end
  false
end