Module: Heatwave::TypographicQuotes

Defined in:: app/lib/heatwave/typographic_quotes.rb

Overview

Convert straight ASCII quotes (", ') to curly typographic quotes
(“ ” ‘ ’) per https://typographyforlawyers.com/straight-and-curly-quotes.html.

Used by save-time hooks (Heatwave::Normalizers#html_scrubber,
Seo::HtmlContentSanitizer) to keep storage in line with the project's
curly-quote convention for user-editable prose, and by render-time
safety nets (Oembed::ProductProvider.curl_typographic_quotes) for
legacy data that hasn't been re-saved yet.

The straight → curly pairing is positional, not semantic: opening quote
follows whitespace, opening bracket, or start of string; otherwise
closing. A " at the very start of a text node after an inline tag
(e.g. the closing quote in <p>"<em>good</em>"</p>) is mis-curled as
an opening — an acceptable edge case since curly-either-way is still
JSON-safe; render-time sanitize_schema_text runs the curl on the
full stripped string so the bug-trigger cases pair correctly there.

Inside <code>, <pre>, <kbd>, <samp>, <tt>, <var>,
<script>, <style>, <title>, text is left untouched — those tags
carry literal content where curly substitution would corrupt meaning.

Constant Summary collapse

SKIP_TAG_NAMES = Tags whose text content must never be curled.

%w[code pre kbd samp tt var script style title].freeze

OPEN_DOUBLE =

/(\A|[\s(\[{])"/

OPEN_SINGLE =

/(\A|[\s(\[{])'/

Class Method Summary collapse

.curl_doc_text_nodes!(doc) ⇒ void
Curl text nodes inside a Nokogiri document in place.
.curl_html(html) ⇒ String^?
Walk text nodes in an HTML string and curl straight quotes inside, skipping <code>/<pre>/etc.
.curl_plain(text) ⇒ String^?
Curl straight quotes in a plain string (no HTML awareness).
.skip_ancestor?(node) ⇒ Boolean
Walk up from node checking whether any ancestor is in SKIP_TAG_NAMES.

Class Method Details

.curl_doc_text_nodes!(doc) ⇒ `void`

This method returns an undefined value.

Curl text nodes inside a Nokogiri document in place.

Parameters:

doc (Nokogiri::XML::Node) —
document fragment or element

# File 'app/lib/heatwave/typographic_quotes.rb', line 65

def self.curl_doc_text_nodes!(doc)
  doc.traverse do |node|
    next unless node.text?
    next if skip_ancestor?(node)

    curled = curl_plain(node.content)
    node.content = curled if curled != node.content
  end
end

.curl_html(html) ⇒ `String`^?

Walk text nodes in an HTML string and curl straight quotes inside,
skipping <code>/<pre>/etc. blocks and any attribute values
(attributes aren't text nodes, so they're naturally excluded).

Parameters:

html (String, nil)

Returns:

(String, nil) —
curled HTML, or input unchanged when blank or
when no straight quotes are present

# File 'app/lib/heatwave/typographic_quotes.rb', line 52

def self.curl_html(html)
  return html if html.blank?
  return html unless html.include?('"') || html.include?("'")

  doc = Nokogiri::HTML5.fragment(html)
  curl_doc_text_nodes!(doc)
  doc.to_html
end

.curl_plain(text) ⇒ `String`^?

Curl straight quotes in a plain string (no HTML awareness).

Parameters:

text (String, nil)

Returns:

(String, nil) —
curled text, or input unchanged when blank

# File 'app/lib/heatwave/typographic_quotes.rb', line 35

def self.curl_plain(text)
  return text if text.blank?

  text
    .gsub(OPEN_DOUBLE, '\1“')
    .tr('"', '”')
    .gsub(OPEN_SINGLE, '\1‘')
    .tr("'", '’')
end

.skip_ancestor?(node) ⇒ `Boolean`

Walk up from node checking whether any ancestor is in
SKIP_TAG_NAMES. Elements named in that list opt their entire
subtree out of curl substitution. Stops at the document root —
Nokogiri::HTML4::Document does not respond to #parent.

Parameters:

node (Nokogiri::XML::Node) —
Text node whose ancestors to check.

Returns:

(Boolean) —
true when any ancestor element name matches
SKIP_TAG_NAMES; false otherwise.

# File 'app/lib/heatwave/typographic_quotes.rb', line 83

def self.skip_ancestor?(node)
  ancestor = node.parent
  while ancestor.respond_to?(:element?) && ancestor.element?
    return true if SKIP_TAG_NAMES.include?(ancestor.name)

    ancestor = ancestor.parent
  end
  false
end

Module: Heatwave::TypographicQuotes

Overview

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.curl_doc_text_nodes!(doc) ⇒ void

.curl_html(html) ⇒ String?

.curl_plain(text) ⇒ String?

.skip_ancestor?(node) ⇒ Boolean

.curl_doc_text_nodes!(doc) ⇒ `void`

.curl_html(html) ⇒ `String`^?

.curl_plain(text) ⇒ `String`^?

.skip_ancestor?(node) ⇒ `Boolean`