Module: Seo::HrefAnomalyNormalizer

Defined in:
app/services/seo/href_anomaly_normalizer.rb

Overview

Shared href fixes before URI parsing, +URI.join+, or +Addressable+ (blog sanitizer, RSS/feeds).

Constant Summary collapse

WY_AUTHORITY =

www + apex, production + staging TLDs

'(?:www\.)?warmlyyours\.(?:com|ws)'
WY_HOST_DOUBLE_SLASH =
Regexp.new("\\A(https?://#{WY_AUTHORITY})//+", Regexp::IGNORECASE)
HTTP_WY_UPGRADE =
Regexp.new("\\Ahttp://(#{WY_AUTHORITY})(?=/|\\z|\\?|#)", Regexp::IGNORECASE)
LEGACY_EN_PREFIX =
%r{\A((?:https?://(?:www\.)?warmlyyours\.(?:com|ws))?)/en(?=/|$)}i

Class Method Summary collapse

Class Method Details

.collapse_double_slash_after_wy_host(href) ⇒ Object

Collapse duplicate slashes immediately after the WarmlyYours host (http(s) supported).



31
32
33
# File 'app/services/seo/href_anomaly_normalizer.rb', line 31

def self.collapse_double_slash_after_wy_host(href)
  href.to_s.gsub(WY_HOST_DOUBLE_SLASH, '\1/')
end

.prewash_wy_href_string(href) ⇒ Object

Shared first pass for blog sanitizer (+Seo::LinkSanitizer+) and feed absolutization (+Article+):
qualify schemeless WY hosts, collapse +//+ after host, fix mistaken +//segment+ paths.



24
25
26
27
28
# File 'app/services/seo/href_anomaly_normalizer.rb', line 24

def self.prewash_wy_href_string(href)
  h = qualify_schemeless_wy_url(href)
  h = collapse_double_slash_after_wy_host(h)
  rewrite_mistaken_protocol_relative_path(h)
end

.qualify_schemeless_wy_url(href) ⇒ Object

+//www.warmlyyours.com/foo+, +//warmlyyours.com/foo+, +www…+, +warmlyyours…+ → +https://www.warmlyyours…/foo+
Protocol-relative apex uses +www+ (consistent with bare-host rules below).



15
16
17
18
19
20
# File 'app/services/seo/href_anomaly_normalizer.rb', line 15

def self.qualify_schemeless_wy_url(href)
  href.to_s
      .sub(%r{\A//(?:www\.)?warmlyyours\.(com|ws)(?=/|\z)}i, 'https://www.warmlyyours.\1')
      .sub(%r{\A(www\.warmlyyours\.(?:com|ws))(?=/|\z)}i, 'https://\1')
      .sub(%r{\Awarmlyyours\.(com|ws)(?=/|\z)}i, 'https://www.warmlyyours.\1')
end

.rewrite_legacy_en_for_feed(href, locale_path) ⇒ Object

Feeds / absolute output: legacy +/en/+ → +/#locale_path+ (e.g. en-US).



56
57
58
59
60
61
# File 'app/services/seo/href_anomaly_normalizer.rb', line 56

def self.rewrite_legacy_en_for_feed(href, locale_path)
  normalized_locale = locale_path.to_s.strip.sub(%r{\A/+}, '')
  return href.to_s if normalized_locale.blank?

  href.to_s.sub(LEGACY_EN_PREFIX) { "#{Regexp.last_match(1)}/#{normalized_locale}" }
end

.rewrite_legacy_en_to_locale_template(href) ⇒ Object

Stored blog / admin template: legacy +/en/+ → +/{locale}+ (Liquid).



51
52
53
# File 'app/services/seo/href_anomaly_normalizer.rb', line 51

def self.rewrite_legacy_en_to_locale_template(href)
  href.to_s.sub(LEGACY_EN_PREFIX, '\1/{{locale}}')
end

.rewrite_mistaken_protocol_relative_path(href) ⇒ Object

Protocol-relative mistaken path: +//floor-heating/foo+ would parse as host "floor-heating".
Real third-party protocol-relative URLs have a dot in the authority (+//cdn.example/...+).



37
38
39
40
41
42
43
44
45
46
# File 'app/services/seo/href_anomaly_normalizer.rb', line 37

def self.rewrite_mistaken_protocol_relative_path(href)
  href = href.to_s
  return href unless href.start_with?('//') && !href.start_with?('///')

  authority = href.delete_prefix('//').split('/', 2).first.to_s
  return href unless authority.present? && authority.exclude?('.')

  tail = href.delete_prefix("//#{authority}")
  "/#{authority}#{tail}"
end

.upgrade_wy_http_to_https(href) ⇒ Object



63
64
65
# File 'app/services/seo/href_anomaly_normalizer.rb', line 63

def self.upgrade_wy_http_to_https(href)
  href.to_s.sub(HTTP_WY_UPGRADE, 'https://\1')
end