Module: Seo::HrefAnomalyNormalizer

Defined in:
app/services/seo/href_anomaly_normalizer.rb

Overview

Shared href fixes before URI parsing, +URI.join+, or +Addressable+ (blog sanitizer, RSS/feeds).

Constant Summary collapse

WY_AUTHORITY =

www + apex, production + staging TLDs

'(?:www\.)?warmlyyours\.(?:com|ws)'
WY_HOST_DOUBLE_SLASH =

Wy host double slash.

Regexp.new("\\A(https?://#{WY_AUTHORITY})//+", Regexp::IGNORECASE)
HTTP_WY_UPGRADE =

Http wy upgrade.

Regexp.new("\\Ahttp://(#{WY_AUTHORITY})(?=/|\\z|\\?|#)", Regexp::IGNORECASE)
LEGACY_EN_PREFIX =

Legacy en prefix.

%r{\A((?:https?://(?:www\.)?warmlyyours\.(?:com|ws))?)/en(?=/|$)}i

Class Method Summary collapse

Class Method Details

.collapse_double_slash_after_wy_host(href) ⇒ Object

Collapse duplicate slashes immediately after the WarmlyYours host (http(s) supported).



33
34
35
# File 'app/services/seo/href_anomaly_normalizer.rb', line 33

def self.collapse_double_slash_after_wy_host(href)
  href.to_s.gsub(WY_HOST_DOUBLE_SLASH, '\1/')
end

.prewash_wy_href_string(href) ⇒ Object

Shared first pass for blog sanitizer (+Seo::LinkSanitizer+) and feed absolutization (+Article+):
qualify schemeless WY hosts, collapse +//+ after host, fix mistaken +//segment+ paths.



26
27
28
29
30
# File 'app/services/seo/href_anomaly_normalizer.rb', line 26

def self.prewash_wy_href_string(href)
  h = qualify_schemeless_wy_url(href)
  h = collapse_double_slash_after_wy_host(h)
  rewrite_mistaken_protocol_relative_path(h)
end

.qualify_schemeless_wy_url(href) ⇒ Object

+//www.warmlyyours.com/foo+, +//warmlyyours.com/foo+, +www…+, +warmlyyours…+ → +https://www.warmlyyours…/foo+
Protocol-relative apex uses +www+ (consistent with bare-host rules below).



17
18
19
20
21
22
# File 'app/services/seo/href_anomaly_normalizer.rb', line 17

def self.qualify_schemeless_wy_url(href)
  href.to_s
      .sub(%r{\A//(?:www\.)?warmlyyours\.(com|ws)(?=/|\z)}i, 'https://www.warmlyyours.\1')
      .sub(%r{\A(www\.warmlyyours\.(?:com|ws))(?=/|\z)}i, 'https://\1')
      .sub(%r{\Awarmlyyours\.(com|ws)(?=/|\z)}i, 'https://www.warmlyyours.\1')
end

.rewrite_legacy_en_for_feed(href, locale_path) ⇒ Object

Feeds / absolute output: legacy +/en/+ → +/#locale_path+ (e.g. en-US).



59
60
61
62
63
64
# File 'app/services/seo/href_anomaly_normalizer.rb', line 59

def self.rewrite_legacy_en_for_feed(href, locale_path)
  normalized_locale = locale_path.to_s.strip.sub(%r{\A/+}, '')
  return href.to_s if normalized_locale.blank?

  href.to_s.sub(LEGACY_EN_PREFIX) { "#{Regexp.last_match(1)}/#{normalized_locale}" }
end

.rewrite_legacy_en_to_locale_template(href) ⇒ Object

Stored blog / admin template: legacy +/en/+ → +/{locale}+ (Liquid).



54
55
56
# File 'app/services/seo/href_anomaly_normalizer.rb', line 54

def self.rewrite_legacy_en_to_locale_template(href)
  href.to_s.sub(LEGACY_EN_PREFIX, '\1/{{locale}}')
end

.rewrite_mistaken_protocol_relative_path(href) ⇒ Object

Protocol-relative mistaken path: +//floor-heating/foo+ would parse as host "floor-heating".
Real third-party protocol-relative URLs have a dot in the authority (+//cdn.example/...+).



39
40
41
42
43
44
45
46
47
48
# File 'app/services/seo/href_anomaly_normalizer.rb', line 39

def self.rewrite_mistaken_protocol_relative_path(href)
  href = href.to_s
  return href unless href.start_with?('//') && !href.start_with?('///')

  authority = href.delete_prefix('//').split('/', 2).first.to_s
  return href unless authority.present? && authority.exclude?('.')

  tail = href.delete_prefix("//#{authority}")
  "/#{authority}#{tail}"
end

.upgrade_wy_http_to_https(href) ⇒ Object



66
67
68
# File 'app/services/seo/href_anomaly_normalizer.rb', line 66

def self.upgrade_wy_http_to_https(href)
  href.to_s.sub(HTTP_WY_UPGRADE, 'https://\1')
end