Module: Seo::HrefAnomalyNormalizer
- Defined in:
- app/services/seo/href_anomaly_normalizer.rb
Overview
Shared href fixes before URI parsing, +URI.join+, or +Addressable+ (blog sanitizer, RSS/feeds).
Constant Summary collapse
- WY_AUTHORITY =
www + apex, production + staging TLDs
'(?:www\.)?warmlyyours\.(?:com|ws)'- WY_HOST_DOUBLE_SLASH =
Regexp.new("\\A(https?://#{WY_AUTHORITY})//+", Regexp::IGNORECASE)
- HTTP_WY_UPGRADE =
Regexp.new("\\Ahttp://(#{WY_AUTHORITY})(?=/|\\z|\\?|#)", Regexp::IGNORECASE)
- LEGACY_EN_PREFIX =
%r{\A((?:https?://(?:www\.)?warmlyyours\.(?:com|ws))?)/en(?=/|$)}i
Class Method Summary collapse
-
.collapse_double_slash_after_wy_host(href) ⇒ Object
Collapse duplicate slashes immediately after the WarmlyYours host (http(s) supported).
-
.prewash_wy_href_string(href) ⇒ Object
Shared first pass for blog sanitizer (+Seo::LinkSanitizer+) and feed absolutization (+Article+): qualify schemeless WY hosts, collapse +//+ after host, fix mistaken +//segment+ paths.
-
.qualify_schemeless_wy_url(href) ⇒ Object
+//www.warmlyyours.com/foo+, +//warmlyyours.com/foo+, +www…+, +warmlyyours…+ → +https://www.warmlyyours…/foo+ Protocol-relative apex uses +www+ (consistent with bare-host rules below).
-
.rewrite_legacy_en_for_feed(href, locale_path) ⇒ Object
Feeds / absolute output: legacy +/en/+ → +/#locale_path+ (e.g. en-US).
-
.rewrite_legacy_en_to_locale_template(href) ⇒ Object
Stored blog / admin template: legacy +/en/+ → +/{locale}+ (Liquid).
-
.rewrite_mistaken_protocol_relative_path(href) ⇒ Object
Protocol-relative mistaken path: +//floor-heating/foo+ would parse as host "floor-heating".
- .upgrade_wy_http_to_https(href) ⇒ Object
Class Method Details
.collapse_double_slash_after_wy_host(href) ⇒ Object
Collapse duplicate slashes immediately after the WarmlyYours host (http(s) supported).
31 32 33 |
# File 'app/services/seo/href_anomaly_normalizer.rb', line 31 def self.collapse_double_slash_after_wy_host(href) href.to_s.gsub(WY_HOST_DOUBLE_SLASH, '\1/') end |
.prewash_wy_href_string(href) ⇒ Object
Shared first pass for blog sanitizer (+Seo::LinkSanitizer+) and feed absolutization (+Article+):
qualify schemeless WY hosts, collapse +//+ after host, fix mistaken +//segment+ paths.
24 25 26 27 28 |
# File 'app/services/seo/href_anomaly_normalizer.rb', line 24 def self.prewash_wy_href_string(href) h = qualify_schemeless_wy_url(href) h = collapse_double_slash_after_wy_host(h) rewrite_mistaken_protocol_relative_path(h) end |
.qualify_schemeless_wy_url(href) ⇒ Object
+//www.warmlyyours.com/foo+, +//warmlyyours.com/foo+, +www…+, +warmlyyours…+ → +https://www.warmlyyours…/foo+
Protocol-relative apex uses +www+ (consistent with bare-host rules below).
15 16 17 18 19 20 |
# File 'app/services/seo/href_anomaly_normalizer.rb', line 15 def self.qualify_schemeless_wy_url(href) href.to_s .sub(%r{\A//(?:www\.)?warmlyyours\.(com|ws)(?=/|\z)}i, 'https://www.warmlyyours.\1') .sub(%r{\A(www\.warmlyyours\.(?:com|ws))(?=/|\z)}i, 'https://\1') .sub(%r{\Awarmlyyours\.(com|ws)(?=/|\z)}i, 'https://www.warmlyyours.\1') end |
.rewrite_legacy_en_for_feed(href, locale_path) ⇒ Object
Feeds / absolute output: legacy +/en/+ → +/#locale_path+ (e.g. en-US).
56 57 58 59 60 61 |
# File 'app/services/seo/href_anomaly_normalizer.rb', line 56 def self.rewrite_legacy_en_for_feed(href, locale_path) normalized_locale = locale_path.to_s.strip.sub(%r{\A/+}, '') return href.to_s if normalized_locale.blank? href.to_s.sub(LEGACY_EN_PREFIX) { "#{Regexp.last_match(1)}/#{normalized_locale}" } end |
.rewrite_legacy_en_to_locale_template(href) ⇒ Object
Stored blog / admin template: legacy +/en/+ → +/{locale}+ (Liquid).
51 52 53 |
# File 'app/services/seo/href_anomaly_normalizer.rb', line 51 def self.rewrite_legacy_en_to_locale_template(href) href.to_s.sub(LEGACY_EN_PREFIX, '\1/{{locale}}') end |
.rewrite_mistaken_protocol_relative_path(href) ⇒ Object
Protocol-relative mistaken path: +//floor-heating/foo+ would parse as host "floor-heating".
Real third-party protocol-relative URLs have a dot in the authority (+//cdn.example/...+).
37 38 39 40 41 42 43 44 45 46 |
# File 'app/services/seo/href_anomaly_normalizer.rb', line 37 def self.rewrite_mistaken_protocol_relative_path(href) href = href.to_s return href unless href.start_with?('//') && !href.start_with?('///') = href.delete_prefix('//').split('/', 2).first.to_s return href unless .present? && .exclude?('.') tail = href.delete_prefix("//#{}") "/#{}#{tail}" end |
.upgrade_wy_http_to_https(href) ⇒ Object
63 64 65 |
# File 'app/services/seo/href_anomaly_normalizer.rb', line 63 def self.upgrade_wy_http_to_https(href) href.to_s.sub(HTTP_WY_UPGRADE, 'https://\1') end |