How to exclude web pages from search indexes

Sometimes you need to prevent a page from being indexed by google or by our on site search engine

There's a few place where this update must be made

  1. Robots.txt

This is the basic, edit /public/robots.txt and add your page or directory structure to exclude at the bottom for the US and CA locale:

Disallow: /en-US/sales/special-offers/*
Disallow: /en-CA/sales/special-offers/*

or

Disallow: /en-US/sales/special-offers/specific-page
Disallow: /en-CA/sales/special-offers/specific-page
  1. Sitemap

The sitemap is dynamically generated by the app/controllers/pages_controller.rb

In the pages controller you will see a large hash defined at the bottom called LAYOUT_MAP

For the given page, instead of specifying the layout only as a string, use a hash and specify the :skip_site_map attribute

Here's an example specifying a regular layout for a page and one specifying a layout + the skip site map parameter

LAYOUT_MAP =
...
'sales/veterans' => '/pages/simple_page_summary',
'sales/special-offers/carpetone' => { layout: '/pages/simple_page_summary' ,skip_site_map: true },  
...
  1. NOINDEX, NOFOLLOW

For good measure you should also add to the meta headers of the page the ROBOTS instructions, in your page html.erb :

<% content_for :head do %>
<meta name="robots" content="noindex,nofollow,noarchive,nosnippet,noodp">
<% end %>