How to exclude web pages from search indexes
Sometimes you need to prevent a page from being indexed by google or by our on site search engine
There's a few place where this update must be made
- Robots.txt
This is the basic, edit /public/robots.txt and add your page or directory structure to exclude at the bottom for the US and CA locale:
Disallow: /en-US/sales/special-offers/*
Disallow: /en-CA/sales/special-offers/*
or
Disallow: /en-US/sales/special-offers/specific-page
Disallow: /en-CA/sales/special-offers/specific-page
- Sitemap
The sitemap is dynamically generated by the app/controllers/pages_controller.rb
In the pages controller you will see a large hash defined at the bottom called LAYOUT_MAP
For the given page, instead of specifying the layout only as a string, use a hash and specify the :skip_site_map attribute
Here's an example specifying a regular layout for a page and one specifying a layout + the skip site map parameter
LAYOUT_MAP =
...
'sales/veterans' => '/pages/simple_page_summary',
'sales/special-offers/carpetone' => { layout: '/pages/simple_page_summary' ,skip_site_map: true },
...
- NOINDEX, NOFOLLOW
For good measure you should also add to the meta headers of the page the ROBOTS instructions, in your page html.erb :
<% content_for :head do %>
<meta name="robots" content="noindex,nofollow,noarchive,nosnippet,noodp">
<% end %>