37

I am creating two pages on my site that are very similar but serve different purposes. One is to thank users for leaving a comment and the other is to encourage users to subscribe.

I don't want the duplicate content but I do want the pages to be available. Can I set the sitemap to hide one? Would I do this in the robots.txt file?

The disallow looks like this:

Disallow: /wp-admin

How would I customize to the a specific page like:

http://sweatingthebigstuff.com/thank-you-for-commenting

Daniel
  • 6,758
  • 6
  • 31
  • 29

3 Answers3

58
Disallow: /thank-you-for-commenting$

in robots.txt

Take a look at last.fm robots.txt file for inspiration.

The dollar sign marks an end. We need that so other pages starting with the same url, for example /thank-you-for-commenting-another-page, will be indexed.

Wim Feijen
  • 788
  • 7
  • 9
Alex
  • 14,338
  • 5
  • 41
  • 59
13

robots.txt files use regular expressions to match pages, so to avoid targeting more pages than you intend, you may need to add a $ to the end of the page name:

Disallow: /thank-you-for-commenting$

If you don't you'll also disallow page /thank-you-for-commenting-on-this-too

Highly Irregular
  • 38,000
  • 12
  • 52
  • 70
9

You can also add a specific page with extension in robots.txt file. In case of testing, you can specify the test page path to disallow robots from crawling.

For examples:

 Disallow: /index_test.php
 Disallow: /products/test_product.html
 Disallow: /products/     

The first one Disallow: /index_test.php will disallow bots from crawling the test page in root folder.

Second Disallow: /products/test_product.html will disallow test_product.html under the folder 'products'.

Finally the last example Disallow: /products/ will disallow the whole folder from crawling.

Nikz
  • 1,346
  • 1
  • 18
  • 24