2

When you have a single page application (SPA) which contains mostly of JavaScript content that is loaded via Ajax google propose a guide on how to make such applications crawlable.

You have to use #! in your page fragments to make the fragments visible to the search engine.

www.example.com/ajax.html#!mainpage

Now, if you use the HTML5 PushState History API you can change the url to

www.example.com/ajax.html/mainpage

This url looks must nicer than the first one. Search engines can easily access the page since there is no hash bang # in the url. The problem is that it is still a JavaScript page the must be interpreted which search engines do not do.

  1. How can this ajax page be accessible to search engines?
  2. How does my server know if a search engine or a user browser tries to access the page?

I have the following ideas but no idea how to implement it or if there are any solutions.

  • you can make a html snapshot of each ajax page that can be accessed by the search engine
  • you can use some kind of ui less browser is accessed by the search engine, the ui less browser interprets the page and then returns the html content to the search engine
Michael
  • 32,527
  • 49
  • 210
  • 370

1 Answers1

2

I think both of your ideas are on track.

Either way, you would need to catch on the server the search engin "?_escaped_fragment_=" as a proxy for "#!". For this, you may look at this SO and lookup GWT official reference.

For performance reason (the headless browser is slow, it may run beyond end of javascript completion), you can also cache the resulting static html pages (those that don't depend upon dynamic parameters like in your example) and serve those but then you need to be careful to keep them in sync when you upgrade your code to avoid being considered a Doorway Page.

Community
  • 1
  • 1
Patrick
  • 1,561
  • 2
  • 11
  • 22
  • So even if you don't have hashbanged URLs you're saying that the search engine will still use ?_escaped_fragment_ and just be empty? This way we can just check to see if this exists. If the crawler only used that param if it detected a hashbang then how would we know it's a crawler? – tommybananas Apr 03 '14 at 20:08
  • @snowman4415 In order to make pages without hash fragments crawlable, add the following meta tag to your page: . Then yes the _escaped_fragment_ that Google will send to your server will be empty; your server logic needs to catch this case as code exemple I referred to in my answer. If I understood your 2nd question, then I would say Google crawler uses _escaped_fragment_, but others can do as well; I don't who does vs. not. – Patrick Apr 04 '14 at 01:00