1

I am trying to get data of the website

www.deutsches-krankenhaus-verzeichnis.de/suche/Bundesland/Nordrhein-Westfalen.jsf

This page is using ajax and i could not figure out how can i grab that data. As i have tried curl and other methods.

Please provide some suggestion.

Thank you

Adas
  • 309
  • 1
  • 18
  • They could use a kind of protection to prevent you from doing this (session checking, refer checking) – Perrykipkerrie Dec 25 '15 at 07:19
  • yes might be, that is what i am not able to get, they are calling whole page using ajax call, and when i see this link they are using "http://www.deutsches-krankenhaus-verzeichnis.de/suche/_files/main-search/Suchergebnis.jsf", but still this page again using same call. – Adas Dec 25 '15 at 07:20
  • and the tricky part is when you view source, you won't find any listing data there. – Adas Dec 25 '15 at 07:23
  • In the best situation, what is the result you want. A list of hospitals? – Perrykipkerrie Dec 25 '15 at 07:25
  • Yes, i need list of hospitals and its link, so that i can automate the process and in next page i can get other informations, but on this page i need Hospital names and its links. – Adas Dec 25 '15 at 07:27
  • Its strange, i only get 404 GET ajax calls in my console, while the hospitals ere loaded – Perrykipkerrie Dec 25 '15 at 07:43
  • yes very strange, i wonder what they are using to prevent. If you find please provide any idea. – Adas Dec 25 '15 at 07:49
  • I only know that these results are listed in the table with the id 'searchResults'. And that this table is hidden at start, however, i cannot find the script in which the hidden is changed in to visible. You can try contacting the company and explain why you need this names and links. – Perrykipkerrie Dec 25 '15 at 07:51
  • yes i also found that ID, even i am not able to find anything with that, I need data for my client, anyway thank you for the effort. – Adas Dec 25 '15 at 07:57
  • what code have you tried? – Clay Dec 25 '15 at 08:19
  • Duplicate of http://stackoverflow.com/questions/260540/how-do-you-scrape-ajax-pages ?? Merry christmas :) – MattAllegro Dec 25 '15 at 08:21
  • $html = new DOMDocument(); @$html->loadHtmlFile('http://www.deutsches-krankenhaus-verzeichnis.de/suche/Bundesland/Nordrhein-Westfalen.jsf'); $xpath = new DOMXPath( $html ); $nodelist = $xpath->query( '//*[@id="searchResults"]' ); foreach ($nodelist as $n){ echo $n->nodeValue."\n"; } – Adas Dec 25 '15 at 09:51

2 Answers2

0

You can go for CasperJS (http://casperjs.org/) Here is the small example for this (http://docs.casperjs.org/en/latest/quickstart.html#a-minimal-scraping-script)

Kshitij Soni
  • 394
  • 3
  • 17
0

When using curl you just get the source of the original page, without any javascript being executed. Try using some headless browser solution such as PhantomJs to load the page and execute the javascript. It allows you to query the page with css selectors after the Ajax data has been loaded.

http://phantomjs.org

stys
  • 707
  • 7
  • 23
  • can you please give me an example as i couldn't figure out how to work with phantomjs. – Adas Dec 25 '15 at 07:44