How to get links from google html search results in c#?

Question

I got this code that brings me the search results from Google as an HTML string:

 WebClient webClient = new WebClient();
 string htmlString = webClient.DownloadString("http://www.google.com/search?q=" + searchQuery);

Any idea how to extract only the links from it ? I guess I do a string search, but it doesn't look so elegant...

I found this code

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(htmlString);
var selectNodes = htmlDoc.DocumentNode.SelectNodes("//li[@class='g']");
foreach (var node in selectNodes)
{
     //node.InnerText will give you the text content of the li tags ...
}

But I'm getting an exception that var selectNodes = htmlDoc.DocumentNode.SelectNodes("//li[@class='g']"); is null...

you can use htmlagilitypack from here http://htmlagilitypack.codeplex.com/ — Dgan, Mar 16 '15 at 05:14

score 0 · Answer 1 · answered Mar 17 '15 at 06:53

HtmlDocument doc = new HtmlDocument();
        doc.Load("file.htm");
        HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//*[@background or @lowsrc or @src or @href]");
        foreach (HtmlNode link in links)
        {

            if (link.Attributes["background"] != null)
                link.Attributes["background"].Value = _newPath + link.Attributes["background"].Value;
            if (link.Attributes["href"] != null)
                link.Attributes["href"].Value = _newPath + link.Attributes["href"].Value;(link.Attributes["href"] != null)
                link.Attributes["lowsrc"].Value = _newPath + link.Attributes["href"].Value;
            if (link.Attributes["src"] != null)
                link.Attributes["src"].Value = _newPath + link.Attributes["src"].Value;
        }

What is the meaning of `_newPath` ? – Liran Friedman Mar 21 '15 at 20:15 — Liran Friedman, Mar 21 '15 at 20:15

How to get links from google html search results in c#?

1 Answers1