I am attempting to use HTMLagilitypack
to extract all the content from the webpage.
foreach (HtmlTextNode node in doc.DocumentNode.SelectNodes("//text()"))
{
sb.AppendLine(node.Text);
}
When i try to parse google.com using above code i get lots of javascript. All i want is to extract the content in the webpage like in h
or p
tags. Like taking the question,answer,comments on this page and removing everything else.
I am really new to XPath and don't exactly know where to move forward. So any help would be appreciated.