0

I'm working on a C# application that needs to scrape some data from a phpBB forum. The forum scraping requires logging in. The application will prompt the user for their login credentials to connect.

I've scraped websites before with C#, but what I'm not sure how to do is login to phpBB and keep a session open during the duration of the screen scraping. I've done some searching and haven't had much luck. Is there a good way to programmatically do something like this?

Tyler Treat
  • 14,640
  • 15
  • 80
  • 115

4 Answers4

1

You don't say what you've tried, but if you used an HttpWebRequest object to retrieve pages and/or logon, then you need to assign a new CookieContainer collection to the HttpWebRequest to store any cookies returned by the website. Share this amongst HttpWebRequest objects to remain logged in

Martin Booth
  • 8,485
  • 31
  • 31
0

look for the names of the username and password fields using Firebug or Chrome (or even View Source), then use my answer here: Programmatically logging into a site, replacing 'session_key' and 'session_password' as appropriate. that should work.

and then translate to C#!

Community
  • 1
  • 1
jcomeau_ictx
  • 37,688
  • 6
  • 92
  • 107
0

I recommend using HTML Agility Pack.

wp78de
  • 18,207
  • 7
  • 43
  • 71
Richard Schneider
  • 34,944
  • 9
  • 57
  • 73
0

I would recommend using WatiN API for doing screen scraping. I have done screen scraping using this API and it does good work. Check it out !

Widor
  • 13,003
  • 7
  • 42
  • 64
Bibhu
  • 4,053
  • 4
  • 33
  • 63
  • WatiN is a testing framework that opens a browser and executes commands. Automating browser functionality is not what I'm looking for. – Tyler Treat Jun 14 '11 at 03:12