Wrote a simple test function using selenium webdriver in Python:
from selenium import webdriver
def test_webdriver():
web = webdriver.PhantomJS()
web.get('http://example.com')
web.find_element_by_tag_name('html')
web.find_element_by_tag_name('head')
web.find_element_by_tag_name('meta')
web.find_element_by_tag_name('body')
web.find_element_by_tag_name('title')
web.find_element_by_tag_name('p')
web.find_element_by_tag_name('div')
This function took much longer than expected to run, so I profiled it with cProfile and saw some lines like this:
ncalls tottime percall cumtime percall filename:lineno(function)
...
9 0.000 0.000 0.157 0.017 .../python2.7/urllib2.py:386(open)
...
Which clearly indicates that webdriver is accessing the network on every find
call in my test function.
I thought that webdriver grabs a DOM once and ONLY once with get()
and then searches and manipulates it locally, similar to BeautifulSoup. Clearly it's not working like that so I'm left with some questions:
- Is this the normal, expected behavior of webdriver, or just a misconfiguration on my part?
- If this is normal behavior, then is there a way to force webdriver to not access the network on every function call?
- What is it accessing the network for? It can't be refreshing the page on every
find
, that just doesn't make any sense.
NOTE: I understand that javascript on the test page may fire off unintended network calls, which is why I'm using http://example.com as my test page, to eliminate that possibility.