7

I need to modify my request url before a response is downloaded. But I'm not able to change it. Even after modifying the request url using request.replace(url=new_url), the process_response prints the non-modified url. Here's the code of the middleware:

def process_request(self, request, spider):
    original_url = request.url
    new_url= original_url + "hello%20world"
    print request.url            # This prints the original request url
    request=request.replace(url=new_url)
    print request.url            # This prints the modified url

def process_response(self, request, response, spider):
    print request.url            # This prints the original request url
    print response.url           # This prints the original request url
    return response

Can anyone please tell me what I'm missing here ?

Rahul
  • 3,208
  • 8
  • 38
  • 68

1 Answers1

10

Since you are modifying the request object in process_request() - you need to return it:

def process_request(self, request, spider): 
    # avoid infinite loop by not processing the URL if it contains the desired part
    if "hello%20world" in request.url: pass 

    new_url = request.url + "hello%20world"
    request = request.replace(url=new_url) 
    return request
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 4
    It went into an infinite loop. But I got the flow now. I need to return the new request so that it can be processed. I used an `if` condition to check if the url is modified and then returned the request. Thanks for the help. – Rahul Dec 23 '15 at 14:27
  • Can you please post how you accomplished this? I've posted a related question at https://stackoverflow.com/questions/50497430/keep-scrapy-spider-on-google-cache-of-site – NFB May 23 '18 at 21:09
  • @Rahul could you please update the answer or put into the comments what you ended up with? I do see that the current code in the answer causes an infinite loop - wanna fix/improve the answer. – alecxe Apr 17 '20 at 21:13
  • I don't remeber it exactly, but it was something like this: `def process_request(self, request, spider): if "hello%20world" in request.url: pass else: new_url= request.url + "hello%20world" request=request.replace(url=new_url) return request` – Rahul Apr 18 '20 at 11:20