1

I am trying to get a list of news articles from https://www.businesstimes.com.sg/keywords/singapore-parliament using XMLHTTP but it seems to return a document with just scripts with no content.

I have tried a basic request:

Private Sub Test()
    Dim xmlhttp As MSXML2.XMLHTTP60
    Set xmlhttp = New MSXML2.XMLHTTP60
    
    xmlhttp.Open "GET", "https://www.businesstimes.com.sg/keywords/singapore-parliament", False
    xmlhttp.Send
           
    Set xmlhttp = Nothing
End Sub

And it returns a document with a bunch of minified script except for this:

<script>
    (function() {
      'use strict';
      var afterReadyCbCalled = false;
      var originalHeaders = ["X-Host", "www.businesstimes.com.sg","X-EC-Hot-Hash", "7790000207959645976","x-ec-pop", "sgb","X-Forwarded-For", "103.252.200.88, 165.225.112.130, 152.195.199.174, 35.201.102.132","X-EC-Session-ID", "12399570086198404337867903748881746029","Accept", "*/*","Accept-Language", "en-US,en-GB;q=0.8,en;q=0.5,ja;q=0.3","True-Client-IP", "165.225.112.130","X-Cloud-Trace-Context", "ff9cf2795015e71e68e65cd11ad81a87/6844774677298190324","X-EC-Uuid", "12399570086198404337867903748881746029","X-Forwarded-Proto", "https","UA-CPU", "AMD64",];
      var originalBody = "";
      function afterReadyCb() {
        if (afterReadyCbCalled) return;
        afterReadyCbCalled = true;
        var xhr = new XMLHttpRequest();
        xhr.onload = function() {
          var isValid = xhr.getResponseHeader("ISTL-INFINITE-LOOP");
          if (isValid != null && isValid != '') return;
          var a = xhr.getResponseHeader("ISTL-REDIRECT-TO");
          if (a != null && a != '') {
            location.replace(a);
          } else {
            if (window.history != null && typeof history.replaceState === 'function') {
              var responseURL = xhr.responseURL != null ? xhr.responseURL : xhr.getResponseHeader("ISTL-RESPONSE-URL");
              if (responseURL != null && responseURL != '') {
                history.replaceState(null, '', responseURL);
              }
            }
            // DO NOT INLINE. There is a bug specific to IE/Edge.
            var responseText = xhr.responseText;
            document.open();
            document.write(responseText);
            document.close();
          }
        };
        xhr.open("get", location.href, true);
        for (var i = 0; i < originalHeaders.length; i += 2) {
          var headerName = originalHeaders[i];
          try {
            xhr.setRequestHeader(headerName, originalHeaders[i + 1]);
          } catch (e) {}
        }
        xhr.setRequestHeader("ISTL-INFINITE-LOOP", '1');
        xhr.send(originalBody);
        var evt = document.createEvent('Event');
        evt.initEvent('BJNyvohAx', true, true);
        dispatchEvent(evt);
      }
      addEventListener('afterReady', afterReadyCb, false);
      setTimeout(afterReadyCb, 200);
    }());
  </script>

I have also tried .setRequestHeader with all the values in originalHeaders and .setRequestHeader "ISTL-INFINITE-LOOP", "1" stated in the script but got 403 Forbidden error instead.

Can anyone advise me what am I missing to get the document content? (if it's even possible)

Thanks in advance!

Raymond Wu
  • 3,357
  • 2
  • 7
  • 20
  • 1
    You need one specific cookie `juLD4H3B` – QHarr Jun 22 '21 at 04:44
  • I'm not familiar with setting cookie in the request (or how it works actually), can I simply get the value from Chrome DevTools and assign it in setRequestHeader? `.setRequestHeader "Cookie", "juLD4H3B=ABZHajF6AQAAH0KEfNV9kI1EEZg8m3BcrjBrBRN1ddwumUMKZVGciT2p_7ji"` – Raymond Wu Jun 22 '21 at 04:52
  • @QHarr Alright after some googling I managed to get it using ServerXMLHTTP instead of XMLHTTP. Are you able to explain how did you determine that cookie is the reason/factor? – Raymond Wu Jun 22 '21 at 04:56
  • 2
    I mimicked the full request with all headers and then did a process of elimination. There are some headers that by experience I know I can immediately remove. After pinning it down to cookies, I removed cookies (quicker possibly to do the divide by two rule i.e. remove half see if fails etc.....) and ascertained which cookie, currently, is needed. I then checked in the browser dev tools application tab to see more about that cookie. – QHarr Jun 22 '21 at 05:04
  • @Qharr thank you for the explanation! I was replicating a full request and tried a process of elimination as you said but the use of XMLHTTP led me to nowhere. I'll be happy to accept it as answer If you want to post this as an answer. – Raymond Wu Jun 22 '21 at 05:34
  • 1
    No. But feel free to post your final code as a solution. You can accept your own answer after two days. – QHarr Jun 22 '21 at 05:38
  • I'd be curious to see the final solution, if it can be provided. – TechFanDan May 11 '22 at 17:17
  • 1
    @TechFanDan I had a [follow-up question](https://stackoverflow.com/questions/68159619/unable-to-get-the-content-of-the-document-using-xmlhttp-request-part-2), take a look at that. – Raymond Wu May 12 '22 at 01:39

0 Answers0