0

Recently, I came across a python script to download files directly from Kaggle : https://ramhiser.com/2012/11/23/how-to-download-kaggle-data-with-python-and-requests-dot-py/

I am trying to do something similar using WebClients in C#. I've came the following response in StackOverFlow : C# download file from the web with login

Tried using it but I seem to be downloading only the login page instead of the actual file. Here's my main code :

CookieContainer cookieJar = new CookieContainer();
CookieAwareWebClient http = new CookieAwareWebClient(cookieJar);

string postData = "name=<username>&password=<password>&submit=submit";
string response = http.UploadString("https://www.kaggle.com/account/login", postData);
Console.Write(response);

http.DownloadFile("https://www.kaggle.com/c/titanic/download/train.csv", "train.CSV");

I've used the Webclient extension from the link above and modified slightly :

public class CookieAwareWebClient : WebClient
{
    public CookieContainer CookieContainer { get; set; }
    public Uri Uri { get; set; }

    public CookieAwareWebClient()
        : this(new CookieContainer())
    {
    }

    public CookieAwareWebClient(CookieContainer cookies)
    {
        this.CookieContainer = cookies;
    }

    protected override WebRequest GetWebRequest(Uri address)
    {
        this.Uri = address;
        WebRequest request = base.GetWebRequest(address);
        if (request is HttpWebRequest)
        {
            (request as HttpWebRequest).CookieContainer = this.CookieContainer;
        }
        HttpWebRequest httpRequest = (HttpWebRequest)request;
        httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
        return httpRequest;
    }

    protected override WebResponse GetWebResponse(WebRequest request)
    {
        WebResponse r = base.GetWebResponse(request);
        var response = r as HttpWebResponse;
        if (response != null)
        {
            CookieCollection cookies = response.Cookies;
            CookieContainer.Add(cookies);
        }
        return response;
    }
}

Was wondering if anyone can point out where I went wrong?

Thanks.

Jeremy Loh
  • 175
  • 1
  • 9

3 Answers3

2

We have created a forum post to help you accomplish what you wanted to do, Accessing Kaggle API through C#. Feel free to post here or on the forum if you have additional questions.

Peijen
  • 83
  • 2
  • 6
  • Hei Peijen, thanks for the update. When I tried to use the sample code provided, I seem to be downloading the files with the following content : "System.Net.Http.WinHttpResponseStream" only. Happens for all files and I've tried with both the project in the sample code and the infamous Titanic project. Do you know why this is happening? ps: I've tried using both .Net framework and .Net core and the results are the same. – Jeremy Loh Mar 16 '18 at 03:19
  • Jeremy, I have updated the code to use FileStream instead of StreamWriter, try that and see if it works. – Peijen Mar 16 '18 at 05:27
  • Unfortunately, it's not working. I'm getting a "stream does not support writing". I can't set the position either. – Jeremy Loh Mar 16 '18 at 06:20
  • Did you use "stream.CopyTo(output);" Instead of "output.Write(stream);" Sorry I didn't call it out in the previous comment. If that still doesn't work do you mind posting your code somewhere and I can take a look. – Peijen Mar 16 '18 at 07:33
  • Yeap, I noticed that. Here's a link to the file itself : https://drive.google.com/file/d/1E3ryHnBDSGL-HoSBMQsOeGOBI8sd37aj/view?usp=sharing I've pretty much js copied and paste the code you had in your sample in this case – Jeremy Loh Mar 16 '18 at 08:04
  • ah, you want **stream**.CopyTo(output) not **output**.CopyTo(stream) – Peijen Mar 16 '18 at 16:17
  • gaah, I can't believe I missed that. It's working now. thanks a lot for your help! – Jeremy Loh Mar 17 '18 at 08:36
0

Try to go to https://www.kaggle.com/c/titanic/download/train.csv by your browser without logged in and your browser will open that page instead of downloading your file. You need to put direct link to the file instead of a web page.

Your code works perfectly, you just need to put a direct link to that file or make sure you have logged in before download the file.

Red Wei
  • 854
  • 6
  • 22
  • Hey Red, yeap you are correct that it needs a login before being able to download. Hence, I understood that by calling this method http.UploadString(..), the WebClientExtension logs in, saves the cookie so that the next time I try to access a page or download the file via http.DownloadFile(...), it uses the cookie to access the direct link but it doesn't seem to work for me. – Jeremy Loh Mar 14 '18 at 01:03
  • This `string postData = "name=&password=&submit=submit";`. It's not always is "name" and "password", you need to view the web cookies and follow their pattern. Try `string postData = "UserName=&Password=` – Red Wei Mar 14 '18 at 04:01
  • Yeap, played around with it but didn't manage to get through . I suppose I'll try the API and see if I can get a better result. Thanks anyway. – Jeremy Loh Mar 16 '18 at 03:21
0

I know it's not exactly what you were asking, but Kaggle now has an official API that you can use to download data. Should be a bit easier to use. :)

Rachael Tatman
  • 841
  • 7
  • 6