2

I am verifying image urls by making an http get request asynchronously. All works fine with the code below but when I have so many Images, our firewall will block my internet access because of so many threads concurrently requesting. Therefore I was looking for a solution how to restrict the count of concurrently running threads. I ended up with this thread telling me to use SemaphoreSlim but I am somehow not able to get the idea and how to implement this?

  • is that SemaphoreSlim wait or waitAsnyc (what is the difference anyway?) should be inside a foreach while adding tasks? Can I just create the task list with linq as I do in my code?
    • why is there used task.Run?
    • after which line is executed does the thread start? after task.run or task.whenall?

If that's not the best approach, please suggest a better one. I'm not sure if using MaxDegreeOfParallelism with parallel.foreach makes sense as well?

  Dim tasks = myImages.Select(Function(x) testUrl_async(x))
  Dim results = Await Task.WhenAll(tasks)

Async Function testUrl_async(ByVal myImage  As image) As Task(Of image)
   Dim myImageurl as string=myImage.imageurl
   myHttpResponse = Await myHttpClient.GetAsync(myImageurl)
    If myHttpResponse.IsSuccessStatusCode Then
        Return myImage
    Else
        Return Nothing
    End If
End Function
Community
  • 1
  • 1
Emil
  • 6,411
  • 7
  • 62
  • 112
  • I would sequentially await them with continuations or a plain old foreach. Pretty sure theres a more elegant solution though. – Machinarius Jun 24 '15 at 13:28
  • 1
    Why don't you use Parallel.ForEach setting MaxDegreeOfParallelism? – mehrandvd Jun 24 '15 at 13:32
  • actually, I was suggested on post below to use better this approach and I tried both indeed but i got better performance with this code than parallel.foreach. http://stackoverflow.com/questions/31007055/using-parallel-foreach-with-or-async-await – Emil Jun 24 '15 at 13:43
  • I checked that post. It's clever not to use Parallel.ForEach. I post an answer based on your idea. Check it out.. – mehrandvd Jun 24 '15 at 14:27
  • Restricting the number of threads is a job for a custom TaskScheduler, like the QueuedTaskScheduler in [Parallel Extensions Extras](http://blogs.msdn.com/b/pfxteam/archive/2010/04/09/9990424.aspx). Anything else simply wastes ThreadPool threads. A better option though is to use an ActionBlock with a specific MaxDegreeOfParallelism. It's far simpler *and* allows you to simply post all URLs in a queue for processing. – Panagiotis Kanavos Jun 24 '15 at 14:28

3 Answers3

5

our firewall will block my internet access because of so many threads concurrently requesting. Therefore I was looking for a solution how to restrict the count of concurrently running threads.

Pretty sure that the firewall is restricting your number of connections, and thus you want to restrict your number of connections (not threads).

is that SemaphoreSlim wait or waitAsnyc (what is the difference anyway?)

Wait is a synchronous wait - it blocks the calling thread. WaitAsync is an asynchronous wait - it frees the calling thread and resumes executing the current method when the semaphore is available.

should be inside a foreach while adding tasks? Can I just create the task list with linq as I do in my code?

You can do it either way: build up a list explicitly, or use LINQ.

why is there used task.Run?

That's an error in that answer. Task.Run is certainly not needed or desired here.

after which line is executed does the thread start? after task.run or task.whenall?

When you call Task.Run, that delegate is queued to the thread pool immediately. But as I said above, you don't want to use Task.Run (it also shouldn't be used in the original answer either).


So, something like this should suffice:

Private _mutex As New SemaphoreSlim(20)
Async Function testUrl_async(myImage As image) As Task(Of image)
    Await _mutex.WaitAsync()
    Try
        Dim myImageurl = myImage.imageurl
        Dim myHttpResponse = Await myHttpClient.GetAsync(myImageurl)
        Return If(myHttpResponse.IsSuccessStatusCode, myImage, Nothing)
    Finally
        _mutex.Release()
    End Try
End Function
Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • Isn't that better to separate the logic of getting an image asynchronously and controlling the connections? I think it's better to let the calling loop to use Semaphore and implement the connection control logic. – mehrandvd Jun 24 '15 at 20:44
  • 1
    @mehrandvd: Sure, you could do that. Just move the `WaitAsync`/`Try`/`Finally`/`Release` into the calling code. – Stephen Cleary Jun 24 '15 at 21:35
  • @StephenCleary when you dont use Task.Run in the other post, will the program create a list of task list and execute those only when the line Task.WhenAll is executed, right? But how is the role of the waitAsync there because it will be already out of foreach scope? how this this work? or it is not good to use whenAll in this case? – Emil Jun 25 '15 at 09:21
  • @batmaci: Async tasks are always returned "hot" - they are already in progress. Neither `await` nor `WhenAll` will *start* them. – Stephen Cleary Jun 25 '15 at 11:20
0

You can use Task and SemaphoreSlim like this:

var MaxAllowedConcurrentRequestsCount = 20;
var guard = new SemaphoreSlim(0, MaxAllowedConcurrentRequestsCount);
foreach(var imageUrl in imageUrls)
{
    guard.Wait()
    var image = await Task.Factory.StartNew((imageUrl) => return myHttpClient.Get(imageUrl));
    guard.Release();
    // You can do whaterver you like with image. For example add it to you concurrent list.
}
mehrandvd
  • 8,806
  • 12
  • 64
  • 111
  • @batmaci You can check following link which describes why to use loop and ContinueWith instead of async/await in some situations. http://www.thinqlinq.com/Post.aspx/Title/Async-in-loops – mehrandvd Jun 24 '15 at 14:40
  • what happened to the comment from @Panagiotis Kanavos. He mentioned a disadvantage of your method. I was comparing your one and his suggestion. Now I see that he deleted his comment. is there any big difference between your approach and using QueuedTaskScheduler? – Emil Jun 24 '15 at 15:40
  • That's good too. I like it as it uses Parallel.ForEeach which I think is cool. My code is written in a lower level so you can see the algorithm. – mehrandvd Jun 24 '15 at 16:25
  • @batmaci After a little chat with Stephen I've just used await instead of ContinueWith in the calling loop. – mehrandvd Jun 25 '15 at 07:17
  • why do you use task.factory insead of task.whenall? and does it make any diffference initiliazing guard before foreach or inside foreach? – Emil Jun 25 '15 at 09:11
-1

I assume that you are not inside the WPF or the Windows Form Thread. If you are there will be only one thread working after the await.

As we assume you are not in those threads, the ThreadPool is used to execute the continuations after the await. You can change the amount of Threads used by the pool with ThreadPool.SetMaxThreads but I suggest you don't do that and leave it up to .NET to do the right thing. That is usually the best way.

Nitram
  • 6,486
  • 2
  • 21
  • 32
  • actually I am on WPF doing this in ViewModel. what do you mean by " If you are there will be only one thread working after the await."? you mean that it will be not multi thread ? if not, why do I see a speed increment? it runs significantly faster – Emil Jun 24 '15 at 14:29
  • If you are in a WPF thread, await has to synchronize to the WPF `Dispatcher`. Basically it does register the continuation as action in the dispatcher using `Dispatcher.Invoke`. https://msdn.microsoft.com/en-us/library/hh199416(v=vs.110).aspx – Nitram Jun 24 '15 at 14:31
  • The OP wants to execute multiple GETs concurrently to reduce speed but not exceed the limit set by the firewall. This isn't about WPF or the thread after `await` – Panagiotis Kanavos Jun 24 '15 at 14:32
  • Even if the code was run on the ThreadPool, `SetMaxThreads` wouldn't limit the number of connection, which is what actually matters here, not the number of threads. – svick Jun 24 '15 at 21:49
  • @Nitram No, the question asks how to limit the number of concurrent requests. The mention of threads is just an incorrect assumption by the OP and so it should be ignored. (BTW, if you don't use [@reply](http://meta.stackexchange.com/q/43019/130186), I won't be notified of your replies.) – svick Jun 25 '15 at 00:57
  • @svick sorry being ignorent but what is the difference between concurrent requests and multi threads? isnt a request done by a thread? if concurrent , it is not supposed to be multi worker threads? – Emil Jun 25 '15 at 08:40
  • 1
    @batmaci Not by all means. Basically concurrent requests mean that multiple connections to a host are established at the same time. Those connections can be maintained by the same thread. So a concurrent request happens if you open a second connection (and request something from a host) before receiving the answer to the first request and closing the first connection. – Nitram Jun 25 '15 at 08:42
  • @Nitram ah i see, you are saying that a thread cant start 2 connections same time but It can start 2nd one before the 1st one is closed. in my case, connections would be so quickly ended, therefore I considered only the if multiple requests started at exactly same time. – Emil Jun 25 '15 at 08:55
  • 1
    @batmaci You underestimate how fast the connections will open in your example In the way you wrote it, the initial caller thread will open each connection one by one. And once the communication is done the ThreadPool threads handle everything after the `Await myHttpClient.GetAsync(myImageurl)` concurrently. How ever even if your connection is only open for a few milliseconds, the first thread opening the connections will be faster. – Nitram Jun 25 '15 at 08:59