2

I have a C# SignalR client (2.2) and ASP.NET MVC SignalR server on Azure. When a new "Entity" is created on the server side, it pushes a simple notification to the client using the following:

public static class EntityHubHelper
{
    private static readonly IHubContext _hubContext = GlobalHost.ConnectionManager.GetHubContext<EntityHub>();

    public static void EntityCreated(IdentityUser user, Entity entity)
    {
        _hubContext.Clients.User(user.UserName).EntityCreated(entity);
    }
}

[Authorize]
public class EntityHub : Hub
{
    // Just tracing overrides for OnConnected/OnReconnected/OnDisconnected
}

Occasionally the client or server will reconnect, which is expected, but I'm seeing cases where both reconnect (e.g. restarting web server), but then the client stops getting data.

This seems to happen after 1-2 days of no data being pushed, then finally a push that gets missed.

Our client tracing:

15/08/02 03:57:23 DEBUG SignalR: StateChanged: Connected -> Reconnecting
15/08/02 03:57:28 DEBUG SignalR: Error: System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server ---> System.Net.WebException: The remote server returned an error: (500) Internal Server Error.
15/08/02 03:57:31 DEBUG SignalR: Error: System.Net.WebSockets.WebSocketException (0x80004005): Unable to connect to the remote server ---> System.Net.WebException: The remote server returned an error: (500) Internal Server Error.
15/08/02 03:57:47 DEBUG SignalR: StateChanged: Reconnecting -> Connected
15/08/02 03:57:47 INFO SignalR OnReconnected

Our server tracing:

8/2/2015 3:57:57 AM     [SignalR][OnReconnected] Email=correspondinguser@example.com, ConnectionId=ff4e472b-184c-49d4-a662-8b0e26da43e2

I'm using the server defaults for keepalive and timeout (10s and 30s) and it's generally using websockets (enabled on Azure, standard so no limits).

I have two questions:

(1) How is the client meant to find out that a server has been restarted in the websocket case (in which case it would lose memory of said client's existence)? Do the server's 10s/30s settings get pushed down during the initial connection, and the client decides that the server is gone after 30s?

(2) How do I debug this situation? Is there some way to prove that the client is actually still receiving keepalives so I know I have some catastrophic problem somewhere else?

c b
  • 148
  • 7

1 Answers1

2

After various tests and fixes it looks like the problem was in the IUserIdProvider when mapping from users to connection IDs. Adding client-originated keepalives using SignalR messages showed that the client and server truly had reconnected, and the connection stayed healthy, but messages pushed from server to client were going into a black hole after 1-2 days, potentially with website publishing/appdomain refreshing involved.

I replaced IUserIdProvider with SQL Azure (various options explained here) using this user presence sample recommended by @davidfowl in this post, and tailored it to my existing user/auth scheme. However, it needed a few additional changes in PresenceMonitor.cs to improve reliability:

  • I had to increase periodsBeforeConsideringZombie from 3 to 6 since it was removing "zombie" connections at 30s when they wouldn't disconnect until 50s or so. This meant connections would sometimes reconnect somewhere in the 30-50s range and not be tracked in the database.
  • I had to fix the handling of heartbeat-tracked connections that weren't found in the database.

The sample has the following code in UserPresence.Check():

// Update the client's last activity
if (connection != null)
{
    connection.LastActivity = DateTimeOffset.UtcNow;
}
else
{
    // We have a connection that isn't tracked in our DB!
    // This should *NEVER* happen
    // Debugger.Launch();
}

However, the situation that apparently should never happen - seeing a heartbeat-tracked connection that wasn't found in the database - was somewhat common (say 10% of new connections) even with periodsBeforeConsideringZombie at 6. This is because the hub's OnConnected event could be a little slow firing sometimes, so you'd see a new connection in the heartbeat list if your 10-second timer-handler was "lucky".

I used this code in UserPresence instead to give a connection two timer ticks, or between 10s and 20s depending on timer "luck", to fire OnConnected. If it's still not DB-tracked I disconnect it so that the client connects again (handling OnClosed) and isn't a black hole for messages (since I loop DB connections for a user in order to push messages).

private HashSet<string> notInDbReadyToDisconnect = new HashSet<string>();

private void Check()
{
    HashSet<string> notInDbReadyToDisconnectNew = new HashSet<string>();

    ...

        else
        {
            // REMOVED: // We have a connection that isn't tracked in our DB!
            // REMOVED: // This should *NEVER* happen
            // REMOVED: // Debugger.Launch();
            string format;
            if (notInDbReadyToDisconnect.Contains(trackedConnection.ConnectionId))
            {
                trackedConnection.Disconnect();
                format = "[SignalR][PresenceMonitor] Disconnecting active connection not tracked in DB (#2), ConnectionId={0}";
            }
            else
            {
                notInDbReadyToDisconnectNew.Add(trackedConnection.ConnectionId);
                format = "[SignalR][PresenceMonitor] Found active connection not tracked in DB (#1), ConnectionId={0}";
            }
        }

    ...


    notInDbReadyToDisconnect = notInDbReadyToDisconnectNew;

    ...
}

It does the job for a single server, but the HashSet probably needs to be moved to the DB to handle scale-out.

After all of this, everything is very reliable and my server-push code is still very simple:

public static class EntityHubHelper
{
    private static readonly IHubContext _hubContext = GlobalHost.ConnectionManager.GetHubContext<EntityHub>();

    public static void EntityCreated(User user, Entity entity)
    {
        List<string> connectionIds = user.PushConnections.Select(c => c.ConnectionId).ToList();
        _hubContext.Clients.Clients(connectionIds).EntityCreated(entity);
    }
}
Community
  • 1
  • 1
c b
  • 148
  • 7