31

I'm implementing a chat room. So far, so good - users can send messages from their browsers via a JS client, and I can use a C# client to do the same thing - these messages get broadcast to other users. Now, I'm trying to implement "online users".

My approach is the following:

  • OnConnected - update the User in the db to be IsOnline = true
  • OnDisconnected - if the User doesn't have any other connections, update the user in the db to be IsOnline = false
  • I'm storing state in the DB because I have to query the db for user thumbnails anyways - this seemed like a simple alternative to working with the Dictionaries in the hub.

The problem I'm encountering is that OnDisconnected doesn't always get called for every client Id - the stale connections are preventing the "if the user doesn't have any other connections" bit from resolving to true, so the user is always "online".

One hacky solution I can think of is to always set the user to offline in the db upon OnDisconnect - but this means that if the user opens two tabs and closes one, they will be "offline". I could then re-set the user to online for every message that gets sent, but this seems like a total waste of processing cycles and still leaves a chunk of time where the user is seen as offline, when they are really online.

I believe that if there was a way to guarantee that OnDisconnected gets called for every client, this problem would go away. It seems like if I leave clients open for a long time (> 10 minutes) and then disconnect, OnDisconnected never gets called. I'll try my best to pinpoint the repro steps and keep this updated.

So - Is this a valid approach to handling online status? If so, what else can be done to ensure that OnDisconnected is firing for every connection, eventually?

This problem worries me because existing Connections will just continue to grow over time, if I'm not mistaken, eventually overflowing due to unhandled state connections.

Code:

I'm using the In-memory approach to groupings.

Sending Messages (C#):

private readonly static ConnectionMapping<string> _chatConnections =
            new ConnectionMapping<string>();
public void SendChatMessage(string key, ChatMessageViewModel message) {
            message.HtmlContent = _compiler.Transform(message.HtmlContent);
            foreach (var connectionId in _chatConnections.GetConnections(key)) {
                Clients.Client(connectionId).addChatMessage(JsonConvert.SerializeObject(message).SanitizeData());
            }
        }

State management:

    public override Task OnConnected() {
        HandleConnection();
        return base.OnConnected();
    }

    public override Task OnDisconnected() {
        HandleConnection(true);
        return base.OnDisconnected();
    }

    public override Task OnReconnected() {
        HandleConnection();
        return base.OnReconnected();
    }

    private void HandleConnection(bool shouldDisconnect = false) {
        if (Context.User == null) return;
        var username = Context.User.Identity.Name;
        var _userService = new UserService();
        var key = username;

        if (shouldDisconnect) {
                _chatConnections.Remove(key, Context.ConnectionId);
                var existingConnections = _chatConnections.GetConnections(key);
                // this is the problem - existingConnections occasionally gets to a point where there's always a connection - as if the OnDisconnected() never got called for that client
                if (!existingConnections.Any()) { // THIS is the issue - existingConnections sometimes contains connections despite there being no open tabs/clients
                    // save status serverside
                    var onlineUserDto = _userService.SetChatStatus(username, false);
                    SendOnlineUserUpdate(_baseUrl, onlineUserDto, false);
                }
        } else {
                if (!_chatConnections.GetConnections(key).Contains(Context.ConnectionId)) {
                    _chatConnections.Add(key, Context.ConnectionId);
                }
                var onlineUserDto = _userService.SetChatStatus(Context.User.Identity.Name, true);
                SendOnlineUserUpdate(_baseUrl, onlineUserDto, true);
                // broadcast to clients
        }
    }

ConnectionMapping:

public class ConnectionMapping<T> {
        private readonly Dictionary<T, HashSet<string>> _connections =
            new Dictionary<T, HashSet<string>>();

        public int Count {
            get {
                return _connections.Count;
            }
        }

        public void Add(T key, string connectionId) {
            lock (_connections) {
                HashSet<string> connections;
                if (!_connections.TryGetValue(key, out connections)) {
                    connections = new HashSet<string>();
                    _connections.Add(key, connections);
                }

                lock (connections) {
                    connections.Add(connectionId);
                }
            }
        }

        public IEnumerable<string> GetConnections(T key) {
            HashSet<string> connections;
            if (_connections.TryGetValue(key, out connections)) {
                return connections.ToList();
            }
            return Enumerable.Empty<string>();
        }

        public void Remove(T key, string connectionId) {
            lock (_connections) {
                HashSet<string> connections;
                if (!_connections.TryGetValue(key, out connections)) {
                    return;
                }

                lock (connections) {
                    connections.Remove(connectionId);

                    if (connections.Count == 0) {
                        _connections.Remove(key);
                    }
                }
            }
        }
    }

Update

Per dfowler's suggestion, an alternative approach would be to implement in-db mapping instead of in-memory, this way more metadata can be used to identify zombified connections. I'm hoping for a solution to the in-memory problem though, instead of re-architect away from a recommended approach that's already implemented.

Amirhossein Mehrvarzi
  • 18,024
  • 7
  • 45
  • 70
SB2055
  • 12,272
  • 32
  • 97
  • 202
  • Have you tried with different configurations for the DisconnectTimeout, KeepAlive and Heatbeat intervals? https://github.com/SignalR/SignalR/wiki/Configuring-SignalR – thepirat000 Jan 11 '14 at 21:45
  • @thepirat000 - yeah I have... this happens on the scale of hours/days too - connections are staying alive in memory indefinitely. – SB2055 Jan 11 '14 at 21:48
  • Don't use database for online/offline detection.. that should be part of the server just do `connections.Count` or `connections.Length` to get the count. Because if your server crashes you'll have many online users when you just start it up.. – SSpoke Jan 12 '14 at 03:55
  • @SSpoke - that's fine, but the problem remains: OnDisconnected() is not getting called when it should be, so connections are persisted. As far as UX goes, I've gained nothing with your suggestion as users will be displayed as online when they're not online. – SB2055 Jan 12 '14 at 04:42
  • @SB2055 when isn't disconnected being called? – davidfowl Jan 12 '14 at 05:49
  • @dfowler - if I have IE, FF, Chrome open with different users/sessions, open up chat, and then wait for a while, say 20-30 mins, then close the browsers, OnDisconnected does not get called. This has happened to me 3-4 times tonight, and I don't know how to debug it. I'm using a worker role and `DateLastActiveInChat` property to clean up in memory+db for now. – SB2055 Jan 12 '14 at 05:54
  • 1
    Try turning tracing on for the transports https://github.com/SignalR/SignalR/blob/master/samples/Microsoft.AspNet.SignalR.Samples/Web.config#L27. It'll show details for specific connection ids as well as lifetime events that get raised or not raised in your case. – davidfowl Jan 12 '14 at 06:03
  • @dfowler - fantastic, will do. – SB2055 Jan 12 '14 at 06:05
  • 1
    You can't count on getting a nice message from the client when they leave; networks can just vanish (and actually do so all the time). Use your heartbeat code to handle disconnects, except when you're _lucky_ enough to get an explicit session-end notification. Don't count on being lucky. – Donal Fellows Jan 12 '14 at 11:50

1 Answers1

21

Try following this sample here:

https://github.com/DamianEdwards/NDCLondon2013/tree/master/UserPresence

davidfowl
  • 37,120
  • 7
  • 93
  • 103
  • I was crossing my fingers in hopes that you'd see this. I'm looking now - what parts exactly? I'm using the in-memory approach to mapping users <> connections. This seems to use a different approach. – SB2055 Jan 12 '14 at 03:34
  • In case a full implementation details would help - here's everything: http://codereview.stackexchange.com/questions/39073/signalr-chat-navbar-implementation – SB2055 Jan 12 '14 at 03:42
  • It's using a database instead of in memory storage to handle restarts + scale out. It's just an expanded example and it's the only way to make it reliable. – davidfowl Jan 12 '14 at 05:23
  • Thanks @dfowler - Is it safe to say that devs shouldn't be depending on OnDisconnected to manage user state then? – SB2055 Jan 12 '14 at 05:35
  • Did you look at the sample? You need to look handle *all* of those events *and* also check presence on a timer with some logic to clean up zombie connections. So the answer is no. – davidfowl Jan 12 '14 at 05:36
  • Yeah, I did. I should have added "solely" to my "on OnDisconnected". I would expect this to handle disconnects reliably, but since that is not the case, if anyone wants to accurately track online users and disconnects, some form of worker role / timer needs to handle cleaning up the zombie connections. Does *this* mean that the in-memory approach is unsafe, because there are zombie connections that will eventually overflow without manual cleanup? – SB2055 Jan 12 '14 at 05:51
  • 2
    The in memory approach has some obvious weaknesses: - If the server restarts you lose all state - If you have multiple servers it'll be inaccurate (as each server will have a different view of who is "online") If the 2 above problems aren't a concern then it should be fine as long as you understand when OnReconnected and OnDisconnected would happen. OnReconnected can fire for e.g. after an app pool/app domain recycle where you lose all of your state and some clients will reconnect and others will disconnect and other will connect. – davidfowl Jan 12 '14 at 06:01
  • Understood. I meant more from an overflow perspective, but the more I think about it, the more I want to move to a more persistent media. While I have you - what are your thoughts on using the in-memory approach, but with Azure Cache instead of an Azure Table? Wouldn't this provide the best of both worlds (fast I/O + scales)? – SB2055 Jan 12 '14 at 06:04
  • 1
    I use azure cache... i think you will still run into problems. I solve this by pinging from the clients and setting a threshold of inactivity. Surefire way although not as real time. – KingOfHypocrites May 13 '14 at 15:42
  • I would add that if you use a pinging approach to combine with IUserProvider. The User Presence project takes a somewhat similar approach although probably the better way to go about it since with pinging you will get needless pings when they have more than one tab open. I'll likely switch to use the User Presence approach once I get some time. – KingOfHypocrites May 13 '14 at 16:05
  • Hi @davidfowl, could you please confirm that zombie clean up logic is still necessary since the changes to OnDisconnected() in https://github.com/SignalR/SignalR/releases/tag/2.1.0? Does it now reliably detect stale clients such that we can avoid needing this logic or is it still required? This task https://github.com/SignalR/SignalR/issues/3115 seems to suggest that it's no longer necessary. – ajbeaven Apr 10 '19 at 20:48
  • @ajbeaven did you find out whether the zombie clean up logic is still necessary? – Fahad S. Ali Jul 04 '21 at 15:24
  • 1
    Imagine the server crashes. What happens then? – davidfowl Jul 06 '21 at 06:43