0

I have a simple class which holds a primary key of which I don't know what type it will be before it runs, as i'm getting the data from COM. It will either be an int or string.

I basically just need to fill up my toUpdateList & toAddList. This was working fine below with not too many records to play around with. However now the mongoDBList returns around 65k records and it's all turned very slow and it's taking 15+ minutes to resolve toUpdateList.

I'm pretty new to C# so I'm likely missing something.

I basically just need to compare one list to another and see if the RecordRevision is higher in the toUpdateList. For the toAddList this ones pretty simple as if it doesn't exist it needs to be added.

Thanks for looking I appreciate it!

class KeyRevision
{
    public dynamic RecordRevision;
    public dynamic PrimaryKey;
}

            List<KeyRevision> potentialUpdateList = new List<KeyRevision>();
            List<KeyRevision> mongoDBList = new List<KeyRevision>();
            List<KeyRevision> toUpdateList = new List<KeyRevision>();
            List<KeyRevision> toAddList = new List<KeyRevision>();

            var sql = env.ExecuteSQL(sqlQuery);

            sql.First();

            // Loop over them and add to array
            do
            {
                if (sql.RecordCount > 0)
                {
                    //Console.WriteLine(sql.GetPropertyValue(primaryKey).ToString() + " : " + sql.RecordRevision);
                    var o = new KeyRevision();
                    o.PrimaryKey = sql.GetPropertyValue(primaryKey);
                    o.RecordRevision = sql.RecordRevision;
                    potentialUpdateList.Add(o);
                }
                sql.Next();
            } while (!sql.EOF);

            // Ask mongo for docs
            var docs = collection1.Find(_ => true).Project("{RecordRevision: 1}").ToList();

            // Get them into our type
            mongoDBList = docs.ConvertAll(x => new KeyRevision()
            {
                PrimaryKey = x.GetValue("_id"),
                RecordRevision = x.GetValue("RecordRevision")
            });

            // Finds which records we need to update
            toUpdateList = potentialUpdateList.Where(x =>
                mongoDBList.Any(y => y.PrimaryKey == x.PrimaryKey && y.RecordRevision < x.RecordRevision)).ToList();

            // Finds the records we need to add
            toAddList = potentialUpdateList.Where(x =>
                mongoDBList.FindIndex(y => y.PrimaryKey == x.PrimaryKey) < 0).ToList();

            Console.WriteLine($"{toUpdateList.Count} need to be updated");
            Console.WriteLine($"{toAddList.Count} need to be updated");
Adam91Holt
  • 1,018
  • 1
  • 14
  • 28
  • 1
    I think there are two seperate problems; 1. Do not fetch all data from MongoDB which contains a RecordRevision - just use mongo db to run the `Finds which records we need to update`. 2. have you tried indexing your data? – daryal Jan 28 '19 at 13:34
  • You're doing very poorly optimized queries here and maybe also doing something that you shouldn't really do (an XY problem) – Fabjan Jan 28 '19 at 13:35
  • @Fabjan Thanks for your input. Could you suggest the best approach? I've spent the last few hours looking into joins and intersects. However the problem I have is when I change something it takes a long time to run, so I've come to seek help! – Adam91Holt Jan 28 '19 at 13:37
  • @Adam91Holt Show us the code where `potentialUpdateList` is populated with values – Fabjan Jan 28 '19 at 13:38
  • @Fabjan I have added that code now. the env variable is basically talking to a COM object which allows me to query the data I need (it's pretty old but I need to do it this way) – Adam91Holt Jan 28 '19 at 13:41
  • @daryal I was doing that initially and will do that for the daily sync as I may only be asking for say 100 docs max at a time. This code is just for initial sync and it needs to look at 3 years worth of records. – Adam91Holt Jan 28 '19 at 13:43
  • Let mongodb do the work for you. See here: https://stackoverflow.com/q/19974216/1736047 – STLDev Jan 28 '19 at 13:44
  • @Adam91Holt Yeah, open mongo management app such as 'RoboMongo'. Crate and run query that will do what you need to do. Return to C# code and use this query that should return exactly the result you're looking for. However if for some reason you *have to* fetch all records I'd suggest using dictionaries and for fast O(1) lookups. You could at least change `mongoDBList` to `Dictionary` and use `mongoDBList.ContainsKey()` – Fabjan Jan 28 '19 at 13:46
  • Thanks @STLDeveloper - however, I would like to try and take the load off the database and think this should be relatively simple to just get a list of what needs to update. As 65k records at 20kb soon adds up :) – Adam91Holt Jan 28 '19 at 13:47
  • 1
    @Fabjan I was already doing something along those lines, however mongo shit a brick when I used the $in operator with so many records like so. // Make array so we only query mongo for these records and not all BsonArray bArray = new BsonArray(); foreach (var rec in potentialUpdateList) { bArray.Add(rec.PrimaryKey); } //var filter = new BsonDocument("_id", new BsonDocument("$in", new BsonArray(bArray))); – Adam91Holt Jan 28 '19 at 13:48
  • Please explain "shit a brick". – nvoigt Jan 28 '19 at 14:44
  • @nvoigt - https://www.urbandictionary.com/define.php?term=shit%20a%20brick – Adam91Holt Jan 28 '19 at 14:47
  • @Adam91Holt How is that term helpful? Was there an actual error message? – nvoigt Jan 28 '19 at 14:50
  • @Fabjan The dictionary approach is like night and day and is perfect for what I'm trying to do. It literally took a second to run now :) Thanks for your input. – Adam91Holt Jan 28 '19 at 14:57
  • @Adam91Holt Nice! It is understandable since dictionary uses fast O(1) lookups and this change has reduced complexity from `O(N2)` to roughly `O(N)`... I'd suggest reading more about dictionary, hashtable, Lookup, O-notation and stuff like this... – Fabjan Jan 28 '19 at 15:09

0 Answers0