3

At least 100000 comparisions was needed when I tried to sort with IComparer<string>.

Why?

public class PeriodComparer: IComparer<string>
{
  int num = 0;

  public int Compare(string x, string y)
  {
    var year1 = int.Parse(x.Substring(2));
    var year2 = int.Parse(y.Substring(2));

    if (year1 < year2)
        return -1;

    if (year1 > year2)
        return 1;

    var season1 = x.Substring(0, 2);

    return season1 == "VT" ? -1 : 1;
  }
}

I tried to sort an array of strings with it like

var strings = new []
    {"VT2010", "HT2010",
    "VT2011", "HT2011",
    "VT2012", "HT2012",
    "VT2013", "HT2013",
    "VT2014", "HT2014",
    "VT2015", "HT2015",
    "VT2016", "HT2016",
    "VT2017", "HT2017"};


var comparer = new PeriodComparer();
var orderedPeriodNames = strings.OrderBy(x => x, comparer).ToList();

expecting the strings to sort first with respect to year and then with respect to VT and HT. (This means that the input in this particular case is already sorted).

However, the execution stalled so I was putting a counter in that Compare function like

public class PeriodComparer: IComparer<string>
{
  int num = 0;

  public int Compare(string x, string y)
  {
    if (++num >= 100000)
    {
      // setting a breakpoint here
    }

    var year1 = int.Parse(x.Substring(2));
    var year2 = int.Parse(y.Substring(2));

    if (year1 < year2)
        return -1;

    if (year1 > year2)
        return 1;

    var season1 = x.Substring(0, 2);

    return season1 == "VT" ? -1 : 1;
  }
}

The breakpoint was hit so at least 100000 comparisions seems to be needed.

Salman A
  • 262,204
  • 82
  • 430
  • 521
Anders Lindén
  • 6,839
  • 11
  • 56
  • 109
  • 1
    that does sound incredibly inefficient - Id add some debug log that writes what its comparing each time to get an idea of what its doing. – BugFinder Feb 06 '18 at 08:58
  • 3
    Your comparer is broken. You should check `season1` vs `season2` as well. Note that the comparer can be called for two equal items. – Ivan Stoev Feb 06 '18 at 09:00
  • 2
    `OrderBy` uses the `Sort` method of `EnumerableSorter`, which implements a QuickSort algorithm. Having more than 100.000 comparisons with only 16 elements seems **way** off. Might be related to you never returning `0` when two elements match, but then again you don't have any duplicates in this case ... – Manfred Radlwimmer Feb 06 '18 at 09:05
  • 2
    Adding `if (x == y) return 0;` at the beginning of the `Compare` method will fix it in your case. – Ivan Stoev Feb 06 '18 at 09:08

4 Answers4

2

If you add a debug statement to the comparer and call it on just two strings, you can see what is going on.

You get this output:

Comparing VT2010 to VT2010
Comparing VT2010 to HT2010
Comparing VT2010 to VT2010

repeated over and over. Apparently it cannot figure out where VT2010 should go in the result, because every time it think it found the correct position and does a final comparison, it turns out that it is too far to the right because still VT2010 < VT2010.

As correctly commented by Ivan Stoev,

Note that the comparer can be called for two equal items.

If you add the line

if(x == y) { return 0; }

at the top of the Compare method, the sorting algorithm will be able to figure out the correct relative order of VT2010 compared to both VT2010 and HT2010.

CompuChip
  • 9,143
  • 4
  • 24
  • 48
2

Your season comparison likely needs a bit of tweaking. Something like:

public class PeriodComparer : IComparer<string>
{
    int num = 0;

    public int Compare(string x, string y)
    {
        if (++num >= 100000)
        {
            Console.WriteLine(num);
        }

        var year1 = int.Parse(x.Substring(2));
        var year2 = int.Parse(y.Substring(2));

        if (year1 < year2)
            return -1;

        if (year1 > year2)
            return 1;

        var season1 = x.Substring(0, 2);
        var season2 = y.Substring(0, 2);

        if (season1 == "VT" && season2 != "VT")
            return -1;
        if (season2 == "VT" && season1 != "VT")
            return 1;

        return StringComparer.InvariantCulture.Compare(season1, season2);
    }
}

This will ensure that sort results are consistent.

To compare, call your existing code with the inputs of "HJ2014" and "HT2014". It will return 1. Now call it with "HT2014" and "HJ2014" - now it will still return 1. This is unexpected.

You have basically said:

"HJ2014" is greater than "HT2014"

and

"HJ2014" is less than "HT2014"

Obviously, both of these can't be true. Thus this "confuses" the sorting algorithm, and causes it to spin its wheels.

Similarly, your code will also claim:

"VT2014" is less than "VT2014"

which is clearly false. This causes issues since OrderBy uses QuickSort under the covers, whereby an entry may be compared to itself.

mjwills
  • 23,389
  • 6
  • 40
  • 63
2

When comparing arbitrary items, say A and B we must ensure that

  A == A
  ...
  whenever A > B then B < A

Please, notice, that in your case these rules are broken; moreover, you never threat strings as being equal; simple example

  var A = "VT2018";

  // Expected 0, Actual -1
  int result = (new PeriodComparer()).Compare(A, A);

Correct accurate (when dealing with public classes we must expect any input) implementation:

  public int Compare(string x, string y) {
    // Special cases: equal strings, nulls
    if (string.Equals(x, y))
      return 0;
    else if (string.Equals(null, x))  // null is smaller than any string
      return -1;
    else if (string.Equals(null, y)) 
      return 1;

    // Suffixes
    // TrimStart('0'): we want "0003" == "3" < "10":  
    string suffixX = x.Length <= 2 ? "" : x.Substring(2).TrimStart('0');
    string suffixY = y.Length <= 2 ? "" : y.Substring(2).TrimStart('0');

    // Natural order: Length first (2000 > 900)...
    if (suffixX.Length > suffixY.Length)
      return 1;
    else if (suffixX.Length < suffixY.Length)
      return -1;

    // ...Lexicograhical next: 2040 > 2030
    int result = string.Compare(suffixX, suffixY);

    if (result != 0)
      return result;

    // Equal Suffixes; compare prefixes
    string prefixX = x.Length <= 2 ? x : x.Substring(0, 2);
    string prefixY = y.Length <= 2 ? y : y.Substring(0, 2);

    if (prefixX == "VT" && prefixY != "VT")
      return -1;
    else if (prefixY == "VT" && prefixX != "VT")
      return 1;

    return string.Compare(prefixX, prefixY);
  } 
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
0

You're not catering for the case where the 'tie-breaker' comparison (of the substring "VT" vs. "HT") is the same.

Matt
  • 1,648
  • 12
  • 22