0

I want to be able to execute the following query:

select distinct studentname, subject, grade from studenttable;

The student table has the following fields:

studentid
studentname
subject
grade

The linq query I have now is:

var students=dc.Select(s => new {s.studentname, s.subject, s.grade}).Distinct();

dc is the data context.

This query works, but how do I get the student id while still satisfying the distinct condition on the studentname, subject, grade set?

David
  • 10,458
  • 1
  • 28
  • 40
Dan
  • 63
  • 2
  • 9
  • Might the [`GroupBy()`](http://msdn.microsoft.com/en-us/library/bb534492(v=vs.110).aspx) method do what you want? – Conduit Jan 14 '15 at 22:04
  • @DuaneKrause makes an excellent point which raises a question. You are getting the distinct list of names, subjects and grades. But if there are two Bob's both with an A in science, then you've lost who is who. Are you sure a distinct is what you want? – crthompson Jan 14 '15 at 22:17
  • @Dan I look at this issue from another angle. Many students might have the same names. But they are different persons and have different student IDs. For example, many students might be named John Smith in a large college. So I think you should simple use: var students=dc.Select(s => new {s.studentid, s.studentname, s.subject, s.grade}).Distinct(); – Jian Huang Jan 14 '15 at 22:35

3 Answers3

4

The issue here is that you've collected three properties of the necessary data which, while they will suffice to pass through Distinct, don't have any link back to their original data, and any such link would break the default implementation of Distinct.

What you CAN do, however, is to use the overload of distinct which takes an IEqualityComparer. This will allow you to compare for equality only on the desired fields, while running Distinct over the entire collection.

var students = dc
    .AsEnumerable() // In case you're using a linq-to-sql framework, this will ensure the query execute in-memory
    .Distinct(new SpecialtyComparer());

//...

public class SpecialtyComparer : IEqualityComparer<StudentTable>
{
    public int GetHashCode(StudentTable s)
    {
        return s.studentname.GetHashCode()
            && s.subject.GetHashCode()
            && s.grade.GetHashCode();
    }

    public bool Equals(StudentTable s1, StudenTable s2)
    {
        return s1.studentname.Equals(s2.studentname)
            && s1.subject.Equals(s2.subject)
            && s1.grade.Equals(s2.grade);
    }
}
David
  • 10,458
  • 1
  • 28
  • 40
2

I believe your design is broken, but I'll answer your specific question....

I'm assuming you're trying to group by name, subject and grade, and retrieve the first representative student of each group.

In this case, you can group by Tuples. A tuple will give you an Equals and GetHashCode method for free, so can be used in Group operations.

IEnumerable<Student> distinctStudents = students
                                           .AsEnumerable()
                                           .GroupBy(s => Tuple.Create
                                                               (
                                                                   s.studentname, 
                                                                   s.subject, 
                                                                   s.grade
                                                               )
                                                          )
                                           .Select(g => g.First()); /// Returns the first student of each group

Here is a dot net fiddle example: https://dotnetfiddle.net/7K13DJ

Andrew Shepherd
  • 44,254
  • 30
  • 139
  • 205
  • Got an exception with this message: LINQ to Entities does not recognize the method 'System.Tuple`3[System.String,System.String,System.String] Create[String,String,String](System.String, System.String, System.String)' method, and this method cannot be translated into a store expression. – Dan Jan 14 '15 at 22:48
  • @Dan - Ah. That would because LINQ TO Entities would be trying to interpret the method and convert it to SQL. I'll add the same trick Aravol used - use `AsEnumerable` to fetch all of the data and process it in Linq to Objects. – Andrew Shepherd Jan 14 '15 at 22:58
  • @Dan: OK, I've updated it now. (Unlike jsfiddle, the link remains unchanged when we update a dotnetfiddle). – Andrew Shepherd Jan 14 '15 at 23:07
  • Thanks for updating the fiddle. However, the groupby (from your example) and distinct (from what i initially had) are returning a different number of records. Any idea why that might be? – Dan Jan 14 '15 at 23:19
  • @Dan - You would have to generate the distinct lists from both approaches, and see where the differences are. I suspect the Linq provider doesn't handle Distinct very well. I've been burnt by that before: http://stackoverflow.com/questions/1484607/linq-distinct-returning-a-different-value-if-extension-method-used – Andrew Shepherd Jan 14 '15 at 23:35
1

When doing a Distinct() on a list of objects, you are aggregating those rows into a smaller number of rows, discarding any duplicates. So, your result will not have the studentid anymore. To preserve the studentid property, you need to use a GroupBy. This will return to you your key (a student of studentname, subject, grade), and a list of original rows. You can see an example in this question here: (GroupBy and count the unique elements in a List).

You will have a list of studentids to choose from, since there might be many rows with the same studentname, subject, and grade. (Otherwise you would not be doing a distinct -- they would be unique and the tuple { sudentname, subject, grade } would be a natural key). So the question you may need to ask yourself is "Which studentid do I need?"

Community
  • 1
  • 1