11

Update: It seems that I am not being clear enough of what exactly I am asking (and as the question developed over time I also lost track a bit), so here is a tl;dr version:

var test1 = a is byte & b;    // compiles
var test2 = a is byte? & b;   // does not compile
var test3 = a is byte? && b;  // compiles

This means - as I understand - that the ? type modifier has lower precedence (as it's not an operator this might not be the best word) than the & operator, but higher than the && operator. Is it so? Where is this described in the standard?


And the original question:

While trying to figure out the answer to the second puzzle from Jon Skeet's excellent blogpost, A Tale of two puzzles, I faced a problem:

unsafe private void Test<T>(T a, bool b) 
{
    var test1 = a is byte? & b;         // does not compile
    var test2 = a is byte? && b;        // compiles
    var test3 = a is byte ? & b : & b;  // compiles
}

Here I am using an unsafe context as my actual goal requires it (e.g.: the third line), but it is not necessary for reproducing the issue I raise. (However it might have an effect, as it introduces the address-of operator as an alternative for the & symbol.)

The first line does not compile (the others do), it gives the following error message:

Syntax error, ':' expected

This means that in that case the compiler sees the line as

var test1 = (a is byte) ? &b [: missing part that it complains about];

While in the second line it sees it as:

var test2 = (a is byte?) && (b);

I checked operator precedence (here), and the order (from highest to lowest) is the following: &, &&, ?: , so this alone does not explain why the first line does not compile while the second does (Or at least not for me - maybe this is where I am wrong...) Edit: I understand why the second compiles, so please don't concentrate on this in your answers.

My next hunch was that somehow the precedence (if there is such thing) for the ? type modifier is somewhere between those two (or in fact three) operators (& and &&). Can it be so? If not could someone please explain the exact behavior that I am experiencing? Is the evaluating order of ? type modifier clearly described somewhere in the standard?

Edit: I am also aware that there is an unary address-of operator (actually that's the trick I am trying to use for the solution...), which plays a role here, but my question still stays the same.

In the same unsafe context those happily compile:

var test1 = true & b;
var test2 = true && b;
// or
var test1 = a is byte & b;
var test2 = a is byte && b;

So I think it must be related to the ? modifier/operator, and not solely to the address-of operator taking precedence (otherwise the two test1 lines would not compile).

P.S.: I know that I can add parentheses to my code, so it will compile, but I want to avoid that:

var test = (a is byte?) & b;   // compiles

Update: I experimented with Roslyn a little, and I thought it might be a good idea to attach the ASTs for the various statements:

var test1 = a is byte & b;

AST for test1 line

var test2 = a is byte? & b;

AST for test2 line

var test3 = a is byte? && b;

AST for test3 line


I want to emphasise that I am not looking for a solution for the original problem in the linked article (of course I am looking for one, but I ask you not to give an answer here please, as I would like to find that out on my own.) Also, please don't comment if I am on a completely wrong track in finding the solution, I only mentioned the puzzle to give some context for my specific problem, and to provide a good-enough excuse for writing code like this.

qqbenq
  • 10,220
  • 4
  • 40
  • 45
  • 2
    Nice question. For now I give up - the "Grammar ambiguities" section in the language spec. doesn't cover this case. Just one thing you might find interesting is that the internet seems to be pondering about this issue [for ~8 years](http://www.progtown.com/topic15933-grammar-ambiguities-in-c-20.html) ;) – BartoszKP Jun 19 '14 at 13:06
  • This is speculation, but I suspect the issue is that the parser only looks ahead a few tokens, and when it's parsing the `?`, it can only see the `& b` or less. `a is byte ? & b` is ambiguous, and it could either mean a nullable type or a ternary operator. If it could look ahead and see the semicolon, the ambiguity would be resolved. It's probably due to limited [lookahead](http://en.wikipedia.org/wiki/Parsing#Lookahead) and not operator precedence per se. – Kendall Frey Jun 19 '14 at 13:34
  • @KendallFrey Well it's obviously not "operator precedence per se", because `?` in `byte?` is a type modifier, not an operator. I think you're correct, that the problem is much earlier, somewhere more on the lexer level. – BartoszKP Jun 19 '14 at 13:56
  • @BartoszKP The lexer isn't the problem, since in both ambiguous cases the token list is `a,is,byte,?,&,b`. It's most likely the parser (or possibly something a little later in the chain). – Kendall Frey Jun 19 '14 at 14:40
  • @KendallFrey Yes, you're right. Hence the "somewhere more" ;) – BartoszKP Jun 19 '14 at 14:52
  • @qqbenq You might leave out the `unsafe` part as it's irrelevant. Also, talking about *precedence* here is misleading - as noted already, `?` in `byte?` is not an operator, and the word *precedence* is most commonly related to operators. – BartoszKP Jun 19 '14 at 15:13
  • @BartoszKP I think the unsafe context might matter because of the address-of operator, and I understand that the word precedence is not the best choice, but what can you suggest instead? – qqbenq Jun 19 '14 at 15:18
  • @qqbenq It doesn't, because the error is exactly the same without the `unsafe` keyword. Sorry, no better term comes to my head other than "parsing issue" :) – BartoszKP Jun 19 '14 at 20:26

2 Answers2

6

The clue here is the unsafe keyword. There are actually two different & operators - bitwise-AND & operator you're used to, but also the address-of & operator, which not only has higher precedence, but is evaluated right-to-left, like all unary operators.

This means that &b is evaluated first, and results in a pointer value. The rest of the statement is, as the compiler complains, unparseable. It's either (a is byte?) (address), or (as the compiler tries to parse it) (a is byte) ? (address) and is missing the :.

I get the same compile error when replacing & with + or -, both symbols that can be either unary or binary operators.

The reason the second statement compiles fine is that there isn't a unary, right-to-left high-precedence && operator.

Avner Shahar-Kashtan
  • 14,492
  • 3
  • 37
  • 63
  • I think you are on the right track, but it hasn't anything to do with the `unsafe` keyword. Removing it will give the same errors. – Patrick Hofman Jun 19 '14 at 11:30
  • Thanks, but if I change the 'a is byte?' part to simply 'true', then it will happily compile. e.g.: var test = true &b; Shouldn't the unary operator take precedence here too? – qqbenq Jun 19 '14 at 11:31
  • I thought this was the case at first too, but if I change the operator to `+`, which is also both unary and binary, it gives a different error, on the operator itself. "; expected" – Kendall Frey Jun 19 '14 at 11:34
  • @qqbenq I think this is because the `is` operator has a lower precedence that the &. – Avner Shahar-Kashtan Jun 19 '14 at 11:35
  • @KendallFrey I get the same error, "Syntax Error, ':' expected' for both & and +. – Avner Shahar-Kashtan Jun 19 '14 at 11:37
  • @AvnerShahar-Kashtan Thanks for your answer (I already +1d it), however I feel it doesn't exactly address my question, as in the case 'a is byte? & b;' the exact precedence is still unclear for me: when is the '?' type modifier evaluated? Why has an operator higher precedence than that modifier? Where is it documented? – qqbenq Jun 19 '14 at 12:29
  • @qqbenq Indeed, the ? in `byte?` is something different. It's not an operator, but a compiler shorthand for `Nullable`. I'll see if the C# language spec have anything to offer. – Avner Shahar-Kashtan Jun 19 '14 at 17:22
  • @AvnerShahar-Kashtan Yes, I know that, and I already checked the specs, my question is mainly about the evaluation order of the ? type modifier and the different operators. – qqbenq Jun 19 '14 at 18:43
2

To be honest I am not quite sure whether i should post this as an answer or add this information to the - already quite verbose - question, but I finally found why it behaves that way. (But I still think it is not explicitly described in the standard, and that it is in fact a limitation of the current implementation of the compiler.)

Also, I am not going to accept my own answer for a while, hoping that someone might be able to give a better answer alternative.

I spent a little time with Roslyn, and I debugged through the lexing and parsing of the various statements from this code:

var test1 = a is byte & b;
var test2 = a is byte? & b;
var test3 = a is byte? && b;

The exact syntax trees are already added to the question, so I am not going to repeat them here.

The difference between the statements comes from this part of the compiling process (from LanguageParser.cs):

private TypeSyntax ParseTypeCore(
    bool parentIsParameter,
    bool isOrAs,
    bool expectSizes,
    bool isArrayCreation)
{
    var type = this.ParseUnderlyingType(parentIsParameter);

    if (this.CurrentToken.Kind == SyntaxKind.QuestionToken)
    {
        var resetPoint = this.GetResetPoint();
        try
        {
            var question = this.EatToken();

            // Comment added by me
            // This is where the difference occurs 
            // (as for '&' the IsAnyUnaryExpression() returns true)
            if (isOrAs && (IsTerm() || IsPredefinedType(this.CurrentToken.Kind) || SyntaxFacts.IsAnyUnaryExpression(this.CurrentToken.Kind)))
            {
                this.Reset(ref resetPoint);

                Debug.Assert(type != null);
                return type;
            }

            question = CheckFeatureAvailability(question, MessageID.IDS_FeatureNullable);
            type = syntaxFactory.NullableType(type, question);
        }
        finally
        {
            this.Release(ref resetPoint);
        }
    }

    // Check for pointer types (only if pType is NOT an array type)
    type = this.ParsePointerTypeMods(type);

    // Now check for arrays.
    if (this.IsPossibleRankAndDimensionSpecifier())
    {
        var ranks = this.pool.Allocate<ArrayRankSpecifierSyntax>();
        try
        {
            while (this.IsPossibleRankAndDimensionSpecifier())
            {
                bool unused;
                var rank = this.ParseArrayRankSpecifier(isArrayCreation, expectSizes, out unused);
                ranks.Add(rank);
                expectSizes = false;
            }

            type = syntaxFactory.ArrayType(type, ranks);
        }
        finally
        {
            this.pool.Free(ranks);
        }
    }

    Debug.Assert(type != null);
    return type;
}

And the same result would occur in case of symbols after the byte? part for whose this function returns anything but SyntaxKind.None:

public static SyntaxKind GetPrefixUnaryExpression(SyntaxKind token)
{
    switch (token)
    {
        case SyntaxKind.PlusToken:
            return SyntaxKind.UnaryPlusExpression;
        case SyntaxKind.MinusToken:
            return SyntaxKind.UnaryMinusExpression;
        case SyntaxKind.TildeToken:
            return SyntaxKind.BitwiseNotExpression;
        case SyntaxKind.ExclamationToken:
            return SyntaxKind.LogicalNotExpression;
        case SyntaxKind.PlusPlusToken:
            return SyntaxKind.PreIncrementExpression;
        case SyntaxKind.MinusMinusToken:
            return SyntaxKind.PreDecrementExpression;
        case SyntaxKind.AmpersandToken:
            return SyntaxKind.AddressOfExpression;
        case SyntaxKind.AsteriskToken:
            return SyntaxKind.PointerIndirectionExpression;
        default:
            return SyntaxKind.None;
    }
}

So the problem is that after an is (or an as) operator, when we face a ? token, then we check if the next token can be interpreted as a unary operator and if so: we don't care about the possibility of the ? token being a type modifier, we simply return the type before it, and will parse the rest accordingly (there are more conditions to be met, but this is the relevant information regarding my question). The irony in that is that the & symbol can't even be a unary operator, only in an unsafe context, but this is never taken into consideration.

As others have pointed out in comments, maybe this issue could be solved if we looked ahead a little bit more, e.g.: in that particular case we could check if there is a matching : for the ? token, and if not, than ignore the possibility of the unary & operator and treat the ? as a type modifier. If I will have the time, I will try to implement a workaround, and see where it will cause even greater problems :) (Luckily there are a lot of tests in the Roslyn solution...)

Thanks for everyone for their feedback.

qqbenq
  • 10,220
  • 4
  • 40
  • 45
  • Great job! I wish everyone had so much determination for analysing, getting to the bottom of an issue and solving it :) – BartoszKP Jul 09 '14 at 20:48