I'm facing some "impossible behavior" situation and I need to somehow debug it.
I have two threads, A and B. Thread B class has these three variables:
public class ThreadBClass() {
private static bool shouldStop = false;
private static objects shouldStopLock = new object();
public static void SetShouldStop()
{
lock( shouldStopLock ) {
shouldStop = true;
}
}
public void Run()
{
while( true ) {
lock( shouldStopLock ) {
if( shouldStop ) {
break;
}
}
doStuff(); << this thing queries SQL Azure database
}
}
}
Thread A has this
var bClass = new threadBClass();
var controlledThread = new Thread( bClass.Run );
thread B runs Run()
with an infinite loop periodically querying SQL Azure database.
Now these two threads run inside Windows Azure web role and when role is being stopped the following happens. Role OnStop()
is run in thread A and does this:
ThreadBClass.SetShouldStop();
const int count = 20;
for( int i = 0; i < count; i++ ) {
if( controlledThread.IsAlive ) {
break;
}
Thread.Sleep( 1000 );
}
If thread B happens not to be querying the database when the flag is being set everything exits cleanly. But something like 40% of the time the SQL database query fails with the exception:
System.Data.SqlClient.SqlException
Timeout expired. The timeout period elapsed prior to completion of
the operation or the server is not responding.
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.ReadByte()
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
at System.Data.SqlClient.SqlInternalConnection.BeginSqlTransaction(IsolationLevel iso, String transactionName)
at System.Data.SqlClient.SqlConnection.BeginTransaction(IsolationLevel iso, String transactionName)
//my code of Run() here
I've tried outputting the ThreadState
of each thread. Thread B state is Running
, so it was definitely not aborted.
So whole situation to me looks insane - one thread acquires a lock for a while and then polls the other thread for IsAlive
and exactly at that time the SQL query times out in another thread. And this is reproducible once in a while.
How do I debug this? Are there any events, global variables, whatever else that .NET runtime exposes and that I could use to find what's actually going on?