I'm trying to make a predictive model for Diabetic Retinopathy Detection. The competition's trainig dataset includes hy-res images are unsymmetricaly divided in 5 classes: Normal-25807 images-73.48%; Mild-2442 images-6.96%; Moderate-5291 images-15.07%; Severe-873 images-2.48% and Proliferative-708 images - 2.01%. For this purpose I use Keras framework with Theano backend (for CUDA comutations).
For image augmentation I used the ImageDataGenerator (the code is below). I've resized images to 299x299 and divided them into 5 folders accordingly their classes:
train_datagen=ImageDataGenerator(rescale=1./255, rotation_range=40, zoom_range=0.2, horizontal_flip=True, fill_mode="constant", zca_whitening=True)
train_generator=train_datagen.flow_from_directory('data/~huge_data/preprocessed_imgs/', target_size=(299, 299), batch_size=32, class_mode='categorical')
At first, just for testing, I desided to use a simple convolutional model:
model=Sequential()
model.add(Convolution2D(32,3,3, input_shape=(3, 299, 299), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(5, activation='softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
In fitting Image generator, I pointed the class_weights in order to fix the asymmetry of data: class_weight ={0: 25807., 1:2442., 2:5291., 3:873., 4:708.};
model.fit_generator(train_generator,
samples_per_epoch=2000,
nb_epoch=50,
verbose=2,
callbacks=callbacks_list,
class_weight ={0: 25807., 1:2442., 2:5291., 3:873., 4:708.})
Problems:
- The model outputs with high loss and high accuracy. Why?
Epoch 1/50 110s - loss: 5147.2669 - acc: 0.7366
Epoch 2/50 105s - loss: 5052.3844 - acc: 0.7302
Epoch 3/50 105s - loss: 5042.0261 - acc: 0.7421
Epoch 4/50 105s - loss: 4986.3544 - acc: 0.7361
Epoch 5/50 105s - loss: 4999.4177 - acc: 0.7361
- Every image model predict as '0' class:
datagen_2=ImageDataGenerator(rescale=1./255)
val_generator=datagen_2.flow_from_directory('data/color_validation_images/',
target_size=(299,299),
batch_size=100,
class_mode='categorical')
y_predict=model.predict_generator(val_generator,
val_samples=82)
[np.argmax(i) for i in y_predict]
the output of it is:
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
without argmax(partly)
array([ 9.47651565e-01, 7.30426749e-03, 4.40788604e-02,
6.25302084e-04, 3.39932943e-04], dtype=float32),
array([ 9.51994598e-01, 6.50278665e-03, 4.07058187e-02,
5.17037639e-04, 2.79774162e-04], dtype=float32),
array([ 9.49448049e-01, 6.50656316e-03, 4.32702228e-02,
5.20388770e-04, 2.54814397e-04], dtype=float32),
array([ 9.47873473e-01, 7.13181263e-03, 4.40776311e-02,
6.00705389e-04, 3.16353660e-04], dtype=float32),
array([ 9.53514516e-01, 6.13699574e-03, 3.96034382e-02,
4.82603034e-04, 2.62484333e-04], dtype=float32),
....
If I've tried to use class_weight ='auto'. In this case, model showed 'predictable' output:
Epoch 1/50 107s - loss: 0.9036 - acc: 0.7381
Epoch 2/50 104s - loss: 0.9333 - acc: 0.7321
Epoch 3/50 105s - loss: 0.8865 - acc: 0.7351
Epoch 4/50 106s - loss: 0.8978 - acc: 0.7351
Epoch 5/50 105s - loss: 0.9158 - acc: 0.7302
But, it still doesn't work:
severe_DR=plt.imread('data/~huge_data/preprocessed_imgs/3_Severe/99_left.jpeg')
mild_DR=plt.imread('data/~huge_data/preprocessed_imgs/1_Mild/15_left.jpeg')
moderate_DR=plt.imread('data/~huge_data/preprocessed_imgs/2_Moderate/78_right.jpeg')
model.predict(mild_DR.reshape((1,)+x[1].shape))
array([[ 1., 0., 0., 0., 0.]], dtype=float32)
model.predict(severe_DR.reshape((1,)+x[1].shape))
array([[ 1., 0., 0., 0., 0.]], dtype=float32)
model.predict(moderate_DR.reshape((1,)+x[1].shape))
array([[ 1., 0., 0., 0., 0.]], dtype=float32)
What I've done wrong?
After answer of Sergii Gryshkevych, I fixed my model: I've changed class_weight to {0:1, 1:10.57, 2:4.88, 3:29, 4:35} (I divided images in each classes to maximum images (in first class)). Next, I changed metrics to categorical_accuracy. And I inctreased the number of layers in model (like here). So, the output after 5 epochs is:
Epoch 1/5
500/500 [==============================] - 52s - loss: 5.6944 - categorical_accuracy: 0.1840
Epoch 2/5
500/500 [==============================] - 52s - loss: 6.7357 - categorical_accuracy: 0.2040
Epoch 3/5
500/500 [==============================] - 52s - loss: 6.7373 - categorical_accuracy: 0.0800
Epoch 4/5
500/500 [==============================] - 52s - loss: 6.0311 - categorical_accuracy: 0.0180
Epoch 5/5
500/500 [==============================] - 51s - loss: 4.9924 - categorical_accuracy: 0.0560
Is it correct?
Is there any way to make assign a quadratic weighted kappa as metrics in keras?