CNN-Transfer Learning For Dog Breed Classification

Image Source:

This project is being done in order to fulfill the requirements of the Udacity Data Scientist nanodegree. Today in this post, I am discussing how I perform dog breed classification using CNN(Convolutional Neural Networks) transfer learning. CNN is a deep neural network, which is commonly used for image classification. Typically, a CNN architecture consists of convolutional layers, activation function, pooling layers, fully connected layers, and normalization layers. Transfer learning is a technique that allows a model developed for a task to be reused as the starting point for another task. Transfer Learning reduces training time without sacrificing accuracy. The model can be used to develop a web or mobile app to detect dog breeds from the image.

Project Overview

This project is being done in order to fulfill the requirements of the Udacity Data Scientist nanodegree. The main objective of this project is to classify the dog breed into 133 classes. The project includes scratch implementation of CNN and then transfer learning done to train CNN. Predict the dog breed from input image using the best model.

Problem Statement

In this project, we are provided with real-world images of dogs as well as labels describing the breed of the dog shown in the image. We are looking to create a solution trained on these labeled images of dogs to use on unlabeled dog images and still be able to determine the type of dog shown in the image.


The dataset used in this project contains 8,351 images of 133 categories of dogs. The data is separated into three folders for training, validation, and test set. The load_files function from the scikit-learn library is used to import the datasets.

from sklearn.datasets import load_files       
from keras.utils import np_utils
import numpy as np
from glob import glob
import matplotlib.pyplot as plt
# define function to load train, test, and validation datasets
def load_dataset(path):
data = load_files(path)
dog_files = np.array(data['filenames'])
dog_targets = np_utils.to_categorical(np.array(data['target']), 133)
return dog_files, dog_targets
# load train, test, and validation datasets
train_files, train_targets = load_dataset('../../../data/dog_images/train')
valid_files, valid_targets = load_dataset('../../../data/dog_images/valid')
test_files, test_targets = load_dataset('../../../data/dog_images/test')
# load list of dog names
dog_names = [item[20:-1] for item in sorted(glob("../../../data/dog_images/train/*/"))]
# print statistics about the dataset
print('There are %d total dog categories.' % len(dog_names))
print('There are %s total dog images.\n' % len(np.hstack([train_files, valid_files, test_files])))
print('There are %d training dog images.' % len(train_files))
print('There are %d validation dog images.' % len(valid_files))
print('There are %d test dog images.'% len(test_files))

Evaluation Metrics

For this particular project, we are only going to focus on an accuracy score. The goal of what we’re looking to do here is pretty simple: we want to see how well we can do at classifying breeds of dogs. Accuracy will be able to tell us in a simple and easy-to-understand way how well our deep learning model is performing in this regard.

Detect Humans

Since we want to identify the most resembling dog breed for a person, a function needs to be written to detect whether a human face exists in an image. This project uses a face detector called haar cascade provided by OpenCV. The gray image is feed to face detector function.

import cv2                
import matplotlib.pyplot as plt
%matplotlib inline
# extract pre-trained face detector
face_cascade = cv2.CascadeClassifier('haarcascades/haarcascade_frontalface_alt.xml')
# load color (BGR) image
img = cv2.imread(human_files[3])
# convert BGR image to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# find faces in image
faces = face_cascade.detectMultiScale(gray)
# print number of faces detected in the image
print('Number of faces detected:', len(faces))
# get bounding box for each detected face
for (x,y,w,h) in faces:
# add bounding box to color image

# convert BGR image to RGB for plotting
cv_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# display the image, along with bounding box

Dog Detector

Similarly, to classify dog breed first we need a dog detector function that determines whether dog exists in the input image. A pre-trained ResNet-50 model is used in this project to detect dogs in images.

from keras.applications.resnet50 import ResNet50# define ResNet50 model
ResNet50_model = ResNet50(weights='imagenet')

Keras CNNs require input images to be converted into 4D tensors, so some pre-processing is needed for the image data.

from keras.preprocessing import image                  
from tqdm import tqdm
def path_to_tensor(img_path):
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_path, target_size=(224, 224))
# convert PIL.Image.Image type to 3D tensor with shape (224, 224, 3)
x = image.img_to_array(img)
# convert 3D tensor to 4D tensor with shape (1, 224, 224, 3) and return 4D tensor
return np.expand_dims(x, axis=0)
def paths_to_tensor(img_paths):
list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)]
return np.vstack(list_of_tensors)

The ResNet50_predict_labels function takes an image path as input and returns the predicted label of that image using the pre-trained ResNet50 model. The ResNet50 dictionary shows that labels between 151 and 268 are all dogs, therefore the dog_detector function can take advantage of this logic to determine whether the input image contains a dog.

from keras.applications.resnet50 import preprocess_input, decode_predictionsdef ResNet50_predict_labels(img_path):
# returns prediction vector for image located at img_path
img = preprocess_input(path_to_tensor(img_path))
return np.argmax(ResNet50_model.predict(img))
### returns "True" if a dog is detected in the image stored at img_path
def dog_detector(img_path):
prediction = ResNet50_predict_labels(img_path)
return ((prediction <= 268) & (prediction >= 151))

Implementation: CNN Built from Scratch

Model Architecture
A CNN with 3 Convolutional layers followed by a max-pooling layer, global average pooling layer, and fully connected layers as dog classifiers.

Below is my model architecture:

Model Architecture

The full dataset has 8,351 dog images, which is not large enough to train a deep learning model from scratch. The model performance is very low since the data in the dataset are less and imbalanced.

Model performance

Refinement: CNN Built Using Transfer Learning

Since creating a CNN from scratch didn’t do too great, we’ll leverage one of the architectures packaged up nicely for us by Udacity. Given the options between VGG-19, ResNet-50, InceptionV3, and Xception, I selected the Xception architecture. With this architecture, I made some modifications to add a Global Average Pooling (GAP) layer, dropout layer, and fully connected dense layer as the output layer.

Xception Model Architecture

The bottleneck features for the Xception network were pre-computed by Udacity and then imported for later use by the transfer learning model.

After training the model with the same training set as before and then testing it against the same respective test set, we get significantly improved results! Whereas my scratch-made model was successfully roughly

Analyzing the Results

Now that we’ve created a model architecture using transfer learning, we’ll test out our model on both some dogs as well as other random images to see how it necessarily performed. I’m just going to select a few examples, but if you’d like to see more, you can go check them out in the Jupyter Notebook over in my GitHub repository.

Evaluation & Validation

As far as metrics go, we already mentioned above that the scratch-made dog breed classifier was accurate roughly 4% whereas our architecture with Exception had an accuracy closer to 83%.

Exception model accuracy

I ran a number of test images through it — both human and dog — to see how it performed, and it came up with some interesting results. As far as dogs go, it correctly knew it was a dog every time, and it also matched the dog breeds appropriately. (You can see these results in the Jupyter Notebook.)

predicting Labrador Retriever breed
predicting Brittany breed

When the image does not contain a human or a dog, it will tell you that there is no human or dog detected. For example, if I provide a cat picture to the model, it does not try to predict its breed, which is expected.

neither dog nor human


The initial model was a CNN from scratch, which did not work well. It only reached an accuracy of 3%, slightly better than a random guess. I think it is because the size of the dataset is relatively small and imbalanced, and the model architecture might not be well designed.


This project took one relatively simple approach to creating a CNN, but there a number of ways in which we could have improved the effort. Here are three things we could have altered along the way:

  1. More training data: Relatively speaking, we trained our model on a very low number of images. Most of these highly complex architectures are trained on vastly more images than our training dataset, so additional training data would almost certainly help to reinforce the outcomes we would expect.
  2. Data Augmentation: It is the method of multiplying the number of data. Since we have less training data we can apply this to multiply our dataset.
  3. Use of Different Optimizer: There are many other optimizers like Adam, Gradient Descent, and other optimizers that can be used. Using a different optimizer or evaluation metric may also improve model performance.

Get the full working code from the GitHub click here. Thanks for reading.

#Datascience #Deeplearning #AI