A 2 Class Keras Convolutional Neural Network Tutorial

Diego Fernandez
3 min readMar 1, 2023

--

What should a data scientist do if they are confronted with an image classification problem? Just throw a convolutional neural network (CNN) at it!

In this blog we will work with Python library Keras, known for its user-friendliness. This is the tool I used when learning about neural networks for the first time! The documentation can be found here: https://keras.io/.

I’m going to demonstrate a CNN I have developed using a Jupyter notebook. We begin with a dataset of microscopic-scaled photos of bacterial aggregation in wastewater. This CNN sought to classify images according to the quality of flocculation (clumping). The 2 classes are ‘good’ and ‘high’.

Despite the dirty water talk, we begin with a cleaned up dataset of 792 classified images and embark on writing some code.

The process consists of 6 key steps

1. Importing Libraries and Pasting Necessary Functions

2. Creating directories to store and label the data, then moving the data into train, validate, and test splits accordingly

3. Data Preparation: Setting Parameters and Instantiating ImageDataGenerator Objects and Creating Train/Validate/Test Batches

4. Creating the Simple Sequential CNN Model

5. Training/Fitting the Model

6. Predicting Using the Model: Plotting Confusion Matrix and Calculating Accuracy

These are broken down in the Jupyter notebook below. The quality of images in Medium is reduced, so please reach out to me if you are interested in a full pdf copy or the original notebook itself.

In short, this example CNN has poor accuracy as the training process was only iterated through one epoch, meaning the model only tried learning from the data once. This yielded an accuracy of 54% as seen above.

Other trials of the same experiment with more epochs, however, yielded significant improvements, with upwards of 84% on models trained over 13 epochs. In the future, an increase in the size of the dataset, and a better balance of the classes of the data will make a significant improvement in model performance.

--

--

Diego Fernandez

Data scientist with a background in computational physics.