This article is the first of 2-part series explaining how to build an image recognition neural network. The code below is available also as a Google CoLaboratory interactive notebook. I will pass the link at the very end of part 2, so that you can run the code on your own. I also recommend reading my article on Machine Learning first, so you are familiar with things like TPUs and Interactive Notebooks.

DISCALMER: Do not rush through this article, take your time to understand the libraries and the code itself. This is not simple stuff we are doing here even though we will be using data set from the hello world of classification. At the very end you will get to execute all the code you see here. Do not be impatient, build your knowledge around this codes context so that when you do execute it – you actually know what it does.

Building a simple neural network

So, without further ado I would like to demonstrate to you using Cloud TPUs in GoogleColab to build a simple neural network classification model using iris data set to predict the species of the flower. This model is using 4 input features (SepalLength, SepalWidth, PetalLength, PetalWidth) to determine one of the following flower species: Setosa, Versicolor, Virginica. We could do something more exciting like classifying health condition based on some smart body scan or combining similar commercial properties in different categories for the real estate listings. But for now we will analyse this hello world of classification so you can grasp the basic idea and get to use some of the tools involved in the process.

We will start by importing all the required libraries. It would be beneficial to you to at least quickly google libraries that we will be importing shortly. That way you’ll have more familiarity with the code I will be explaining to you. Here are some good tutorials for two most often used libraries that I definitely recommend you familiarize yourself with – numpy and pandas.

Importing TensorFlow

Once we’ve loaded the libraries we run a simple check on TensorFlow just to make sure it imported ok.

Next, we need to check for the artificial environment variable ‘COLAB_TPU_ADDR’. Its presence will indicate that we have a TPU resource available for us. If this fails, just go to “Edit” menu on top of the notebook and select “Notebook settings”. Once there, select TPU as our hardware accelerator, so that the cloud machine Google made available for this session will reconfigure to use TPU. We will also start a quick TensorFlow session just so that we can check what devices are available for our computations on the allocated machine. Format of output will show: name of the device, type of device CPU/GPU/TPU and finally the amount of memory allocated for each device.

Below we define parameters required later by the network training and evaluation phases:

Next we specify information about our training and testing data, including Urls under which we can find csv files. We also create an array of column names. This will be used by Panda library which we will use to load csv. This column names array will be used to create annotation on the Panda DataFrame object that will be created from the csv file. Annotations are purely informational, you can but don’t have to pass them. More on this in the next step.

Pandas helpers

Following function uses Pandas helpers to create 2 sets of data. First set is our training data. That way the information we will pass to our model for it to learn how to predict 3 classes of Iris flowers. Second set is basically a smaller version of the first one but containing different values. The split between training and testing data is usually 80% – 20%. Each set describes parameters of flowers in subset-x and the corresponding classes of flowers in subset-y. Training process will basically try to iterate over our network (which we will configure very soon) passing it TRAIN data and try to smart-guess best weights for neurons in each layer.

The goal of this exercise is to allow our network/model to predict test set classes of given Iris flowers as accurate as possible. So after each iteration it will test its accuracy against the TEST data and when its predictions are in acceptable range it stops, giving us a model that in theory now should be able to predict classes of Iris flowers for any input parameters from outside of the test / train data sets (How cool is this?).

Also we use read_csv provided by Pandas to create DataFrames. They are basically annotated arrays with a lot of metadata and helper functions. It’s data scientists’ preferred way of loading csv files to memory and here you can read more about the reason for this.

Now we are ready to do the network configuration, this is where we shape its layers, inputs and outputs, a very important step that you are likely to experiment with the most – trying and comparing different configurations. Bellow we configure Sequential model/network. Its good for most deep learning problems. But you should also be aware of a Functional model. It’s slightly more complex and error prone but allows more flexibility. Both sequential and functional models have layers that we need to configure and each layer has its parameters.

As you can see bellow we are using .Dense() function which basically defines type of layer being added. Types usually differ in the number of inputs and outputs as well as the way they are connected. Dense (also called fully connected) means inputs and outputs are fully connected between layers. All neurons from layer n are connected with all layers in the following layer n+1. At this point it’s not important that you know convolutional layers (you would add them with the .conv(…) method). But if you are interested, you can see the comparison of Dense and Convolutional layer bellow. Convolutional Layer is often called a filter and is well suited for image processing. More on this here

Final word on ReLU

ReLU is basically a function which eliminates negative values. You can imagine if the previous layer provided some negative values to the following one. Some neurons may not activate due to the 0 value set by their ReLU activation function. ReLU and a Sigmoid are two very common activation functions that are well suited for a variety of cases. I will tell you about in the future articles.

Keeping track of our progress

We won’t learn anything more in this already lengthy article. But we’ve made a lot of progress. We’ve initialized our environment and made all the preparations of the test data. We’ve also configured the network model itself. In the next article we’ll compile the model and deal with the training, evaluation and the predictions themselves. Now I strongly suggest going through the code and dissecting it in mind trying to understand what is happening here. Try writing down questions about anything that is unclear. You can either look for it in part 2 of this series. Or google it like I did.

Also feel free to leave any thoughts or/and questions in the comments. I will try to help as much as I can.

Software developer at Aspire Systems Poland. Problem solver. The more complicated the problem is, the more motivated he gets. Whether it’s designing, improving processes, architecture or coding, he will be the first one to jump right in.