Machine learning is a hot topic at the moment. However, not everyone knows how to use it in practice. You will learn it in our machine learning blogpost series.

Machine learning has been a popular topic for a while now but there still are few resources that you could use. That is why we are starting this new series that will let you gather knowledge on the following topics:

  • What machine learning is and how to use it in practice
  • How to validate your solutions
  • What tools to use when working with machine learning solutions

The posts you will get in the series will be clear and easy to digest and the next topics will depend on what you are particularly interested in. So if you are willing to gain more advanced knowledge, just let me know and I will deal with such topics in the subsequent parts of the series.

Using machine learning is similar to being a scientists performing experiments. Every day you look for new components or experiment in various conditions.

Before we start our scientific career though, let’s prepare the lab. It may be less interesting part of the job, still that’s where we have to start.

Technologies

To start using machine learning technologies you can use any language that has:

  • Diagram module
  • Learning algorithm module
  • Module for loading and processing data (database connection, CSV files)

In our machine learning guide we are going to use Python and its libraries. What have we chosen Python? Here is why:

  • You can easilty acquire the language semantics
  • It can be run on any operating system
  • Jupyter Notebook uses Python as a language (more information about it – below)
  • It is popular among academics

Installation of the environment

The Anaconda suite is the easiest to install and to get the environment to experiment in.

If you have more experience with Python, instead of installing Anaconda you can install necessary libraries using pip (the commands below can be helpful if you set up a server on which Anaconda can’t be installed):

How to check your installation

To check if the installation process has been completed correctly, run the script you will find here.

Surely, when it comes to security issues, it’s not the best practice to run scripts you have found on unknown sites. But, no risk no fun, as they say. XD

To run scripts in Python, use the command line and enter the file name:

As a result you will see a series of statistics and a window similar to this one:

pic1

The script you see above is a program that learns numbers based on images. We are not going to analyse it at the moment as it is too complicated at the beginner level.

Jupyter notebook

One more tool, called Jupiter notebook, may come in handy. It is automatically installed with the Anaconda suite. If you know your way around Python, here is how to install it using pip.

The tool that can be run in the browser on a local server is a notepad into which you can enter a few types of cells:

  • Executable Python script (that can be used in other languages, too).
  • Markdown documentation
  • Clean text

Why is Jupiter so helpful?

  • If you work in a team other developers can use your diagrams (the way you can work with diagrams makes the tool so great)
  • You can combine various data sources
  • You can quickly do data verification – make an inquiry, filter and calculate statistics (median, mean and others)
  • You can easily store it in the version control system – notepads are json files

You can test Jupyter notebook here.

Optional task

As soon as you have installed Jupyter, create a new notebook. Add a cell with script, enter the contents of plot_rbm_logistic_classification.py and run the script. You should see a window at the end and your task is to display the diagram in the browser, not in the window.

Tip: You can add one more cell with the script and divide it into smaller parts. In this way you can have the entire learning script in one cell and the script used for drawing diagrams in the other one. You will not waste your time starting the learning process once again.

The most boring part behind us! J In the next part of the series we are going to deal with the basic theory and do the first experiment. Stay tuned and if you have any problems of questions – leave a comment.

 

Software Developer and Team Leader at Goyello, constantly challenging his status quo. He wants to change the world and values knowledge gained in practice more than certifications and academic titles.