Machine learning is a hot topic at the moment. However, not everyone knows how to use it in practice. You will learn it in our machine learning blogpost series.
Machine learning has been a popular topic for a while now but there still are few resources that you could use. That is why we are starting this new series that will let you gather knowledge on the following topics:
What machine learning is and how to use it in practice
How to validate your solutions
What tools to use when working with machine learning solutions
The posts you will get in the series will be clear and easy to digest and the next topics will depend on what you are particularly interested in. So if you are willing to gain more advanced knowledge, just let me know and I will deal with such topics in the subsequent parts of the series.
Using machine learning is similar to being a scientists performing experiments. Every day you look for new components or experiment in various conditions.
Before we start our scientific career though, let’s prepare the lab. It may be less interesting part of the job, still that’s where we have to start.
To start using machine learning technologies you can use any language that has:
Learning algorithm module
Module for loading and processing data (database connection, CSV files)
In our machine learning guide we are going to use Python and its libraries. What have we chosen Python? Here is why:
You can easilty acquire the language semantics
It can be run on any operating system
Jupyter Notebook uses Python as a language (more information about it – below)
It is popular among academics
Installation of the environment
The Anaconda suite is the easiest to install and to get the environment to experiment in.
If you have more experience with Python, instead of installing Anaconda you can install necessary libraries using pip (the commands below can be helpful if you set up a server on which Anaconda can’t be installed):
python-mpip install pandas
python-mpip install scipy
python-mpip install scikit-learn
python-mpip install numpy
python-mpip install matplotlib
How to check your installation
To check if the installation process has been completed correctly, run the script you will find here.
Surely, when it comes to security issues, it’s not the best practice to run scripts you have found on unknown sites. But, no risk no fun, as they say. XD
To run scripts in Python, use the command line and enter the file name:
As a result you will see a series of statistics and a window similar to this one:
The script you see above is a program that learns numbers based on images. We are not going to analyse it at the moment as it is too complicated at the beginner level.
One more tool, called Jupiter notebook, may come in handy. It is automatically installed with the Anaconda suite. If you know your way around Python, here is how to install it using pip.
python–mpip install jupyter
The tool that can be run in the browser on a local server is a notepad into which you can enter a few types of cells:
Executable Python script (that can be used in other languages, too).
Why is Jupiter so helpful?
If you work in a team other developers can use your diagrams (the way you can work with diagrams makes the tool so great)
You can combine various data sources
You can quickly do data verification – make an inquiry, filter and calculate statistics (median, mean and others)
You can easily store it in the version control system – notepads are json files
As soon as you have installed Jupyter, create a new notebook. Add a cell with script, enter the contents of plot_rbm_logistic_classification.py and run the script. You should see a window at the end and your task is to display the diagram in the browser, not in the window.
Tip: You can add one more cell with the script and divide it into smaller parts. In this way you can have the entire learning script in one cell and the script used for drawing diagrams in the other one. You will not waste your time starting the learning process once again.
The most boring part behind us! J In the next part of the series we are going to deal with the basic theory and do the first experiment. Stay tuned and if you have any problems of questions – leave a comment.
We process cookies and make them available to Google Analytics (a service provided by Google, Inc.) to improve the performance of the website, to learn your preferences about using it and to tailor it to your needs. The data will be anonymised before being transmitted. If you do not agree to this, you may disable cookies in your browser. If you do not change your browser settings, you accept the fact that it saves cookies.