Telling the computer what to do is fun, but have you ever wondered how you can simply tell the computer one thing and it extends to another thing? I mean allowing the computer to make future executions without having to wait for exclusive commands from the programmer. Oh yeah, allowing the computer to indirectly make inference without waiting for input from the programmer.
As a software developer for the past three years, I have always given direct instructions to the computer, but then I began to ask whether we could give fewer instructions to the computer and get more output. I began this thought path a few months back, and it has led me to machine learning, the field where you can basically teach a computer to learn.
It is more than simply teaching the computer to learn; it is about setting the computer up to serve as a knowledge sponge so that it accumulates any information that comes its way. Resources about machine learning abound on the internet, but from my own experience, I want to shed a little light on the basic ideas and the most important things to focus on.
What Is Machine Learning?
When you started out in school, you learned things by first being introduced to them and, after a while, by answering questions about them. You basically learned from data.
Probably you remember the first time you tried to identify an animal or a letter of the alphabet and you missed it? It took you one or more tries to fully learn, and from there, you became an alphabet or animal identifier.
The same thing happens with machine learning.
Machine learning is teaching computers to learn from data. It basically involves exposing computers to data and allowing the computer to make decisions and suggestions based on the data. The computer is exposed to some letters or to some animals, and after a while, the computer is asked to identify them.
In essence, machine learning sets up a system where the software learns for all the days of its life as long as the computer is exposed to more data. To learn is to able to make inference about new information from previous information.
The examples that are used to teach the computer are called the training set. Each example is known as the training instance. Simply storing information in your memory without making inference from it is not considered learning.
What’s the Difference Between Machine Learning and a Normal Computer Program?
A computer program is usually a step-by-step instruction of how a computer should behave. A human determines the exact steps and gives those steps to the computer to execute. As such, the computer does not make new inference beyond what the programmer has written.
However, machine learning involves giving the computer the basic rules it needs to execute a task and then leaving it to continue learning on its own. The operations of a machine learning program are not always controlled by a human.
Due to this capability, a spam-filtering, machine-learning program does not wait for a human to tell it how to classify every single email it receives; it simply learns from the data it was exposed to. If the data changes, the computer is able to understand the change and adjusts to the change automatically.
What Fields Does Machine Learning Apply To?
Machine learning is applicable in fields where tasks are executed based on data. For example, it can be applied to law, since lawyers have to make cases based on data. It can be applied to medicine, since doctors have to prescribe medications based on data about patients. So essentially, machine learning can be applied to any field where data are generated and where experience is based on access to more data.
For software development, machine learning leans toward understanding how programmers write code so that programmers can write less and better code. In the area of infrastructure on which our software runs, machine learning can help improve understanding when the infrastructure has too much load to process and can indicate how that can be improved. Also, areas like design can be greatly improved.
Types of Machine-Learning Systems
Machine learning can be classified based on different paradigms. Some are:
- Human supervision
- Data acquisition rate
- Generalization method
Machine learning can be classified into four categories under this paradigm:
Supervised Learning: While teaching kids the alphabet, a teacher shows the letter and gives its name. The teacher gives the letter and also labels it. Since a teacher supervises when this learning is taking place, it is called supervised learning.
In machine learning, data passed to the computer contains the desired solution. The desired solution is called the label. The label serves as the teacher during the learning process.
Supervised learning can be categorized into classification and regression. In classification, the system learns to classify data into one class or another, while in regression, the system learns to produce a number for a certain input.
Unsupervised learning: When a teacher is not available to guide and supervise your learning, it is termed unsupervised learning. So in machine learning, it means the data supplied to the computer are not labeled. Usually, the computer finds patterns on its own. Some types of unsupervised learning include clustering, dimensionality reduction (feature extraction), anomaly detection, and association-rule learning.
- Clustering – grouping unlabeled data into similar groups based on certain patterns detected by the machine-learning program.
- Dimensionality reduction – reducing the number of features used in the training example or simply grouping related features into smaller numbers. This is usually a necessary step before passing on training examples to other machine learning algorithms.
- Association-rule learning – digging through large amounts of data and discovering interesting relationships between attributes.
- Anomaly detection – detecting anomalies or outliers in data.
Semi-supervised learning: This simply combines supervised and unsupervised learning. It means that some of the data passed to the computer are labeled, while some others are unlabeled. Usually, there is more unlabeled data than labeled data.
Reinforcement learning: Remember when you started learning to play a certain game? The things you learned were the things to avoid so that you did not end your game. You learned things that improved your life span and ran away from things that sought to harm you. It’s the same in reinforcement learning.
The learning environment contains an agent that finds the right actions to perform and is rewarded or punished based on the tasks. It simply seeks to find what the best line of action (known as the policy) is to take at every point. So in the game analogy, the player is the agent, and the tricks or steps to avoid being killed are called the policies.
Data Acquisition Rate
To learn, you need data, and the rate at which you get and consume data is one of the categories of classification for machine learning. Under this category, there is batch (one time) and online (incremental) learning.
Batch or one-time learning: The system is trained in a single batch with all the available data. If new data are available, the old and new data are combined together into one batch, and the system is trained again from scratch. As such, this method takes a lot of time and resources and is done offline, hence the name offline learning. After training, the system is deployed online again for use.
Online or incremental learning: The system continuously learns from new data, not from a single batch, like in batch learning. In this case, the system keeps learning without having to stop to retrain. It continuously learns. The data are fed in individually or in small groups or batches.
Under the generalization paradigm, machine learning can be divided into instance-based and model-based learning.
Instance-based: Learning is done by comparing new data with data already stored in memory.
Model-based: Rather than learning from stored training examples, we can decide to build a model based on them. From the model, we can make predictions. This is called making an inference.
The Time to Start Is Now: Integrate Machine Learning Today
Machine learning is all about data and patterns, and if you look at what software development is about, you’ll see it’s the same thing. It is important that software developers begin to look at how they can integrate machine learning into their daily activities. For designers, machine learning helps you better understand how your users interact with your design and how it can be improved.
For software developers, machine learning helps you build intelligent systems. Machine learning makes software development more about building an intelligent machine than a machine that simply expects input all the time from the user. In essence, machine learning is about making intelligent systems, not just garbage in, garbage out systems.
And the time is now for every software developer to understand the beauty of machine learning and how it can be applied to their daily tasks.
|Published on Java Code Geeks with permission by Idris Azeez, partner at our JCG program. See the original article here: My Journey Towards Machine Learning|
Opinions expressed by Java Code Geeks contributors are their own.