Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition)
()
About this ebook
The book starts with an introduction to problem-solving in machine learning and subsequently focuses on classification problems. It then explores the Naïve Bayes algorithm, a probabilistic method widely used in industrial applications. The application of Bayes Theorem and underlying assumptions in developing the Naïve Bayes algorithm for classification is also covered. Moving forward, the book centers its attention on the Logistic Regression algorithm, exploring the sigmoid function and its significance in binary classification. The book also covers Decision Trees and discusses the Gini Factor, Entropy, and their use in splitting trees and generating decision leaves. The Random Forest algorithm is also thoroughly explained as a cutting-edge method for classification (and regression). The book concludes by exploring practical applications such as Spam Detection, Customer Segmentation, Disease Classification, Malware Detection in JPEG and ELF Files, Emotion Analysis from Speech, and Image Classification.
By the end of the book, you will become proficient in utilizing classification algorithms for solving complex machine learning problems.
Read more from Partha Majumdar
Linear Programming for Project Management Professionals: Explore Concepts, Techniques, and Tools to Achieve Project Management Objectives Rating: 0 out of 5 stars0 ratings
Related to Mastering Classification Algorithms for Machine Learning
Related ebooks
From Novice to ML Practitioner: Your Introduction to Machine Learning Rating: 0 out of 5 stars0 ratingsEveryday Data Structures Rating: 0 out of 5 stars0 ratingsAnalysis and Design of Algorithms: A Beginner’s Hope Rating: 0 out of 5 stars0 ratingsIntroduction to Algorithms & Data Structures 2: A solid foundation for the real world of machine learning and data analytics Rating: 0 out of 5 stars0 ratingsDeep Learning for Data Architects: Unleash the power of Python's deep learning algorithms (English Edition) Rating: 0 out of 5 stars0 ratingsGROKKING ALGORITHM BLUEPRINT: Advanced Guide to Help You Excel Using Grokking Algorithms Rating: 0 out of 5 stars0 ratingsIntroduction to Algorithms & Data Structures 1: A solid foundation for the real world of machine learning and data analytics Rating: 0 out of 5 stars0 ratingsMachine Learning for the Web Rating: 0 out of 5 stars0 ratingsDeep Learning with C#, .Net and Kelp.Net: The Ultimate Kelp.Net Deep Learning Guide Rating: 0 out of 5 stars0 ratingsProgramming Massively Parallel Processors: A Hands-on Approach Rating: 0 out of 5 stars0 ratingsProgramming the Network with Perl Rating: 0 out of 5 stars0 ratingsApproximate Dynamic Programming: Solving the Curses of Dimensionality Rating: 4 out of 5 stars4/5OpenCL in Action: How to accelerate graphics and computations Rating: 0 out of 5 stars0 ratingsGROKKING ALGORITHMS: Simple and Effective Methods to Grokking Deep Learning and Machine Learning Rating: 0 out of 5 stars0 ratingsPractical C++ Backend Programming Rating: 0 out of 5 stars0 ratingsExperimentation for Engineers: From A/B testing to Bayesian optimization Rating: 0 out of 5 stars0 ratingsGROKKING ALGORITHMS: A Comprehensive Beginner's Guide to Learn the Realms of Grokking Algorithms from A-Z Rating: 0 out of 5 stars0 ratingsSystems Programming: Designing and Developing Distributed Applications Rating: 0 out of 5 stars0 ratingsPro Cryptography and Cryptanalysis: Creating Advanced Algorithms with C# and .NET Rating: 0 out of 5 stars0 ratingsComputational Learning Approaches to Data Analytics in Biomedical Applications Rating: 5 out of 5 stars5/5Mastering C++ Network Automation Rating: 0 out of 5 stars0 ratingsASP.NET Application Development Fundamentals Rating: 0 out of 5 stars0 ratingsEssential Algorithms: A Practical Approach to Computer Algorithms Rating: 5 out of 5 stars5/5Code with Java 21: A practical approach for building robust and efficient applications (English Edition) Rating: 0 out of 5 stars0 ratingsJulia for Data Analysis Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
2084: Artificial Intelligence and the Future of Humanity Rating: 4 out of 5 stars4/5Dark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Summary of Super-Intelligence From Nick Bostrom Rating: 5 out of 5 stars5/5Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5ChatGPT For Dummies Rating: 0 out of 5 stars0 ratingsThe Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Enterprise AI For Dummies Rating: 3 out of 5 stars3/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratings10 Great Ways to Earn Money Through Artificial Intelligence(AI) Rating: 3 out of 5 stars3/5THE CHATGPT MILLIONAIRE'S HANDBOOK: UNLOCKING WEALTH THROUGH AI AUTOMATION Rating: 5 out of 5 stars5/5ChatGPT for Marketing: A Practical Guide Rating: 3 out of 5 stars3/5Artificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Impromptu: Amplifying Our Humanity Through AI Rating: 5 out of 5 stars5/5AI for Educators: AI for Educators Rating: 5 out of 5 stars5/5ChatGPT for Screenwriters Rating: 0 out of 5 stars0 ratingsWhat Makes Us Human: An Artificial Intelligence Answers Life's Biggest Questions Rating: 5 out of 5 stars5/5
Reviews for Mastering Classification Algorithms for Machine Learning
0 ratings0 reviews
Book preview
Mastering Classification Algorithms for Machine Learning - Partha Majumdar
CHAPTER 1
Introduction to Machine Learning
Welcome to this book.
In this book, we will explore models for classifying data. We need to classify data for various purposes. For example, from piles of data regarding credit card transactions, we need to find out if there is any fraudulent transaction. So essentially, we are classifying the data into two classes – good transactions and fraudulent transactions. For example, from data regarding pictures of food items, we need to figure out if a food item would suit a diabetic patient.
Human beings are experts in classification in most situations. However, the data to classify is too large in the modern world. So, we need machines to classify as effectively as humans so that it is practical to meet the demand.
This book will discuss various models using which, machines can effectively classify data. Before we discuss these classification models, we start with a discussion on what machine learning is. We will also explore how machines can be made to learn.
Figure 1.1
Structure
In this chapter, we will discuss the following topics:
Machine learning
Traditional programming versus programming for machine learning
The learning process of a machine
Kinds of data machines can learn from
Types of machine learning
Supervised learning
Unsupervised learning
Objectives
After reading this chapter, you can differentiate between traditional programming and programming for machine learning. Also, you will understand what are the different problems that can be solved by machine learning.
Machine learning
Neuroscientist Warren S. McCulloch and Logician Walter H. Pitts published A Logical Calculus of the ideas immanent in the Nervous activity in 1943 in the Bulletin of Mathematical Biophysics, Vol 5. In this paper, they discussed a mathematical model of neural networks. This is the first attempt to make machines think like the human brain. ¹
What is being able to think
is a vast subject. We can make a simple abstraction, as shown in Figure 1.1. Thinking is a process of collecting data, finding patterns in the data, and making inferences from the patterns.
1 https://www.cse.chalmers.se/~coquand/AUTOMATA/mcp.pdf.
Figure 1.2: Abstraction of how thinking is performed
Let us discuss the process of thinking through an example. Suppose the data provided to us is a massive pile of medicines. On receiving this data, we could find patterns like which medicines are like each other. We may study the composition of the medicines and the manufacturers and many other attributes. We may classify the medicines as which medicine group is for curing what disease based on the patterns we find.
Machine learning is like this. We present data to the machine and sometimes provide information about the data. Based on this information and knowledge, the machine finds patterns and considers the mechanism to see them as its rules. Once the machine has formulated its rules, it makes inferences about a new situation.
Machine learning is a branch of Artificial Intelligence (AI). In machine learning, using mathematical modeling on data, a machine is made to learn the patterns in the data without any human intervention.
Traditional programming versus programming for machine learning
Programming for machine learning is different from traditional programming.
In traditional programming, we have data and rules. We apply the rules to the data to get the Output. Refer to Figure 1.3:
Figure 1.3: Traditional Programming
Consider this example from the world of Physics. When we want the computer to calculate the value of momentum, we tell the computer that the formula for momentum is the mass multiplied by the velocity and tell the computer the value of mass and velocity. Here, the value of mass and velocity is the Data. On this data, the computer applies the Rule, that is, the formula for momentum, to find the momentum value for us. The value of momentum calculated by the computer is the Output.
Momentum = Mass * Velocity
Generally written as,
Momentum = mv
In contrast to traditional programming, in machine learning, we supply the computer with Data and Output, and we expect the computer to generate the Rules as shown in Figure 1.4:
Figure 1.4: Programming for Machine Learning
Suppose we had a mechanism to get values of momentum from some experiment. And we knew the values of mass and velocity in each of the experiments. Now, if we want the computer to determine the formula for momentum, that would be a machine learning situation. So, we would input the values of mass, velocity, and momentum and tell the machine to determine the formula for calculating momentum.
The learning process of a machine
Let us discuss a simplistic way the machines learn. As you would imagine, the actual process is much more complex.
Consider that we have the following data, (Refer to Figure 1.5) from an experiment. We ask the machine to provide a relationship between momentum mass and velocity.
Figure 1.5: Input to a computer to create a Machine Learning Model
For the machine to build a model, the data scientist must tell the machine what model to make. Generally, the data scientist tries to understand the data. This step is called Exploratory Data Analysis. In the preceding situation, we have two independent variables, m, and v, and one dependent variable, M. We can plot this data on a 2-dimensional chart as shown in Figure 1.6:
Figure 1.6: Scatter Plot based on data in Figure 1.4
Let us say that data scientist decides to create a linear model of the form M = β0 + β1 * m + β2 * v. The machine needs to estimate the values of β0, β1, and β2.
The Data Scientist provides a starting value of β0, β1, and β2. Let us say that these values are β0 = 5, β1 = 5, and β2 = 5. Using these values, the machine calculates the values for M, as shown in Figure 1.7. We call the value calculated by the machine Mhat.
Figure 1.7: Initial estimates of Momentum (M) made by the machine
If we plot this data, we get the chart shown in Figure 1.8, where the dots are the actual values of M as provided to the machine. The stars are the values of M estimated by the machine:
Figure 1.8: Plot of the machine's initial Momentum (M) estimates
We can see that the machine did not do so well. However, the machine continues. The machine calculates its error in making the estimates, as shown in Figure 1.9. We see that the machine can overestimate or underestimate. So, the error can be negative or positive. Instead of considering the value of the error, we consider the value of the square of the error. Further, we calculate the mean of the squared error (MSE) across all the data points by averaging the squares of error.
Figure 1.9: Computation of error in estimates made by the machine
Now, the machine considers other values of β0, β1, and β2 so that the value of the MSE is minimized. After some rounds of calculations, the machine gets the following values of β0, β1, and β2, as shown in Figure 1.10:
Figure 1.10: Estimate of Momentum after minimizing MSE
The estimates, though better, could be more reasonable. So, the data scientist considers another strategy. This time the data scientist asks the computer to try and find a relationship between m*v and M. They want the machine to create an equation of the form M = β0 + β1 * m * v. As in the earlier case, the data scientist gives initial values for β0 and β1 as β0 = 5 and β1 = 5.
The setup is shown in Figure 1.11:
Figure 1.11: New setup
The machine tries to minimize the MSE for this setup and calculate the values of β0 and β1, as shown in Figure 1.12:
Figure 1.12: Estimate of Momentum after minimizing MSE for the new model devised in Figure 1.11
The machine has done much better. Let us plot this data and check (Refer to Figure 1.13):
Figure 1.13: Plot of new estimates made by the machine. The RED crosses are the estimates
So, the machine has given us a formula for calculating momentum based on the data provided to the machine. According to the machine:
Momentum = 2.46214616625777 + 0.991020053873143 * Mass * Velocity
Now, for any new value of Mass and Velocity, say Mass = 7 kg and Velocity = 8 km/h, the machine would say that:
Momentum = 2.46214616625777 + 0.991020053873143 * 7 kg * 8 km/h = 57.95926918 kg * km/h
This is pretty good as, according to the formula from physics, the value of momentum for Mass = 7 kg and Velocity = 8 km/h should be 56 kg * km/h.
Kinds of data the machines can learn from
From nature, human beings can gather data through the five sense organs. We can see, hear, smell, taste, and feel. Out of these five types of data, human beings have been able to digitize what they see and hear. Likewise, machines can also understand data from images and sounds.
Human beings have created a lot of digital data from various activities we perform. This data is either structured or unstructured.
Structured data is organized in tabular form and follow definite semantics. It is by far the data most processed by machines. As of 2022, about 80% of the data machines learn from are structured data. Machines are extremely good with structured data. Also, machines are very useful in working on structured data as humans fail to cope with the volumes of structured data. Examples of structured data can be found in any system where some transactions are conducted. For example, the data regarding credit card system transactions is structured. In a credit card system, millions of transactions are performed daily. Tasks like detecting fraudulent transactions are extremely difficult for human beings. So, here machines are best suited for the job.
Unstructured data is a more recent phenomenon. This has mainly exploded due to social media. Unstructured data has no definite semantics, so, such data must be expressed with some semantics before the machines can work on them. Over the years, many representations of unstructured data have emerged; thus, machines can work efficiently on such data. Examples of unstructured data include tweets and newspaper articles. Images and audio/video clips are also unstructured data.
We can also categorize data as semi-structured, containing portions of structured and unstructured data. For example, data from emails have a structure in that it contains structured information regarding the date the email was sent, who sent it, whom it was sent to, what the subject is, does it have attachments, and so on. However, the body of the emails contains unstructured data. As machines work well with structured and unstructured data, machines work well with semi-structured data too.
No matter the type of data, it must be understood that machines can only work on numbers. So, any data the machines need to understand must be presented to the machine in numbers. In this book, we will discuss various techniques for converting non-numeric data to numbers without any loss of context and allowing machines to learn from them. These discussions will be spread across all the remaining chapters as we will discuss different problems to be solved by the machines.
Types of machine learning
Machine learning can be classified into two main types. They are Supervised learning and Unsupervised learning.
In Supervised learning, we can perform two tasks: regression and classification.
In Unsupervised learning, we can do two tasks: clustering and dimensionality reduction.
There is a special case of Clustering tasks called Anomaly Detection.
Figure 1.14 summarizes all types of machine learning and the tasks that can be performed:
Figure 1.14: Types of machine learning
Some people also consider Reinforcement learning as one type of Machine Learning. At the same time, some people argue that Reinforcement learning is approximate dynamic programming.
Let us discuss each type of machine learning in more detail. However, this book focuses on the classification task, a type of supervised learning.
Supervised learning
In Supervised learning, the machine is provided data along with labels. The machine learns based on the data and the associated labels and then makes inferences. So, we are providing the machine with prior knowledge, and then after the machine learns from this knowledge, it can make decisions within the boundaries of this provided knowledge.
Labels are the analysis of the data as determined by humans. For example, if we want the machine to learn to differentiate between images of dogs and cats, we need to provide data regarding dogs and cats to the machines. Along with this data, we need to provide labels stating which are the images of dogs and which are the images of cats. Suppose we want the machine to predict the marks in an exam. In that case, we need to provide historical data along with labels stating how many marks were obtained under the circumstances provided in the data.
The bottom line in Supervised learning is that we provide existing knowledge to the machine and expect the machine to find patterns in the provided knowledge and make rules that the machine can use to answer future questions asked on the same subject.
Let us understand this with an example. Consider that we want the machine to be able to detect spam emails. So, we gather the data as shown in Table 1.1:
Table 1.1 : Example dataset of emails for spam detection
In this example, the dataset in Table 1.1 contains only 6 data points. In real situations, the datasets have thousands and millions of data points. Nevertheless, the dataset contains data in 4 variables: Contains spelling mistakes, Contains the word Urgent
, Contains the word ASAP
, and Contains a link to click. In machine language parlance, these variables are called Independent Variables. For these four variables, there is data in each data point. In normal circumstances, experts would have studied real emails and gathered these four characteristics for each email. Apart from collecting data regarding the characteristics of the emails, experts would also assign a label as to whether the email is benign or spam. The variable we refer to as the label is also called the dependent variable in machine learning parlance.
In Supervised learning, the machine would form patterns from the independent variables considering the associated dependent variable. From the pattern would emerge a rule that the machine will use when given new values for the independent variables.
The preceding example is a Classification problem where the machine needs to decide whether an email is benign or spam. This type of Classification problem is called a Binary classification problem, as the machine must decide between two options or classes.
There are classification problems where the machine needs to choose between more than two classes. Such classification problems are called Multi-class classification problems.
Implementation of classification on the example data provided in Table 1.1 is as follows:
import pandas as pd
df = pd.DataFrame([['NO', 'NO', 'NO', 'YES', 'Benign'],
['NO', 'NO', 'NO', 'NO', 'Benign'],
['YES', 'NO', 'YES', 'NO', 'Spam'],
['NO', 'YES', 'YES', 'YES', 'Spam'],
['YES', 'NO', 'NO', 'YES', 'Benign'],
['YES', 'YES', 'YES', 'YES', 'Spam']
],
columns = ['ContainsSpellingMistakes', 'ContainsUrgent', 'ContainsASAP', 'ContainsLink', 'Label']
)
df
ContainsSpellingMistakes ContainsUrgent ContainsASAP ContainsLink Label
0 NO NO NO YES Benign
1 NO NO NO NO Benign
2 YES NO YES NO Spam
3 NO YES YES YES Spam
4 YES NO NO YES Benign
5 YES YES YES YES Spam
X = df.drop('Label', axis = 1, inplace = False)
y = df['Label']
print(X, '\n\n', y)
ContainsSpellingMistakes ContainsUrgent ContainsASAP ContainsLink
0 NO NO NO YES
1 NO NO NO NO
2 YES NO YES NO
3 NO YES YES YES
4 YES NO NO YES
5 YES YES YES YES
0 Benign
1 Benign
2 Spam
3 Spam
4 Benign
5 Spam
Name: Label, dtype: object
from sklearn.preprocessing import LabelEncoder
# Convert all data to numbers
leX = LabelEncoder()
XL = X.apply(leX.fit_transform)
leY = LabelEncoder()
yL = leY.fit_transform(y)
print(XL, '\n\n', yL)
ContainsSpellingMistakes ContainsUrgent ContainsASAP ContainsLink
0 0 0 0 1
1 0 0 0 0
2 1 0 1 0
3 0 1 1 1
4 1 0 0 1
5 1 1 1 1
[0 0 1 1 0 1]
from sklearn.linear_model import LogisticRegression
# Build Model
lr = LogisticRegression()
lr.fit(XL, yL)
# Prepare Test Data
testData = ['NO', 'YES', 'NO', 'YES']
Xtest = leX.transform(testData)
prediction = lr.predict(Xtest.reshape(1, -1))
print('Prediction =', leY.inverse_transform(prediction))
Prediction = ['Benign']
Take another example. Suppose we have the temperatures of a city, say Bengaluru, every day for many years. We have three attributes, that is, the date, whether it was cloudy on that date, and the temperature on that date, as shown in Table 1.2:
Table 1.2 : Example dataset of temperatures in a city
Suppose we have this data from 01-Jan-2001 till 31-Dec-2015. Also, we want to know what the temperature would be on 25-Oct-2022. We should be able to predict the same using a Machine learning system with a Regression model. We need historical data for all the independent and associated dependent variables in Regression problems. In the example in Table 1.2, the date and whether cloudy or not are the independent variables or features. From some independent variables, we can derive many more independent variables. For example, from our data in Table 1.2, from the feature date, we can derive other independent variables like month, day of the year, etc. So, instead of using the date as the independent variable, we could use the month of the date and the day of the year as our features. Generating independent variable(s) or feature(s) from the existing independent variable(s) is called Feature Engineering (Refer to Table 1.3).
Table 1.3 : Feature Engineered dataset of temperatures in a city
The temperature is the dependent variable or target variable. Given this data, we want the machine to learn the patterns and create a rule. Then, given any date in the future and whether it is cloudy, the machine should predict the temperature on that day. So, this is also Supervised Learning.
A regression implementation on the example data provided in Table 1.2 is as follows:
import pandas as pd
df = pd.DataFrame([['01-01-2001', 'YES', 14.3],
['01-02-2001', 'NO', 13.7],
['01-03-2001', 'NO', 13.6],
['01-04-2001', 'YES', 14.3],
['01-05-2001', 'NO', 14.2],
['01-06-2001', 'YES', 12.8],
['01-07-2001', 'NO', 14.7],
['01-08-2001', 'NO', 11.3],
['01-09-2001', 'NO', 11.7],
['01-10-2001', 'NO', 12.1],
],
columns = ['Date', 'Cloudy', 'Temperature']
)
df
Date Cloudy Temperature
0 01-01-2001 YES 14.3
1 01-02-2001 NO 13.7
2 01-03-2001 NO 13.6
3 01-04-2001 YES 14.3
4 01-05-2001 NO 14.2
5 01-06-2001 YES 12.8
6 01-07-2001 NO 14.7
7 01-08-2001 NO 11.3
8 01-09-2001 NO 11.7
9 01-10-2001 NO 12.1
import datetime
import numpy as np
from sklearn.preprocessing import LabelEncoder
# Feature Engineering
# Get Month and Day of the Year
df['Month'] = pd.to_datetime(df['Date']).dt.month
referenceDate = np.array([datetime.datetime(2001, 1, 1)] * len(df))
df['DayOfYear'] = (pd.to_datetime(df['Date']) - referenceDate).dt.days
# Convert Cloudy to numbers
leC = LabelEncoder()
df['Cloudy'] = leC.fit_transform(df['Cloudy'])
df
Date Cloudy Temperature Month DayOfYear
0 01-01-2001