Machine Learning Concepts

Chapter 2

TensorFlow.js enables traditional developers to build and run machine learning solutions using JavaScript in the web browser. Even if you have never created machine learning solutions before, this chapter introduces concepts that would enable web developers to use TensorFlow.js to build machine and deep learning solutions for the browser.

Machine learning and deep learning form the bulk of topics that comprise Artificial Intelligence. The illustration in Figure 2-1a is nothing new but is a good segue to start the conversation.

If you are confused about the various terms you come across when reading about artificial intelligence (AI), refer to Figure 2-1b. I displayed 'Robotics' and 'Expert Systems' in italics because they do not fall into our discussion.

Machine Learning versus Deep Learning

Given a data-set required for training a machine learning model, the data scientist does not get to decide which algorithm you want to use to solve the problem at hand. This is analogous to a programmer picking a sorting algorithm to sort an array of numbers. She (the programmer) does not blindly choose the sorting mechanism, but instead looks at the size of the data first, and depending on the complexity and efficiency, picks an algorithm that orders the data in the array. Similarly, the data scientist picks out an algorithm based on its efficiency and accuracy (we will look at how to calculate the efficiency of a machine learning algorithm later in the chapter).

Deep learning is simply a subset of machine learning, and otherwise explicitly stated, the term Machine Learning encompasses Deep Learning as shown in Figure 2-1. For the same reason, this book is titled ‘Machine Learning with TensorFlow.js’ (and not ‘Deep Learning with TensorFlow.js’) even though the TensorFlow.js library uses deep learning to develop solutions.

Solution Building Blocks

The following sub-sections cover the building blocks of a machine/deep learning solution. These building blocks are captured in the end-to-end solution involving learning from data as shown in Figure 2-2.

Problem

In Figure 2-2, you start with a Problem depending on the data you have. Sometimes, it is the data available that dictates the problem, and in other cases, the problem at hand dictates the data to be collected. The data is collected from public or private sources, or both.

Dataset

Once all the data is collected, it is explored, cleansed, and wrangled to produce a Dataset. While a discussion on Exploratory Data Analysis or EDA (Data Exploration, Cleansing, and Wrangling) is beyond the scope of this chapter or even book, both client (math.js) and server (Pandas, NumPy, SciPy, etc.) libraries exist for this task. Figure 2-3 shows the various sources the data is collected from, to train a machine learning algorithm.

Features

Although not included in Figure 2-2, a Feature is an attribute or field of data that can be passed through a machine learning algorithm to generate an outcome. Features are not columns since not all columns need to be submitted to an ML algorithm e.g. Last Name, Gender, etc. and typically comprise of atomic values that can be classified as either continuous or categorical in nature.

  • Continuous values are attributes or features that are made of un-intermittent denominations or quantities, like Price, Weight, Age, etc.

  • Categorical values are composed of attributes or features that contain categories or classes that apply to the rows or records or observations, such as Account Type, Gender, City, etc.

The concept of continuous and categorical values is captured in Figure 2-4 below.

This can be further demonstrated using some example data shown in Figure 2-5.

In the data presented in Figure 2-5 above _id, latitude_reg, longitude_reg, elevation, and score comprise values that are Continuous, whereas type, iso_region, municipality, and scheduled_service are Categorical values or features.

Experiment

To come up with an optimal solution, data scientists and/or machine learning engineers apply the dataset to different ML algorithms and compare the outcomes. The outcome with the algorithm which is the most accurate and efficient is chosen from a host of outcomes, and hence this is known as a machine learning Experiment. The final algorithm to be used in the ML model is determined by the accuracy and efficiency of that algorithm.

Note A machine learning experiment involves splitting the data into 2 sets, training and test. The test set is what determines the accuracy of an algorithm.

Outcome

The Outcome of a machine learning experiment is the one with the highest accuracy and efficiency, selected from a variety of outputs or results of different algorithms chosen based on the training data and the value to be predicted from that data.

Note The data required to produce an outcome can come in the following different formats, of which only the first i.e. Tabular is example of Structured data, and the others are examples of Unstructured data:

Text (Sequential): Unstructured data for natural language processing (NLP) tasks. Sound (Sequential): Audio data for classification, speech synthesis, voice recognition, etc. Video (Sequential): Data to analyze video in real-time (streaming) or static formats. Image (Non-sequential): Visual data for object detection or computer vision tasks. Numeric (Sequential and Non-sequential): Data that includes decimal and non-decimal values. Timeseries (Sequential): Data that must appear in a specific sequence to make sense. Timeseries data is numeric.

Tip We will primarily focus on numeric values in our discussion. However, for a machine learning algorithm to act on the information provided to it, the data must first be converted to a numeric value (either continuous and categorical). The conversion from its original form is part of the Exploratory Data Analysis (EDA), which also involves data cleaning and munging to bring together data from multiple sources and producers, as shown in Figure 2-6a.

Machine Learning Prologue

If you have dabbled in machine learning before, you may already have some idea of the types of algorithms that exist for the data problems encountered by an enterprise. If you know absolutely nothing about machine learning, Figures 2-7a and 2-7b are good places to start. You can learn more about these problem types and algorithms in the following sections.

Note The data to train a machine learning model can be categorized as follows:

Labeled: Refers to data that has been manually (using human operator) flagged or annotated, e.g. flagging each customer record of the bank to denote whether a loan would be paid off, etc. Unlabeled: Mainly used for analysis of data or information. Though unlabeled, machine learning algorithms can process and separate the data into multiple clusters or groups of information depending on data properties, e.g. Customer Segmentation etc. Partial: Data is only partially labeled, whereas the other data in the dataset is not labeled. The algorithms that train partially labeled data enables data that can be easily collected.

Problem Types

The algorithm to train depends on the type of problems presented to the organization. Those machine learning algorithms are covered in more detail in the next section, but the problems the algorithms are designed to address fall into the following broad classes:

Categorization (Classification)

Whenever a predicted outcome can be defined by a categorical variable, the type of problem can be described as a Classification problem. In other words, the data scientist or machine learning engineer first categorizes or classifies the training data manually as belonging to a particular category e.g. cat or not cat, cat or dog or neither, etc. Categorization or Classification are of 2 types:

  1. Binary Classification: Pick one from two classes or categories, e.g. iPhone or Not iPhone, Aero-dynamic or Not Aero-dynamic, etc.

  2. Multi-class Classification: Select one from many classes or categories, e.g. emotion classification, identification of animals in a Serengeti ecosystem, etc.

Prediction or Forecasting (Regression)

Regression problems in machine learning are defined by a numeric (continuous variable) predicted outcome. Examples include predicting the stock price based on some historic data, or a sports team final score based on playing history, etc.

Grouping (Clustering)

A Clustering problem in machine learning pertains to an unsupervised learning scenario characterized by predicting the category or group an outcome will fall into, but the categories are not known in advance. This is in sharp contrast to a classification problem where the data scientist known the number and nature of categories before running the machine learning model. Clustering problems are typically used to identify anomalies in data, and include examples of customer or product segmentation, city planning, natural disaster studies etc.

Conversion (Embedding)

Embedding problems and algorithms in machine learning are mainly used for creating numeric variables from existing text values. Since machine learning algorithms work better with numerical values than text, Embedding allows textual columns or features to be converted into continuous or categorical values. Embedding or Conversion in machine learning can also refer to dimensionality reduction, or reducing/eliminating features in a dataset that are not needed.

Creation (Generative)

When machine learning is used to generate an output i.e. images, audio/video files, text etc., such problems and algorithms are said to be Generative. This is in contrast with Discriminant modeling (opposite of generative) where the desired outcome is to discriminate or classify, using a classification model. Examples of generative models include those where machine learning is used to generate outcomes that did not exist in the past before the algorithm was run e.g. generative models can be used to create new human faces based on facial data, or write new poetry based on the past verses.

Simulation (Reinforcement Learning)

Simulation or Reinforcement Learning is characterized by an agent that performs an action and get a reward as a result of that action. This proves to the agent whether the action is successful or not.

Note Although not mentioned in Figure 2-7b, there are two problem types i.e. ‘Simulation (Reinforcement Learning)’ and ‘Recognition (Detection)’ that use supervised/semi-supervised learning and sequential data. Recognition is used to create recurrent neural networks (RNN) for speech recognition, machine translation, classification, regression, and generative tasks. Both problem types are captured in Figures 2-8a and 2-8b.

The problem types discussed above are illustrated in Figure 2-8a below:

I took the liberty of combining the data types (mentioned earlier in Figure 2-6b) with each problem type in the above diagram.

Tip Regardless of the machine learning problem you are trying to solve, you have to train an algorithm with the available data to create a prediction model. Training a ML model involves splitting the data into two parts: training and test (the training set is larger than the test set); the algorithm or model is first trained with the training data, and tested for prediction accuracy with the test data. Refer to Figure 6-4 for a visual illustration.

Algorithms

Before we can embark on our TensorFlow.js journey, lets us first get a feel for the machine learning algorithms out there. This is, by no stretch of the imagination, an exhaustive list of the algorithms. The intention here is to help you understand the categories of machine learning problems you will be faced with, and the types of algorithmic solutions you will be required to come up with.

Note Most of the existing text pertaining to machine learning classify the algorithms as either supervised, un-supervised, or semi-supervised. While it is absolutely okay to categorize ML algorithms according to their category, machine learning problems are rarely simple enough to select algorithms based on the available dataset, and require significant efforts to wrangle the data to fit the machine learning algorithm to the dataset.

Machine Learning algorithm types in Figure 2-7b and the problem types from Figure 2-8a can be combined to form an illustration in Figure 2-8b.

The above diagram can also be accessed on the books GitHub page, and is referenced again in Chapter 6. The above, however, is not a list of all machine learning algorithms. Also note that the algorithm names with the asterisk (*) are the ones modeled using deep learning.

Note To view some problems or use-cases pertaining to machine learning, refer to Figure 6-4a, a diagram similar to the ones above in Figures 2-8a and 2-8b.

Deep Learning Components

Deep learning is a subset of machine learning (see Figure 2-1) in that it uses artificial neural networks (ANN) to make sense of the data provided to it. Deep learning is composed of the following components:

Neural Networks

Neural networks comprise neurons that transform data to solve a problem. Artificial neural networks or ANN borrow the terminology from the working of a person’s brain, and that is extent of its link with the genetic wiring inside the human body. Before continuing the discussion on neural networks, take a look at illustrations below:

  1. The illustrations in the above three figures are all neural networks, inspired from the human brain.

  2. Each node in the network is called a Neuron, characterized by a square, but is also denoted by a circle in various neural network diagrams online and offline.

  3. Each neural network has only one input layer and only one output layer.

  4. The total number of layers does not include the input layer; therefore, the input layer is known as Layer 0 in the above diagrams.

  5. Each input in the input layer is called a Feature, whereas each output in the output layer represents a single yield or output of the problem, e.g. the four neurons in the input layer correspond to four features, and the two outputs refer to two yields produced by the model (Figure 2-9c).

  6. The hidden layer in a neural network can have zero (Figure 2-9a), one (Figure 2-9b), or N number (Figure 2-9c) of hidden layers.

  7. If a neural network has more than one hidden layers, the model is said to be a Deep Neural Network using Deep Learning to produce the output (Figure 2-9c).

  8. If each neuron in a layer of the neural network is connected by every neuron in the previous layer, the current layer is said to be a Dense layer.

Weights

  1. Every neuron connected to the next higher layer has a particular non-zero weight w associated with a connection.

  2. The total value of the neuron is equal to the value x of a neuron times the weight w of its link to the next higher layer.

  3. Not all neurons in a layer are active.

  4. Considering there are three neurons in a layer in Figure 2-10 i.e. x1, x2, and x3, the output is the sum (also known as the weighted sum) of the value of each neuron in that layer multiplied by all its weight i.e. w1, w2, and w3 (also refer to Listing 2-1).

Bias

  1. Each layer in a neural network model includes a constant value known as a Bias, and acts as an intercept in a linear equation model.

  2. Each Bias value is associated with exactly one layer in a neural network model.

  3. After adding the Bias value, Figure 2-10 can be redrawn as shown below in Figure 2-11.

Activation Function

  1. Not all layers in a neural network model are counted.

  2. The Activation Function determines whether neurons in a layer are fired or not.

  3. Activation Functions can be characterized into the following and illustrated in Figure 2-12a:

    1. Binary Step Function

    2. Linear Activation Function

    3. Non-linear Activation Function

      1. Sigmoid TanH (Hyberbolic Tangent)

      2. ReLU (Rectified Linear Unit)

      3. Leaky ReLU

      4. Parametric ReLU

      5. Softmax

      6. Swish

  4. Activation Functions exist in every layer of a neural network, but it is the activation function that produces the output of the neural network (last-layer activation) that determines the prediction result produced by the model.

  5. After adding the Activation Function, Figure 2-11 can be redrawn as illustrated in Figure 2-13.

Note In case you are wondering why Figure 2-13 follows Figure 2-12a, be advised that the letters in diagram numbers were used to group similar illustrations together, and Figures 2-12b, 2-12c, and 2-12d are included in this chapter.

Learning Rate

  1. Learning Rate is a scalar value that tells a model how quickly or slowly it learns.

  2. A small learning rate means that the model takes a long time to learn, and a large learning rate implies that the model learns very quickly.

  3. The trick is to pick a learning rate that is neither too high nor too low.

  4. A learning rate is specified for a model at the time of creation when an optimizer (explained in the next section) is defined. Please note that the learning rate is optional for some optimizers and mandatory for others.

Epochs

  1. Deep learning is an iterative process and each training iteration is known as an epoch.

  2. In other words, an Epoch is one iteration where the model learns from the entire training dataset.

Optimizer

  1. An Optimizer is an algorithm that is used to update the weights on a neural network model.

  2. The optimizer is based on the loss function (explained in the next section).

  3. The various optimizers are listed in Figure 2-12b below:

Loss/Cost Function

Before going into the details of the loss function, refer to Figure 2-14.

  1. The Objective Function or criterion either has to be minimized or maximized, depending on whether that objective function value is positive or negative (see Figure 2-14).

  2. When it needs to be minimized, the objective function is also called a loss function or a cost function, and the value of the loss function is known as a loss.

  3. Conversely, when it is negative and needs to be maximized, the objective function or criterion is called a reward function.

  4. Figure 2-12c lists the loss functions used by a neural network model.

Metrics

  1. Metrics specify how a neural network model will be evaluated.

  2. The choice of a metric depends on the type of machine learning problem being addressed.

  3. A list of metrics is shown in Figure 2-12d.

Note Do not be troubled if you do not understand any of the components of deep learning, or have trouble understanding why a certain building block is needed. All concepts pertaining to TensorFlow.js and machine learning in the web browser will be explained in due time.

Putting the pieces together

The components of deep learning and neural networks explained in the previous section culminate into the following equation for processing each layer in a neural network model.

Y = σ(∑(xi.wi)) + b

where x = Neuron Value w = Weight b = Bias σ = Activation Function

Pseudo-Code

Listings 2-1a and 2-1b present the above equation in pseudo code.

Listing 2-1a: Pseudo-Code for creating a Layer in a Neural Network model
CREATE_LAYER(NEURONS, WEIGHTS, BIAS, ACTIVATION_TYPE)
/* 
   NEURONS = Values of neurons in layer (array), 
   WEIGHTS = Assigned weights of neurons (array), 
   BIAS = Layer bias,
   ACTIVATION_TYPE = Type of activation function to use
*/
BEGIN
   Set NO_OF_NEURONS := LENGTH(NEURONS) /* Number of neurons in layer */
   Set NO_OF_WEIGHTS := LENGTH(WEIGHTS) /* Number of neuron weights  */

   IF NO_OF_NEURONS <> NO_OF_WEIGHTS
      ERROR(‘Invalid number of weights for the number of neurons in layer.’)

   Set LAYER := 0 /* Initialize layer value total to zero */

   FOR I = 1 TO NO_OF_NEURONS /* For each neuron in layer */
   BEGIN
      Set LAYER := LAYER + (NEURONS[I] * WEIGHTS[I])
   END
   
   Set LAYER := LAYER + BIAS /* Add bias to total */
   Set LAYER := ACTIVATION_FUNC(LAYER, ACTIVATION_TYPE) /* Call activation 
                                                                function */

   Return LAYER /* Return calculated value of layer */
END
Listing 2-1b: Pseudo-Code for the Activation Function
ACTIVATION_FUNC(NUMERIC_VAL, ACTIVATION_TYPE, ALPHA=0)
/* 
   NUMERIC_VAL = Numeric input value for layer, 
   ACTIVATION_TYPE = Type of activation to use,
   ALPHA = Optional parameter; Only needed for Parametric ReLU activation
*/
BEGIN
   /* switch..case Statement */
   Case based on ACTIVATION_TYPE
   BEGIN
      CASE ‘SIGMOID’
      BEGIN
         Set RESULT := (1 /(1 + CALC_EXPONENTIAL(-1 * NUMERIC_VAL)))
         /* 
            Example of exponential function call in Python
            https://trinket.io/python/6b658abf06 
         */
      END
      CASE ‘TANH’
      BEGIN
         Set RESULT := (2 /(1 + CALC_EXPONENTIAL(-2 * NUMERIC_VAL))) -1
      END
      CASE ‘RELU’
      BEGIN
         IF NUMERIC_VAL < 0
            Set RESULT := 0
         ELSE
            Set RESULT := 1
      END
      CASE ‘LEAKY_RELU’
      BEGIN
         IF NUMERIC_VAL < 0
            Set RESULT := 0.01 * NUMERIC_VAL
         ELSE
            Set RESULT := NUMERIC_VAL
      END
      CASE ‘PARAMETRIC_RELU’ /* Also called Parameterized ReLU */
      BEGIN
         IF NUMERIC_VAL >= 0
            Set RESULT := NUMERIC_VAL
         ELSE
            IF ALPHA = 0 OR ALPHA = NULL
               ERROR(‘Alpha parameter value expected.’)

            IF ALPHA = 0.01
               Set RESULT := ACTIVATION_FUNC(NUMERIC_VAL, ‘LeakyReLU’)
            ELSE
               Set RESULT := ALPHA * NUMERIC_VAL
      END
      CASE ‘SOFTMAX’
      BEGIN
         Set TEMP := CALC_EXPONENTIAL(NUMERIC_VAL)
         Set RESULT := TEMP/CALC_SUMMATION(TEMP)
         /* 
            Example of summation function call in Python
            https://trinket.io/python/9e3f38bb25 
         */
      END
      CASE ‘SWISH’
      BEGIN
         Set RESULT := NUMERIC_VAL/(1 - CALC_EXPONENTIAL(-1 * NUMERIC_VAL))
      END
      DEFAULT
      BEGIN
         ERROR(‘Unknown Activation Type.’)
      END
   END

   Return RESULT /* Return result of activation function */
END

Lastly, the pseudo-code in Listing 2-1c makes use of the code in the above two listings to create a neural network model.

Listing 2-1c: Pseudo-Code for creating a Neural Network model
CREATE_NEURAL_NETWORK_MODEL(INPUTS, OUTPUTS, LAYERS)
/* 
   INPUTS = Inputs to the model (array), 
   OUTPUTS = Outputs of the model (array),
   LAYERS = Hidden layers in the model (array)
*/
BEGIN
   Set NO_OF_LAYERS := LENGTH(LAYERS)            /* Get number of layers */

   FOR I = 1 TO NO_OF_LAYERS                     /* For each layer */
   BEGIN
      INITIALIZE(LAYER)                          /* Initialize layer */
      Set LAYER := CREATE_LAYER(                 /* Call CREATE_LAYER */
                      LAYERS[I].NEURONS[],       /* Neurons in layer */
                      LAYERS[I].WEIGHTS[],       /* Weights of neurons */
                      LAYERS[I].BIAS,            /* Layer bias */
                      LAYERS[I].ACTIVATION_TYPE  /* Type of activation */
                   )
      IF I = 1                    /* If first hidden layer of model */
         ADD_MODEL_INPUTS(LAYER)  /* Add model inputs to layer */

      IF I = NO_OF_LAYERS         /* If last hidden layer of model */
         ADD_MODEL_OUTPUTS(LAYER) /* Add model outputs to layer */

      ADD_TO_MODEL(LAYER, MODEL)  /* Add layer to neural network model */
   END

   Return MODEL /* Return neural network model */
END

Tip The theoretical aspects using TensorFlow.js are discussed in Chapter 4 in the section titled ‘TensorFlow.js Syntax’. Also, refer to the JavaScript code snippets in Listings 4-4a, 4-4b, and 4-5 to learn the syntax used by the TensorFlow.js library.

Once a neural network has been created, we can compile, train, and predict the result as shown in the pseudo-code in Listing 2-2 below.

Listing 2-2: Pseudo-Code for processing a model to make a prediction
PROCESS_NEURAL_NETWORK(MODEL, NEW_INPUT, EPOCHS, 
                       OPTIMIZER_STRING, LEARNING_RATE=0, LOSS_STRING)
/* 
   MODEL = A neural network model, 
   NEW_INPUT = A new input for making a prediction using the model,
   EPOCHS = A numeric value representing the number of epochs,
   OPTIMIZER_STRING = A string value representing the optimizer,
   LEARNING_RATE = A numeric learning rate for the model,
   LOSS_STRING = A string value representing the loss
*/
BEGIN
   IF OPTIMIZER_STRING IS INVALID_OPTIMIZER_NAME
      ERROR(‘Invalid Optimizer Name.’)

   IF (OPTIMIZER_STRING = ‘SGD’ OR 
       OPTIMIZER_STRING = ‘MOMENTUM’ OR 
       OPTIMIZER_STRING = ‘ADAGRAD’ OR 
       OPTIMIZER_STRING = ‘RMSPROP’)
       AND LEARNING_RATE <= 0
          ERROR(‘Learning Rate expected.’)

   Set OPTIMIZER := SET_LEARNING_RATE(OPTIMIZER_STRING, LEARNING_RATE)

   COMPILE(MODEL, OPTIMIZER, LOSS_STRING)
   TRAIN(MODEL, EPOCH)
   Set RESULT := PREDICT(MODEL, NEW_INPUT)

   Return RESULT /* Return result of prediction */
END

A developer may choose to evaluate the model using test data. The pseudo-code in Listing 2-3 shows how a machine learning model is evaluated. While not mandatory for predicting an outcome, it is highly recommended. Also, the compile step is necessary for evaluation since it validates the model before evaluation and/or prediction.

Listing 2-3: Pseudo-Code for evaluating a model
EVALUATE_NEURAL_NETWORK(MODEL, OPTIMIZER_STRING, LEARNING_RATE=0,  
                        LOSS_STRING)
/* 
   MODEL = A neural network model, 
   NEW_INPUT = A new input for evaluating the model,
   OPTIMIZER_STRING = A string value representing the optimizer,
   LEARNING_RATE = A numeric learning rate for the model,
   LOSS_STRING = A string value representing the loss
*/
BEGIN
   COMPILE(MODEL, OPTIMIZER, LOSS_STRING)
   Set TEST_RESULT := EVALUATE(MODEL, NEW_INPUT)

   Return TEST_RESULT /* Return the result of the evaluation */
END

Flowcharts

The pseudo-code in each on the above listings are illustrated visually in flowcharts below.

Note Just to reiterate, many of the above topics are re-discussed later in the book. It is sufficient to understand these concepts incompletely at this stage, as long as you understand the components that make up a deep learning solution.

Note See a larger version of the above diagram below (divided into 3 parts).

Note The flowcharts are meant for understanding the sequence of steps in the algorithm, so there is not necessarily a one-to-one relationship between the pseudo-code and the above flowchart diagrams. I followed a programming-like naming convention for the function calls in the flowcharts to make them easier to understand. You are perfectly fine using plain-words to define the function calls, as long as you follow the correct sequence of statements in your flowchart diagrams.

Other Concepts

Forward-Propagation and Back-Propagation

For any neural networks, say for example the ones shown in Figures 2-9b and 2-9c, any calculations are done from left to right, starting from the inputs, to the hidden layers according to their sequence, and to the outputs. This is known as forward-propagation as shown in Figure 2-18a below.

However, training a neural network involves multiple iterations and includes updating the weights in the hidden layer(s) once it is realized that the given weights produce an error (do not produce the desired result or outcome). To update the weights, a reverse approach, called backpropagation, is taken; where the starting point is the output layer and goes up to the input layer, as shown in Figure 2-18b.

Note The above over-simplifies forward- and backward-propagation. The intent here is to just introduce you to the terms. If you want to learn more in-depth, I would recommend online resources.

Overfitting and Underfitting

The whole point of training a neural network with available data is to make predictions using completely new data (preferably from the same source), and the accuracy of the predictions determines how effective a machine learning model is. Sometimes, though, a model is too accurate in predicting the outcomes of the training data (and not test and new data) because the data scientist or machine learning engineer has made the model fit the available data too well.

The concept of overfitting is shown in Figure 2-19a, where the training data (depicted using solid-black circles) exactly coincides with the created model (shown using a solid-black line). However, the test data, and any new data that the created model has not seen before, fall completely outside the model.

The opposite of overfitting in machine learning is underfitting, shown in Figure 2-19b, where the model does not generalize to the training data used to create it. This also ensures that the model does not fit any new data, leading to model inaccuracy and bad performance metrics.

To strike a balance between overfitting and underfitting, or simply to reduce overfitting in the model, the following approaches may be used.

  1. Increase the amount of training data without decreasing the volume of test data by gathering more data values preferably from the same source(s).

  2. Reduce the complexity of the network by decreasing the number of unit or neurons, condensing the weight values, and/or reducing the number of layers in the model.

Dropout

In order to prevent overfitting in a machine learning model, dropout is used. Dropout eliminates a model complexity by switching off certain units or neurons during training, and can apply to either the visible (input) layer or any of the hidden layers, but not the output layer. The ‘switching off’ of neurons is also called ‘ignoring’, and the neurons to be ignored are selected at random. The neural network model from Figure 2-9b is shown in Figure 2-20 with dropout, where the third neuron in the input layer and the second and fifth neurons in the hidden layer are ignored to limit complexity during one epoch in training, meaning the same dropout will not be used in other iterations or epochs.

Synopsis

Before I end this chapter, I would like to summarize the components that make up a neural network model in the following table:

Component

Mandatory or Optional

Defined For

Input

Mandatory

Neural Network

Output

Mandatory

Neural Network

Layer

Mandatory

Neural Network

Neuron

Mandatory

Layer

Weight

Mandatory

Neuron

Bias

Mandatory

Layer

Activation Function

Mandatory

Layer

Epoch

Mandatory

Neural Network

Optimizer

Mandatory

Neural Network

Learning Rate

Optional

Neural Network

Loss

Optional

Neural Network

Metrics

Optional

Neural Network

Key Takeaways

The chapter looks into the underlying topics of machine learning, deep learning, and neural networks, and covers the following:

  • Building blocks for deep learning solutions

  • Problem types and algorithms for machine learning

  • Components of a neural network model

Last updated