End-to-End ML Solutions using TensorFlow.js

Chapter 6

The process followed in using TensorFlow.js is illustrated in Figure 5-1 and explained in Chapter 5 in section titled ‘TensorFlow.js Process’. Step 4 of the process is explained in detail in the last chapter. Steps 1, 2, and 3 are covered in this chapter, whereas step 5 is covered in Chapter 11.

While handled by different people or even different teams in a server-side machine learning scenario, end-to-end development of a solutions on the client-side in a web browser using TensorFlow.js involves steps performed by the developer or machine learning engineer.

To train a model which can then be used to make predictions or inferences, a web browser script must first follow the steps below. Each of these steps are shown in Figure 5-1, and explained sparingly in Chapter 5.

Script Setup

Although covered in Chapter 3, this section explains the setup in slightly greater detail. To create a web page in Visual Studio Code and use TensorFlow.js, follow the following steps:

  1. Create file:

    1. Create a folder in Windows Explorer.

    2. Create a new file by going to File New File in the top menu in Visual Studio Code.

    3. Save the file by hitting Ctrl+S on the keyboard, navigate to the created folder, specify a file name and extension (html) for the created file and hit the Enter key on your keyboard.

  2. Create new workspace:

    1. Navigate to the created folder in Visual Studio Code.

    2. Create and save a workspace file going to File Save Workspace As…, type a name for the workspace file, and click Save.

  3. Generate HTML code for web page:

    1. Generate the HTML code for the file in Visual Studio Code using the HTML5 Boilerplate extension (covered in Chapter 3 in sub-section titled ‘HTML Boilerplate’) by typing html5-boilerplate on your keyboard and hitting the Enter key.

  4. Add TensorFlow.js references:

    1. Copy and paste or type references to the TensorFlow.js files by using the code in Listing 3-1.

After following the above steps, the Visual Studio Code environment would look like this.

Figure 6-1: An HTML file with references to the TensorFlow.js scripts

Getting the Data

This step is also called data ingestion and comprises two sub-steps, data loading and data preparation, as illustrated in Figure 6-2.

Figure 6-2: TensorFlow.js Data Ingestion

Similar to Figure 5-3 which shows the artifacts involved in the training step, the following illustration in Figure 6-3a shows the artifacts involved in the define step of the TensorFlow.js library.

Figure 6-3a: Artifacts involved in the Define Step of TensorFlow.js

Data can be loaded in a JavaScript machine learning script file either as a Comma Separated File (CSV) or as Tensors. To prepare the data for training a model, it must be cleaned and/or wrangled and explored. Data exploration is also known as Exploratory Data Analysis (EDA). Each of these steps are explained further in the following text.

Loading the Data

Data is the lifeblood of an effective model. To train a machine learning model, data has to be loaded first, and trained. Data can be is several different formats, the most common of which are comma separated values or CSV files, and tensors.

Comma Separated Values (CSV)

You might have seen CSV files (with a .csv extension) even if do not perform machine learning since it is a lightweight format to transfer large amounts of information on the Internet. To view a sample CSV file, follow the link to view the World Food Program (WFP) Global Food Prices Database .csv file in Microsoft Excel or a text editor of your choice: https://data.humdata.org/dataset/wfp-food-prices

Note A sample CSV file can be searched for and opened by doing a web search on ‘sample csv file’.

You can load a CSV file by using the JavaScript code in Listing 6-1 below:

Listing 6-1: Loading a CSV file in TensorFlow.js

// Loading a CSV file
async function loadCsvData() {
   const csvUrl = 'https://url/';
   const csvFileName = 'fileName.csv';
   const csvFileUrl = csvUrl + csvFileName;
   const csvDataset = tf.data.csv(csvFileUrl, {
      hasHeader: true,
      columnConfigs: {
         predictionLabelName: {
            isLabel: true
         }
      }
   });
}

In the above code example, after a CSV file URL has been generated, it is loaded using the tf.data.csv function. Also, in columnConfigs, the programmer must specify the name of the column that needs to be predicted (label) by replacing the predictionLabelName with the name of the column that needs to be predicted (the outcome of the prediction).

Tensors

Tensors are covered in greater detail in Chapter 4. In the following simple example, tensors are used to load data in a script file, as shown in Listing 6-2.

Listing 6-2: Loading Tensor Data in TensorFlow.js

// Loading tensor data
async function loadTensorData() {
 const trainingData = {
 columnA: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 columnB: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 };
 const testData = {
 columnA: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 columnB: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 };
};

In the above code, columnA and columnB can be replaced by more meaningful names.

Note Outside of TensorFlow.js, after data is loaded, a data scientist or machine learning engineer may need clean, prepare, and explore the loaded data to perform data cleansing or organization and find anomalies etc. before a model can be trained, evaluated, and validated. In a web browser environment using TensorFlow.js, it is assumed that the programmer has already performed the operations to clean, organize, and/or explore the data, and model training can begin right away.

Data Conversion to Tensors

Any loaded data must be converted to a tensor before a model can be trained.

Listing 6-3: Convert Loaded Data in TensorFlow.js

// Convert data to tensors
async function loadDataAndConvertToTensors() {
 const trainingData = {
 columnA: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 columnB: [20, 40, 60, 80, 100, 120, 140, 160, 180, 200]
 };
 const testData = {
 columnA: [1, 2, 3, 4, 5],
 columnB: [20, 40, 60, 80, 100]
 };
 // Convert training data to tensors
 const trainingDataTensors = {
 columnA: tf.tensor2d(trainingData.columnA, [10, 1]),
 columnB: tf.tensor2d(trainingData.columnB, [10, 1])
 };
 // Convert test data to tensors
 const testDataTensors = {
 columnA: tf.tensor2d(testData.columnA, [5, 1]),
 columnB: tf.tensor2d(testData.columnB, [5, 1])
 };
};

The two functions above, titled trainingDataTensors and testDataTensors, convert the loaded data into a two-dimension tensors (tensor2d) by specifying the shape of the tensor to convert to.

Other data such as image must also be converted to tensors before they can be used to train a model. Image classification, recognition, etc. are covered in detail in chapters 7 and 8 but conversion from image to tensor is covered in this section. See Listing 6-4a for code to convert image data.

Listing 6-4a: Convert an Image to a Tensor

// Convert image to tensor
function convertImageToTensor() {
 let img = document.getElementById('image-element'); // get image element
 let tensorData = tf.browser.fromPixels(img); // covert to tensor
 alert(tensorData); // display tensor
}

Tip ECMAScript 2015 introduced const and let for variable declaration in JavaScript, whereas the previous versions used var. While it still works, you are encouraged to use the new keywords const and let to declare variables since they both offer block-scope in addition to specifying whether a variable can be reassigned; if declared with const, the value cannot be reassigned, whereas let allows the value to be modified later on after it has been initialized.

In the above code example, an HTML image is converted into a tensor using the tf.browser.fromPixels JavaScript function. In Listing 6-4b, the same code is used but some additional operations are also performed on the converted data.

Listing 6-4b: Convert an Image to a Tensor and perform some operations

// Convert image to tensor and perform operations
function convertImageToTensor() {
 let img = document.getElementById('image-element');
 let tensorData = tf.browser.fromPixels(img, 4) // covert/use 4 channels
 .resizeNearestNeighbor([100, 100]) // change image size
 .expandDims() // expand tensor rank
 .toFloat() // cast to float32
 .reverse(); // reverse the tensor
}

Tip If you have opened the HTML file directly in Windows Explorer, it will not execute and you will see the ‘Failed to execute texImage2D on WebGL2RenderingContext: The image element contains cross-origin data, and may not be loaded.’ error message in the Developer Toolbar. This happens because the embedded image in the file and the address of the HTML page itself are different. Instead, open the HTML file using the HTTP Server extension (refer to HTTP Server / HTML Review section in Chapter 3.).

Note The above code examples do not make use of any other JavaScript library. This has been done for brevity’s sake. You are encouraged to continue to use any libraries like jQuery, Angular, React, etc. since they do not cause any known conflicts with the TensorFlow.js library.

Preparing the Data

Data preparation entails massaging the data so it can be used to train a model. This step is also known as data wrangling, data munging, or data cleansing. The resulting output is a dataset that has undergone transformation to allow a machine learning model to be trained, tested, and validated.

There is the possibility of preparing data offline as it can speed up the production environment considerably. However, there are scenarios where it makes more sense to process the data online. Examples of this scenario include situations where data is updated often or the data is accessed with a small enough frequency where it does not make sense to host the data in a different environment as it places high storage and processing requirements for offline usage.

Data Cleaning

To allow data to be cleaned, each value must be massaged or wrangled to fill missing values, rounded or truncated, and/or eliminated from the output dataset, depending on the requirements.

Problems

Data requires cleaning or cleansing for a number of the following reasons:

  • Missing values: Values go missing when data is downloaded from an API or exported from a host of data sources.

  • Duplicate or irrelevant values: Entire rows of data appear multiple times violating uniqueness or referential integrity constraints; values or even columns might need to be removed since the data does not pertain to the problem being addressed.

  • Inaccurate or unexpected values: Inaccurate or unexpected values in a column are going to throw the model off-base and will need to be corrected or removed before model training.

  • Invalid formatting or extra space: Values inside a column or field might be valid, but are formatted incorrectly or contain extra spaces within them.

Solutions

Once bad data has been identified, it needs to be rectified using one of the following approaches:

  • Fix/optimize values: One of the solutions for data that is related to the problem at-hand is to fix or optimize the broken values. This can include filling-in or removing the missing values, aggregating the values by merging, combining, or summarizing the data, or by splitting the values.

  • Ignore or Work around: If an error in the data is not serious and does not impact the outcome, it can be ignored. Also, if the data is going to change in the production environment, it can be dealt with at a later stage.

  • Filter or re-generate: There are options to eliminate rows or values if they already exist in the dataset, or do not affect the results if they are removed. Also, re-generating the data is more effective when generating the data is inexpensive and costs less to query the source again than fixing the broken data.

The best way to learn data preparation or data cleaning is by using an actual dataset, so I made use of one in the example below, but before that, the reader is introduced with the commands they would be using. The following JavaScript commands can be used to prep the data for model training.

  • concat – Concatenates two or more arrays to form a new array. See Listing 6-5a for more details.

  • delete – Removes a field or property from a JavaScript object or JSON string, as shown in Listing 6-5b.

  • filter – Filters out or removes unwanted elements from an array, shown in Listing 6-5c.

  • map – Returns a new array from an input array, as shown in Listing 6-5d.

  • reduce – Transform an array to an atomic value after aggregation or summarization, as shown in Listing 6-5e.

Listing 6-5a: Concatenate two or more arrays

// Calls the concat function to join two or more arrays
function performConcatenation() {
 let pakistan = ["Islamabad","Karachi","Lahore"];
 let turkey = ["Ankara","Istanbul"];
 let china = ["Beijing","Shanghai","Shenzhen"];
 let pakTurkey = pakistan.concat(turkey);
 document.getElementById("pakTurkey").innerHTML = pakTurkey;
 let pakTurkeyChina = pakistan.concat(turkey,china);
 document.getElementById("pakTurkeyChina").innerHTML = pakTurkeyChina;
}

Listing 6-5b: Removes a field or property from an object

// Calls the delete operator to remove a field or property
function performDeletion() {
 var country = {
 name: "Pakistan",
 capital: "Islamabad",
 location: "South Asia",
 flag: "pakistan.png",
 currency: "Rupees",
 ageInYears: 73,
 rulingParty: "PTI"
 };
 delete country.ageInYears;
 document.getElementById("message").innerHTML =
 country.name + " was founded "
 + country.ageInYears + " years ago, and is presently ruled by "
 + country.rulingParty + ".";
}

Listing 6-5c: Remove an element from an array

var earthquakeReadings = [5.9, 5.6, 7.2, 6.5, 6.3, 6.7, 5.5, 6, 6.2, 5.7];
function checkHighEarthquake(earthquakeReading) {
 return earthquakeReading > 6;
}
// Calls the filter function to remove an element from an array
function performRemovalFromArray() {
 alert("Earthquake Readings: " +
 earthquakeReadings.filter(checkHighEarthquake));
}

Listing 6-5d: Create a new array from an input array

var earthquakeReadings = [5.9, 5.6, 7.2, 6.5, 6.3, 6.7, 5.5, 6, 6.2, 5.7];
// Calls the map function on an array
function performMapOperationOnArray() {
 let rounded = earthquakeReadings.map(Math.round);
 let truncated = earthquakeReadings.map(Math.trunc);
 alert("Rounded Readings: " + rounded);
 alert("Truncated Readings: " + truncated);
}

Listing 6-5e: Create an atomic value from an array

var earthquakeReadings = [5.9, 5.6, 7.2, 6.5, 6.3, 6.7, 5.5, 6, 6.2, 5.7];
function calculateAverage(arrayToUse) {
 return arrayToUse.reduce((a,b)=>a+b)/arrayToUse.length
}
// Calls the reduce function on an array
function performReduceOperationOnArray() {
 alert("Average Reading: " +
 calculateAverage(earthquakeReadings).toFixed(2));
}

Note The toFixed function is used to round the output to two decimal places (notice the 2 passed in the parameter list), and is optional.

Exploratory Data Analysis (EDA)

Part of cleaning up a dataset is its exploration or analysis. While it’s easy to analyze the dataset server-side using Python, R, or another language, it makes more sense to explore the data on the client alongside TensorFlow.js using a language, technology, or framework that the user is familiar with.

Datasets are typically analyzed on the server, using Python as the programming language alongside the DataFrame library. The most basic concept in exploring or analyzing a dataset in data science or machine learning is that of the DataFrame, which is basically a memory-resident, flat two-dimensional view of the data, with rows and columns, and is a subset of the entire data contained in a dataset.

We are going to use a CSV file of natural disasters in Pakistan (downloaded from web: https://query.data.world/s/grbwv7lhfsagoi37py65d3wqjs7n3y) to perform data preparation and EDA using JavaScript code in Listing 6-6. The CSV file has also been made available in the dataset folder on GitHub. The client-side library used in the following code is Danfo.js to create a DataFrame in the web browser. Please note that code in Listing 6-6 assumes that the Danfo.js library has already been added to the page using the script tag.

Listing 6-6: Explore or Analyze a Dataset

var earthquakeReadings = [5.9, 5.6, 7.2, 6.5, 6.3, 6.7, 5.5, 6, 6.2, 5.7];

Tip One other way you can learn to prepare a dataset and to use other operations, is to use machinelearn.js, a web-browser based machine learning library that is programmed using JavaScript, just like the TensorFlow.js library. While outside the scope of this text, you can learn more about it and its underlying API by going to its website at https://www.machinelearnjs.com/

Writing Code for Data Ingestion

Refer to Chapter 2 to view a list of the types of data available for machine learning. Regardless of the format of the data, it must be converted to numeric or tensor form for training, testing, and prediction. The following listings demonstrate some common use cases for ingesting data in a web browser.

Listing 6-7a: Loading a local Image using JavaScript

function loadImage() {
 const fileElement = document.getElementById("ImageFile");
 const imgElement = document.getElementById("ImageElement");
 const fileReader = new FileReader();
 fileReader.onload = function() {
 imgElement.src = fileReader.result;
 };
 fileReader.readAsDataURL(fileElement.files[0]);
}

Listing 6-7b: Loading a Video using JavaScript

function loadVideo() {
 const fileElement = document.getElementById("VideoSource");
 const vdoElement = document.getElementById("VideoElement");
 fileElement.src = "../../videos/JF17-sm.mp4";
 vdoElement.load();
 vdoElement.style.display = 'block';
}

Listing 6-7c: Displaying Webcam using JavaScript (Adobe Flash)

// Display computer’s webcam
function displayWebcam() {
}

The code in Listing 6-7c uses Adobe Flash, and may not work properly if you do not have the appropriate Flash player installed in your web browser. You can use the JavaScript code below in Listing 6-7d instead if you want to use HTML5 to view your webcam in a web browser.

Listing 6-7d: Displaying Webcam using JavaScript (HTML5)

// Convert image to tensor and perform operations

Tip There are two ways to give access to your webcam through a Microsoft Edge web browser in Windows 10. You may need just one of both of these methods: Privacy Settings: Browser Settings:

Architecture Definition

A software architecture defines the components that make up a software application. This is in contrast to defining a machine learning architecture, which is synonymous with outlining the machine learning problem and selecting the algorithms that can be used to solving that problem.

Problems

Refer to section titled ‘Problem Types’ in Chapter 2 and the algorithms to address those machine learning problem types in Figure 2-8a. Within an organization, machine learning engineers are expected to frame the problem, the solution to that problem, and the resulting benefits to the enterprise. That is a tall order since it requires knowing all about the data that is available to the ML engineers and/or data scientists, not to mention how that solution would be surfaced to the relevant stakeholders who would make use of the information to make meaningful and sound decisions.

I would recommend you view Figure 6-4a below with Figures 2-8a and 2-8b since it summarizes some of the problems that a machine learning solution addresses that can be beneficial to the organization and its stakeholders.

Figure 6-4a: Typical Machine Learning Problems

Architectures

Once a problem statement has been framed, the architects have to define what the machine learning solution would look like and how it will be made available to its users. A machine learning architecture should address the following 3 areas:

  1. The machine learning algorithms to solve the problem.

  2. The components or building blocks of the solution.

  3. The overall architecture of the solution as it fits into the larger enterprise application.

Items 2 and 3 above are outside the scope of this book, but the first item pertaining to the machine learning algorithms and use cases is summarized in Figure 6-4b below.

Figure 6-4b: Machine Learning Use Cases

Note The above diagram is an extension of Figures 2-8a, 2-8b, and 2-8c.

Parameter Tuning

TODO

Figure 6-3b: Artifacts involved in the Compile Step of TensorFlow.js

Model Training

Once data has been loaded in a web browser, it is used to train and validate a model. Alternatively, a pre-trained model (even one trained by someone else and using a different technology) can be used for validation.

Splitting the Data

Before a model can be trained, the dataset must first be split into two pieces; one for training, one for testing, and the last one for validation. The training and testing datasets are mandatory, but the validation set is optional. Refer to Figure 6-5 below for a visual representation of this description.

Tip I coined the term STEP (Split, Train, Evaluate, Publish) in 2016 to explain the machine learning process after data has been ingested. Regardless of the framework used for model training or testing, and whether training/testing occurs on the server or the client, STEP still holds true today and can be used to describe the stages of a machine learning solution.

Figure 6-5: Splitting the Data

As shown in Figure 6-5, the training dataset is always larger than the test or evaluation dataset. One of the other notable things in the figure above is the validation dataset, which is separate from the training and testing data. All three can be described as follows:

  • Training – Data required for fitting the model to information contained within, and is passed to the Model.fit() statement.

  • Testing/Evaluation – To test or evaluate the performance of a model, the testing data is passed to the Model.evaluate() statement.

  • Validation – Optional in nature, the validation data is passed as argument to the Model.fit()statement, and selects the model hyperparameters.

Training Activities

[todo]

[todo]

Testing and Evaluating the Model

Testing, evaluating, or validating a model determines the accuracy of the trained model using the test data piece of the dataset.

Figure 6-3c: Artifacts involved in the Predict Step of TensorFlow.js

Test or Evaluation Activities

[todo]

Key Takeaways

This chapter describes the steps for an end-to-end solution development using the TensorFlow.js library in conjunction with the previous chapter (which covers the training piece), and explains the following:

  • Loading the data from an external source.

  • Cleaning and exploring the loaded data.

  • Picking an architecture based on the loaded data and the machine learning problem to solve.

The steps for deploying the trained model are covered later on in the book.

Last updated