Using Public Datasets with TensorFlow.pptx

giddijukho 8 views 21 slides May 09, 2024

Slide 1 of 21

About This Presentation

Using TensorFlow and public datasets

Size: 928.27 KB

Language: en

Added: May 09, 2024

Slides: 21 pages

Slide Content

Using Public Datasets with TensorFlow Datasets CHAPTER 4

There are lots of different ways of getting the data with which to train a model. Fashion MNIST dataset that is conveniently bundled with Keras Many public datasets require you to learn lots of different domain-specific skills before you begin to consider your model architecture. The goal behind Tensor‐Flow Datasets (TFDS) is to expose datasets in a way that’s easy to consume, where all the preprocessing steps of acquiring the data and getting it into TensorFlow-friendly APIs are done for you

TFDS builds on this idea, but greatly expands not only the number of datasets available but the diversity of dataset types. TensorFlow Datasets is a separate install from TensorFlow, so be sure to install it before trying out any samples! If you are using Google Colab , it’s already preinstalled.

TFDS List The list of available datasets is growing all the time, in categories such as: Audio: Speech and music data Image : From simple learning datasets like Horses or Humans up to advanced research datasets for uses such as diabetic retinopathy detection Object detection : COCO, Open Images, and more Structured data : Titanic survivors, Amazon reviews, and more Summarization : News from CNN and the Daily Mail, scientific papers, wikiHow , and more Text : IMDb reviews, natural language questions, and more Translate: Various translation training datasets Video : Moving MNIST, Starcraft , and more

Getting Started with TFDS

Data about the dataset is also available using the with_info parameter when loading the dataset, like this:

Using TFDS with Keras Models In Chapter 2 we saw: When using TFDS the code is very similar, but with some minor changes. The Keras datasets gave us ndarray types that worked natively in model.fit , but with TFDS we’ll need to do a little conversion work:

Horses & Humans from tfds

The Horses or Humans dataset is split into training and test sets, so if you want to do validation of your model while training, you can do so by loading a separate validation set from TFDS like this:

Loading Specific Versions All datasets stored in TFDS use a MAJOR.MINOR.PATCH numbering system If PATCH is updated, then the data returned by a call is identical, but the underlying organization may have changed If MINOR is updated, then the data is still unchanged, with the exception that there may be additional features in each record If MAJOR is updated, then there may be changes in the format of the records and their placement

Using Mapping Functions for Augmentation We used augmentation tools that were available when using an ImageDataGenerator to provide the training data for your model.

Using TensorFlow Addons The TensorFlow Addons library contains even more functions that you can use. Some of the functions in the ImageDataGenerator augmentation (such as rotate) can only be found there, so it’s a good idea to check it out.

Using Custom Splits If you’re familiar with Python slice notation, you can use that as well. This notation can be summarized as defining your desired slices within square brackets like this: [<start>: <stop>: <step>]

The ETL process Extract, Transfer, and Load ETL is the core pattern that TensorFlow uses for training, regardless of scale. We’ve been exploring small-scale, single-computer model building in this book, but the same technology can be used for large-scale training across multiple machines with massive datasets.

The Extract-Transform-Load (ETL) process is a crucial step in training machine learning models. Extraction and Transformation: Data extraction and transformation can be performed on any processor, including a CPU. Tasks such as downloading data, unzipping files, and preprocessing records are typically executed on the CPU. The code used for these tasks may not fully leverage the capabilities of GPUs or TPUs, as they are primarily designed for parallel computation.

Training Phase: Training a model offers significant benefits when utilizing GPUs or TPUs. GPUs and TPUs excel in parallel processing, making them ideal for training tasks. Therefore, it's advantageous to utilize GPUs or TPUs during the training phase whenever possible. Workload Distribution: In scenarios where both CPU and GPU/TPU resources are available, it's beneficial to distribute the workload accordingly. The Extract and Transform stages are typically performed on the CPU, leveraging its general processing capabilities. Conversely, the Load stage, involving model training, is best suited for execution on GPUs or TPUs due to their specialized parallel computing capabilities.

Large datasets often require data preparation, including extraction and transformation, to be performed in batches due to their size. In this scenario, while one batch is being prepared, the GPU/TPU remains idle as it awaits the data for training. Once the batch is ready, it can be sent to the GPU/TPU for training, but this leaves the CPU idle until the training process is completed. Subsequently, the CPU starts preparing the next batch, leading to significant idle time in the overall process. This idle time highlights the potential for optimization in the data preparation and training pipeline to minimize resource underutilization and improve overall efficiency.

Using Public Datasets with TensorFlow.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Using Public Datasets with TensorFlow.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx