What is Numpy?

It stands for Numerical python. It is a python package which is used to perform a wide variety of mathematical operations on arrays and works faster than a regular python list.

What can we do with Numpy?

One of the easiest things to do with Numpy is to change the shape of an array

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6])

print(a)

b = np.reshape(
               a,     # the array being reshaped
               (2,3)  # dimensions of the new array
              )

print(b) 

c = np.reshape(
               a,    
               (6,1)  
              )

print(c)

[1 2 3 4 5 6]
[[1 2 3]
 [4 5 6]]
[[1]
 [2]
 [3]
 [4]
 [5]
 [6]]

Create two dimensional array.

a_list = [[1, 2, 3, 4, 5, 6], [6, 5, 4, 3, 2, 1]]
d = np.array([a_list])
d

array([[[1, 2, 3, 4, 5, 6],
        [6, 5, 4, 3, 2, 1]]])

Accessing Elements: Can find a specific index, similar to regular python lists

z1 = np.random.randint (10, size=6)

z1[0] # Find an index

8

z1[0:2] # Create an array of the specified range

array([2, 8])

z1[-1] # Get last element of the list

6

Using Numpy With Images

from skimage import io
photo = io.imread('san_diego.jpg')
type(photo)

numpy.ndarray

import matplotlib.pyplot as plt
plt.imshow(photo)

photo.shape

(549, 976, 3)

Output a mirror image

plt.imshow(photo[::-1])

<matplotlib.image.AxesImage at 0x7f8c20a3d9d0>

Find specific part of the image by putting axes into an array

plt.imshow(photo[150:400, 675:775])

<matplotlib.image.AxesImage at 0x7f8c1885d520>

Reduce the size of an image

plt.imshow(photo[::2, ::2])

<matplotlib.image.AxesImage at 0x7f8c18830f70>

Can use Numpy math functions to find values related to an image

photo
photo_sin = np.sin(photo)
photo_sin

array([[[-0.355  ,  0.8857 ,  0.9946 ],
        [-0.355  ,  0.8857 ,  0.9946 ],
        [-0.355  ,  0.8857 ,  0.9946 ],
        ...,
        [ 0.3467 , -0.9985 , -0.5063 ],
        [ 0.9766 , -0.491  ,  0.452  ],
        [ 0.9766 , -0.491  ,  0.452  ]],

       [[-0.355  ,  0.8857 ,  0.9946 ],
        [-0.355  ,  0.8857 ,  0.9946 ],
        [-0.355  ,  0.8857 ,  0.9946 ],
        ...,
        [ 0.9766 , -0.491  ,  0.452  ],
        [ 0.9766 , -0.491  ,  0.452  ],
        [ 0.9766 , -0.491  ,  0.452  ]],

       [[-0.355  ,  0.8857 ,  0.9946 ],
        [-0.355  ,  0.8857 ,  0.9946 ],
        [-0.355  ,  0.8857 ,  0.9946 ],
        ...,
        [ 0.9766 , -0.491  ,  0.452  ],
        [ 0.7085 ,  0.4678 ,  0.9946 ],
        [ 0.7085 ,  0.4678 ,  0.9946 ]],

       ...,

       [[-0.404  , -0.677  ,  0.869  ],
        [-0.1323 , -0.02655,  0.721  ],
        [ 0.9907 , -0.5586 , -0.46   ],
        ...,
        [-0.906  , -0.9663 , -0.1935 ],
        [-0.846  , -0.305  , -0.8115 ],
        [-0.906  , -0.305  , -0.7905 ]],

       [[ 0.5513 , -0.9854 ,  0.8857 ],
        [ 0.5293 , -0.3877 , -0.9424 ],
        [-0.988  ,  0.774  ,  0.6963 ],
        ...,
        [-0.846  , -0.7393 , -0.1935 ],
        [-0.751  , -0.5215 ,  0.9727 ],
        [-0.2878 , -0.5586 , -0.1935 ]],

       [[-0.751  , -0.305  ,  0.5806 ],
        [-1.     ,  0.987  ,  0.785  ],
        [-0.757  ,  0.0177 , -0.09717],
        ...,
        [-0.846  ,  0.92   , -0.93   ],
        [ 0.6504 , -0.9995 , -0.616  ],
        [-1.     ,  0.6704 , -0.46   ]]], dtype=float16)

Numpy Hacks

For your hacks, use matplotlib and numpy to slice this image to display Waldo. Also find and display one other numpy function and blog about what it is used for.

photo_a = io.imread('waldo.jpg')
type(photo_a)

plt.imshow(photo_a)

<matplotlib.image.AxesImage at 0x7f8c18782f70>

What is Pandas?

Pandas is an open source Python package which is used for data analysis and machine learning. Pandas is built using numpy which supports it using multidimensional arrays such as what is shown below:

Data can then be manipulated using pandas to do all sorts of different things such as data cleaning, statistical analysis, and data visualization. Below is an example of data visualization in a table using pandas.

import pandas as pd

pd.__version__

'1.4.2'

classes = pd.Series(["Mathematics","Chemistry","Physics","History","Geography","German"])

grades  = pd.Series([90,54,77,22,25])


pd.DataFrame({"Classes": classes, "Grades": grades})

What is TenserFlow?

Tensor Flow is a machine learning platform which has tools to validate and transform large datasets and also provides standard datasets for machine learning training.

An example of this is the fashion MNIST (Modified National Institute of Standards and Technology) database:

In this dataset there are many pictures which are each28x28 Numpy arrays.

Setting up

This will train a neural network model to identify pictures of clothing.
We will use tf.keras, a high-level API to build and train models in TensorFlow.

import tensorflow as tf

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/sophia/vscode/fastpages/_notebooks/2023-04-27-Data-Analysis-Tensor-Flow.ipynb Cell 29 in <cell line: 2>()
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/sophia/vscode/fastpages/_notebooks/2023-04-27-Data-Analysis-Tensor-Flow.ipynb#X40sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> # TensorFlow and tf.keras
----> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/sophia/vscode/fastpages/_notebooks/2023-04-27-Data-Analysis-Tensor-Flow.ipynb#X40sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a> import tensorflow as tf
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/sophia/vscode/fastpages/_notebooks/2023-04-27-Data-Analysis-Tensor-Flow.ipynb#X40sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a> # Helper libraries
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/sophia/vscode/fastpages/_notebooks/2023-04-27-Data-Analysis-Tensor-Flow.ipynb#X40sdnNjb2RlLXJlbW90ZQ%3D%3D?line=4'>5</a> import numpy as np

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/__init__.py:37, in <module>
     34 import sys as _sys
     35 import typing as _typing
---> 37 from tensorflow.python.tools import module_util as _module_util
     38 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
     40 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import.

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/__init__.py:42, in <module>
     37 from tensorflow.python.eager import context
     39 # pylint: enable=wildcard-import
     40 
     41 # Bring in subpackages.
---> 42 from tensorflow.python import data
     43 from tensorflow.python import distribute
     44 # from tensorflow.python import keras

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/__init__.py:21, in <module>
     15 """`tf.data.Dataset` API for input pipelines.
     16 
     17 See [Importing Data](https://tensorflow.org/guide/data) for an overview.
     18 """
     20 # pylint: disable=unused-import
---> 21 from tensorflow.python.data import experimental
     22 from tensorflow.python.data.ops.dataset_ops import AUTOTUNE
     23 from tensorflow.python.data.ops.dataset_ops import Dataset

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/experimental/__init__.py:97, in <module>
     15 """Experimental API for building input pipelines.
     16 
     17 This module contains experimental `Dataset` sources and transformations that can
   (...)
     93 @@UNKNOWN_CARDINALITY
     94 """
     96 # pylint: disable=unused-import
---> 97 from tensorflow.python.data.experimental import service
     98 from tensorflow.python.data.experimental.ops.batching import dense_to_ragged_batch
     99 from tensorflow.python.data.experimental.ops.batching import dense_to_sparse_batch

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/experimental/service/__init__.py:419, in <module>
      1 # Copyright 2020 The TensorFlow Authors. All Rights Reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     13 # limitations under the License.
     14 # ==============================================================================
     15 """API for using the tf.data service.
     16 
     17 This module contains:
   (...)
    416   job of ParameterServerStrategy).
    417 """
--> 419 from tensorflow.python.data.experimental.ops.data_service_ops import distribute
    420 from tensorflow.python.data.experimental.ops.data_service_ops import from_dataset_id
    421 from tensorflow.python.data.experimental.ops.data_service_ops import register_dataset

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/experimental/ops/data_service_ops.py:22, in <module>
     20 from tensorflow.core.protobuf import data_service_pb2
     21 from tensorflow.python import tf2
---> 22 from tensorflow.python.data.experimental.ops import compression_ops
     23 from tensorflow.python.data.experimental.service import _pywrap_server_lib
     24 from tensorflow.python.data.experimental.service import _pywrap_utils

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/experimental/ops/compression_ops.py:16, in <module>
      1 # Copyright 2020 The TensorFlow Authors. All Rights Reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     13 # limitations under the License.
     14 # ==============================================================================
     15 """Ops for compressing and uncompressing dataset elements."""
---> 16 from tensorflow.python.data.util import structure
     17 from tensorflow.python.ops import gen_experimental_dataset_ops as ged_ops
     20 def compress(element):

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py:22, in <module>
     18 import itertools
     20 import wrapt
---> 22 from tensorflow.python.data.util import nest
     23 from tensorflow.python.framework import composite_tensor
     24 from tensorflow.python.framework import ops

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/util/nest.py:34, in <module>
      1 # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     13 # limitations under the License.
     14 # ==============================================================================
     16 """## Functions for working with arbitrarily nested sequences of elements.
     17 
     18 NOTE(mrry): This fork of the `tensorflow.python.util.nest` module
   (...)
     31    arrays.
     32 """
---> 34 from tensorflow.python.framework import sparse_tensor as _sparse_tensor
     35 from tensorflow.python.util import _pywrap_utils
     36 from tensorflow.python.util import nest

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/framework/sparse_tensor.py:25, in <module>
     23 from tensorflow.python import tf2
     24 from tensorflow.python.framework import composite_tensor
---> 25 from tensorflow.python.framework import constant_op
     26 from tensorflow.python.framework import dtypes
     27 from tensorflow.python.framework import ops

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:25, in <module>
     23 from tensorflow.core.framework import types_pb2
     24 from tensorflow.python.eager import context
---> 25 from tensorflow.python.eager import execute
     26 from tensorflow.python.framework import dtypes
     27 from tensorflow.python.framework import op_callbacks

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:21, in <module>
     19 from tensorflow.python import pywrap_tfe
     20 from tensorflow.python.eager import core
---> 21 from tensorflow.python.framework import dtypes
     22 from tensorflow.python.framework import ops
     23 from tensorflow.python.framework import tensor_shape

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/framework/dtypes.py:37, in <module>
     34 from tensorflow.core.function import trace_type
     35 from tensorflow.tools.docs import doc_controls
---> 37 _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type()
     38 _np_float8_e4m3fn = _pywrap_float8.TF_float8_e4m3fn_type()
     39 _np_float8_e5m2 = _pywrap_float8.TF_float8_e5m2_type()

TypeError: Unable to convert function return value to a Python type! The signature was
	() -> handle

We will use the Fashion MNIST dataset which contains 70,000 grayscale images in 10 categories.
Here we will load the database.

fashion_mnist = tf.keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
29515/29515 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26421880/26421880 [==============================] - 2s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
5148/5148 [==============================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4422102/4422102 [==============================] - 0s 0us/step

This dataset loads 4 NumPy arrays:
train_images and train_labels arrays are the training set, used for the models to learn
test_images and test_labels arrays test the accuracy

Each imaged is mapped to a label. The class names are not included with the dataset, store them here to use later when plotting images:

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, with each image represented as 28 x 28 pixels:

train_images.shape

(60000, 28, 28)

Likewise, there are 60,000 labels in the training set:

len(train_labels)

60000

Each label is an integer between 0 and 9:

train_labels

array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)

There are 10,000 images in the test set. Again, each image is represented as 28 x 28 pixels:

test_images.shape

(10000, 28, 28)

And the test set contains 10,000 images labels:

len(test_labels)

10000

Preprocessing data

The data must be preprocessed before training the network. If you inspect the first image in the training set, you will see that the pixel values fall in the range of 0 to 255:

plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
plt.show()

Scale these values to a range of 0 to 1 before feeding them to the neural network model. To do so, divide the values by 255. It's important that the training set and the testing set be preprocessed in the same way:

train_images = train_images / 255.0

test_images = test_images / 255.0

To verify that the data is in the correct format and that you're ready to build and train the network, let's display the first 25 images from the training set and display the class name below each image.

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

Building the neural network requires configuring the layers of the model, then compiling the model.

The basic building block of a neural network is the layer. Layers extract representations from the data fed into them. Hopefully, these representations are meaningful for the problem at hand.

Most of deep learning consists of chaining together simple layers. Most layers, such as tf.keras.layers.Dense, have parameters that are learned during training.

model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10)
])

The first layer in this network, tf.keras.layers.Flatten, transforms the format of the images from a two-dimensional array (of 28 by 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels). Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn; it only reformats the data.

After the pixels are flattened, the network consists of a sequence of two tf.keras.layers.Dense layers. These are densely connected, or fully connected, neural layers. The first Dense layer has 128 nodes (or neurons). The second layer returns a logits array with length of 10. Each node contains a score that indicates the current image belongs to one of the 10 classes.

Before the model is ready for training, it needs a few more settings. These are added during the model's compile step:

Loss function —This measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction.
Optimizer —This is how the model is updated based on the data it sees and its loss function.
Metrics —Used to monitor the training and testing steps. The following example uses accuracy, the fraction of the images that are correctly classified.

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

Training

Training the neural network model requires the following steps:

Feed the training data to the model. (train_images and train_labels arrays.)
The model learns to associate images and labels.
You ask the model to make predictions about a test set (test_images array)
Verify that the predictions match the labels from the test_labels array.

To start training, call the model.fit method

model.fit(train_images, train_labels, epochs=10)

Epoch 1/10

2023-04-04 11:53:51.070373: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 188160000 exceeds 10% of free system memory.

1875/1875 [==============================] - 11s 6ms/step - loss: 0.4959 - accuracy: 0.8268
Epoch 2/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.3762 - accuracy: 0.8651
Epoch 3/10
1875/1875 [==============================] - 10s 6ms/step - loss: 0.3361 - accuracy: 0.8778
Epoch 4/10
1875/1875 [==============================] - 10s 6ms/step - loss: 0.3130 - accuracy: 0.8860
Epoch 5/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2946 - accuracy: 0.8903
Epoch 6/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2816 - accuracy: 0.8958
Epoch 7/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2679 - accuracy: 0.9011
Epoch 8/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2578 - accuracy: 0.9027
Epoch 9/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.2494 - accuracy: 0.9062
Epoch 10/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.2393 - accuracy: 0.9107

<keras.callbacks.History at 0x7f41ac3327f0>

As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 0.91 (or 91%) on the training data.

Next, compare how the model performs on the test dataset:

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)

313/313 - 1s - loss: 0.3225 - accuracy: 0.8908 - 1s/epoch - 3ms/step

Test accuracy: 0.8907999992370605

It turns out that the accuracy on the test dataset is a little less than the accuracy on the training dataset. This gap between training accuracy and test accuracy represents overfitting. Overfitting is when a machine learning model performs worse on new, previouly unseen inputs than on the training data.

Predicting Images

With the model trained, you can use it to make predictions about some images. Attach a softmax layer to convert the model's linear outputs (logits) to probabilities, which should be easier to interpret.

probability_model = tf.keras.Sequential([model, 
                                         tf.keras.layers.Softmax()])

predictions = probability_model.predict(test_images)

313/313 [==============================] - 1s 3ms/step

Here, the model has predicted the label for each image in the testing set. Let's take a look at the first prediction:

predictions[0]

array([1.7737974e-10, 9.8017128e-10, 2.4250555e-08, 2.7087502e-10,
       3.3816602e-11, 7.0955430e-04, 1.5008560e-09, 2.1424549e-02,
       2.1235054e-09, 9.7786587e-01], dtype=float32)

A prediction is an array of 10 numbers. They represent the model's "confidence" that the image corresponds to each of the 10 different articles of clothing. You can see which label has the highest confidence value:

np.argmax(predictions[0])

9

So, the model is most confident that this image is an ankle boot, or class_names[9]. Examining the test label shows that this classification is correct:

test_labels[0]

9

Graph this to look at the full set of 10 class predictions.

def plot_image(i, predictions_array, true_label, img):
  true_label, img = true_label[i], img[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])

  plt.imshow(img, cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'

  plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
  true_label = true_label[i]
  plt.grid(False)
  plt.xticks(range(10))
  plt.yticks([])
  thisplot = plt.bar(range(10), predictions_array, color="#777777")
  plt.ylim([0, 1])
  predicted_label = np.argmax(predictions_array)

  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')

With the model trained, you can use it to make predictions about some images.

Let's look at the 0th image, predictions, and prediction array. Correct prediction labels are blue and incorrect prediction labels are red. The number gives the percentage (out of 100) for the predicted label.

i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  test_labels)
plt.show()

i = 12
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  test_labels)
plt.show()

Let's plot several images with their predictions. Note that the model can be wrong even when very confident.

# Color correct predictions in blue and incorrect predictions in red.
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
  plt.subplot(num_rows, 2*num_cols, 2*i+1)
  plot_image(i, predictions[i], test_labels, test_images)
  plt.subplot(num_rows, 2*num_cols, 2*i+2)
  plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()

Finally, use the trained model to make a prediction about a single image.

img = test_images[1]

print(img.shape)

(28, 28)

tf.keras models are optimized to make predictions on a batch, or collection, of examples at once. Accordingly, even though you're using a single image, you need to add it to a list:

img = (np.expand_dims(img,0))

print(img.shape)

(1, 28, 28)

Now predict the correct label for this image:

predictions_single = probability_model.predict(img)

print(predictions_single)

1/1 [==============================] - 0s 52ms/step
[[1.0949210e-05 4.1276347e-11 9.9810290e-01 1.2848138e-10 1.0825287e-03
  1.3133799e-13 8.0366491e-04 1.1366387e-15 3.2978012e-10 1.3686339e-15]]

plot_value_array(1, predictions_single[0], test_labels)
_ = plt.xticks(range(10), class_names, rotation=45)
plt.show()

tf.keras.Model.predict returns a list of lists, one list for each image in the batch of data. Grab the predictions for our (only) image in the batch:

np.argmax(predictions_single[0])

2

And the model predicts a label as expected.

	Classes	Grades
0	Mathematics	90.0
1	Chemistry	54.0
2	Physics	77.0
3	History	22.0
4	Geography	25.0
5	German	NaN