Lecture 13: Intro to Machine Learning

Lecture 13: Intro to Machine Learning

CS 179: LECTURE 15 INTRODUCTION TO CUDNN (CUDA DEEP NEURAL NETS) LAST TIME We derived the minibatch stochastic gradient descent algorithm for neural networks, i.e. Mostly matrix multiplications to compute One other derivative,

cuBLAS (Basic Linear Algebra Subroutines) already does matrix multiplications for us cuDNN will take care of these other derivatives for us TODAY Using cuDNN to do deep neural networks SETTING UP CUDNN cudnnHandle_t

Like cuBLAS, you need to maintain cuDNN library context Call cudnnCreate(cudnnHandle_t *handle) to initialize the context Call cudnnDestroy(cudnnHandle_t handle) to clean up the context HANDLING ERRORS Almost every function we will talk about today

returns a cudnnStatus_t (an enum saying whether a cuDNN call was successful or how it failed) Like standard CUDA, we will provide you with a checkCUDNN(cudnnStatus_t status) wrapper function that parses any error statuses for you Make sure you wrap every function call with this function so you know where and how your code breaks! REMINDERS ABOUT CUDA

Pattern of allocate initialize free (reiterate here for students who may not be as comfortable with C++) DATA REPRESENTATION cudnnTensor_t For the purposes of cuDNN (and much of machine learning), a tensor is just a multidimensional array A wrapper around a flattened 3-8 dimensional

array Used to represent minibatches of data For now, we will be using flattened 4D arrays to represent each minibatch X DATA REPRESENTATION cudnnTensor_t Consider the case where each individual training example is just a vector (so the last two axes will have size each)

Then X[n,c,0,0] is the value of component c of example n If axis has size size, then X[n,c,h,w] (pseudocode) is actually X[n*size0*size1*size2 + c*size0*size1 + h*size0 + w] DATA REPRESENTATION cudnnTensor_t More generally, a single training example may

itself be a matrix or a tensor. For example, in a minibatch of RGB images, we may have X[n,c,h,w], where n is the index of an image in the minibatch, c is the channel (R = , G = , B = ), and h and w index a pixel (h, w) in the image (h and w are height and width) DATA REPRESENTATION cudnnTensorDescriptor_t

Allocate by calling cudnnCreateTensorDescriptor( cudnnTensorDescriptor_t *desc) The ordering of array axes is defined by an enum called a cudnnTensorFormat_t (since we are indexing as X[n,c,h,w], we will use CUDNN_TENSOR_NCHW) A cudnnDataType_t specifies the data type of the tensor (we will use CUDNN_DATA_FLOAT)

DATA REPRESENTATION cudnnTensorDescriptor_t Initialize by calling cudnnSetTensor4dDescriptor( cudnnTensorDes criptor_t desc, cudnnTensorFormat_t format, cudnnDataType_t dataType, int n, int c, int h, int w) Free by calling cudnnDestroyTensorDescriptor( cudnnTensorD escriptor_t desc)

DATA REPRESENTATION cudnnTensorDescriptor_t Get the contents by calling cudnnGetTensor4dDescriptor( cudnnTensorDescriptor _t desc, cudnnDataType_t dataType, int *n, int *c, int *h, int *w, int *nStr, int *cStr, int *hStr, int *wStr) Standard trick of returning by setting output parameters Dont worry about the strides nStr, cStr, hStr, wStr

RELATION TO ASSIGNMENT 5 Forward pass (Algorithm) For each minibatch of training examples (Each example and its label are a column in matrices and respectively) For each counting up from to Compute matrix Compute matrix Our models prediction is

RELATION TO ASSIGNMENT 5 Forward pass (Implementation) Calculate the expected sizes of the inputs and outputs of each layer and allocate arrays of the appropriate size Input has shape Weight matrix has shape Outputs and have shape

Initialize tensor descriptors for each and RELATION TO ASSIGNMENT 5 Forward pass (Implementation) Note that cuBLAS puts matrices in column-major order, so and will be tensors of shape In this assignment, the skeleton code we provide will handle the bias terms for you (this is the extra term that weve been carrying in this whole time)

Just remember that when we write , we are implicitly including this bias term! RELATION TO ASSIGNMENT 5 Backward pass (Algorithm) Initialize gradient matrix For each counting down from to Calculate Calculate for each

and Update RELATION TO ASSIGNMENT 5 Backward pass (Implementation) Each matrix has the same shape as the input to its corresponding layer, i.e. Have each share a tensor descriptor with its corresponding Update each using cuBLASs GEMM

cuDNN needs the associated tensor descriptor when applying the derivative of the activation/nonlinearity ACTIVATION FUNCTIONS cudnnActivationDescriptor_t Allocate with cudnnCreateActivationDescriptor( cudnnActiv ationDescriptor_t *desc) Destroy with cudnnDestroyActivationDescriptor( cudnnActi vationDescriptor_t desc)

ACTIVATION FUNCTIONS cudnnActivationMode_t An enum that specifies the type of activation we should apply after any given layer Specify as CUDNN_ACTIVATION_ can be SIGMOID, RELU, TANH, CLIPPED_RELU, or ELU (the last 2 are fancier activations that address some of the issues with ReLU); use RELU for this assignment

ACTIVATION FUNCTIONS Graphs of activations as a reminder ACTIVATION FUNCTIONS cudnnNanPropagation_t An enum that specifies whether to propagate NANs Use CUDNN_PROPAGATE_NAN for this assignment ACTIVATION FUNCTIONS cudnnActivationDescriptor_t

Set with cudnnSetActivationDescriptor( cudnnActivati onDescriptor_t desc, cudnnActivationMode_t mode, cudnnNanPropagation_t reluNanOpt, double coef) coef is relevant only for clipped ReLU and ELU activations, so just use 0.0 for this assignment ACTIVATION FUNCTIONS cudnnActivationDescriptor_t Get contents with cudnnGetActivationDescriptor( cudnnActivatio

nDescriptor_t desc, cudnnActivationMode_t *mode, cudnnNanPropagation_t *reluNanOpt, double *coef) coef is relevant only for clipped ReLU and ELU activations, so just give it a reference to a double for throwaway values ACTIVATION FUNCTIONS Forward pass for an activation Computes tensor x = alpha[0] * (z) + beta[0] * x Note: numeric * means element-wise multiplication

cudnnActivationForward( cudnnHandle_t handle, cudnnActivationDescriptor_t activationDesc, void *alpha, cudnnTensorDescriptor_t zDesc, void *z, void *beta, cudnnTensorDescriptor_t xDesc, void *x) ACTIVATION FUNCTIONS Backward pass for an activation Computes dz = alpha[0] * (z) * dx + beta[0] * dz

cudnnActivationBackward( cudnnHandle_t handle, cudnnActivationDescriptor_t activationDesc, void *alpha, cudnnTensorDescriptor_t xDesc, void *x, cudnnTensorDescriptor_t dxDesc, void *dx, void *beta, cudnnTensorDescriptor_t zDesc, void *z, cudnnTensorDescriptor_t dzDesc, void *dz) ACTIVATION FUNCTIONS

Backward pass for an activation Computes dz = alpha[0] * (z) * dx + beta[0] * dz These are element-wise products, not matrix products! x: output of the activation, dx: derivative wrt x, z: input to the activation, dz: tensor to accumulate as output SOFTMAX/CROSS-ENTROPY LOSS

Consider a single training example transformed as The softmax function is The cross-entropy loss is Gives us a notion of how good our classifier is SOFTMAX/CROSS-ENTROPY LOSS Forward pass Computes tensor x = alpha[0] * softmax(z) + beta[0] * x

cudnnSoftmaxForward(cudnnHandle_t handle, cudnnSoftmaxAlgorithm_t alg, cudnnSoftmaxMode_t mode, void *alpha, cudnnTensorDescriptor_t zDesc, void *z, void *beta, cudnnTensorDescriptor_t xDesc, void *x) SOFTMAX/CROSS-ENTROPY LOSS cudnnSoftmaxAlgorithm_t Enum that specifies how to do compute the

softmax Use CUDNN_SOFTMAX_ACCURATE for this class (scales everything by to avoid overflow) The other options are CUDNN_SOFTMAX_FAST (less numerically stable) and CUDNN_SOFTMAX_LOG (computes the natural log of the softmax function) SOFTMAX/CROSS-ENTROPY LOSS

cudnnSoftmaxMode_t Enum that specifies over which data to compute the softmax CUDNN_SOFTMAX_MODE_INSTANCE does it over the entire input (sum over all c, h, w for a single n in X[n,c,h,w]) CUDNN_SOFTMAX_MODE_CHANNEL does it over each channel (sum over all c for each n, h, w triple in X[n,c,h,w])

Since h and w are both size here, either is fine to use SOFTMAX/CROSS-ENTROPY LOSS Backward pass cuDNN has a built-in function to compute the gradient of the softmax activation on its own However, when coupled with the cross-entropy loss, we get the following gradient wrt :

This is easier and faster to compute manually! Therefore, you will implement the kernel for this yourself SOFTMAX WITH OTHER LOSSES Backward pass For different losses, use the following function: cudnnSoftmaxBackward(cudnnHandle_t handle, cudnnSoftmaxAlgorithm_t alg, cudnnSoftmaxMode_t mode,

void *alpha, cudnnTensorDescriptor_t xDesc, void *x, cudnnTensorDescriptor_t dxDesc, void *dx, void *beta, cudnnTensorDescriptor_t dzDesc, void *dz) SOFTMAX WITH OTHER LOSSES Backward pass As with other backwards functions in cuDNN, this function computes the tensor dz = alpha[0] * (z) + beta[0] * dz

x is the output of the softmax function and dx is the derivative of our loss function wrt x (cuDNN uses them internally) Note that unlike backwards activations, we dont need a z input parameter (where z is the input to the softmax function) SUMMARY Defining data in terms of tensors

Using those tensors as arguments to cuDNNs built-in functions for both the forwards and backwards passes through a neural network You can find more details about everything we discussed in NVIDIAs official cuDNN developer guide Next week: convolutional neural nets

Recently Viewed Presentations

  • Concentration, units & dimensions Learning Objectives:  Define Environmental

    Concentration, units & dimensions Learning Objectives: Define Environmental

    Texas A&M University Other titles Arial Verdana Times New Roman Wingdings Watermark Concentration, units & dimensions Examples of Environmental Fluid Mechanics Projects Environmental Fluid Mechanics Transport in the Hydrosphere Concentrations Hydromechanics Point Pollution Sources Non-Point Pollution Sources Storm Water Runoff...
  • PCI 6th Edition Headed Concrete Anchors (HCA) Presentation

    PCI 6th Edition Headed Concrete Anchors (HCA) Presentation

    X, Y Critical Dimensions BED, SED Concrete Tension Failure Modes Design tensile strength is the minimum of the following modes: Breakout fNcb: usually the most critical failure mode Pullout fNph: function of bearing on the head of the stud Side-Face...
  • Abiotic vs. Biotic Factors - Lancaster High School

    Abiotic vs. Biotic Factors - Lancaster High School

    Abiotic vs. Biotic Factors . Another relationship in Ecosystems. What are the other relationships? What is an Ecosystem? The living and non-living components of an environment. But… What is living? What is non-living? Biotic Factors .
  • ADAPTATIONS FOR PLANT TRANSPORT - Mr Lovat Biology

    ADAPTATIONS FOR PLANT TRANSPORT - Mr Lovat Biology

    sucrose is loaded by active transport into sieve tubes using ATP. water enters the sieve tube along a ψgradient by osmosis. the pressure into the sieve tubes increases and the sucrose moves through the phloem to the sinks where the...
  • ENGAGE MOVING TARGETS 1 OVERVIEW  THREAT IDENTIFICATION  PRESENTATION

    ENGAGE MOVING TARGETS 1 OVERVIEW THREAT IDENTIFICATION PRESENTATION

    TERMINAL LEARNING OBJECTIVE. Given a service rifle/Infantry Automatic Rifle (IAR) with primary aiming device, individual field equipment, common weapon sling, magazines, ammunition and moving targets from 125 to 150 meters, eliminate 60% of exposed threats through incapacitation.
  • Psychology 2314 Chapter 15 Late Adult Psychosocial Development

    Psychology 2314 Chapter 15 Late Adult Psychosocial Development

    Even the youngest children knew that the mouse was dead, but most of them believed that it still had feelings, needs, and wishes. For children, death does not stop life. These researchers also surveyed 20 college students, 13 of whom...
  • HUMOR AND ICONICITY IN CLASSICAL MUSIC by Don

    HUMOR AND ICONICITY IN CLASSICAL MUSIC by Don

    (Nilsen & Nilsen 211) 26 * Here are some classical composers who are famous for their humorous musical compositions: 26 * JOHANN SEBASTIAN BACH In his Pulitzer-Prize-winning Godel, Escher, Bach: An Eternal Golden Braid, Douglas Hofstadter compares artist M. C....
  • Ayam Petelur Telur Gradin g PENYIMPANAN PRODUK PEMASARAN

    Ayam Petelur Telur Gradin g PENYIMPANAN PRODUK PEMASARAN

    Ayam Petelur Telur Grading PRODUK PEMASARAN WHOLE EGG PENYIMPANAN Basic Emulsion Theory Homogenous mixture of oil and water stabilized by an emulsifier Two classifications Macroemulsions- thermodynamically unstable Microemulsions- thermodynamically stable Interfacial tension Net interaction between dispersed phase Addition of van...