损失函数(loss)¶
损失函数(loss function): 用于定义单个训练样本预测值与真实值之间的误差
用来估量你模型的预测值f(x)与真实值Y的不一致程度,它是一个非负实值函数,通常使用L(Y, f(x))来表示,损失函数越小,模型的鲁棒性就越好。
损失函数是经验风险函数的核心部分,也是结构风险函数重要组成部分。
模型的结构风险函数包括了经验风险项和正则项,即最优化经验风险和结构风险,而这个函数就被称为目标函数
1.问题 鸢尾花分为:狗尾草鸢尾、杂色鸢尾、弗吉尼亚鸢尾。
通过测量:花萼长、花萼宽、花瓣长、花瓣宽,这四个参数得出鸢尾花的类别。
2.问题解决思路
if 语句 case 语句 ——专家系统:把经验告诉计算机,计算机执行逻辑判别,给出分类 神经网络:采集大量(花萼长、花萼宽、花瓣长、花瓣宽、对应的类别)数据对构成数据集。其中(花萼长、花萼宽、花瓣长、花瓣宽)称为输入特征;(对应的类别)称为标签,需要人工标定。 把数据集喂入神经网络结构->网络优化参数得到模型->模型读入新输入特征->输出识别结果。
3.神经网络理论准备
输出y=x*w+b ,即所有的输入x乘以各自线上的权重w求和加上偏置项b得到输出y。其中,输入特征x形状应为(1,4)即1行4列,输出y形状应为(1,3)即1行3列,w形状应为(4,3)即4行3列,b形状应为(3, )即有3个偏置项。
这种x和y每每相连的,称之为全连接网络。
线上的权重w和偏置b会被随机初始化为一些随机值。
前向传播:喂入一组特征值->根据线性y=x*w+b,利用随机初始化的w和b,算出输出y。
损失函数(loss function):预测值(y)和标准答案(y_)的差距。损失函数可以定量判断w和b参数选择的优劣。均方误差是一种比较常见的损失函数。
均方误差
引入损失函数的目的是寻找一组参数w和b使得损失函数最小。为达成这一目的,我们采用梯度下降的方法。损失函数的梯度表示损失函数对各参数求偏导后的向量,损失函数梯度下降的方向,就是是损失函数减小的方向。梯度下降法即沿着损失函数梯度下降的方向,寻找损失函数的最小值,从而得到最优的参数。梯度下降法涉及的公式如下:
其中,lr表示学习率,是一个超参数,表征梯度下降的速度。如学习率设置过小,参数更新会很慢,如果学习率设置过大,参数更新可能会跳过最小值。
梯度下降更新的过程为反向传播,下面通过例子感受反向传播。利用如下公式对参数w进行更新。
# 设置环境变量:TF_DISABLE_ONEDNN_OPTS=0
import os
import tensorflow as tf
2024-03-29 22:03:09.474524: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-03-29 22:03:09.477157: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-03-29 22:03:09.541118: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. 2024-03-29 22:03:09.542369: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-29 22:03:10.451777: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
w = tf.Variable(tf.constant(5, dtype=tf.float32))
lr = 0.2 #学习率
epoch = 40
#for epoch 定义顶层循环,表示对数据集循环epoch次, 此例数据集数据仅有1个w,初始化时候constant赋值为5,循环迭代40次
for each in range(epoch):
# with结构到grads框起了梯度的计算过程
with tf.GradientTape() as type:
loss = tf.square(w + 1)
# .gradient函数告知谁对谁求导
grads = type.gradient(loss, w)
# .assign_sub 对变量做自减 即:w -= lr*grads 即 w = w - lr*grads
w.assign_sub(lr * grads)
print("After %s epoch, w is %f, loss is %s" % (each+1, w.numpy(), loss))
# lr初始值:0.2
# 最终目的:找到loss最小, 即w = -1的最优参数w
After 1 epoch, w is 2.600000, loss is tf.Tensor(36.0, shape=(), dtype=float32) After 2 epoch, w is 1.160000, loss is tf.Tensor(12.959999, shape=(), dtype=float32) After 3 epoch, w is 0.296000, loss is tf.Tensor(4.6655993, shape=(), dtype=float32) After 4 epoch, w is -0.222400, loss is tf.Tensor(1.679616, shape=(), dtype=float32) After 5 epoch, w is -0.533440, loss is tf.Tensor(0.60466176, shape=(), dtype=float32) After 6 epoch, w is -0.720064, loss is tf.Tensor(0.21767823, shape=(), dtype=float32) After 7 epoch, w is -0.832038, loss is tf.Tensor(0.07836417, shape=(), dtype=float32) After 8 epoch, w is -0.899223, loss is tf.Tensor(0.028211098, shape=(), dtype=float32) After 9 epoch, w is -0.939534, loss is tf.Tensor(0.010155998, shape=(), dtype=float32) After 10 epoch, w is -0.963720, loss is tf.Tensor(0.0036561578, shape=(), dtype=float32) After 11 epoch, w is -0.978232, loss is tf.Tensor(0.001316215, shape=(), dtype=float32) After 12 epoch, w is -0.986939, loss is tf.Tensor(0.0004738369, shape=(), dtype=float32) After 13 epoch, w is -0.992164, loss is tf.Tensor(0.0001705816, shape=(), dtype=float32) After 14 epoch, w is -0.995298, loss is tf.Tensor(6.140919e-05, shape=(), dtype=float32) After 15 epoch, w is -0.997179, loss is tf.Tensor(2.210742e-05, shape=(), dtype=float32) After 16 epoch, w is -0.998307, loss is tf.Tensor(7.958537e-06, shape=(), dtype=float32) After 17 epoch, w is -0.998984, loss is tf.Tensor(2.8650732e-06, shape=(), dtype=float32) After 18 epoch, w is -0.999391, loss is tf.Tensor(1.0314506e-06, shape=(), dtype=float32) After 19 epoch, w is -0.999634, loss is tf.Tensor(3.7129314e-07, shape=(), dtype=float32) After 20 epoch, w is -0.999781, loss is tf.Tensor(1.3367425e-07, shape=(), dtype=float32) After 21 epoch, w is -0.999868, loss is tf.Tensor(4.811227e-08, shape=(), dtype=float32) After 22 epoch, w is -0.999921, loss is tf.Tensor(1.7320417e-08, shape=(), dtype=float32) After 23 epoch, w is -0.999953, loss is tf.Tensor(6.237233e-09, shape=(), dtype=float32) After 24 epoch, w is -0.999972, loss is tf.Tensor(2.2454039e-09, shape=(), dtype=float32) After 25 epoch, w is -0.999983, loss is tf.Tensor(8.083454e-10, shape=(), dtype=float32) After 26 epoch, w is -0.999990, loss is tf.Tensor(2.9059777e-10, shape=(), dtype=float32) After 27 epoch, w is -0.999994, loss is tf.Tensor(1.0510348e-10, shape=(), dtype=float32) After 28 epoch, w is -0.999996, loss is tf.Tensor(3.769074e-11, shape=(), dtype=float32) After 29 epoch, w is -0.999998, loss is tf.Tensor(1.3656631e-11, shape=(), dtype=float32) After 30 epoch, w is -0.999999, loss is tf.Tensor(4.863665e-12, shape=(), dtype=float32) After 31 epoch, w is -0.999999, loss is tf.Tensor(1.7195134e-12, shape=(), dtype=float32) After 32 epoch, w is -1.000000, loss is tf.Tensor(6.004086e-13, shape=(), dtype=float32) After 33 epoch, w is -1.000000, loss is tf.Tensor(2.2737368e-13, shape=(), dtype=float32) After 34 epoch, w is -1.000000, loss is tf.Tensor(8.881784e-14, shape=(), dtype=float32) After 35 epoch, w is -1.000000, loss is tf.Tensor(3.1974423e-14, shape=(), dtype=float32) After 36 epoch, w is -1.000000, loss is tf.Tensor(1.4210855e-14, shape=(), dtype=float32) After 37 epoch, w is -1.000000, loss is tf.Tensor(3.5527137e-15, shape=(), dtype=float32) After 38 epoch, w is -1.000000, loss is tf.Tensor(3.5527137e-15, shape=(), dtype=float32) After 39 epoch, w is -1.000000, loss is tf.Tensor(3.5527137e-15, shape=(), dtype=float32) After 40 epoch, w is -1.000000, loss is tf.Tensor(3.5527137e-15, shape=(), dtype=float32)
TensorFlow中的Tensor表示张量,是多维数组、多维列表,用阶表示张量的维数。0 阶张量叫做标量,表示的是一个单独的数,如123;1阶张量叫作向量,表示的是一个一维数组如[1,2,3];2 阶张量叫作矩阵,表示的是一个二维数组,它可以有i行j列个元素,每个元素用它的行号和列号共同索引到,如在[[1,2,3],[4,5,6],[7,8,9]]中,2 的索引即为第0行第1列。张量的阶数与方括号的数量相同,0个方括号即为0阶张量,1个方括号即为1阶张量。故张量可以表示阶到n阶的数组。
TensorFlow 中数据类型包括 32 位整型(tf.int32)、32 位浮点(tf.float32)、64 位浮点(tf.float64)、布尔型(tf.bool)、字符串型(tf.string)。
创建张量的方法:
(1)tf.constant(张量内容,dtype=数据类型(可选)),第一个参数表示张量内容,第二个参数表示张量的数据类型。
import tensorflow as tf
a = tf.constant([1,5],dtype=tf.int64) #创建1阶张量[1,5],指定数据类型为64位整型
print(a) #打印出a
print(a.dtype) #打印出a的数据类型
print(a.shape) #打印出a的形状
(2)tf.convert_to_tensor(数据名,dtype=数据类型(可选)):将 numpy 格式化为Tensor格式。
import tensorflow as tf
import numpy as np
a = np.arange(0, 5)
b = tf.convert_to_tensor(a, dtype=tf.int64)
print("a:", a)
print("b:", b)
3)tf. zeros(维度)创建全为 0 的张量,tf.ones(维度)创建全为 1 的张量,tf. fill(维度,指定值)创建全为指定值的张量。其中维度参数部分,如一维则直接写个数,二维用[行,列]表示,多维用[n,m,j..]表示。
import tensorflow as tf
a = tf.zeros([2,3])
b = tf.ones(4)
c = tf.fill([2,2],9)
print(a)
print(b)
print(c)
(4)随机生成的初始化参数,要符合正态分布:
- 生成正态分布的随机数,默认均值为0,标准差为1:tf.random.normal(维度,mean=均值,stddev=标准差)
- 生成截断式正态分布的随机数:tf.random_truncated_normal(维度,mean=均值,stddev=标准差)
import tensorflow as tf
d = tf.random.normal([2, 2], mean=0.5, stddev=1)
print("d:", d)
e = tf.random.truncated_normal([2, 2], mean=0.5, stddev=1)
print("e:", e)
(5)生成均匀分布的随机数:tf.random.uniform(维度, minval=最小值, maxval=最大值):
import tensorflow as tf
f = tf.random.uniform([2, 2], minval=0, maxval=1)
print("f:", f)