滁州市网站建设_网站建设公司_小程序网站_seo优化-台湾省网站建设公司

活函数，属于神经网络中的概念。

激活函数，就像神经元的开关，决定了输入信号能否被传递，以及以什么形式传递。

为应对不同的场景，激活函数不断发展出了各种实现。它们存在的意义，就是为信号传递赋予不同种类的“非线性”特征，从而让神经网络能够表达更为丰富的含义。

本文旨在梳理常见的 40 多种激活函数（也包含少量经典的输出层函数）。

说明

本文将简要介绍激活函数的概念和使用场景，并列出其数学公式，然后基于Python进行可视化实现。最后一节则以表格的形式，从多个维度对比了其中最为经典的 20 多个激活函数，以期为读者提供选型参考。

本文所有代码实现均基于Jupyter NoteBook，感兴趣的读者可以后台留言获取完整ipynb文件。

为使得各激活函数的代码实现更为简洁，首先做一些初始化操作，如导入对应Python库、定义对应的绘图函数等，如下：

# -*- coding: utf-8 -*-

# 导入必要的库

import numpy as np

import matplotlib.pyplot as plt

from scipy.special import expit as sigmoid # scipy 的 sigmoid

import warnings

warnings.filterwarnings("ignore")

# 设置中文字体和图形样式

plt.rcParams['font.sans-serif'] = ['SimHei', 'Arial Unicode MS', 'DejaVu Sans']

plt.rcParams['axes.unicode_minus'] = False

plt.style.use('seaborn-v0_8') # 使用美观样式

# 定义输入范围

x = np.linspace(-10, 10, 1000)

# 定义画图函数(单张图)

def plot_activation(func, grad_func, name):

y = func(x)

dy = grad_func(x)

plt.figure(figsize=(8, 5))

plt.plot(x, y, label=name, linewidth=1.5)

plt.plot(x, dy, label=f"{name}'s derivative", linestyle='--', linewidth=1.5)

plt.title(f'{name} Function and Its Derivative')

plt.legend()

plt.grid(True)

plt.axhline(0, color='black', linewidth=0.5)

plt.axvline(0, color='black', linewidth=0.5)

plt.show()

# 定义画图函数（多张图，用于对比不同参数的效果）

def plot_activations(functions, x):

plt.figure(figsize=(10, 7))

for func, grad_func, name in functions:

y = func(x)

dy = grad_func(x)

plt.plot(x, y, label=name, linewidth=1.5)

plt.plot(x, dy, label=f"{name}'s derivative", linestyle='--', linewidth=1.5)

plt.title('Activation Functions and Their Derivatives')

plt.legend()

plt.grid(True)

plt.axhline(0, color='black', linewidth=0.5)

plt.axvline(0, color='black', linewidth=0.5)

plt.show()

接下来，让我们开始吧！

经典激活函数

Sigmoid

适用于二分类问题的输出层，将输出压缩到 (0,1) 区间表示概率。不推荐用于隐藏层，因易导致梯度消失。

公式

实现

def sigmoid(x):

return 1 / (1 + np.exp(-x))

def sigmoid_grad(x):

s = sigmoid(x)

return s * (1 - s)

plot_activation(sigmoid, sigmoid_grad, 'Sigmoid')

图像

image

Tanh（双曲正切）

Tanh 输出零中心化，使梯度更新方向更均衡，收敛更快，是一种比 Sigmoid 更优的激活函数，适合隐藏层使用，尤其在 RNN 中仍有应用。但它仍可能梯度消失。

公式

实现

def tanh(x):

return np.tanh(x)

def tanh_grad(x):

return 1 - np.tanh(x)**2

plot_activation(tanh, tanh_grad, 'Tanh')

图像

image

Linear

主要用于回归任务的输出层，保持输出为原始实数，不进行非线性变换。

不适合用在隐藏层（否则整个网络等价于单层线性模型，无法学习非线性特征）。

在某些特定模型（如自编码器的中间层或策略网络）中也可能使用。

公式

实现

def linear(x):

return x

def linear_grad(x):

return np.ones_like(x)

plot_activation(linear, linear_grad, 'Linear')

图像

Image

Softmax

多分类问题的输出层标准激活函数，将输出转化为概率分布。不用于隐藏层。

公式

实现

from mpl_toolkits.mplot3d import Axes3D

def softmax(x):

exp_x = np.exp(x - np.max(x, axis=0, keepdims=True)) # 数值稳定

return exp_x / np.sum(exp_x, axis=0, keepdims=True)

def softmax_grad(x):

s = softmax(x).reshape(-1, 1)

return np.diagflat(s) - np.dot(s, s.T) # Jacobian矩阵

# 生成输入数据（二维，便于可视化）

x = np.linspace(-10, 10, 100)

y = np.linspace(-10, 10, 100)

X, Y = np.meshgrid(x, y)

inputs = np.vstack([X.ravel(), Y.ravel()]).T

# 计算Softmax输出（取第一个维度作为输出值，因为Softmax输出是概率分布）

outputs = np.array([softmax(p)[0] for p in inputs]).reshape(X.shape)

# 计算梯度（取Jacobian矩阵的第一个对角线元素）

gradients = np.array([softmax_grad(p)[0, 0] for p in inputs]).reshape(X.shape)

# 绘制Softmax函数

fig = plt.figure(figsize=(12, 5))

# 1. Softmax函数图像

ax1 = fig.add_subplot(121, projection='3d')

ax1.plot_surface(X, Y, outputs, cmap='viridis', alpha=0.8)

ax1.set_title('Softmax (First Output Dimension)')

ax1.set_xlabel('x1')

ax1.set_ylabel('x2')

ax1.set_zlabel('P(x1)')

# 2. Softmax梯度图像

ax2 = fig.add_subplot(122, projection='3d')

ax2.plot_surface(X, Y, gradients, cmap='plasma', alpha=0.8)

ax2.set_title('Gradient of Softmax (∂P(x1)/∂x1)')

ax2.set_xlabel('x1')

ax2.set_ylabel('x2')

ax2.set_zlabel('Gradient')

plt.tight_layout()

plt.show()

图像

Image

ReLU 函数及其变体

ReLU（Rectified Linear Unit）

中文名称是线性整流函数，是在神经网络中常用的激活函数。通常意义下，其指代数学中的斜坡函数。

公式

实现

def relu(x):

return np.maximum(0, x)

def relu_grad(x):

return (x > 0).astype(float)

plot_activation(relu, relu_grad, 'RelU')

图像

Image

ReLU6

ReLU6 是 ReLU 的有界版本，输出限制在 [0, 6] 区间。

主要用于移动端和轻量级网络（如 MobileNet、EfficientNet 的早期版本），其有界性有助于提升低精度推理（如量化）时的稳定性。

也常见于强化学习（如 DQN）中，用于限制输出范围，防止训练波动。

公式

或：

实现

def relu6(x):

return np.minimum(np.maximum(0, x), 6)

def relu6_grad(x):

dx = np.zeros_like(x)

dx[(x > 0) & (x < 6)] = 1

return dx

滁州市网站建设_网站建设公司_小程序网站_seo优化

热门文章

文章分类

标签云

需要专业的网站建设服务？

滁州市网站建设_网站建设公司_小程序网站_seo优化

热门文章

文章分类

标签云

相关文章

树莓派运行 DeepSeek 大模型实战：轻量化模型选型与内存占用控制精要

工业边缘节点应用：DeepSeek处理实时产线数据的低功耗配置方案

[解决方案] 回顾一下业务中的网络技术演化

需要专业的网站建设服务？