Pytorch 的可复现性

连理o

发布时间 2024.03.15阅读数 283 评论数 0

首先说明，即使使用完全相同的 seed，在不同 Pytorch 版本或不同硬件平台之间，完全的可复现性都是无法保证的，我们能做的就是尽量使实验结果在特定的硬件和软件平台上是可复现的
保证特定平台上的可复现性主要包括两点：(1) Controlling sources of randomness；(2) configure PyTorch to Avoid using nondeterministic algorithms for some operations (但这可能会使得算法更慢)

Controlling sources of randomness

Random number generator

def set_random_seed(seed):
    # Random number generators in other libraries
    # If you are using any other libraries that use random number generators, refer to the documentation for those libraries to see how to set consistent seeds for them.
    np.random.seed(seed)
    # Python
    random.seed(seed)
    # PyTorch random number generator
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)

set_random_seed(seed=42)

CUDA convolution benchmarking

CUDA convolution operations 使用的 cuDNN 库也可能是不确定行为的来源之一。当使用新的尺寸参数调用 cuDNN convolution 时，benchmark 功能会执行多种卷积算法，对它们进行基准测试来找到其中最快的算法，之后同样的尺寸参数会一直使用最快的算法。由于 benchmarking noise 和硬件平台的不同，benchmark 功能可能会选择不同的算法
* 关闭 benchmarking 功能可以使得 cuDNN 固定地选择某种算法，当然这可能会带来性能下降

torch.backends.cudnn.benchmark = False

Avoiding nondeterministic algorithms

可以使用 torch.use_deterministic_algorithms(True) 来强制 PyTorch 使用 deterministic algorithms 而非 nondeterministic algorithms ，如果某个算法是 nondeterministic 的且没有 deterministic 的版本则会报错

CUDA convolution determinism

禁用 CUDA convolution benchmarking 只是固定了使用的算法种类，但该算法本身也有可能是 nondeterministic 的，设置 torch.backends.cudnn.deterministic = True可以保证 convolution operation 具有确定性的行为 (这个设置只作用于 convolution operation，而前面说的 torch.use_deterministic_algorithms(True)是作用于全部 operations 的)

torch.backends.cudnn.deterministic = True

CUDA RNN and LSTM

In some versions of CUDA, RNNs and LSTM networks may have non-deterministic behavior. See torch.nn.RNN() and torch.nn.LSTM() for details and workarounds.

DataLoader

当 num_workers > 0 时需要进行如下设置：DataLoader will reseed workers following Randomness in multi-process data loading algorithm. Use worker_init_fn() and generator to preserve reproducibility (Make sure that your dataloader loads samples in the same order every call.):

def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)

g = torch.Generator()
g.manual_seed(0)

DataLoader(
    train_dataset,
    batch_size=batch_size,
    num_workers=num_workers,
    worker_init_fn=seed_worker,
    generator=g,
)

Summary

def set_random_seed(seed):
    # Random number generators in other libraries
    # If you are using any other libraries that use random number generators, refer to the documentation for those libraries to see how to set consistent seeds for them.
    np.random.seed(seed)
    # Python
    random.seed(seed)
    # PyTorch random number generator
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)

set_random_seed(seed=42)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    numpy.random.seed(worker_seed)
    random.seed(worker_seed)

g = torch.Generator()
g.manual_seed(0)

DataLoader(
    train_dataset,
    batch_size=batch_size,
    num_workers=num_workers,
    worker_init_fn=seed_worker,
    generator=g,
)

References

Python CUDA Pytorch

转载原出处：

打赏 0

上一篇：生成扩散模型 (Diffusion Models)

下一篇：PyTorch 测量代码段的运行时间

Pytorch 的可复现性

连理o

Controlling sources of randomness

Random number generator

CUDA convolution benchmarking

Avoiding nondeterministic algorithms

CUDA convolution determinism

CUDA RNN and LSTM

DataLoader

Summary

References

为你推荐

构建关系抽取的动词源

如何使用ChatGPT的API(四)思维链推理

卡尔曼滤波

精选【Go语言成长之路】安装Go

精选深蓝学院-视觉SLAM十四讲-第一章作业

知识图谱基本工具Neo4j使用笔记 三 ：Py2neo 基本增删改查使用

评论（0）

关于作者

连理o

27

0

150

2

PyTorch (三): 常见的网络层

傅里叶级数、傅里叶变换 (FT)

精选重参数 (Reparameterization)

相关推荐

Makefile的基础知识，看这篇就够了！

ROS入门学习笔记（五）-----launch启动文件的使用方法、常用可视化工具的使用方法

自动驾驶 - 概述

C++之观察者(Event-Subscriber)模式

3d稀疏卷积——spconv源码剖析（四）

精选内核双链表篇:list.h——遍历链表：list_for_each、list_for_each_safe、list_for_each_entry

热门泡泡

30积分 失眠，聊聊自己搞ROS的心得体会吧

ros学习路线

30积分 TF_REPEATED_DATA ignoring data错误

各位大佬，有什么ROS定位算法推荐吗

5积分 想买能用ROS2的开发套件。或者开发板

5积分 ros中启动gazebo时报错

给作者打赏

忘记密码

修改头像

添加你感兴趣的标签

举报类型（必选）

举报详情（选填）

知识图谱基本工具Neo4j使用笔记三：Py2neo 基本增删改查使用

30积分失眠，聊聊自己搞ROS的心得体会吧

5积分想买能用ROS2的开发套件。或者开发板