• 首先说明,即使使用完全相同的 seed,在不同 Pytorch 版本或不同硬件平台之间,完全的可复现性都是无法保证的,我们能做的就是尽量使实验结果在特定的硬件和软件平台上是可复现的
  • 保证特定平台上的可复现性主要包括两点:(1) Controlling sources of randomness;(2) configure PyTorch to Avoid using nondeterministic algorithms for some operations (但这可能会使得算法更慢)

Controlling sources of randomness

Random number generator

def set_random_seed(seed):
    # Random number generators in other libraries
    # If you are using any other libraries that use random number generators, refer to the documentation for those libraries to see how to set consistent seeds for them.
    np.random.seed(seed)
    # Python
    random.seed(seed)
    # PyTorch random number generator
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
set_random_seed(seed=42)

CUDA convolution benchmarking

  • CUDA convolution operations 使用的 cuDNN 库也可能是不确定行为的来源之一。当使用新的尺寸参数调用 cuDNN convolution 时,benchmark 功能会执行多种卷积算法,对它们进行基准测试来找到其中最快的算法,之后同样的尺寸参数会一直使用最快的算法。由于 benchmarking noise 和硬件平台的不同,benchmark 功能可能会选择不同的算法
  • * 关闭 benchmarking 功能可以使得 cuDNN 固定地选择某种算法,当然这可能会带来性能下降
torch.backends.cudnn.benchmark = False

Avoiding nondeterministic algorithms

  • 可以使用 torch.use_deterministic_algorithms(True)强制 PyTorch 使用 deterministic algorithms 而非 nondeterministic algorithms ,如果某个算法是 nondeterministic 的且没有 deterministic 的版本则会报错

CUDA convolution determinism

  • 禁用 CUDA convolution benchmarking 只是固定了使用的算法种类,但该算法本身也有可能是 nondeterministic 的,设置 torch.backends.cudnn.deterministic = True可以保证 convolution operation 具有确定性的行为 (这个设置只作用于 convolution operation,而前面说的 torch.use_deterministic_algorithms(True)是作用于全部 operations 的)
torch.backends.cudnn.deterministic = True

CUDA RNN and LSTM

  • In some versions of CUDA, RNNs and LSTM networks may have non-deterministic behavior. See torch.nn.RNN() and torch.nn.LSTM() for details and workarounds.

DataLoader

  • num_workers > 0需要进行如下设置:DataLoader will reseed workers following Randomness in multi-process data loading algorithm. Use worker_init_fn() and generator to preserve reproducibility (Make sure that your dataloader loads samples in the same order every call.):
def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)

g = torch.Generator()
g.manual_seed(0)

DataLoader(
    train_dataset,
    batch_size=batch_size,
    num_workers=num_workers,
    worker_init_fn=seed_worker,
    generator=g,
)

Summary

def set_random_seed(seed):
    # Random number generators in other libraries
    # If you are using any other libraries that use random number generators, refer to the documentation for those libraries to see how to set consistent seeds for them.
    np.random.seed(seed)
    # Python
    random.seed(seed)
    # PyTorch random number generator
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
set_random_seed(seed=42)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
def seed_worker(worker_id):
    worker_seed = torch.initial_seed() % 2**32
    numpy.random.seed(worker_seed)
    random.seed(worker_seed)

g = torch.Generator()
g.manual_seed(0)

DataLoader(
    train_dataset,
    batch_size=batch_size,
    num_workers=num_workers,
    worker_init_fn=seed_worker,
    generator=g,
)

References