We need random numbers for many applications. Especially for physics simulations or machine learning (ML) model training, we are interested in a reproducible stream of random numbers—I’m not talking about cryptographic applications here. For simulations or model training, we are primarily interested in the reproducibility of the task. This can be achieved by seeding a random number generator. For example, in Python,
>>> import random
>>> random.seed(42)
>>> random.random()
0.6394267984578837
>>> random.random()
0.025010755222666936
>>> random.seed(42)
>>> random.random()
0.6394267984578837
>>> random.random()
0.025010755222666936
the sequence of random numbers after seeding the generator with the same number is identical each time.
But what to use as a seed?
What should we use as a seed value? Well, I hear you scream 42, or 1234, or 1337. Fair enough. Over the years working as a physicist, I’ve encountered a lot of code simulating some part of particle physics detectors (the initial particle reaction, the subsequent decay of short-lived particles, the interaction of particles and the detector material, …). However, the number of different random seeds used in all these independent pieces of software was shockingly small. Everyone’s favorite random seed seemed to be 1234.
Why is it bad?
What’s so bad about using the same seed in many places? Well, let’s consider a random walk, for example, actually a Markov chain Monte Carlo process, to find an optimal configuration in a large parameter space. You might change the landscape of your parameter space slightly from optimization to optimization, but with the same initial parameter, you’ll always get a very similar random walk. You will end up with a significant bias towards that particular random walk encoded in your initial random seed.
The goal of a random number sequence was to get random numbers. By always using the same random seed, we are reducing the random sequence to a mere sequence of numbers. You could as well cycle through a fixed set of numbers.
What to use instead
Whenever we want a random sequence of numbers, we should use a unique random seed. If, and only if, we want the possibility to rerun the code with the exact same result, we should reuse a random seed.
Does this mean we should use a random random seed? This would be ideal, but that’s not the most practical solution. In principle we don’t need random random seeds, all we need is unique random seeds.
The strategy I consider both practical and providing unique-enough random seeds is to use the current date. If you and your colleagues need more the one random sequence per day (i.e. you need more entropy), append the current time when you write the seed. For example, when you need a random sequence on May 24, 2023, seed the generator with
random.seed(20230524)
And there you go.
>>> random.random()
0.8977306476860809
>>> random.random()
0.895353741835529
>>> random.random()
0.47582576680789956
...
This might also interest you