# Choosing Random Serial Numbers for Embedded Products

Serial numbers are crucial for embedded devices that you want to individually track. They are used for uniquely identifying devices for a bunch of reasons, including tracking metadata (location, firmware version, customer, e.t.c), used when talking to a server, for warranty purposes and more. There a few ways to go about generating serial numbers. We’ll first discuss a few different methods, and then dive into how to calculate the probability of a collision when using random serial numbers.

## Non-Random Serial Numbers

If you want to assign you own (and typically write it to non-volatile memory (NVM) like EEPROM or flash) you can go with an incrementing serial number. This works well as long as you have a centralized place to store the last used serial number, and don’t have to generate serial numbers across factories at the same time. The downside is that it’s an extra step in the assembly process, and you may already need to use MAC addresses for others reasons, so now you have two serial numbers to keep track of. Of course, another option across multiple sites is to prefix a unique ID with a site ID.

## Random Serial Numbers

One option is to use randomly generated serial numbers. However, there is a chance that two serial numbers will be the same. Note some devices have “random” addresses but are guaranteed to be unique, e.g. Bluetooth LE public 48-bit MAC addresses. These are typically issued by a regulating authority.

Here are some examples of what typical random numbers used for serial numbers and/or MAC addresses look like (in hexadecimal format):

If the number is truly random, the interesting question becomes “how large do I need the random number range to be so that I have a very low chance of a collision”? This question is also highly relevant to hash functions and the “birthday problem”. It can be answered with a dose of statistics which we’ll cover below.

One way to solve this is to consider probabilities. Let $P(A)$ be the probability that that at least one serial number is the same as another. It is easier to calculate the probability that all serial numbers are unique, and then subtract this from 1 to get the probability that at least one serial number is the same as another. Let’s call the event $P(B)$ the probability that all serial numbers are unique. Then:

P(A) = 1 - P(B) \end{align*}$$Let's assume we have four 8-bit serial numbers which can take on values from 0 to 255. The first serial number can never collide with anything, so it has a probability of being unique of \dfrac{256}{256} (i.e. 1). The second serial number to be picked has a \dfrac{255}{256} chance of being unique since there is 1 number in 256 which would cause a collision. Assuming that was unique (this is called _conditional probability_), the third serial number to be picked has a \dfrac{254}{256} chance of being unique. Continuing on, the fourth serial number to be picked has a \dfrac{253}{256} chance of being unique. Thus the probability that all four serial numbers are unique is:$$\begin{align*} P(B) &= \frac{256}{256} \times \frac{255}{256} \times \frac{254}{256} \times \frac{253}{256} \\ &= 0.977 \\ \end{align*}$$Then simply 1 minus this is the probability that there is a collision:$$\begin{align*} P(A) &= 1 - P(B) \\ &= 1 - 0.977 \\ &= 0.023 \\ \end{align*}$$So there is a 2.3% chance of a collision when generating four random 8-bit serial numbers. ## Generalizing The Equation We can generalize the equation above, for use in applicable real-world scenarios of 1,000+ devices and 32 or 64-bit serial numbers. For n devices and a serial number of b bits, the probability of a collision is:$$\begin{align*} P(A) &= (1 - \frac{1}{2^b})(1 - \frac{2}{2^b}) \ldots (1 - \frac{n-1}{2^b}) \\ \end{align*}$$We can write a simple Python script to calculate this for any n and b: python def calc_p_of_collision_exact(num_of_serial_nums, num_of_bits): """ This is the exact formula for calculating the probability of a collision. However, it is very slow when num_of_serial_nums is > 100,000. :param num_of_serial_nums: The number of serial numbers. :param num_of_bits: The number of bits in the serial number (e.g. 32 for a 32-bit serial number). :return: The probability of a collision as a number in the range [0, 1]. """ probability_unique = 1.0 # You could probably use numpy here to speed things up # (remove the loop and use np.prod() instead) for i in range(1, num_of_serial_nums): probability_unique *= 1 - (i / (2 ** num_of_bits)) return 1 - probability_unique  We can plot this for a more realistic 32-bit serial number and a varying number of devices from 1 to 200,000 (200k): <Image src={import('./_assets/probability-of-collision-32-bit.png')} width="600px">The probability of a collision when generating 32-bit random serial numbers for different numbers of devices.</Image> You can see that we start running into significant issues with 32-bit random serial numbers as soon as we start manufacturing 1000's of devices. At 25,000 devices the probability of a collision is already at about 10%. It seems like we will likely need more bits. What about 64-bit serial numbers? Unfortunately, the equation above gets very computationally expensive to solve for large numbers of n (because it loops over n devices it has O(n) time complexity). And with larger bit serial numbers we want to check larger n so see how capable it is! I tried running the above code for n = 1e9 (1 billion devices) and b = 64 but it took too long to compute. Luckily, there is an approximation we can use which works rather well[^wikipedia-birthday-problem]:$$\begin{align*} P(A) &\approx 1 - e^{-\left(\dfrac{n \times (n-1)}{2 \times 2^b}\right)} \\ \end{align*}$$This approximation is quite good. Due to the lack of any loops, this has O(1) time complexity (assuming the maths operations like multiply, divide and exponent are O(1)). The other nice thing is that this can be calculated by hand. Here is a comparison of the approximation to the exact calculation for 32-bit serial numbers: <Image src={import('./_assets/probability-of-collision-32-bit-approx-vs-exact.png')} width="600px">Comparing the approximate and exact equations for calculating the collision probability for 32-bit serial numbers and a varying number of devices.</Image> Can't see two lines? That's because the approximation is good. If we were diving deep into the maths we could plot the difference between the two. But for engineering purposes, this is proof enough! We can use this approximation to calculate collision probabilities for 64 bits. Below is the probability for 64-bit numbers, up to 1 billion (1e9 devices)! You can see that the probability starts to rise above 0 at about the 1 billion mark. <Image src={import('./_assets/probability-of-collision-64-bit.png')} width="600px">The probability of a collision when generating 64-bit random serial numbers for different numbers of devices from 1 to up to 1 billion.</Image> Here is the Python code for the above equation (if you need it): python def calc_p_of_collision_approx(num_of_serial_nums, num_of_bits): """ This works better when num_of_serial_nums is > 100,000. :param num_of_serial_nums: The number of serial numbers. :param num_of_bits: The number of bits in the serial number (e.g. 32 for a 32-bit serial number). :return: The probability of a collision as a number in the range [0, 1]. """ # From https://en.wikipedia.org/wiki/Birthday_problem and # https://stackoverflow.com/questions/62664761/probability-of-hash-collision return 1 - math.exp(-((float(num_of_serial_nums)*float(num_of_serial_nums - 1)) / (2*(2 ** (float(num_of_bits) + 1)))))  If you want to be even lazier you can equate n \times (n-1) \approx n^2 and simplify the equation above even further with:$$\begin{align*} P(A) &\approx 1 - e^{-\dfrac{n^2}{2 \times 2^b}} \\ \end{align*}$$As a general rule of thumb, the probability of a collision starts rising significantly when the number of devices is about the same as the square root of the number of possible serial numbers. This is because the total number of comparisons of any two serial numbers is given by the combinations formula. The formula is[^calculator-soup-combinations-calculator]:$$\begin{align*} C(n, r) = \dfrac{n!}{r!(n-r)!} \end{align*}$$Where n is the total number of objects in the set and r is the number of items in each selection. For our use case, n is the number of serial numbers and r is 2 (since we are comparing any two serial numbers). We can simply this to:$$\begin{align*} C(n, 2) &= \dfrac{n!}{2!(n-2)!} \\ &= \dfrac{n!}{2!(n-2)!} \\ &= \dfrac{n \times (n-1)}{2} \end{align*}$$The probability of any one comparison colliding is roughly equal to \dfrac{1}{N} where N is the number of possible serial numbers. So the approximate probability of a collision is:$$\begin{align*} P(A) &\approx \frac{n^2}{N} \\ \end{align*}$$This fraction starts moving away from 0 towards 1 when n^2 = N (when the top and bottom of the fraction are in the same order of magnitude), thus when n = \sqrt{N}. <Aside type="tip"> You might be wondering how \dfrac{n!}{2!(n-2)!} simplifies to \dfrac{n \times (n-1)}{2}. There is a handy simplification (which is really useful when dealing with factorials, because most calculators don't like computing n! for n > 1000 or so) which makes the equation more manageable:$$\begin{align*} C(n,\ 2) &= \frac{n!}{2!(n-2)!} \\ &= \frac{n \times (n-1) \times (n-2)!}{2 \times (n-2)!} \\ \end{align*}$$The (n-2)! terms cancel out, leaving you with:$$\begin{align*} \frac{n \times (n-1)}{2} \end{align*}$$</Aside> ## Examples <Aside type="example"> **100,000 Devices, 32-bit or 64-bit Serial Number?** If you were planning on manufacturing 100,000 devices, and you were using a 32-bit randomly generated serial number, what would the probability of a collision be? Let's start with the approximate equation above. Our n is 100,000 and our b is 32:$$\begin{align*} P(A) &= 1 - e^{-\dfrac{n^2}{2 \times 2^b}} \\ &= 1 - e^{-\dfrac{100,000^2}{2 \times 2^{32}}} \\ &= 1 - e^{-\dfrac{1e10}{8.590e9}} \\ &= 1 - e^{-1.164} \\ &= 1 - 0.312 \\ &= 0.688 \\ \end{align*}$$So there is a 69% chance of a collision if generating 32-bit random serial numbers for 100,000 devices! Not good :-O. What if we used a 64-bit serial number instead?$$\begin{align*} P(A) &= 1 - e^{-\dfrac{n^2}{2 \times 2^b}} \\ &= 1 - e^{-\dfrac{100,000^2}{2 \times 2^{64}}} \\ &= 1 - e^{-\dfrac{1e10}{3.689e19}} \\ &= 1 - e^{-2.711e-10} \\ &= 1 - 0.99999999972894945692 \\ &= 2.711e-10 \\ \end{align*}$$This is better! There is a 1 in 3.7 billion (\dfrac{1}{2.711e-10}) chance of a collision if generating 64-bit random serial numbers for 100,000 devices. I don't know about your appetite for risk, but I would be happy with those odds! </Aside> <Aside type="example"> **The Birthday Problem** This is a classic probability problem which is essentially the same problem as what we have been discussing with serial numbers. The problem is as follows: "How many people do you need in a room before you have a 50% chance that two people share the same birthday?". The answer is a someone counter-intuitive 23! Let's double-check 23 is correct by using the same approximate equation above. The number of people in the room is 23, so that is our n. Rather than have a number of bits defining our total available options, we just have 365 days of the year to choose from (ignoring leap years). So we will replace 2^b with 365. Substituting into the general equation:$$\begin{align*} P(A) &= 1 - e^{-\dfrac{n^2 }{2 \times 2^b}} \\ &= 1 - e^{-\dfrac{23^2}{2 \times 365}} \\ &= 1 - e^{-\dfrac{529}{730}} \\ &= 1 - e^{-0.725} \\ &= 1 - 0.484 \\ &= 0.516 \\ \end{align*} Which gives us approx. a 50% chance of collision, as expected. </Aside> [^calculator-soup-combinations-calculator]: Calculator Soup. _Combinations Calculator (nCr)_. Retrieved 2024-06-02, from https://www.calculatorsoup.com/calculators/discretemathematics/combinations.php. [^wikipedia-german-tank-problem]: Wikipedia (2024, Jan 4). _German tank problem_. Retrieved 2024-06-03, from https://en.wikipedia.org/wiki/German_tank_problem. [^wikipedia-birthday-problem]: Wikipedia (2024, Jun 2). _Birthday problem_. Retrieved 2024-06-04, from https://en.wikipedia.org/wiki/Birthday_problem.