[Math Talk #6] Benford's Law (Is it Always True?) [Part 1]

View this thread on: d.buzz | hive.blog | peakd.com | ecency.com
·@mathsolver·
0.000 HBD
[Math Talk #6] Benford's Law (Is it Always True?) [Part 1]
<center> <img src = "https://i.imgsafe.org/ab/ab7925dc24.gif"/> </center>

[1]

# Benford's Law (Part 1, Is it True?)

## 1. What is Benford's Law?

Benford's law, also called Newcomb-Benford's law, law of anomalous numbers, and first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. While Newcomb was inspecting numerous data's including logarithms, he noticed that people were looking up more numbers whose logarithms started with the digit 1. 

Inspired by this inspection, in 1938, American Mathematician [Frank Benford](https://en.wikipedia.org/wiki/Frank_Benford) [2] stated in his paper [<___The Law of Anomalous numbers___>](https://www.scribd.com/document/209534421/The-Law-of-Anomalous-Numbers) [3]  what is called Benford sequence. 

---
#### Benford Sequence.
A sequence of positive numbers <img src="http://latex.codecogs.com/gif.latex?\left\{&space;x_n&space;\right\}" title="\left\{ x_n \right\}" align = "center"/> is said to be __Benford of base <img src="http://latex.codecogs.com/gif.latex?b" title="b" />__ if the probability of observing the first digit of <img src="http://latex.codecogs.com/gif.latex?x_n" title="x_n" align = "center"/> in base <img src="http://latex.codecogs.com/gif.latex?b" title="b" /> is <img src="http://latex.codecogs.com/gif.latex?j" title="j" align = "center"/> is 

<center> <img src="http://latex.codecogs.com/gif.latex?\log_b&space;\left(&space;1&space;&plus;&space;\frac{1}{j}&space;\right&space;)" title="\log_b \left( 1 + \frac{1}{j} \right )" /> </center>

More precisely, we would have 

<center> <img src="http://latex.codecogs.com/gif.latex?\lim_{N&space;\rightarrow&space;\infty}&space;\frac{\&hash;&space;\left\{&space;x_n:&space;n&space;\leq&space;N,&space;\text{&space;first&space;digit&space;of&space;}x_n&space;\text{&space;is&space;}j&space;\right\}}{N}&space;=&space;\log_b&space;\left(&space;1&space;&plus;&space;\frac{1}{j}&space;\right)" title="\lim_{N \rightarrow \infty} \frac{\# \left\{ x_n: n \leq N, \text{ first digit of }x_n \text{ is }j \right\}}{N} = \log_b \left( 1 + \frac{1}{j} \right)" /> </center>

where <img src="http://latex.codecogs.com/gif.latex?j&space;\in&space;\left\{1,&space;2,&space;...,&space;b-1&space;\right\}" title="j \in \left\{0, 1, 2, ..., b-1 \right\}" align = "center"/>

---

The limit implies that the ratio between total number <img src="http://latex.codecogs.com/gif.latex?N" title="N" /> and the number of <img src="http://latex.codecogs.com/gif.latex?x_i" title="x_j" align = "center"/> s having first digit <img src="http://latex.codecogs.com/gif.latex?j" title="j" align = "center"/> where <img src="http://latex.codecogs.com/gif.latex?i" title="i" /> runs over <img src="http://latex.codecogs.com/gif.latex?1" title="1" /> through <img src="http://latex.codecogs.com/gif.latex?N" title="N" /> should converge to <img src="http://latex.codecogs.com/gif.latex?\log_b(1&space;&plus;&space;1/j)" title="\log_b(1 + 1/j)" align = "center"/> as <img src="http://latex.codecogs.com/gif.latex?N&space;\rightarrow&space;\infty" title="N \rightarrow \infty" /> . 

## 2. Where does it Come From?

### 2-1. Intuitive Approach

Many people would think that the probability that any positive number in base  <img src="http://latex.codecogs.com/gif.latex?b" title="b" /> having first digit <img src="http://latex.codecogs.com/gif.latex?j" title="j" align = "center"/> is just 

<center> <img src="http://latex.codecogs.com/gif.latex?\frac{1}{b-1}" title="\frac{1}{b-1}" /> </center>

Let's just examine what it means without any reasoning. It means that we expect equal probability of having each digit <img src="http://latex.codecogs.com/gif.latex?j&space;=&space;1,&space;2,&space;...,&space;b&space;-1" title="j = 1, 2, ..., b -1" align = "center"/> , so that the uniform probability mass function is

<center> <img src="http://latex.codecogs.com/gif.latex?f(j)&space;=&space;\frac{1}{b&space;-&space;1}" title="f(j) = \frac{1}{b - 1}" /> </center>

__Is this true...?__ Well, first let's do some computer experiment.

### 2-2. A Simple Computer Experiment

Take the sequence to be just enumeration of positive integers, <img src="http://latex.codecogs.com/gif.latex?x_i&space;=&space;i" title="x_n = n" align = "center"/> . Define a function

<center> <img src="http://latex.codecogs.com/gif.latex?f_n:&space;\mathbb{N}&space;\rightarrow&space;\mathbb{N},\&space;f_n(N)" title="f_n: \mathbb{N} \rightarrow \mathbb{N},\ f_n(N)" /> </center>

such that 

<center> <img src="http://latex.codecogs.com/gif.latex?f_n(N)&space;=&space;\frac{\&hash;&space;\left\{&space;1\leq&space;j&space;\leq&space;N:&space;\text{&space;first&space;digit&space;of&space;}x_j&space;\text{&space;is&space;}&space;n&space;\right\}}{N}" title="f_n(N) = \# \left\{ 1\leq j \leq N: \text{ first digit of }j \text{ is } n \right\}" /> </center>

Here is the plot of  <img src="http://latex.codecogs.com/gif.latex?f_n(N)" title="f_n(N)" align = "center"/>  when base <img src="http://latex.codecogs.com/gif.latex?b&space;=&space;10" title="b = 10" align = "center"/> (decimal expression) up to <img src="http://latex.codecogs.com/gif.latex?N&space;=&space;10000&space;=&space;10^4" title="N = 10000 = 10^4" /> . 

<center> <img src = "https://i.imgsafe.org/aa/aa68a8d179.png"/> </center>

where 

1. Blue Line corresponds to n = 1

2. Orange Line corresponds to n = 4

3. Green Line corresponds to n = 7

4. Red Line corresponds to n = 9

### 2-3. Analyzing the plot - [4]
The flucutation for each specific <img src="http://latex.codecogs.com/gif.latex?n" title="n" />, is due to the numbers of the form 

<center> <img src="http://latex.codecogs.com/gif.latex?nxxxx" title="nxxxx" /> to <img src="http://latex.codecogs.com/gif.latex?(n&plus;1)xxxx" title="(n+1)xxxx" align = "center"/> </center>

all such numbers are counted, which skyrockets the graph. For example, if <img src="http://latex.codecogs.com/gif.latex?n&space;=&space;1" title="n = 1" />, then numbers of the form 

<center> <img src="http://latex.codecogs.com/gif.latex?1000&space;\leq&space;X&space;\leq&space;1999" title="1000 \leq X \leq 1999" /> </center>

will be added for numerator of <img src="http://latex.codecogs.com/gif.latex?f_1(N)" title="f_1(N)" align = "center"/> .

---

The local maxima and minima of specific graph <img src="http://latex.codecogs.com/gif.latex?f_n(N)" title="f_n(N)" align = "center"/> are special. By direct calculation, <img src="http://latex.codecogs.com/gif.latex?k" title="k" />-th local minima occurs on the number of the form 

<center> <img src="http://latex.codecogs.com/gif.latex?(n-1)\underbrace{99...99}_k" title="(n-1)\underbrace{99...99}_k" /> </center>

and maxima occurs on the number of the form 

<center> <img src="http://latex.codecogs.com/gif.latex?n\underbrace{99...99}_k" title="n\underbrace{99...99}_k" /> </center>

. By direct calculation, minimum and maximum would be 

<center> <img src="http://latex.codecogs.com/gif.latex?f_{\min,&space;n}(k)&space;=&space;\frac{1&space;\cdot&space;10^{k-1}&space;&plus;&space;1\cdot&space;10^{k-2}&space;&plus;&space;...&plus;1}{(n-1)\cdot&space;10^k&space;&plus;&space;9&space;\cdot&space;10^{k-1}&space;&plus;&space;...&space;&plus;&space;9&space;\cdot&space;10&space;&plus;&space;9}&space;=&space;\frac{10^k&space;-&space;1}{9(n&space;\cdot&space;10^k&space;-&space;1)}" title="f_{\min, n}(k) = \frac{1 \cdot 10^{k-1} + 1\cdot 10^{k-2} + ...+1}{(n-1)\cdot 10^k + 9 \cdot 10^{k-1} + ... + 9 \cdot 10 + 9} = \frac{10^k - 1}{9(n \cdot 10^k - 1)}" /> </center>

and 

<center> <img src="http://latex.codecogs.com/gif.latex?f_{\max,&space;n}(k)&space;=&space;\frac{1&space;\cdot&space;10^{k}&space;&plus;&space;1\cdot&space;10^{k-}&space;&plus;&space;...&plus;1}{n\cdot&space;10^k&space;&plus;&space;9&space;\cdot&space;10^{k-1}&space;&plus;&space;...&space;&plus;&space;9&space;\cdot&space;10&space;&plus;&space;9}&space;=&space;\frac{10^{k&plus;1}&space;-&space;1}{9((n&plus;1)&space;\cdot&space;10^k&space;-&space;1)}" title="f_{\max, n}(k) = \frac{1 \cdot 10^{k} + 1\cdot 10^{k-} + ...+1}{n\cdot 10^k + 9 \cdot 10^{k-1} + ... + 9 \cdot 10 + 9} = \frac{10^{k+1} - 1}{9((n+1) \cdot 10^k - 1)}" /></center> 

Letting <img src="http://latex.codecogs.com/gif.latex?k&space;\rightarrow&space;\infty" title="k \rightarrow \infty" /> implies <img src="http://latex.codecogs.com/gif.latex?N&space;\rightarrow&space;\infty" title="N \rightarrow \infty" />, so that 

<center> <img src="http://latex.codecogs.com/gif.latex?f_{\min,&space;n}&space;=&space;\lim_{k&space;\rightarrow&space;\infty}&space;f_{\min,&space;n}(k)&space;=&space;\frac{1}{9n}" title="f_{\min, n} = \lim_{k \rightarrow \infty} f_{\min, n}(k) = \frac{1}{9n}" /> </center>

and 

<center> <img src="http://latex.codecogs.com/gif.latex?f_{\max,&space;n}&space;=&space;\lim_{k&space;\rightarrow&space;\infty}&space;f_{\max,&space;n}(k)&space;=&space;\frac{10}{9(n&plus;1)}" title="f_{\max, n} = \lim_{k \rightarrow \infty} f_{\max, n}(k) = \frac{10}{9(n+1)}" /> </center>

You can see that since the limit of maxima and minima does not match, the limit

<center> <img src="http://latex.codecogs.com/gif.latex?\lim_{N&space;\rightarrow&space;\infty}f_n(N)" title="\lim_{N \rightarrow \infty}f_n(N)" /> </center>

does not exist. From the definition of Benford Sequence in Section 1, we can deduce that 

##### Set of Natural numbers viewed as a trivial sequence is NOT Benford of base 10!
##### Also, any sequence drawn from a single set <img src="http://latex.codecogs.com/gif.latex?\left\{&space;1,&space;2,&space;...,&space;N&space;\right\}" title="\left\{ 1, 2, ..., N \right\}" align = "center"/> can NOT be Benford of base 10. 

<center> <img src = "https://i.imgsafe.org/ab/ab69aecc6b.png" /> </center>

## 3. So what is the Point?

The easiest way to understand the conclusion is to think a a random set of number as not one number but two: you have the number itself and the highest possible number that it could be. Therefore, you have a __potential range__ of possible numbers for __every random number__. Also, for you to have a good data set the potential range should be random for each random number. If the __potential range is the same for each random number, then Benford's law will not correctly predict the leading number__. 

For example, suppose you have a test score of 50 students. If the test score ranges 1 through 100 for all students, you can not expect the test score statistic to follow Benford. 

However, suppose you have an accounting data. Numbers in accounting are highly random, each has different potential range, so you can __expect__ that the numbers appear would roughly follow Benford. 

## 4. What's Next
In the next post, I will give rigorous analysis on Benford sequence, giving some examples that follow Benford's Law. 

## 5. Citations
[1] http://mathworld.wolfram.com/BenfordsLaw.html (only image is used)

[2] https://en.wikipedia.org/wiki/Frank_Benford

[3] https://www.scribd.com/document/209534421/The-Law-of-Anomalous-Numbers

[4] https://www.sciencedirect.com/science/article/pii/S2211379715000728

All the graph plots are done by myself with aid of Python MatPlotLib.
👍 , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,