Empirical math, or how experiments can increase our confidence in mathematical conclusions despite access to formal proofs

Earlier posts: Something about certainty, proofs in math, induction/abduction, Is the summed cubes equal to the squared sum of counting integer series?

The things I’m going to say for math are equally true for logic, so whenever I write “math”, just mentally substitute to “math and logic”.

Here I’m going to be talking about something different than the normal empiricist approach to philosophy of math. Briefly speaking, that approach states that mathematical truths are just like truths in empirical science because they really are knowable in kind of the same way and there is no special math way of knowing things. The knowability can be either thru inductive generalizations (enumerative induction) or thru a coherentist view of epistemic justification (which I adhere to and which is closely related to consilience).

When a sound proof isn’t enough

I asked a mathy friend of mine to find out whether all numbers of the series:

S. 9, 99, 999, 9999, …

are divisible by 3. Let’s call this proposition P. A moment’s reflection should reveal the answer to be yes, but it is not so easy to give a formal proof of why this is the case*.

My mathy friend came up with this:


What he is calling cross sum is a translation of the Danish tværsum, but it looks like it refers to something unrelated. However, I managed to find the English term: digit sum. The recursive version is digital root.

I’m not sure it is quite formatted correctly (e.g. ai should just be a, I think), but the idea is something like this:

  1. Each of the numbers in S can be constructed with the summation equation given, a=9 and for k→∞. E.g. for k=2, summation is n=9*100+9*101+9*102=9+90+900=999.
  2. The 10 modulus 9 is 1, which is just to say that dividing 10 by 9 gives a remainder of 1.
  3. Some equations with digit roots and modulus which aims to show that the digit root of each member of S is the same (?).
  4. Finally, because we know that having a digital root of 9 means a number is divisible by 3, then all members of S are divisible by 3.

I’m not sure why he is using modulus, but presumably there is some strong connection between modulus and digital roots that I’m not familiar with. A reader of this post can probably come up with a better proof. :)

However, let’s assume that we have a proof that we think is sound (valid+has true premises). How certain should we be in our belief that P is true (by which I just mean, what probability should we assign to P being true), given that we think we have a sound proof? The answer isn’t 100%. To see why, we need to consider that uncertainty in a conclusion can come from (at least) three sources: 1) uncertainty inherent in the inference, 2) uncertainty in the premises, and 3) uncertainty about the inference type.

The first source is the one most familiar to scientists: since we (usually) can’t measure the entire population of some class of interest (e.g. humans, some of which are dead!), we need to rely on a subset of the population, which we call a sample. To put it very briefly, statistical testing is concerned with the question of how to infer and how certain we can be about the properties of the population. This source is absent when dealing with deductive arguments about math.

The second source is our certainty in the premises. Assuming a roughly linear transfer of justification from premises (in conjoint form) to conclusion in deductive arguments means that the certainty in a conclusion we derive from a set of premises cannot be stronger than our certainty in our premises. In empirical domains, we are never 100% certain about our premises since these themselves are infested with these sources of uncertainty. However, for math, this brings us back to the question of how certain we are about the assumptions we use in proofs. Generally, these can be deduced from simpler principles until we in the end get down to the foundations of math. This again brings us to the question of which epistemology is right: foundationalism or coherentism (or exotic infinitism, but few take that seriously)? Can we really be 100% certain about the foundations of math? Weren’t some very smart people wrong in the past about this question (some of them must be, since they disagree)? What does this show about how certain we should be?

The third source is our human fallibility in even knowing which structure the argument has. Who has not made countless errors during their lifetime in getting mathematical proofs right? Thus, we have ample empirical evidence of our own failing to correctly identify which arguments are valid and which are not. The same applies to inductive arguments.

The place for experiments in math

If the above is convincing, it means that we cannot be certain about mathematical conclusions, even when we have formal proofs.

Often, we use a suitable counterexample to show that a mathematical claim is false. For instance, if someone claimed that naive set theory is consistent (i.e. has no inconsistencies), we would point the person to Russell’s paradox. However, coming up with this counterexample wasn’t easy, it remained undiscovered for many years. Counterexamples in math are, as far as I know, usually found thru expert intuition. While this sometimes works, it isn’t a good method. A better method is to systematically search for counterexamples.

How could we do that? Well, we could simply try a lot of numbers. This would make it impractical for humans, but not necessarily for computers. Thus, when we have a mathematical claim of the type all Xs are Ys, we can try to disprove it by generating lots of cases and checking whether they have the property that is claimed (i.e. they shouldn’t have X&~Y). Sometimes we can even do this in an exhaustive or locally exhaustive (e.g. try all numbers between 1 and 1000) way. Other times the search space is too large for modern computers, so we would need to use sampling. This of course means that we introduce type 1 uncertainty discussed above.

Still, if continued search as described above fails to disprove a mathematical claim, my contention is that this increases our certainty about the claim, just as it does in regular empirical science. In my conversation with mathy people, they were surprisingly unwilling to accept this conclusion which is why I wrote up this longer blogpost.

Experimental code for members of S

So, are all members of S divisible by 3?

# divisible by 3 ------------------------------------------------------------
depth = 20
results = numeric()
for (i in 1:depth) {
  #repeat 9 i times
  x_rep = rep(9, i)
  #place next to each other, then convert to numeric
  x = str_c(x_rep, collapse = "") %>% as.numeric
  #save modulus 3
  results[i] = x %% 3
#> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

So, the first 20 members of S are divisible by 3. This increases the strength of my belief in P, even if the proof above (or something similar) turns out to work.

My intuitive proof

The reason I said that a moment’s reflection should be sufficient is that the following informal proof springs immediately to my mind:

  1. All members of S are multiples of the first member of S.
    Specifically, the vector of factors to use is, F: 1, 11, 111, 1111, …
  2. If n is divisible by k, then n*i is also divisible of k, where n, k, and i are all non-zero integers.
  3. 9 is divisible by 3.
  4. Thus, all members of S are divisible by 3.

Maybe someone can find a way to prove (1) and (2).

Other R code

Some other R code I wrote for this post, but didn’t end up discussing in the post.

# digital root ------------------------------------------------
digital_root = function(x) {
  #convert to chr
  x_str = as.character(x)
  #split into digits, unlist, convert back to num
  x_split = str_split(x_str, "") %>% unlist %>% as.numeric
  #get sum of digits
  x_sum = sum(x_split)
  #if is has more than 1 digit, call digital sum on result
  #otherwise return result
  if (x_sum > 9) {
  } else {

digital_root(9) == 9
digital_root(99) == 9
digital_root(999) == 9
digital_root(12) == 3
digital_root(123) == 6
digital_root(1234) == 1
# distributive? -----------------------------------------------------------
depth = 100
results = matrix(nrow=depth, ncol=depth)
for (i in 1:depth) {
  for (j in 1:depth) {
    ij = i + j
    results[i, j] = digital_root(ij) == digital_root(i) + digital_root(j)