Friday, June 9, 2017

Reimplementing Plan 9 dump, relaxing backups

Some time has passed now since I last used Plan 9 and there are many things I miss; the simplicity, understanding a whole system from the ground up... One of the things I used to miss most was the backup system. For the user it was amazingly simple. In /dump one could find a directory tree with a directory per year, with one per month inside an so on, with a snapshot of the filesystem at that point in time.
This happened automatically without intervention of the user. It enabled me to relax and focus on the job (whatever that was).
While I have used similar features in other systems, they were either too complex (and thus not trustworthy) or I didn't fully understand them, or they required the intervention of the user or they failed too often or a combination of all of them.

The first answer I get when I tell some developer this is "you should be using Git".
Of course, I know of the existence of Git, (and other VCSs), and use it but there are three main problems I find with them as backup systems. One is they are not trivial to navigate so when I am doing something urgent, they take cycles of whatever I am doing. Another is that when you are working with things which are not code, like images for presentations, video... there are problems with binary diffs. Finally, they require intervention of the user, you have to do a push everytime you edit a file. Of course, you could do it periodically, but there are still other factors to consider like simplicity. Adding a layer over git to make it into a dump filesystem is too complex which, again, makes it less trustworthy. Git is nice when you have a distributed team of developers and the complexity pays off. Backups are an entirely different problem.

The most important lessons I learnt from Plan 9's dump is that backups should happen without you doing anything and you should be able to access and navigate them instantly and without fuss. This makes you check without noticing that the backup is online and working because you are using it as part of your daily work, for example, by checking the history of a file edits or modifications. All of that without giving it any second thought.

Another requirement I learnt the hard way, and which is also important, is for the backup system to be as simple as possible, specially in the part that takes the snapshots, as my data and part of my sanity are going to depend on it. If I can't understand completely the format in which the backups are stored, I end up not trusting the backup system. If parts of the backups are lost, I should be able to understand and piece together whatever is left.

While I was thinking about this, I found this post which shows that rsync is the ideal candidate to make the snapshots. It is a battle tested command, where it is going to be quite difficult to find essential bugs, and has support to make the snapshots efficiently, by making hard links to the files that have not changed. I make my snapshots hourly, and for this time period, not making the dumps too redundant, is quite important. Also, there are quite a lot of corner cases in filesystems these days (ACLs, metadata, etc.) which I don't want to deal with or think about if possible. I let rsync take care of all that.

The script to make the snapshot is very simple and, of course, running it hourly in Unix is no problem (before I would have used cron, now systemd).

 Now I have a dump filesystem which I can bind or soft link (if you want to install the whole thing in Linux, it is explained here) to /dump or mount remotely with sshfs. This enables me to do things like

cd /dump/2017/0513/1410/MAIN/paurea/doc/howto

 to see that directory as it was in that date, like in Plan 9.

Not having almost any redundancy is a little troubling, because if a file is overwritten in the dump by someone, we may loose it. The clients mount it read-only, so it shouldn't happen,  exactly like all the other things the backup stops from happening which shouldn't be happening either. To prevent this, I checkpoint the whole filesystem with this script monthly. It creates a fresh copy without any backwards dependencies and doubles the use of space by the size of a working copy.
 To do that I use the cp(1) command which, by default, copies hard links as separate files.
I think of dumpme as compression and chkptme as controlled redundancy, two steps in channel coding (hopefully without catastrophic error propagation).

To navigate the dump comfortably, I reimplemented the yesterday(1) and history(1) commands from Plan 9 using go. They are complete reimplementations from scratch with the options I find useful.

These commands I called yest and hist, so now, you can run

$yest -y=1 /main/MAIN/paurea/doc/howto
/dump/2016/0609/1606/MAIN/paurea/doc/howto

to get the path of the file or directory in the dump a year ago. Some of the features of yesterday(1) like -c to copy yesterday's file are not there, I may add them later if I find them necessary.

The other command is hist, which will give you the history of the file or directory like history, including its diffs if it is a text file.

$ cd  /main/MAIN/paurea/doc/charlas/1.talk

$ hist -c slides.tex

#create    /dump/2017/0510/1605/MAIN/paurea/doc/charlas/1.talk/slides.tex 19718 0777 2017-05-09 17:52:09 +0200 CEST  f
#wstat    /dump/2017/0510/1728/MAIN/paurea/doc/charlas/1.talk/slides.tex 19718 0777 2017-05-10 16:55:52 +0200 CEST  f
#write    /dump/2017/0512/1851/MAIN/paurea/doc/charlas/1.talk/slides.tex 34710 0777 2017-05-12 18:14:37 +0200 CEST  f
#write    /dump/2017/0518/1130/MAIN/paurea/doc/charlas/1.talk/slides.tex 34844 0777 2017-05-17 16:08:08 +0200 CEST  f


Without the -c it returns the incremental diffs of the file. The diff of two files is implemented using diffmatchpatch, which returns a list of edits (insertions, deletions and matches), which I had to convert into line differences like the ones given by the command diff(1).

One feature I would like to implement in the future is to somehow mark the checkpoints in the filesystem and use some form of scrubbing or error detection with the different (supposedly equal) copies of the files.






Wednesday, June 29, 2016

Supersum? Subproduct?

When I was in school, like all the other kids, I was taught addition, multiplication and a little bit later exponentiation. There are several ways to think about all of them, some of them have been very fruitful in mathematics at large (like groups and geometry) and some others, less, like iterated composition (unless you think of arrow composition in category theory as abstracted composition), so many mathematicians don't think about them much.

My plan is to try to write this post with as few prerequisites as posible and to tell a story about some of my explorations in this area. This may trump rigour, precision and formalism. Everything I will write about here can be proven and made completely formal. I will also write the story as if it was presented to me coherently and clearly, which was, unfortunately not the case. There were many holes, which I filled in on my own later. Of course, also, my memory of it is diffuse after so much time.

The way addition was presented to me in school, was by counting.

$$ a + b = \underbrace{1+1+\cdots}_{a\ times}+ \underbrace{1+1+\cdots}_{b\ times}$$.


Later followed by tables to make calculations easier, of course.


Then they defined multiplication, which was more interesting,

$$ a * b = \underbrace{a+a+a\cdots}_{b\ times}=\underbrace{b+b+b\cdots}_{a\ times}$$.

again, followed by tables.

By looking at the definition, is not evident that the two ways to write the same multiplication, i.e. that $a*b = b*a$, are the same.
 We also used another equivalent definition which was geometric, appealing to the intuitive idea of area. This was done both discretely by arranging dots in a rectangle and counting them



  and continuously

 with the idea of area (or, for three elements and three dimensions, volume).

The area inside the rectangle is the multiplication of both sides, so the operation has to be commutative. If I remember correctly, this was both the definition of area and a kind of proof of commutativity (this was at school, as I said, the exposition was not very precise).

We learnt about all the properties of both operations. Even if at the time we didn't know the name, we learnt that addition and multiplication without the 0 are Abelian groups:
  1. Associative, $(a+b)+c = a+(b+c)$ and $(a*b)*c = a*(b*c)$.
  2. Commutative, because the groups are Abelian, $a+b = b+a$ and $a*b = b*a$.
  3. Identity element a+0 = a (for multiplication, $a*1 = a$).
  4. For each element there is an inverse $a-a = 0$, $a/a = 1$. We write the inverse of $a$ for addition $-a$ and the inverse of $a$ for multiplication $1/a$. $0$ would have no inverse for multiplication, which is why it is left out. It is also called the absorbing element because $a*0 = 0$ for all $a$.
  5. A group also needs to be closed, the operation with any two numbers cannot go out of the set.
We have one more property, which relates addition and multiplication, the distributive property
$$a*(b+c) = a*b + a*c.$$
 This property is quite intuitive if we draw the operation.

We can use the distributive property to define negative products,
$$a*(b-b) = 0 = a*b + a*(-b),$$
so
$$a*(-b)  = -(a*b)$$
and so on.
 A little later, exponentiation was defined, which also has a definition as an iterative composition
$$ a ^ b = \underbrace{a*a*a\cdots}_{b\ times},$$
where $a$ is called base and $b$ is the exponent.
The first surprise is that the exponentiation is not commutative, so $a^b \neq b^a$ in general.
We learnt about many properties of exponentiation, but I will center myself in two,
$$a^{b+c} = a^b*a^c $$
and
$$e^{b\ ln(a)} = a^b$$
where $e$ is a special number, the Euler number, $2.7182818284 \ldots$ and
$ ln(a) $ is the inverse of the exponential operation, the natural logarithm, the logarithm with base $e$, i.e. the solution to $e^x = a$.
Note that I have jumped from the definition of exponentiation by iterated composition (which only works for integer numbers) to something which can be defined for real numbers. Real numbers include fractions, like $0.5$ and numbers with infinite non-repeating digits after the decimal point like $e$ or $\pi$. We spent a year going from one to the other, and this was the first time I was exposed to an interesting proof and a generalization.
With the iterated composition definition, it is easy to find the result with a fractional base, for example
$$0.5^3 = 0.5*0.5*0.5=0.125$$.
The problem appears when we want to use a fractional exponent,
$$3^{0.5} = ??? $$
How do we apply the operation $0.5$ times? The problem here is that the definition is inherently asymmetric (which is why it is surprising that the multiplication, which is defined similarly is commutative).
The approach used to fix the problem is very interesting. We can use the property above,
$a^{b+c} = a^b * a^c$ recursively to find
$(a^b)^c = a^{b*c}$.
We need one more ingredient, negative exponents, but
$$a^{0} = a^{1-1} = a^{1}*a^{-1}$$
and $a^{0} = 1$ so $a^{-1} = mult\_inverse(a)$.
 The inverse element can be used to define the division, $a*mult\_inverse(b) = a/b$ which is the inverse operation of the multiplication.

Why is $a^0 = 1$? This is an intriguing question. It is not evident what applying the operation $0$ times should yield $1$. To see why, we can use again $a^{b+c} = a^b * a^c$,
when $c = 0$, $a^{b+0} = a^b = a^b * 1$. So for everything to work, the identity of multiplication
has to be the result of exponentiating by the identity of addition.

We can finally find the value when the exponents are fractions by solving the equation $a = (a^{1/b})^b$. For example, if we are looking for $a^{0.5}$, we only need to find a number that multiplied by itself is $a$. For more general fractions, we can do it by parts, $a^{b/c} =(a^b)^{1/c}$.
We already know how to do it with negative exponents, and they can be combined. For more general exponents, i.e. real numbers, we can approximate them as much as we want by greater and greater $b$ and $c$ in the fractions $\frac{b}{c}$ (more generally using a limit). What we are doing here is finding the completion of the rational numbers (fractions) which are the real numbers. This is a theme that appears a lot in mathematics.
 All this, gives us a definition, but it is not very satisfactory for calculation. To do that, we normally use more powerful tools; infinite sums, which in math are called series.
I am not going to derive them here, but I will write them down anyway, because they will appear later.

First lets define the exponential function as $f(x) = e^x$. This means that to obtain the value of the function, you plug an integer value $x=n$ in the exponent and you get a value. To do that, you can use the definition of both $e$, which is the number above (technically, we have to play with derivatives, series and other more advanced tools to define it) and apply the iterative composition definition. For example
$f(3) = e*e*e$. For fractional or real exponents, you use the approach described above.
We can then define the inverse function of $e^x$, which we have already seen, the natural logarithm which by definition by $ln(e^x) = x$ and conversely
$e^{ln(x)} = x$.

Euler discovered that the exponential can be written as an infinite series (infinite sum)
$$e^x = 1 + x+ \frac{x^2}{1*2} +\frac{x^3}{1*2*3} + \frac{x^4}{1*2*3*4}+\ldots$$.
We can write $1*2*3*4*5*\cdots n = n! $, which is called the factorial.
There is a similar series for the logarithm,
$$ln(x) = (x-1) - \frac{(x-1)^2}{2} + \frac{(x-1)^3}{3} - \frac{(x-1)^4}{4}\ldots$$

Even though the series are infinite, the terms keep getting smaller so, by picking a reasonable
number of terms, we get a good enough approximation of the value we are looking for.

We can use the fact that the logarithm is the inverse of the exponential,
$x^y = (e^{ln(x)})^y= e^{ln(x)*y}$,
and the series to calculate exponentials for any base values. We can start with the series and this is an alternative but equivalent approach to define exponentials for any values.

So,  after this rather long introduction, we are in the position to understand what my frame of mind was when I asked my highschool teachers the following two questions:
  1. Does it make sense to define operations by iterated composition after the exponentiation? What are their properties?
  2. We have defined a kind of fractional composition for the exponentiation and multiplication (multiplication by a fractional number and a fractional exponent respectively), can we do the same in general and find an operation between addition and multiplication?
 The first question was the simplest and was the first one that drew my attention. Of course, in the intellectual wasteland that is (or at least used to be) secondary math education in Spain, my teachers didn't help. When I asked them any questions about material not in the syllabus their eyes would glaze over or I would get told off for being weird. Also, there was no internet because it didn't exist yet, so I had no way of knowing if what I was doing had been previously done. More concretely, I continued iterating the exponentiation to get a new operation,
$$\underbrace{a^{a^{a^{a\ldots}}}}_{b\ times}.$$
The operation can be interpreted in two ways, one which is actually a new operation the other which is not.

The first way to parenthesize,
$$\underbrace{{(({a^a})^a})^a\ldots}_{b\ times}=a^{a^{b-1}},$$
is not a new operation, whereas
$$\underbrace{a^{(a^{(a^{(a\ldots)})})}}_{b\ times}$$
is actually a new operation. We can continue defining an infinite sequence of operations by continuing to iterate further each of them.
I wrote down all the properties I found. Later, the internet appeared and I discovered that
my operation was actually called tetration and the following ones are hyperoperations.
So I was late for these discoveries, but nevertheless, I got the pleasure of finding them myself.
There is a lot of work to do in this area still, but this post is about the second question.

The second question was much more difficult to answer. To make it more precise,
if we index the operations using a variant of Knuth notation, where there is an arrow to indicate the operation and a superindex to indicate which hyperoperation we are talking about:
$$a*b = a\uparrow^0 b$$
$$a^b = a\uparrow^1 b$$
Then, what would $a\uparrow^{0.5}b$ be?

I tried different strategies without success.
My first strategy, was to try to generalize iterated function composition. To go from addition to multiplication, we are actually compositing a function many times. We start with
$$f(x, y) = x + y,$$
and then
$$x*y = f(x, f(x, f(x, \ldots))\ldots).$$
Then we iterate for exponentiation and so on.

If we could iterate, say, $0.5$ times, then, we could define an operation between the sum and the multiplication. I will be calling it supersum for now, but I don't know what the right name for it is, hence the title of the post.
There are various places in mathematics where the concept of fractional composition in some sense appears including multiplication $a*0.5$ and exponentiation $a^{0.5}$. The most notable of it is in linear tranformations. One can define the root of a matrix, either by means of series or by diagonalization and multiplying by this matrix represents fractional application of the linear transformation.
Could we turn our problem into one of fractional compositon?
Trying to apply this approach to our problem in an elegant way is very difficult, at least for me. I got stuck for a long time.
I tried various other strategies, also without success, which I don't even remember.
Then, at some point, while reading about Lie groups, I had my first break.
I had the idea of using the exponential map of Lie groups. The approach is simple. We can use an exponential function to map addition to multiplication
$$a^{(b+c)} = a^b * a^c.$$
I only had to find an exponential for the multiplication $r(x)$ for which the analogous formula would need to hold,
$$r(a*b) = r(a) *r(b).$$ Then I could interpolate somehow between both exponentials.
This does not define uniquely the function we are looking for, for example, the identity would fit the bill.
I had to get the inspiration elsewhere; the Mellin transform. I won't delve into depth here, but the Mellin transform is invariant under dilation (multiplication), whereas the Fourier transform is invariant under translation (addition). We can use the functions found in the Mellin transform and they will be the the exponentials we seek for. The new exponential for multiplication is then $r(x) = 1/x$. Now we need a map which varies from one to the other. It is not going to be continuous because one is not defined in $0$, so we have to "break" the exponential to convert it into $r(x)$.
After a lot of time staring into these functions, I finally got my second break. Enter the Mittag-Leffler function, which plays the role of the exponentials for fractional derivatives.

$$E_\alpha(x) = 1 + \frac{x}{\Gamma(\alpha +1)} + \frac{x^2}{\Gamma(2*\alpha +1)}\ldots$$
The gamma function, $\Gamma(\alpha)$, also discovered by Euler, is a continuous generalization of the factorial, indeed if $n$ is a positive integer number,
$\Gamma(n) = (n-1)!$.
For Mittag-Leffler functions, when $\alpha = 1$, we have $E_1(x) = e^x$. When $\alpha = 0$, the function is $E_0(x) = 1/(1-x)$.
$E_0$ is not the $r(x)$ we where looking for, but it is close enough. We can use the function to define the supersum, when $0\leq \alpha \leq 1$,
$$a\circledast_{\alpha} b = E_{\alpha}^{-1}(E_{\alpha}(a)*E_{\alpha}(b))$$
where $E_{\alpha}^{-1}(x)$ is the inverse of $E_{\alpha}(x)$. We leave the proof that it meets all the properties to the reader (it is in the paper linked at the end of the post).
 It has the particular cases
$$a\circledast_1b = a+b = ln(e^a*e^b) $$
and
$$a\circledast_0b = a+b-a*b = c$$
which can be rewritten
$$c-1 = -(x-1)*(y-1),$$
which is a form of multiplication (reversed and shifted, which can be fixed, but it is essentially a multiplication). It gets better if you use $E_{\alpha}(-x)$ and you can shift the neutral element,
but I will leave that for another post. You may want to see an interesting (introductory) video related to this ideas.
There are many ways to extend and apply this new group and I am still sieving through the results. For example, we can use it to define a transform (this is what abstract harmonic analysis is about, read more here), but again, this is a story for another post.
I have infromally called this group (technicaly the Lie group determined by the exponential map defined by the Mittag-Leffler functions applied on the multiplication) supersum.

What would you name it?

For more details on all this, see the paper I wrote. It is an Accepted Manuscript of an article published by Taylor & Francis in Integral Transforms and Special Functions and will be available available online: http://www.tandfonline.com/doi/full/10.1080/10652469.2016.1267730.





Tuesday, April 26, 2016

Wallis sieve, and lp n-balls III

This is the third installment, do read the previous posts. I left some standing questions in the last post, and I am going to answer at least one of them: Can we squeeze an $l_p$ ball´s volume out of a generalized Wallis sieve?. At the same time I will generalize and simplify the proof from the xkcd post. First, lets generalize the Wallis sieve.

In $d$ dimensions, starting with a $p$ sided hypercube and cutting it appropriately (I leave to the reader to draw it), we get the product,
$$A_n^d = \prod_{n=1}^\infty \frac{p^d n^{d-1} \Big(n+\frac{d}{p}\Big)} {(pn+1)^d},$$
which can be rewritten as the limit
$$A_n^d = lim_{n\to\infty} \frac{p^{nd}\Gamma(n+1)^{d-1}\Gamma\Big(n+1+\frac{d}{p}\Big)\Gamma\Big(1+\frac{d}{p}\Big)^d}{p^{nd}\Gamma(1+\frac{d}{p})\Gamma\Big(n+1+\frac{1}{p}\Big)^d}$$
where I have made use of the Euler gamma function recurrent properties (see previous posts).
Note that the $d$ in $A_n^d$ is an index, not an exponent.
 The limit can be separated into two factors (after cancelling the $p^{nd}$),
 $$A_n^d = lim_{n\to\infty}\Bigg[ \frac{\Gamma(n+1)^{d-1}\Gamma\Big(n+1+\frac{d}{p}\Big)}{\Gamma\Big(n+1+\frac{1}{p}\Big)^d} \Bigg]\Bigg[\frac{\Gamma\Big(1+\frac{1}{p}\Big)^d}{\Gamma(1+\frac{d}{p})}\Bigg]$$.
Remember that $\frac{\Gamma(n+a)}{\Gamma(n)}\sim n^a$, so the first term when $n\to\infty$
$$ \frac{\Gamma(n+1)^{d-1}\Gamma\Big(n+1+\frac{d}{p}\Big)}{\Gamma\Big(n+1+\frac{1}{p}\Big)^d} = \frac{\Gamma(n+1)^{d-1}\Gamma\Big(n+1+\frac{d}{p}\Big)}{\Gamma\Big(n+1+\frac{1}{p}\Big)^{d-1}\Gamma\Big(n+1+\frac{1}{p}\Big)} \sim \frac{n^{\frac{d-1}{p}}}{n^{\frac{d-1}{p}}}\sim 1.$$
So we obtain, finally,
 $$A_n^d = lim_{n\to\infty} \frac{\Gamma\Big(1+\frac{1}{p}\Big)^d}{\Gamma(1+\frac{d}{p})} = V_d^p\Big(\frac{1}{2}\Big).$$
The value of the limit is the volume of the $l_p$ ball, with the case of the hypersphere $p=2$ being a particular case.
Another remarkable case happens when $p=1$ and we obtain the volume of the $d$-dimensional cross-polytope,  of radius $R=\frac{1}{2},$ which is $\frac{1}{d!},$ . The cross-polytope is the generalization of the octahedron to $n$ dimensions and is the dual of the hypercube we start with.

This formula lets us also interpret the volume of various fat Cantor and other Smith-Cantor-Volterra sets.
The question still standing from last post is: Is there a geometrical interpretation for the intermediate $A_n^d$? And of course, What more can we learn from this relation between $l_p$ and these sets?



Saturday, April 23, 2016

Wallis sieve, and lp n-balls II

This post is a continuation of this one. In it I talked about this video by Matt Parker and some
of its consequences for two dimensions. In the video, he also asserted that, for three dimensions, the approach of  cutting off pieces of a cube in tha same way that is done in the Wallis sieve but in three dimensions, would give back the volume of a sphere (see the pretty drawings in the post by Evelyn Lamb).
This opened the  question of whether this would happen in higher dimensions. Someone in twitter (thanks!) pointed me to this post in the xkcd forum, where they proof this fact. I am going to transcript, dissect and explain this proof. The original ideas are all from the post, and the mistakes all mine.
First, remember the basic Wallis product formula,

$$\frac{\pi}{4}=\prod_{n=1}^{\infty}\frac{4n(n+1)}{(2n+1)^2}=lim_{n\to\infty} \frac{4^n n! (n+1)!}{(2n+1)!!^2}.$$

Remember that the doble factorial is the product $n!! = n(n-2)(n-4)\cdots~$ which stops when the terms would cease to be positive, i.e. with $⌈n/2⌉$ terms. An important property of the  double factorial is that it can be written in terms of Euler gamma function,
$$\Gamma\Big(n+\frac{1}{2}\Big) = \frac{(2n-1)!!\sqrt{\pi}}{2^n}.$$

Remember also that the volume of an $l_p$ hyperball is
$$V_d^p(R) = \frac{(2\Gamma(\frac{1}{p} + 1)R)^d}{\Gamma(\frac{d}{p} + 1)}.$$
So the, taking into account $\Gamma\big(\frac{3}{2}\big)=\frac{\sqrt{\pi}}{2}$, the volume of an d-hypersphere, which is the $l_2$ hyperball is
$$V_d^2(R) = \frac{R^d\pi^{d/2}}{\Gamma(\frac{d}{2} + 1)}.$$

The Wallis product can be written,
$$\frac{\pi}{4}=\prod_{n=1}^{\infty}\Bigg[1-\frac{1}{(2n+1)^2}\Bigg],$$
which is convergent, and we can move around terms (if we are careful), because the series
$$\sum_{n=0}^{\infty}a_n=\sum_{n=0}^{\infty}\frac{1}{(2n+1)^2},$$
is absolutely convergent and we can apply the test for product convergence.

The general product for d dimensions for the Wallis sieve turns out to be
$$\prod_{n=1}^\infty\frac{2^dn^{d-1}(n+\frac{d}{2})}{(2n+1)^d}.$$

For even $d$ we can write this product as
$$lim_{n\to\infty} \frac{2^{nd}(n!)^{d-1}(n+\frac{d}{2})!}{(\frac{d}{2})!((2n+1)!!)^d}.$$
What we have done here is that the factors coming from $n+\frac{d}{2}$ have been expanded into
$\frac{(n+\frac{d}{2})!}{(\frac{d}{2})!}$. This is the tricky part where we have assumed $
d$ to be even. For odd $d$ we can either rewrite in terms of $k$, i.e. $d=2k+1$, and follow a similar approach or be more general and use the gamma function.

 The trick to calculate the limit is to separate it into the product of two parts, one of which can be identified as a Wallis product, between square brackets,
$$lim_{n\to\infty}  A_n^d = lim_{n\to\infty}  \Bigg[\frac{4^{n-1} n! (n-1)!}{(2n-1)!!}\Bigg]^{d/2}\Bigg(\frac{2^dn^{d/2}(n+\frac{d}{2})!}{(\frac{d}{2})!n!(2n+1)^d}\Bigg),$$
$$lim_{n\to\infty}  A_n^d =\Big(\frac{\pi}{2}\Big)^{d/2}\Bigg(\frac{2^dn^{d/2}(n+\frac{d}{2})!}{(\frac{d}{2})!n!(2n+1)^d}\Bigg).$$

The last part is to show that the right term converges to $\Big(\frac{d}{2}\Big)!$.
We show it by parts, first, when $n \to \infty$
$$\frac{\Big(n+\frac{d}{2}\Big)!}{n!}=(n+1)(n+2)\cdots(n+\frac{d}{2}) \sim n^{d/2},$$
 so the limit can be rewritten as
$$lim_{n\to\infty}  A_n^d =\Big(\frac{\pi}{2}\Big)^{d/2}\Bigg(\frac{(2n)^d}{(\frac{d}{2})!(2n+1)^d}\Bigg).$$.

 Finally, for big enough $n$, we can use approximate the second term as,
$$\frac{(2n)^d}{(\frac{d}{2})!(2n+1)^d}\sim\frac{1}{(\frac{d}{2})!}=\frac{1}{\Gamma(\frac{d}{2}+1)},$$
so
$$lim_{n\to\infty}  A_n^d = \frac{(\frac{\pi}{4})^{d/2}}{\Gamma(\frac{d}{2} + 1)}=V_d^2\Big(\frac{1}{2}\Big).$$

So, now that we have the general formula for $d$ dimensions, the question still stands, can we interpret $A_n^d$ in geometrical terms as we did for $d=2$? What about $l_p$ for $p\neq2$? Stay tuned.






Wednesday, April 20, 2016

Wallis sieve, and lp n-balls

I heard about the Wallis sieve the first time in this video by Matt Parker, which is fascinating. Instantly I recognized the pattern. I thought the relation between the Wallis sieve and the formula for the volume of an lp n-ball would be trivial and well-known, it turns out it is neither. Later, I read the blog post in scientific american by Evelyn Lamb and I still thought it would be easy to relate both. Finally, I sat down and did the work and found that the result is not only surprising, but (at least to me), completely non-obvious and has the potential to be very interesting.

The volume of an $l_p$ n-ball, a generalized ball is of radius $R$ is, (for more details and the calculation and history of the formula, see Xiafu Wang's paper),
$$V_d^p(R) = \frac{(2\Gamma(\frac{1}{p} + 1)R)^d}{\Gamma(\frac{d}{p} + 1)}.$$

Note that the ball for $l_2$ is an hypersphere of dimension $d$.
This explains the relation between the Euler gamma function and $\pi$, one of my favorites,
 $$\Gamma\Big(\frac{3}{2}\Big) = \frac{\sqrt{\pi}}{2}.$$
At the same time, the gamma function is a generalization of the factorial and satisfies all sorts of recursive formulas similar to the Wallis sieve.

I will refer you to Evelyn Lamb's post for a detailed introduction, but the Wallis sieve can be easily written as a limit using the gamma formula,

$$\frac{\pi}{2} = \prod_{n=1}^{\infty}\Bigg[\frac{(2n)^2}{(2n-1)(2n+1)}\Bigg] = \frac{2\cdot 2\cdot 4\cdot 6\cdot 6\ldots}{1\cdot 2\cdot 2\cdot 5\cdot 5\cdot 7\ldots},$$
 which can be rewritten as a limit,
$$\lim_{n\to\infty} \frac{2^{4n}}{n{{2n}\choose{n}}^2} = \pi \lim_{n\to\infty} \frac{n \Gamma(n)^2}{\Gamma(\frac{1}{2}+n)^2} = \pi.$$

We can then apply the gamma duplication formula,

$$\Gamma(z)\Gamma(z+\frac{1}{2})= 2^{1-2z}\sqrt{\pi}\Gamma(2z),$$

and the functional relation,

$$\Gamma(z+1) = z\Gamma(z),$$

to rewrite again the limit,

 $$\pi = \lim_{n\to\infty} n\Bigg[\frac{\Gamma(n)^2}{\Gamma(2n)2^{1-2n}}\Bigg]^2 = \lim_{n\to\infty} \frac{1}{n}\Bigg[\frac{\Gamma(1+n)^2 2^{2n}}{\Gamma(2n+1)}\Bigg]^2,$$
so
$$\pi = \lim_{n\to\infty}\frac{V_2^{\frac{1}{n}}(2^n)^2}{4n}.$$

This is, to say the least, surprising. Instead of hyperspheres and a trivial relationship, we get something which looks like an astroid (the image comes from Wikipedia).

So it is the limit of the square of the volume of this figure as it collapses upon itself, its inner radius getting smaller while the outer radio grows. This result is bizarre and not at all trivial.

The formula can be generalized (I will play with this the next time I have some free time) to higher dimensions. The video talks about this, but I have not written their formula down. Also, it will be interesting if fat Cantor sets can be written in terms of hyperballs too.
I have the conjecture which it will be related with the taxicab measure astroid ball, whatever that is.

Edit: fixed a missing n in the denominator in the limit.



Tuesday, April 12, 2016

Swap without temporary space

There is a really bad technical interview question which appears now and then.
It goes like this: “how would you exchange the value of two variables without
using temporary space?” Whenever I hear about this question I cringe. It is one of those questions that does not measure anything other than: have you seen this before? or did you read Hacker's Delight?, which, by the way, I wholeheartedly recommend. And you may be a fine developer and human being and be just unlucky enough to not have seen this trick before.

It gets better, because the question is voided in some programming languages with tuple literals or multiple assignment. For example in go, the solution is trivial,

a, b = b, a

And you are done with it. Even better, the compiler may generate a swap of registers, which is as efficient as it gets.

In any case, I was chatting about this question with a friend and I remembered some  ideas I thought I had read somewhere, maybe in Hacker's Delight, maybe
somewhere else. After checking, apparently, I hadn't read it in any of them, so maybe I have come up with them myself. In any case, the gist of it is, if you are ever asked this question, you can use matrices to go completely overboard with the answer.

So, say you want to swap two variables and you want to do it without temporary storage. One of the classic ways to do this is,

a = a + b
b = a - b
a = a - b

So how can we describe this in terms of matrices?
Well, each of the assignments is actually the multiplication of the vector
$\begin{bmatrix}a\\ b\end{bmatrix}$ by a matrix and as long as the matrix determinant is not zero, you
don't lose any information.

For example, the first assignment may be written in math,

$a' = a + b$
$b' = 0 + b$

or in matrix form:
$$\begin{bmatrix}a'\\ b'\end{bmatrix} = \begin{bmatrix}1 & 1\\0 & 1\end{bmatrix}\begin{bmatrix}a\\ b\end{bmatrix}$$

So, the three matrices describing the previous assignments are,

$$M = \begin{bmatrix}1 & 1\\0 & 1\end{bmatrix}$$
$$N = \begin{bmatrix}1 & 0\\1 & -1\end{bmatrix}$$
$$R = \begin{bmatrix}1 & -1\\0 & 1\end{bmatrix}$$
The multiplication of these matrices (be careful, the order has to be right)
$$RNM = \begin{bmatrix}1 & -1\\0 & 1\end{bmatrix}\begin{bmatrix}1 & 0\\1 & -1\end{bmatrix}\begin{bmatrix}1 & 1\\0 & 1\end{bmatrix} = \begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}$$

which is a reverse identity i.e. a swap.

This already works (even if it overflows). In all truth any N factors of the
reverse identity do the trick. You may even rescale them, for example, multiply the first by 2 and the second by 1/2, if you are in floating point, for more obscurity. Or use reciprocals for integers (another trick from Hacker's Delight).

We can go even further and work in $GF2$, i.e. binary bit by bit operations.
In this space, the addition is the xor (^) and each number is its own inverse,
so the above equation can be written,

a = a ^ b
b = a ^ b
a = a ^ b

You can also write this code in terms of factors of the reverse identity with binary matrices.

The three assignments with the xor is probably what the (now completely stunned) interviewer was aiming for.









Tuesday, March 27, 2012

Taking measures and drawing them

I have spent the last couple of months measuring nix, the operating system I am working on. Measuring is hard. Quoting Feynman:

"The first principle is that you must not fool yourself, and you are the easiest person to fool."

When you are measuring complex systems and the amd64 machine we are measuring is quite complex, it is important to separate the variables that can influence the results. Each possible variable needs to be modified separately in a different experiment. Each experiment has to be made many, many times. Then, we calculate some statistics and we are done.

Except...

Trying to modify each variable separately is not easy. In many cases, they are not independent. Also, some of them may not be under your control (network traffic, for example). Still, that is the subject for another post. In this one we are interested in calculating simple statistics and drawing them, which should always be the first approximation to a new problem.

Mean and standard deviation

We were taught how to interpret experiment results at school, by calculating the average and the standard deviation. It is very possible, being a programmer, that you think that this is trivial and to do it yourself in your benchmark. Well, it is not difficult, but it takes some care. It is easy to do it wrong (I know from experience).

First, the average needs to be calculated using an online algorithm. An online algorithm for the average (normally called a running average) just takes one sample at a time generates an average of what you have seen already.

Why is this important?, why can't I just add all the samples and divide by N?

Because, if you take a lot of samples (which you should) and add them, you will overflow the variable you use as accumulator.

There is also the question of numerical stability. The representation the computer uses for floating point is not equally precise across all the range. You have to be careful with the operations you do, so that they do not amplify the errors added by the representation. This consideration is specially important for calculating the standard deviation. If you calculate the standard deviation using the classical offline algorithm you will end up subtracting very big numbers to obtain a small one, one of the worst things to do. It is very easy when taking the square root to end up with an imaginary standard deviation, because of this problem.

The solution is to use an online algorithm to calculate both the average and the standard deviation. One which is carefully designed not to be unstable. Donald Knuth's Art of Computer Programming is the right resource for this kind of algorithms, and fair enough, we can find one in page 232, Vol 2, 3rd Edition.


A simple description of the algorithm can be found here:
The gist of it is:

Initialize M1 = x1 and S1 = 0.

For subsequent x's, use the recurrence formulas

Mk = Mk-1+ (xk - Mk-1)/k
Sk = Sk-1 + (xk - Mk-1)*(xk - Mk).

For 2 ≤ k ≤ n, the kth estimate of the variance is s2 = Sk/(k - 1).

This is an example of this algorithm written in awk I use for files with the results of time, which look like this:

1.34u 34.5s 25.78r
1.34u 33.5s 23.78r
...

So, we already know how to calculate the standard deviation and the average and that is all, we can draw an error bar diagram and we are done. Not quite. Without going deep into statistics, more can be done as first approximation to interpreting a group of samples.

Whiskers

The standard deviation tells you nothing about how the samples are really distributed. Yes, it can tell you if they are spread out but not how asymmetric or skewed the distribution is.

If the distribution is not gaussian or worse, not symmetrical, and many real distributions are not, the standard deviation is a bad estimation of how spread the the error is. It is better not to make any assumptions.

I prefer to use a box plot or whisker plot. A whiskers diagram draws a box spanning the first and third quartile (Q1 and Q3) with the median (the second quartile or Q2) in the middle and two whiskers depicting the error (what the whiskers represents varies among versions of the whisker diagram).


First of all, what is a quartile? The median is the sample that divides the samples in two equal sets (samples bigger than the median and samples smaller):


A quartile is the median of each of the sets which the median creates by dividing the original set in two. This can be applied recursively. If the number of samples of any of this sets is even, the median is defined as the average of the two central samples. For example for the set:

1 3 4 6 7 8

the median (or second quartile Q2) is (4+6)/2 = 5.

The first quartile Q1 is extracted from the lower set

1 3

and would be (1+3)/2=2 and Q3 would be 7.5.

There are other algorithms where the lowest and highest samples and the median are not used when calculating Q1 and Q3, this is method 1 in the wikipedia entry.

I normally use method 1, but there is not much difference in practice.

Going back to whiskers, what do they represent? Well, it depends on the version of the whiskers diagram. First, there is something called IQR (inter quartile range) which is Q3 - Q1. For many versions of the whiskers diagram, the whiskers represent Q1 - 1.5*IQR and Q3 + 1.5*IQR (I will call this version kind A). The other most popular version is that they represent the maximum and the minimum of the whole data (I will call this version kind B). I prefer the second one because, again, it does not make any assumptions about the data.

Anything outside of the whiskers (if there is anything) is called an outlier and sometimes they are drawn in the diagram and sometimes they are not.

In any case, a whisker diagram is very good because from a glance one can deduce

  1. How skewed the data is by looking at the relative position of the median
  2. How spreaded the data can be by looking at the whiskers
  3. If two groups of samples are clearly different by seeing where the boxes and the whiskers lay relatively to one other.
So, the next question is how to draw this diagrams given the samples.
As an example, see this little script which calculates the values of the box for numbers in the first column of a diagram of kind A. I use it with grap sometimes, when I need complete control.

What if I want to have a very pretty diagram with most of the work done automatically by the machine? In this case, I use R with ggplot2.

First, you need to install the R programming language. This can be done downloading some binary package for your operating system from the web page. This includes (among other things) two programs of interest to us:

- R, a text interpreter and
- Rscript, which is R but can be used as a interpreter for #! in scripts

Why Rscript is needed and R does not behave like any other interpreter, is beyond me, but it is how it is.

In any case, after that, next install ggplot2.
Fire R from the command line and run install.package like this:

> install.packages("ggplot2")

select the mirror and exit R and you are done.

You are ready to create the first plot. For this, first you need data. I am going to use as an example this file, which has a header and a body and looks something like this:

, Ncores, Time, Scheduler 
X0, 10, 3.84, amp
X0, 10, 3.89, amp
The first line is the header, telling me what do the samples stand for. The first column identifies samples to be grouped together, for example all rows with X0 stand for a group of samples that should go in one box. We are ready to write a script to generate the graph.
We open a file and write this, which can also be downloaded here:
  1. #!/usr/bin/Rscript

  2. data <-read.csv("data")
  3. dframe = data[, c("Ncores", "Time","Scheduler")]

  4. colnames(dframe) = c("Ncores", "Time","Scheduler")

  5. library(ggplot2)

  6. ggplot(dframe, aes(x=factor(Ncores), y=Time, fill=Scheduler)) + geom_boxplot(outlier.size=0, size=0.2)
  7. ggsave(file="data.pdf")
Line 1 is the to make it a script, remember to use Rscript instead of R.
Line 3 selects which columns of data we are going to represent and line 6 gives them names.
Then we include ggplot2 (line 8) and we plot calling ggplot and geom_boxplot.
The second parameter is the aesthetics, which are parameters related to how the graph is displayed. Fill, for example, gives the color for the background of the box, which is chosen automatically to be a different one for each possible value. geom_boxplot parameter describes how outliers should be drawn (I am not depicting them) and ggsave writes the plot to a pdf file.

This scripts plots a whiskers plot of kind A, which is also called tukey diagram in honor to its creator (who also coined the term bit and the most used FFT algorithm).

If the meaning of the whiskers need to be changed, this can be more involved. An example can be seen in this script, where the result is not kind A or kind , kind B, but a custom made one.

We are finally ready to contemplate the result in all its glory (a pdf scalable version can be found here). Of course, this graph is quite baroque, but then again it expresses a lot of information.