a very occasional diary.




360 years later or „Скрещенья ног“

In 1896 Paul Gauguin completed Te Arii Vahine (The King’s Wife):

From many similar paintings of his Tahitian period this, together with a couple of preparatory watercolours, is distinguished by artificial legs placement, which can be characterised in Russian by the equally forced line (quoted in this article's title) from a certain universally acclaimed poem. This strange posture is neither a deficiency nor an artistic whim. It is part of a silent, subtle game played over centuries, where moves are echoes and the reward—some flickering form of immortality:
This is Diana Resting, by Cranach the Elder, 1537. Let me just note the birds and leave the pleasure of finding other clues to the reader. Lucas Cranach (and this is more widely known) himself played a very similar game with Dürer.

By sheer luck, the first painting is just few kilometers away from me in Pushkin's museum.


A curious case of stacks and queues.

When studying computing science we all learn how to convert an expression in the "normal" ("infix", "algebraic") notation to "reverse Polish" notation. For example, an expression "a*b + c*d" is converted to "a b * c d * +". An expression in reverse Polish notation can be seen as a program for a stack automaton:

Where PUSH pushes its argument on the top of the (implicit) stack, while ADD and MUL pop 2 top elements from the stack, perform the respective operation and push the result back.

For reasons that will be clearer anon, let's re-write this program as

Container c;
c.put(c.get() * c.get())
c.put(c.get() * c.get())
c.put(c.get() + c.get())

Where Container is the type of stacks, c.put() pushes the element on the top of the stack and c.get() pops and returns the top of the stack. LIFO discipline of stacks is so widely used (implemented natively on all modern processors, built in programming languages in the form of call-stack) that one never ask whether a different method of evaluating expressions is possible.

Here is a problem: find a way to translate infix notation to a program for a queue automaton, that is, in a program like the one above, but where Container is the type of FIFO queues with c.put() enqueuing an element at the rear of the queue and c.get() dequeuing at the front. This problem was reportedly solved by Jan L.A. van de Snepscheut sometime during spring 1984.

While you are thinking about it, consider the following tree-traversal code (in some abstract imaginary language):

walk(Treenode root) {
        Container todo;
        while (!todo.is_empty()) {
                next = todo.get();
                for (child in next.children) {

Where node.children is the list of node children suitable for iteration by for loop.

Convince yourself that if Container is the type of stacks, tree-walk is depth-first. And if Container is the type of queues, tree-walk is breadth-first. Then, convince yourself that a depth-first walk of the parse tree of an infix expression produces the expression in Polish notation (unreversed) and its breadth-first walk produces the expression in "queue notation" (that is, the desired program for a queue automaton). Isn't it marvelous that traversing a parse tree with a stack container gives you the program for stack-based execution and traversing the same tree with a queue container gives you the program for queue-based execution?

I feel that there is something deep behind this. A. Stepanov had an intuition (which cost him dearly) that algorithms are defined on algebraic structures. Elegant interconnection between queues and stacks on one hand and tree-walks and automaton programs on the other, tells us that the correspondence between algorithms and structures goes in both directions.


Unexpected isomorphism

Since Cantor's "I see it, but I cannot believe it" (1877), we know that \(\mathbb{R}^n\) are isomorphic sets for all \(n > 0\). This being as shocking as it is, over time we learn to live with it, because the bijections between continua of different dimensions are extremely discontinuous and we assume that if we limit ourselves to any reasonably well-behaving class of maps the isomorphisms will disappear. Will they?

Theorem. Additive groups \(\mathbb{R}^n\) are isomorphic for all \(n > 0\) (and, therefore, isomorphic to the additive group of the complex numbers).

Proof. Each \(\mathbb{R}^n\) is a vector space over rationals. Assuming axiom of choice, any vector space has a basis. By simple cardinality considerations, the cardinality of a basis of \(\mathbb{R}^n\) over \(\mathbb{Q}\) is the same as cardinality of \(\mathbb{R}^n\). Therefore all \(\mathbb{R}^n\) have the same dimension over \(\mathbb{Q}\), and, therefore, are isomorphic as vector spaces and as additive groups. End of proof.

This means that for any \(n, m > 0\) there are bijections \(f : \mathbb{R}^n \to \mathbb{R}^m\) such that \(f(a + b) = f(a) + f(b)\) and, necessary, \(f(p\cdot a + q\cdot b) = p\cdot f(a) + q\cdot f(b)\) for all rational \(p\) and \(q\).

I feel that this should be highly counter-intuitive for anybody who internalised the Cantor result, or, maybe, especially to such people. The reason is that intuitively there are many more continuous maps than algebraic homomorphisms between the "same" pair of objects. Indeed, the formula defining continuity has the form \(\forall x\forall\epsilon\exists\delta\forall y P(x, \epsilon, \delta, y)\) (a local property), while homomorphisms are defined by \(\forall x\forall y Q(x, y)\) (a stronger global property). Because of this, topological categories have much denser lattices of sub- and quotient-objects than algebraic ones. From this one would expect that as there are no isomorphisms (continuous bijections) between continua of different dimensions, there definitely should be no homomorphisms between them. Yet there they are.


Why Go is Not My Favorite Programming Language

Disclaimer: this article shares very little except the title with the classical Why Pascal is Not My Favorite Programming Language. No attempt is made to analyse Go in any systematic fashion. To the contrary, the focus is on one particular, if grave, issue. Moreover, the author happily admits that his experience with Go programming is very limited.

Go is a system programming language and a large fraction of system software is processing of incoming requests of some sort, for example:
  • [KERNEL] an OS kernel processes system calls;
  • [SERVER] a server processes requests received over network or IPC;
  • [LIB] a library processes invocations of its entry points.
A distinguishing feature of system software is that it should be resilient against abnormal conditions it the environment such as network communication failures, storage failure, etc. Of course, there are practical limits to such resilience and it is very difficult to construct a software that would operate correctly in the face on undetected processor or memory failures (albeit, such systems were built in the past), but it is generally agreed that system software should handle a certain class of failures to be usable as a foundation of software stack. We argue that Go is not suitable for system programming because it cannot deal with one of the most important failures in this class: memory allocation errors.

Out of many existing designs of failure handling (exceptions, recovery blocks, etc.) Go exclusively selects explicit error checking with a simple panic-recovery mechanism. This makes sense, because this is the only design that works in all the use-cases mentioned above. However, memory allocation errors do not produce checkable errors in Go. The language specification does not even mention a possibility of allocation failure and in the discussions of these issues (see e.g., here and here) Google engineers adamantly refuse considering a possibility of adding an interface to intercept memory allocation errors. Instead, various methods to warn the application that memory is "about to be exhausted" as proposed. These methods, of course, only reduce the probability of running out of memory, but never eliminate it (thus making bugs in the error handling code more difficult to test). As one can easily check by running a simple program that allocates all available memory, memory allocation error results in unconditional program termination, rather than a recoverable panic.

But even if a way to check for allocation errors or recover from them were added, it would not help, because Go often allocates memory behind the scenes, so that there is no point in the program source, where a check could be made. For example, memory is allocated whenever a struct is used as an interface:

package main
type foo interface { f() int } type bar struct { v int } func out(s foo) int { return s.f() - 1 } func (self bar) f() int { return self.v + 1 } func main() { for { out(bar{}) } }
The program above contains no explicit memory allocations, still, it allocates a lot of memory. The assembly output (use godbolt.org for example) for out(bar{}) contains a call to runtime.convT64() (see the source) that calls mallocgc.

func convT64(val uint64) (x unsafe.Pointer) {
	if val == 0 {
		x = unsafe.Pointer(&zeroVal[0])
	} else {
		x = mallocgc(8, uint64Type, false)
		*(*uint64)(x) = val

To summarise, the combination of the following reasons makes Go unsuitable for
construction of reliable system software:
  • it is not, in general, possible to guarantee that memory allocation would always succeed. For example, in the [LIBRARY] case, other parts of the process or other processes can exhaust all the available memory. Pre-allocating memory for the worst case is impractical except in the simplest cases;
  • due to the design of Go runtime and the implementation of the fundamental language features like interfaces, it is not possible to reliably check for memory allocation errors;
  • software that can neither prevent nor resolve memory allocation errors is unreliable. For example, a library that when called crashes the entire process, because some other process allocated all available memory cannot be used to build reliable software on top of it.


Cographs, cocounts and coconuts.

Abstract: Dual of the familiar construction of the graph of a function is considered. The symmetry between graphs and cographs can be extended to a suprising degree.
Given a function \(f : A \rightarrow B\), the graph of f  is defined as $$f^* = \{(x, f(x)) \mid x \in A\}.$$ In fact, within ZFC framework, functions are defined as graphs. A graph is a subset of the Cartesian product \(A \times B\). One might want to associate to \(f\) a dual cograph object: a certain quotient set of the disjoint sum \(A \sqcup B\), which would uniquely identify the function. To understand the structure of the cograph, define the graph of a morphism \(f : A \rightarrow B\) in a category with suitable products as a fibred product:
f^* @>\pi_2>> B\\
@V \pi_1 V V @VV 1_B V\\
A @>>f> B
\end{CD}In the category of sets this gives the standard definition. The cograph can be defined by a dual construction as a push-out:
A @>1_A>> A\\
@V f V V @VV j_1 V\\
B @>>j_2> f_*
\end{CD}Expanding this in the category of sets gives the following definition:
$$f_* = (A \sqcup B) / \pi_f,$$
where \(\pi_f\) is the reflexive transitive closure of a relation \(\theta_f\) given by (assuming in the following, without loss of generality, that \(A\) and \(B\) are disjoint)
$$x\overset{\theta_f}{\sim} y \equiv y = f(x)$$
That is, \(A\) is partitioned by \(\pi_f\) into subsets which are inverse images of elements of \(B\) and to each such subset the element of \(B\) which is its image is glued. This is somewhat similar to the mapping cylinder construction in topology. Some similarities between graphs and cographs are immediately obvious. For graphs: $$\forall x\in A\; \exists! y\in B\; (x, y)\in f^*$$ $$f(x) = \pi_2((\{x\}\times B) \cap f^*)$$ $$f(U) = \{y \mid y = f(x) \wedge x\in U \} = \pi_2((U\times B)\cap f^*)$$ where \(x\in A\) and \(U\subseteq A\). Similarly, for cographs: $$\forall x\in A\; \exists! y\in B\; [x] = [y]$$ $$f(x) = [x] \cap B$$ $$f(U) = (\bigcup [U])\cap B$$ where \([x]\) is the equivalance set of \(x\) w.r.t. \(\pi_f\) and \([U] = \{[x] \mid x \in U\}\). For inverse images: $$f^{-1}(y) = \pi_1((A \times \{y\}) \cap f^*) = [y] \cap A$$ $$f^{-1}(S) = \pi_1((A \times S) \cap f^*) = (\bigcup [S])\cap A$$ where \(y\in B\) and \(S\subseteq B\).

A graph can be expressed as $$f^* = \bigcup_{x \in A}(x, f(x))$$ To write out a similar representation of a cograph, we have to recall some elementary facts about equivalence relations.

Given a set \(A\), let \(Eq(A) \subseteq Rel(A) = P(A \times A)\) be the set of equivalence relations on \(A\). For a relation \(\pi \subseteq A \times A\), we have $$\pi \in Eq(A) \;\; \equiv \;\; \pi^0 = \Delta \subseteq \pi \; \wedge \; \pi^n = \pi, n \in \mathbb{Z}, n \neq 0.$$ To each \(\pi\) corresponds a surjection \(A \twoheadrightarrow A/\pi\). Assuming axiom of choice (in the form "all epics split"), an endomorphism \(A \twoheadrightarrow A/\pi \rightarrowtail A\) can be assigned (non-uniquely) to \(\pi\). It is easy to check, that this gives \(Eq(A) = End(A) / Aut(A)\), where \(End(A)\) and \(Aut(A)\) are the monoid and the group of set endomorphisms and automorphisms respectively, with composition as the operation (\(Aut(A)\) is not, in general, normal in \(End(A)\), so \(Eq(A)\) does not inherit any useful operation from this quotient set representation.). In addition to the monoid structure (w.r.t. composition) that \(Eq(A)\) inherits from \(Rel(A)\), it is also a lattice with infimum and supremum given by $$\pi \wedge \rho = \pi \cap \rho$$ $$\pi \vee \rho = \mathtt{tr}(\pi \cup \rho) = \bigcup_{n \in \mathbb{N}}(\pi \cup \rho)^n$$ For a subset \(X\subseteq A\) define an equivalence relation \(e(X) = \Delta_A \cup (X\times X)\), so that $$x\overset{e(X)}{\sim} y \equiv x = y \vee \{x, y\} \subseteq X$$ (Intuitively, \(e(X)\) collapses \(X\) to one point.) It is easy to check that $$f_* = \bigvee_{x \in A}e(\{x, f(x)\})$$ which is the desired dual of the graph representation above.


Conatus trimetro iambico.

Aloof as stardust rains
Are memory dim prints
Eliding tensed face
By shadows within of
All conquering space that
Inly trusts each friend
Of madness—whose embrace
Accept with no delays
To take one final look
Before you turn Rose ways.

A rather rare meter in English, but much easier once you let enjambments in.



Licence my roving hands, and let them go,
Before, behind, between, above, below.
O my America! my new-found-land
   -- J. Donne, 1633.
Блуждающим рукам моим дай разрешенье,
Пусти вперед, назад, промеж, и вверх, и вниз,
О дивный новый мир, Америка моя!
Variant reading reeing instead of roving is even better.

I hope that "О дивный новый мир" (O brave new world) is not entirely anachronistic.


Zero everywhere.

Предположим, что \(\forall x\in\mathbb{R}\to f(x) \neq 0\). Возьмем произвольный интервал \([a, b] \subset \mathbb{R}\), \(a \lt b\) и докажем, что на этом интервале есть точка \(x_0\) такая, что \(\lim \limits_{x\to x_0} f(x) \neq 0\).

Пусть \(T_n = |f|^{-1}([\frac{1}{n}, +\infty)) \cap [a, b]\), т.е. \(x \in T_n \equiv |f(x)| \ge \frac{1}{n} \wedge x\in[a, b]\), для \(n > 0\).

Если каждое \(T_n\) нигде неплотно, нигде неплотно и их объединение, (т.к. \([a, b]\) — бэровское пространство), но их объединение это весь интервал \([a, b]\) — противоречие. Следовательно, некоторое \(T_n\) имеет внутреннюю точку, \(x_0 \in T_n\), тогда \(T_n\) содержит \(x_0\) вместе с неким открытым интервалом на котором, таким образом, \(|f(x)| ≥ \frac{1}{n}\), и, следовательно, \(|\lim \limits_{x\to x_0} f(x)| \ge \frac{1}{n} \gt 0\).

Заметим, что мы доказали больше, чем требовалось, а именно, что множество нулей всюду плотно. Или, что функция всюду сходящаяся к непрерывной, почти всюду непрерывна (замените \(0\) на произвольную непрерывную \(g:\mathbb{R}\to\mathbb{R}\)).

(2014.12.15) Обобщение.

Let \(X\) be a Baire space, \(Y\)—a second-countable Urysohn space and \(f,g : X \to Y\)—arbitrary maps. If \((\forall x\in X)(\lim\limits_{t\to x}f(t) = \lim\limits_{t\to x}g(t))\) then \(f = g\) on a dense subset.

Proof (by contraposition). Suppose that there is an open \(A \subseteq X\), such that \((\forall x\in A)(f(x)\neq g(x))\). Let \(B\) be a countable base of \(Y\).
Define a countable family of subsets of \(A\): \(T_{U,V} = f^{-1}(U) \cap g^{-1}(V) \cap A\), where \(U, V \in B\) (that is, \(U\) and \(V\) are open subsets of \(Y\)). For any \(x\in A\), \(f(x)\neq g(x)\), and because \(Y\) is Urysohn, there are \(U, V\in B, cl(U)\cap cl(V) = \varnothing, f(x)\in U, g(x)\in V\), that is, \(x\in T_{U,V}\) that is,
\(\bigcup\limits_{cl(U)\cap cl(V) = \varnothing} T_{U,V} = A\)
Because \(X\) and hence \(A\) are Baire spaces, one of \(T_{U,V}\) in the union above, say \(T_{U_0, V_0}\) is not nowhere dense. That is, there is an open set \(G\subseteq A\) such that for any \(x_0\in G\), and any open neighbourhood \(S, x_0\in S\), \(S \cap T_{U_0,V_0}\neq\varnothing\), that is there exists a \(x'\in S\) such that \(f(x') \in U_0\) and \(g(x')\in V_0\).
Suppose that \(\lim\limits_{t\to x_0}f(t) = \lim\limits_{t\to x_0}g(t) = y\in Y\). This means that every open neighbourhood of \(y\) intersects with \(U_0\), hence \(y\in cl(U_0)\). Similarly \(y\in cl(V_0)\), contradiction with \(cl(U_0)\cap cl(V_0) = \varnothing\).
PS: для задачи 2, ответ \(k = 2\cdot n - 2\).


Blindsight spot.

I just realised that in any unital ring commutativity of addition follows from distributivity:

\(a + b\)
\(=\)\(-a + a + a + b + b - b\)
\(=\)\(-a + a\cdot(1 + 1) + b\cdot(1 + 1) - b\)
\(=\)\(-a + (a + b)\cdot(1 + 1) - b\)
\(=\)\(-a + (a + b)\cdot 1 + (a + b)\cdot 1 - b\)
\(=\)\(-a + a + b + a + b - b\)
\(=\)\(b + a\)

The same holds for unital modules, algebras, vector spaces, &c. Note that multiplication doesn't even need to be associative. It's amazing how such things can pass unnoticed.


Whither science (А болт мы сделаем деревянным).

I was recently on a committee, reviewing papers for a computer science conference. The submissions, from all over the world, were in PDF. One submission was actually a Trojan. An executable masking as a PDF file. This by itself is a startling evidence of how widespread spying became (the Trojan came from a respectable university email address), but the really alarming thing is that this was *not* the paper that got the lowest grades. That is, some legitimate submissions have less scientific content than a virus.

A few days later, while perusing proceedings of another conference, which shall remain nameless only for so long, I found something that triggered this post.

A conference on distributed computing must be interesting. And indeed, for an attentive reader, persistent enough to reach the  last articles, there is a treasure in store. An impressive proof of how versatile and far-reaching distributed computing is nowadays.

And who do you think publishes this?

Oh, pity.


Parmigianino's faces

An angel from Madonna with the long neck (1535):

An unidentified girl from Antea (1524, survived everything, including Austrian salt mines):

Albeit one can argue that the resemblance is due to the stricture of the mannerist canon (see the earlobes, for example), this is undoubtedly the same face.

A discovery no less thrilling even though I am definitely not the first to make it.


threads, contexts and doors.

Paul Turner gave a talk about new threading interface, designed by Google, at this year Linux Plumbers Conference:

The idea is, very roughly, to implement the ucontext interface with kernel support. This gives the benefits of kernel threads (e.g., SMP support), while retaining fast context switches of user threads. switchto_switch(tid) call hands off control to the specified thread without going through the kernel scheduler. This is like swapcontext(3), except that kernel stack pointer is switched too. Of course, there is still an overhead of the system call and return, but it is not as important as it used to be: the cost of normal context switch is dominated by the scheduler invocation (with all the associated locking), plus, things like TLB flushes drive the difference between user and kernel context switching further down.

I did something similar (but much more primitive) in 2001. The difference was that in that old implementation, one could switch context with a thread running in a different address space, so it was possible to make a "continuation call" to another process. This was done to implement Solaris doors RPC mechanism on Linux. Because this is an RPC mechanism, arguments have to be passed, so each context switch also performed a little dance to copy arguments between address spaces.


Weekend affinity.

Groups usually come with homomorphisms, defined as mappings preserving multiplication:
$$f(a\cdot b) = f(a)\cdot f(b)$$
From this definition, notions of subgroup (monomorphism), quotient group (epimorphism, normal subgroup) and the famous isomorphism theorem follow naturally. The category of groups with homomorphisms as arrows has products and sums, equalizers and coequalizers all well-known and with nice properties.

Consider, instead, affine morphism, that can be defined by the following equivalent conditions:

  1. \(f(a \cdot b^{-1} \cdot c) = f(a) \cdot f^{-1}(b) \cdot f(c)\)
  2. \(f(a \cdot b) = f(a) \cdot f^{-1}(e) \cdot f(b)\)
  3. \(\exists t. f(a \cdot b) = f(a) \cdot t \cdot f(b)\)

The motivation for this definition is slightly roundabout.

The difference between homomorphism and affine morphism is similar to the difference between a vector subspace and an affine subspace of a vector space. A vector subspace always goes through the origin (for a homomorphism \(f\), \(f(e) = e\)), whereas an affine subspace is translated from the origin (\(f(e) \neq e\) is possible for an affine morphism).

Take points \(f(a)\) and \(f(b)\) in the image of an affine morphism, translate them back to the corresponding "vector subspace" to obtain \(f(a) \cdot f^{-1}(e)\) and \(f(b) \cdot f^{-1}(e)\). If translated points are multiplied and the result is translated back to the affine image, the resulting point should be the same as \(f(a \cdot b)\):
f(a \cdot b) = (f(a) \cdot f^{-1}(e)) \cdot (f(b) \cdot f^{-1}(e)) \cdot f(e) = f(a) \cdot f^{-1}(e) \cdot f(b)
which gives the definition (2).

(1) => (2) immediately follows by substituting \(e\) for \(b\).
(2) => (3) by substituting \(f^{-1}(e)\) for \(t\).
(3) => (2) by substituting \(e\) for \(a\) and \(b\).
(2) => (1)

\(f(a \cdot b^{-1} \cdot c)\)
\(=\){ (2) for \(a \cdot (b^{-1} \cdot c)\) }
\(f(a) \cdot f^{-1}(e) \cdot f(b^{-1} \cdot c)\)
\(=\){ (2) for \(b^{-1} \cdot c\) }
\(f(a) \cdot f^{-1}(e) \cdot f(b^{-1}) \cdot f^{-1}(e) \cdot f(c)\)
\(=\){ \(e = f(b) \cdot f^{-1}(b)\), working toward creating a sub-expression that can be collapsed by (2) }
\(f(a) \cdot f^{-1}(e) \cdot f(b^{-1}) \cdot f^{-1}(e) \cdot f(b) \cdot f^{-1}(b) \cdot f(c)\)
\(=\){ collapsing \(f(b^{-1}) \cdot f^{-1}(e) \cdot f(b)\) by (2) }
\(f(a) \cdot f^{-1}(e) \cdot f(b^{-1} \cdot b) \cdot f^{-1}(b) \cdot f(c)\)
\(=\){ \(b^{-1} \cdot b = e\) }
\(f(a) \cdot f^{-1}(e) \cdot f(e) \cdot f^{-1}(b) \cdot f(c)\)
\(=\){  \(f^{-1}(e) \cdot f(e) = e\) }
\(f(a) \cdot f^{-1}(b) \cdot f(c)\)

It is easy to check that each homomorphism is an affine morphism (specifically, homomorphisms are exactly affine morphisms with \(f(e) = e\)).

Composition of affine morphisms is affine and hence groups with affine morphisms form a category \(\mathbb{Aff}\).

A subset of a group \(G\) is called an affine subgroup of \(G\) if one of the following equivalent conditions holds:
  1. \(\exists h \in G:\forall p, q \in H \rightarrow (p \cdot h^{-1} \cdot q \in H \wedge h \cdot p^{-1} \cdot h \in H)\)
  2. \(\forall p, q, h \in H \rightarrow (p \cdot h^{-1} \cdot q \in H \wedge h \cdot p^{-1} \cdot h \in H)\)
The equivalence (with a proof left as an exercise) means that if any \(h\) "translating" affine subgroup to a subgroup exists, then any member of the affine subgroup can be used for translation. In fact, any \(h\) that satisfies (1) belongs to \(H\) (prove). This matches the situation with affine and vector subspaces: any vector from an affine subspace translates this subspace to the subspace passing through the origin.

Finally for to-day, consider an affine morphism \(f:G_0\rightarrow G_1\). For \(t\in G_0\) define kernel:
$$ker_t f = \{g\in G_0 | f(g) = f(t)\}$$
It's easy to check that a kernel is affine subgroup (take \(t\) as \(h\)). Note that in \(\mathbb{Aff}\) a whole family of subobjects corresponds to a morphism, whereas there is the kernel in \(\mathbb{Grp}\).

To be continued: affine quotients, products, sums, free affine groups.


Vale of tears.

Consider a system describing possible computations (e.g., a programming language or a state machine formalism) including interactions with the external world, that is, input and output facilities.

A pair of dual sub-classes of computational elements (values, objects, functions, &c.) can be defined:
  • finitary elements that are known to not depend on input and
  • ideal elements that output is known to not depend on.
The rest of elements may depend on input and may affect output. Let's call such elements laic ("temporal" might be a better word).

The class of finitary elements is well-known: because they can be computed without input, they can be computed before the program starts, i.e., they correspond to various constants, including static entities like types (in statically typed languages), classes, function bodies and so on. Some languages have powerful finitary computations, for example, C++ template specialisation is Turing complete.

Laic elements are the most usual things like variables and objects.

Ideal elements are less known. They have a long history of use in the area of formal program verification where they are called ghost or auxiliary variables. 

There is an obvious restriction of data and control flow between various types of elements:

  • finitary element may depend only on finitary elements;
  • laic element may depend on laic and finitary elements (e.g., normal function can take a constant as a parameter, but constant cannot be initialised with the value of a variable or function call);
  • ideal element may depend on any element (e.g., ideal variable can be assigned the value of a laic variable, but not other way around).

The most important property of ideal elements is that, because they don't affect observable program behaviour, there is no need to actually compute them! Yet, they are useful exactly because of this property: ideal elements are not computed and, hence, are not constrained by the limitations of actual computational environments. For example, an ideal variable can represent an infinite (even uncountable) collection or a real number (real real number, not  approximation); an ideal function can be defined by the transfinite induction or by a formula involving quantifiers.

To use ideal elements, one assumes that they follow normal rules of the language (for example, axiomatic or denotational semantics). This assumption doesn't burden the implementors of the language precisely because the ideal elements are not computed. Under that assumption, one can reason about properties of ideal elements.

As a simplest example, an ideal variable can be used to record the sequence of calls to a certain function:

ideal f_seq = {};
function f(arg) {
        f_seq := f_seq ++ arg;

and then reason about f_seq using whatever method is used to reason about laic elements (e.g., weakest preconditions, Hoare triples or usual hand-waving), for example, to prove that messages delivered to a receiver were sent by the sender (that is, deliver_seq is a sub-sequence of send_seq).

It is interesting that both finitary elements (specifically, static types) and ideal elements help to reason about the behaviour of the laic world sandwiched between them.

Nothing in this short article is new, except for the (obvious) duality between ideal and finitary elements.

Exercise 0: implement linear types by casting laic elements to ideal.
Exercise 1: implement garbage collection similarly.


Side with effete.

More of "I know C better than you!" stuff.

I just discovered (the verb is deeply characteristic) how to write a HAS_SIDE_EFFECTS(expr) macro that
  • doesn't evaluate expr and
  • returns true iff expr has side-effects.
The macro essentially depends on a GCC extension. It is useful, for example, as a sanity check in conditionally compiled code, e.g., in assert(expr).


Zero is overrated

Define a sequence of functions \(P_i:\mathbb{R}^+\rightarrow\mathbb{R}\), \(i\in\mathbb{N}\)
$$P_0(x) = \ln(x)$$
$$P_{n+1}(x) = \int P_{n}(x)\cdot dx$$

I found a beautiful closed formula for \(P_i\), that I haven't seen before.

Integrating by parts, it's easy to calculate first few \(P_i\):
$$P_1(x) = x\cdot\ln x - x$$
$$P_2(x) = \frac{x^2}{2}\cdot\ln x - \frac{3}{4}\cdot x^2$$
$$P_3(x) = \frac{x^6}{6}\cdot\ln x - \frac{11}{36}\cdot x^3$$

which leads to a hypothesis, that
$$P_n = \frac{x^n}{n!}\cdot\ln x - K_n\cdot x^n$$
for certain constants \(K_i\), \(K_1 = 1\).

Again integrating by parts, obtain:
$$K_{n+1} = \frac{1}{n+1}\cdot\Bigg(K_n + \frac{1}{(n+1)!}\Bigg)$$

from where
$$K_n$$$$=$$$$\frac{1}{n}\cdot\Bigg(K_{n-1} + \frac{1}{n!}\Bigg)$$
$$=$$$$\frac{1}{n}\cdot\Bigg(\frac{1}{n!} + \frac{1}{n-1}\cdot\big(K_{n-2} + \frac{1}{(n-1)!}\big)\Bigg)$$
$$=$$$$\frac{1}{n!}\cdot\Bigg(\frac{1}{n} + \frac{1}{n-1}\Bigg) + \frac{1}{n\cdot(n - 1)}\cdot K_{n-2}$$
$$=$$$$\frac{1}{n!}\cdot\Bigg(\frac{1}{n} + \frac{1}{n - 1} + \frac{1}{n - 2}\Bigg) + \frac{1}{n\cdot(n-1)\cdot(n - 2)}\cdot K_{n-3}$$
$$=$$$$\frac{1}{n!}\cdot\Bigg(\frac{1}{n} + \frac{1}{n - 1} + \cdots +\frac{1}{n - p + 1}\Bigg) + \frac{1}{n\cdot(n-1)\cdot\cdots\cdot(n - p +1)}\cdot K_{n-p}$$
Substitute \(p = n - 1\):
$$K_n = \frac{1}{n!}\cdot\Bigg(1 + \frac{1}{2} + \cdots + \frac{1}{n}\Bigg)$$
Substitute this in the hypothesis:
$$P_n = \frac{x^n}{n!}\cdot\Bigg(\ln x - \big(1 + \frac{1}{2} + \cdots + \frac{1}{n}\big)\Bigg)$$
This nicely contains fragments of exponent, nth-harmonic number and, after a diagonalisation, the Euler constant:
$$\lim_{n\to +\infty}\frac{n!}{n^n}\cdot P_n(n) = -\gamma$$

Why \(P_i\) are interesting at all? Because if one completes them for negative indices as
$$P_n = (-1)^{n-1}\cdot(-n-1)!\cdot x^n$$
then mth-derivative of \(P_n\) is \(P_{n-m}\) for all non-negative \(m\):
$$(\forall n\in\mathbb{Z})(\forall m\in\mathbb{N})\partial^m P_n = P_{n - m}$$
and similarly
$$(\forall n\in\mathbb{Z})(\forall m\in\mathbb{N})\int_m P_n\cdot dx = P_{n + m}$$
where \(\int_m\) is repeated integral.

This is in contrast with powers \(x^n\), \(n \ge 0\), which, under repeated derivation, eventually pathetically collapse to a constant and then to 0, so that negative powers are not reachable from positive and other way around.

It's interesting, which other families \((\phi_i)_{i\in\mathbb{Z}}\) are there such that
$$(\forall m\in\mathbb{N})\partial^m\phi_n = \phi_{n - m}$$
$$(\forall m\in\mathbb{N})\int_m \phi_n\cdot dx = \phi_{n + m}$$
$$(\forall n\neq m)\phi_n \neq const\cdot\phi_m$$
(the latter condition is to avoid degenerate cases)?


Деревья по кругу.

Эндоморфизмы и орбиты, небольшая теория для левой руки.

Abstract: orbits of an endomorphism of a finite set are represented as connected components of a suitably defined graph. Their structure is surprisingly simple: a collection of disjoint trees rooted on the vertices of a unique cycle.

Пусть \(f:X\rightarrow X\), произвольное отображение множества \(X\) в себя, т.е. эндоморфизм \(X\). Известно, что если \(f\) является автоморфизмом (т.е. биекцией), то оно разбивает \(X\) на непересекающиеся орбиты. Каждая орбита состоит из точек
$$\ldots f^{-2}(x), f^{-1}(x), x, f(x), f^2(x), \ldots$$
для некоторого \(x\), где \(f^n\) означает \(n\)-кратное применение \(f\) (\(f^0(x) = x\), \(f^{n+1}(x) = f(f^n(x))\), \(f^{-n}(x) = (f^{-1})^n(x)\)). Если \(X\) конечно, то всякая орбита является циклом, состоящим из конечного числа точек, и для некоторого \(n\) будет \(x = f^{n}(x)\). В случае бесконечного \(X\) могут существовать орбиты, состоящие из бесконечного (счетного) числа различных точек.

Как выглядят аналоги орбит для произвольных эндоморфизмов конечных множеств?

Обычное определение орбиты не дает интересного обобщения на эндоморфизмы, т.к. получающися множества пересекаются. Вместо этого, дадим следующее определение:

Множество \(M\), \(M \subseteq X\), \(f\)-замкнуто, если \(x \in M\equiv f(x)\in M\), т.е. \(f\)-замкнутое множество содержит вместе с каждой точкой \(x\) ее образ \(f(x)\) и все прообразы \(f^{-1}(x)\).
Очевидно, что
Пересечение \(f\)-замкнутых множеств \(M\) и \(N\) — \(f\)-замкнуто:
\(x\in M \cap N\)
\(\equiv\){ определение пересечения множеств }
\(x\in M \wedge x\in N\)
\(\equiv\){ определение \(f\)-замкнутости}
\(f(x)\in M \wedge f(x)\in N\)
\(\equiv\){ определение пересечения множеств }
\(f(x)\in M \cap N\)

Орбитой называется непустое минимальное \(f\)-замкнутое множество, т.е. \(f\)-замкнутое множество не имеющее собственных \(f\)-замкнутых подмножеств.

Для автоморфизмов это определение совпадает со стандартным.

Очевидно, что орбиты не пересекаются (иначе их пересечение было бы их \(f\)-замкнутым подмножеством, в нарушение требования минимальности). Кроме этого, всякая точка принадлежит хотя бы одному \(f\)-замкнутому множеству, например всему \(X\). Это значит, что для всякой точки определено пересечение всех \(f\)-замкнутых множеств, содержащих эту точку. Такое пересечение будет орбитой. Следовательно \(X\) разбивается на непересекающиеся орбиты.


Отображению \(f\) можно сопоставить ориентированный граф (который останется безымянным, как единственный, подлежащий рассмотрению), с множеством вершин \(X\) и множеством дуг \(\{(x, f(x)) | x \in X \}\).

Этот граф обладает замечательным свойством: у каждой его вершины есть ровно одна исходящая дуга. Очевидно, что всякому графу, обладающему таким свойством соответствует эндоморфизм множества вершин.

Мы будет также рассматривать неориентированную версию этого же графа (в которой забыты направление дуг) и будет в этом случае говорить о "ребрах" (а не дугах), неориентированнх путях и неориентированных циклах.

Все пути и циклы предполагаются простыми, т.е. проходящими через данную дугу или вершину не более одного раза.

Переформулировав определение орбиты в терминах неориентированного графа, получаем, что пара связанных ребром вершин принадлежит одной и той же орбите. Отсюда следует, что орбиты это в точности компоненты связности неориентированного графа.

Рассмотрим дугу \(a\) из вершины \(x\) в вершину \(y\) (т.е. предположим, что \(y = f(x)\)). Для соответствующего ребра \(a\) определим

$$\sigma(x, a) = +1$$
$$\sigma(y, a) = -1$$

Таким образом, функция \(\sigma\) определена для любой вершины и смежного ей ребра и обладает следующими свойствами:
Если ребро \(a\) соединяет вершины \(x\) и \(y\), то \(\sigma(x, a) + \sigma(y, a) = 0\) (свойство (0))

Если вершина \(x\) соединяет разные ребра \(a\) и \(b\), то \(\sigma(x, a) + \sigma(x, b) \leq 0\) (свойство (1))

Теперь мы может доказать простую лемму:
Если \(a_0, ... a_n\) неориентированный путь между вершинами \(x\) и \(y\), то \(\sigma(x, a_0) + \sigma(y, a_n) > -2\), т.е. крайние ребра пути не могут быть одновременно входящими.

Индукция по длине пути.
\(n = 1\). Немедленно по свойству (0).
\(n = k + 1\). Пусть \(a_0, ... a_k\) неориентированный путь между вершинами \(x\) и \(z\), а ребро \(a_{k+1}\) соединяет \(z\) и \(y\).
\(\sigma(x, a_0) + \sigma(y, a_{k+1})\)
\(=\)\(\sigma(x, a_0) + \sigma(y, a_{k+1}) + \sigma(z, a_k) - \sigma(z, a_k)\)
\(>\){ по гипотезе индукции \(\sigma(x, a_0) + \sigma(z, a_k) > -2\) }
\(-2 + \sigma(y, a_{k+1}) - \sigma(z, a_k)\)
\(=\)\(-2 + \sigma(y, a_{k+1}) - \sigma(z, a_k) + \sigma(z, a_{k+1}) - \sigma(z, a_{k+1})\)
\(\geq\){ по свойству (1) \(\sigma(z, a_k) + \sigma(z, a_{k+1}) \leq 0\)}
\(-2 + \sigma(y, a_{k+1}) + \sigma(z, a_{k+1})\)
\(=\){ по свойству (0) \(\sigma(y, a_{k+1}) + \sigma(z, a_{k+1}) = 0\)}


Рассмотрим ориентированные циклы в нашем графе. Для всякой вершины \(x\), лежащей на ориентированном цикле из \(n\) вершин \((x_i | 0 \leq i < n)\), имеем \(f^n(x) = x\). В частности, циклы длины 1 это в точности стационарные точки \(f\). Обратно, множество вершин всякого ориентированного цикла длины \(n\) можно описать как $$\{f^i(x) | 0 \leq i < n\} = \{f^i(x) | 0 \leq i\} = \{x, f(x), \ldots, f^{n-1}(x)\}$$ где за \(x\) можно принять любую вершину цикла.

При помощи функции \(\sigma\) легко доказывается и следующий полезный факт:
Всякий цикл ориентирован.

Рассмотрим неориентированный цикл из ребер \(a_i\), соединяющих вершины \(x_i\) и \(x_{i+1}\), где \(0 \leq i < n\) и \(i+1\) понимается по модулю \(n\). Составим сумму $$S = \sum_{0 \leq i < n}\sigma(x_i, a_i) + \sigma(x_{i+1}, a_i)$$ По свойству (0), каждое слагаемое этой суммы равно 0. Перегруппировав слагаемые, получим $$S = \sum_{0 \leq i < n}\sigma(x_i, a_i) + \sigma(x_i, a_{i-1}) = 0$$ По свойству (1) (применимому, т.к. цикл простой), каждое слагаемое здесь не больше нуля. Так как сумма равно 0, то каждое слагаемое равно 0: $$\sigma(x_i, a_i) + \sigma(x_i, a_{i-1}) = 0$$ Т.е., в каждую вершину цикла входит и выходит одна дуга, то есть цикл ориентирован.

С этого момента не будет различать ориентированные и неориентированные циклы.

Просто доказывается следующее свойство:
Циклы имеющие общую вершину совпадают.

Действительно, пусть циклы \(\{f^i(x) | 0 \leq i\}\) и \(\{f^i(y) | 0 \leq i\}\) имеют общую точку. Т.е. \(f^p(x) = f^q(y)\). Для произвольной точки первого цикла имеем
\(=\){ для произвольного \(d\), так как \(f^n(x) = x\) }
\(f^{i+ d\cdot n}(x)\)
\(=\){ выберем \(d\) так, что \(i+ d\cdot n > p\) }
\(f^{p + \Delta}(x)\)
\(=\){ \(f^p(x) = f^q(y)\) }
\(f^{q +\Delta}(y) \in \{f^i(y) | 0 \leq i\}\)
Аналогично, всякая точка второго цикла принадлежит первому.

Отсюда следует интересное свойство:
Всякий неориентированный путь \(P\) между вершинами \(x\) и \(y\), лежащими на циклах, полностью лежит в некотором цикле.

Применим индукцию по длине \(P\).

\(n = 0\). Очевидно.
\(n = k + 1\). По лемме, хотя бы одно из крайних ребер \(P\) — исходящее. Пусть это ребро, исходящее из точки \(x\). Это ребро соединяет \(x\) с вершиной \(f(x)\), также лежащей на цикле. Таким образом, путь \(f(x), \ldots, y\) длины \(k\) соединяет точки, лежащие на циклах и следовательно (по гипотезе индукции), полностью лежит в некотором цикле. Этот цикл имеет общую точку \(f(x)\) с циклом в котором лежит точка \(x\) и, следовательно, эти циклы совпадают. Таким образом, путь \(x, \ldots, y\) полностью лежит в цикле.

Таким образом, два разных цикла не могут быть соединены путем. Следовательно, каждая орбита содержит не более одного цикла.

Настало время использовать конечность \(X\).
Всякая орбита конечного множества содержит цикл.

Действительно, орбита \(M\) не пуста по определению. Выберем в ней точку \(x\) и рассмотрим функцию \(\phi : \mathbb{N} \rightarrow M\), \(\phi(n) = f^n(x)\). Т.к. \(M\) конечно, \(\phi\) не может быть однозначным отображением и существуют натуральные числа \(p\) и \(q\), такие что \(f^p(x) = f^q(x)\). Следовательно, \(f^p(x)\) лежит на цикле длины не большей \(|p - q|\).

Зафиксируем орбиту и ее единственный цикл.

Рассмотрим "прореженный" граф, получающийся из орбиты уделением всех ребер цикла и выберем компоненту связности этого графа, содержащую произвольную вершину на цикле.

Очевидно, что
  • это множество не содержит других точек цикла, так как по свойству, установленному выше, всякий путь соединяющий точки цикла лежит в этом цикле, а ребра цикла были удалены;
  • это множество не содержит циклов, т.к. цикл в компоненте единственный и его ребра удалены.
Итак, компонента является связным (по построению) графом без циклов, т.е. деревом, пересекающимся с циклом ровно в одной точке, которую можно выбрать за корень. Каждой точке цикла соответствует такое дерево. Деревья, соответствующие разным точкам цикла не пересекаются (как компоненты связности).

Докажем, что всякая точка \(x\) орбиты принадлежит такому дереву. Для этого достаточно показать, что \(x\) можно соединить с некоторой вершиной цикла неориентированным путем не проходящим через ребра цикла (т.е. неориентированным путем пролегающем в прореженном графе) Т.к. орбита связна, то существует неориентированный путь \(P\), соединяющий \(x\) с вершиной цикла. Если этот путь содержит ребро цикла \(a\), то он распадается в композицию путей \(P = P_0\cdot a\cdot Q_0\). \(P_0\) соединяет \(x\) с одной из точек, соединенных ребром \(a\), т.е. с некоторой точкой на цикле. Повторяя процесс этот процесс необходимое число раз мы получим путь \(P_n\), который не содержит ребер цикла (возможно вообще не содержит ребер).

Итак, орбита эндоморфизма конечного множества распадается на непересекающиеся деверья, корни которых прикреплены к циклу.

Под действием последовательного применения преобразования \(f\), всякая точка сначала движется по дереву, в направлении корня, сливаясь при этом с другими точками этого дерева. После достижения корня, точка начинает двигаться по циклу.


В случае бесконечного множества, орбита может быть устроена несколько сложнее.

Вместо циклов, придется рассмотреть траектории, определенные, как последовательности точек \((f^n(x) | n > 0)\). Цикл является частным случаем траектории.

Так же как и в случае с циклами, неориентированный путь, соединяющий точки на траекториях, сам лежит в некоторой траектории. Однако траектории имеющие общую точку не обязаны совпадать. К счастью, имеется более слабое, но полезное свойство: если пересечение траекторий непусто, то оно является траекторией.

Как и в конечном случае, рассмотрим компоненту связности. Возьмем пересечение всех содержащихся в ней траекторий. Предположим, что оно непусто. В таком случае, это перечение есть траектория. Образуем прореженный граф, удалив все ребра траектории-пересечения. Так как мы удалили как минимум по ребру из каждой траектории, то в рассматриваемой компоненте связности не осталось ни одной траектории, в частности, ни одного цикла. Неориентированная компонента связности прореженного графа является таким образом, связным ацикличным графом, т.е. деревом.

Итак, в случае непустого пересечения траекторий, орбита распадается на непересекающиеся деревья, корни которых закреплены на траектории-пересечении.

Остается рассмотреть случай пустого пересечения. Пример такой ситуации доставляет функция
$$f(x, y) = (\lfloor x/2^y\rfloor\cdot 2^y, y+1)$$
\(f\) строит бесконечное "дерево", начиная с листьев, склеивая \(2^y\) смежных узлов уровня \(y\) в один узел уровня \(y+1\).

Очевидно, что все множество \(\mathbb{N}\times\mathbb{N}\) является орбитой. Каждая траектория через конечное число шагов достигает оси \(x = 0\), следовательно все траектории пересекаются, но их пересечение пусто.

Однако здесь бессоница отступает и авторы прекращают дозволенные речи.


A macro.

C language is wonderfully profound. After almost 20 years, I still find new ways to (ab-)use it. Here is a little problem:
Define a macro IN(x, set), called as
IN(x, (v0, v1, ..., vn))
that expands into an expression
((x) == (v0) || (x) == (v1) || ... (x) == (vn)) 
which evaluates to true iff x is a member of set {v0, ..., vn}, where the type of x and all vs is the same. Alternatively, define a macro IN1(x, ...) that expands to the same expression as IN(x, (...)).

You won't see this post (or maybe you will).

My ISP is blocking my blog. Am I an unwitting beacon of revolution?


Riemann hypothesis, DIY

Continuing with useless exercises, let's look at prime numbers. We shall use Haskell this time.

-- prime numbers: [2,3,5,7,11,13,17,19, ...]
pr :: [Int]
pr = [p | p <- [2 .. ], maximum [d | d <- [1 .. p - 1], p `mod` d == 0] == 1]

The first line, starting with '--' is a comment. The second line declares 'pr' to have type "list of Int". Typings are optional in Haskell. The last line, uses "list comprehension" (yes, it is related to "set comprehension" used in one of the recent posts) to define 'pr' as those elements of the infinite list [2, 3, 4, 5, ...] (denoted [2 ...]), whose maximal proper divisor is 1. This is a very inefficient, quadratic way to build the list of all primes.

In addition, define a list of all twin primes:

-- twin primes, primes whose difference is 2
twinp :: [Int]
twinp = [n | n <- pr, (n + 2) `elem` (take (n + 4) pr)]

"x `elem` s" is true iff x is en element of list s. Ugly '(n + 2) `elem` (take (n + 4) pr)' is needed because for an infinite s, "x `elem` s" either returns True or never returns: without some additional knowledge about the list, the only way to determine whether x belongs to it is by linear search, which might never terminate. In our case, 'pr' is strictly ascending and hence the search can be limited to some finite prefix of the list ((take n s) returns first n elements of s). Perhaps there is a way to specify list properties that `elem` can rely on, I don't know, I opened the Haskell specification an only hour ago.

Let's do some "visualisation":

-- an (infinite) string containing a '+' for each element of s and '.' for each
-- element not in s, assuming that s is ascending and s !! n > n.
-- dot pr == "..++.+.+...+.+.. ..."
dot :: [Int] -> [Char]
dot s = [ if (x `elem` (take (x + 2) s)) then '+' else '.' | x <- [0 ..]]

For example, (dot pr) is an infinite string containing "+" for each prime number and "." for each composite.

By printing (dot s) in rows, k columns each, some hopefully interesting information about distribution of elements of s can be obtained. First, split s into sub-lists of length k:

-- (dot s), sliced into sub-strings of length k, separated by newlines
sl :: [Int] -> Int -> [[Char]]
sl s k = ['\n' : [ (dot s) !! i | i <- [j*k .. (j+1)*k - 1]] | j <- [0 ..]]

then, concatenate them all:

-- concatenation of slices
outp :: [Int] -> Int -> [Char]
outp s k = concat (sl s k)

A few examples:

putStr (outp pr 6)


immediately one sees that, with the exception 2 and 3, prime numbers have remainder 1 or 5 when divided by 6. For 7, the picture is

putStr (outp pr 7)


it's easy to see that "vertical" lines of "+", corresponding to a particular remainder are now (unsurprisingly) replaced with slopes. By changing the width of the picture, one can see how regular pattern changes its shape periodically
putStr (outp pr 23)


putStr (outp pr 24)


One of the more striking examples is with twin primes:
putStr (outp twinp 6)


Every pair of twin primes except for (3, 5) has a form (6*n - 1, 6*n + 1).

As Gauss reportedly quipped (when refusing to work on the last Fermat's theorem): "number theory is full of problems whose difficulty is only surpassed by their uselessness".


Cue a key

There is a typical problem with Emacs experienced by people frequently switching between different keyboard mappings, for example, to work with non ASCII languages: fundamental Emacs keyboard shortcuts (and one has to invoke them a lot to use Emacs efficiently) all use ASCII keys. For example, when I want to save my work, while entering some Russian text, I have to do something like Alt-Space (switch to the US keyboard layout) Control-x-s (a Emacs shortcut to save buffers) Alt-Space (shift back to the Cyrillic keyboard layout). Switching out and back to a keyboard layout only to tap a short key sequence is really annoying.

Two solutions are usually proposed:
  • Duplicate standard key sequences in other keyboard layouts. For example,
    (global-set-key [(control ?ч ?ы)] 'save-some-buffers)
    expression in .emacs binds Control-ч-ы to the same command as Control-x-s is usually bound. This eliminates the need to switch layout, because ч-x (and ы-s) symbols are located on the same key (in QWERTY and JCUKEN layouts respectively).
  • Another solution employs the fact that Emacs is a complete operating system and, therefore, has its own keyboard layout switching mechanism (bound to Control-\ by default). When this mechanism is used, Emacs re-interprets normals keys according to its internal layout, so that typing s inserts ы when in internal Russian mode, while all command sequences continue to work independently of layout. The mere idea of having two independent layout switching mechanisms and two associated keyboard state indicators is clearly ugly beyond words (except for people who use Emacs as their /sbin/init).
Fortunately, there is another way:

; Map Modifier-CyrillicLetter to the underlying Modifier-LatinLetter, so that
; control sequences can be used when keyboard mapping is changed outside of
; Emacs.
; For this to work correctly, .emacs must be encoded in the default coding
; system.
 (lambda (r e) ; R and E are matching Russian and English keysyms
   ; iterate over modifiers
   (mapc (lambda (mod)
    (define-key input-decode-map 
      (vector (list mod r)) (vector (list mod e))))
  '(control meta super hyper))
   ; finally, if Russian key maps nowhere, remap it to the English key without
   ; any modifiers
   (define-key local-function-key-map (vector r) (vector e)))

(Inspired by a cryptic remark about "Translation Keymaps" in vvv's .emacs.)


Unintentionally yours.


This post is about set theory, which is a framework to reason about collections, elements and membership.

We start with a informal and naïve outline, which is (very loosely) based on a Godel-Bernays version of set theory. This theory is about sets (collections) which contain elements, which can in turn be sets. When a set \(X\) contains an element \(t\), it is written \(t\in X\).

First, set equality has to be defined:

E0 Axiom of extensionality:
$$X = Y \equiv (t \in X \equiv t \in Y)$$
(Here \(\equiv\) is a logical equivalence, pronounced "if and only if".) This axiom means that sets are equal if and only if they have the same elements. (In this and following formulae, free variables are implicitly universally quantified.)

Subsets are defined by \(X \subseteq Y \equiv (t\in X \Rightarrow t\in Y)\), that is, X is a subset of Y when every element of X is also an element of Y. It's easy to check that \(X = Y \equiv (X\subseteq Y \wedge Y\subseteq X)\), where \(\wedge\) means "and".

Next, a way to build new sets is needed:

E1 Axiom of (extensional) comprehension:
$$t \in \{u \mid P(u)\} \equiv P(t)$$
This axiom introduces a "set-builder notation" \(\{u \mid P(u) \}\) and states that \(\{u \mid P(u) \}\) is exactly the set of all \(t\) such that \(P(t)\) holds.

Now, there is already enough machinery for two famous collections to be immediately constructed: empty set, \(\varnothing = \{t \mid false \}\), which contains no elements, and the collection of all sets: \(U = \{t \mid true \}\), which contains all possible elements.

With these axioms, conventional set operations can be defined:
  • singleton: \(\{t\} = \{u\mid u = t\}\), for each \(t\) a singleton set \(\{t\}\) can be constructed that contains \(t\) as its only element. Note that \(t\in\{t\}\)
  • union: \(X\cup Y = \{t\mid t \in X \vee t \in Y\}\) (\(\vee\) means "or")
  • intersection: \(X\cap Y = \{t\mid t\in X \wedge t\in Y\}\)
  • complement: \(\complement X = \{t \mid t \notin X\}\)
  • (unordered) pair: \(\{t,u\} = \{t\}\cup\{u\}\)
  • ordered pair: \((t,u) = \{t, \{t, u\}\}\)
  • power set: \(\mathfrak{P}(X) = \{t \mid t \subseteq X\}\)

From here, one can build hierarchical sets, representing all traditional mathematical structures, starting with natural numbers:

$$0 = \varnothing, 1 = \{0\}, 2 = \{0, 1\}, \ldots n + 1 = \{0, \ldots, n\}, \ldots$$

then integers, rationals, reals, &c., adding more axioms (of infinity, of foundation, of replacement, &c.) along the way.

It was discovered quite early that this system is not entirely satisfactory. First defect is that it is impossible to have elements which are not sets themselves. For example, one would like to talk about a "set of all inhabited planets in the Solar system". Elements of this set (planets) are not sets, they are called ur-elements. Unfortunately, the axiom of extensionality makes all ur-elements equal to the empty set. Note, that this indicates that the axiom of extensionality doesn't work well with sets that have very few (none) elements. This was never considered a problem, because all sets of interest to mathematics can be constructed without ur-elements.

Another, more serious drawback, arises in the area of very large sets: existence of a set \(\{t\mid t\notin t\}\) directly leads to a contradiction known as Russel's paradox.

Among several methods to deal with this, one separates sets into two types: "smaller" collections which are continued to be called "sets" and "proper classes", which are collections so large that they cannot be a member of any collection. Axiom of comprehension is carefully modified so that set-builder never produces a collection having some class as its element. In this setup Russel's paradox becomes a theorem: \(\{t\mid t\notin t\}\) is a proper class.


The axiom of extensionality states that sets are equal when they contain the same elements. What would happen, if set theory were based on a dual notion of intentional equality (which will be denoted by \(\sim\) to tell it from extensional one), where sets are equal when they are contained in the same collections?

I0 Axiom of intensionality:
$$X \sim Y \equiv (X \in t \equiv Y\in t)$$
This looks bizarre: for any "normal" set \(X\) a collection of all sets containing \(X\) as element is unmanageably huge. But as a matter of fact, intentional equality is much older than extensional, it is variously known as Leibniz's law, identity of indiscernibles and, in less enlightened age, as duck typing.

There is a nice symmetry: while intensional equality myopically confuses small sets (ur-elements), extensional equality cannot tell very large collections (proper classes) from each other, because they are not members of anything and, therefore, intentionally equal.

The whole extensional theory buildup can be mirrored easily by moving things around \(\in\) sign:

Intensional subsets: \(X \unlhd Y \equiv (X\in t \Rightarrow Y\in t)\)

I1 Axiom of intensional comprehension (incomprehension):
$$[u \mid P(u)]\in t \equiv P(t)$$

And associated operations:
  • uniqum (or should it have been "s-ex-gleton?): \([t] = [u\mid u \sim t]\), note that \([t]\in t\).
  • intentional union: \(X\triangledown Y = [t\mid X \in t \vee Y \in t]\)
  • intentional intersection: \(X\triangle Y = [t\mid X \in t \wedge Y \in t]\)
  • intentional complement: \(\Game X = [t \mid X \notin t]\)
  • intentional pair: \([t,u] = [t]\triangledown [u]\)
  • intentional ordered pair: \(<t,u> = [t, [t, u]]\)
  • intentional power set: \(\mathfrak{J}(X) = [t \mid X \unlhd t\}\)

What do all these things mean? In extensional world, a set is a container, where elements are stored. In intensional world, a set is a property, which other sets might or might not enjoy. If \(t\) has property \(P\), it is written as \(t\in P\). In the traditional notation, \(P\) is called a predicate and \(t\in P\) is written as \(P(t)\). The axiom of intentional equality claims that sets are equal when they have exactly the same properties (quite natural, right?). \(X\) is an intentional subset of \(Y\) when \(Y\) has all properties of \(X\) and perhaps some more (this looks like a nice way to express LSP). Intentional comprehension \([u \mid P(u)]\) is a set having exactly all properties \(t\) for which \(P(t)\) holds and no other properties. Intentional union of two sets is a set having properties of either and their intentional intersection is a set having properties of both, &c. Uniqum \([P]\) is the set that has property \(P\) and no other properties.

Because intensional theory is a perfect dual of extensional nothing interesting is obtained by repeating extensional construction, for example, by building "intensional natural numbers" as

$$0' = U, 1' = [0'], 2' = [0', 1'], \ldots (n + 1)' = [0', \ldots, n'], \ldots$$

What is more interesting, is how intensional and extensional twins meet. With some filial affection it seems:
  • by uniqum property \([\varnothing] \in\varnothing\), which contradicts the definition of \(\varnothing\), also
  • set \([t\mid false]\) is not a member of any set (perhaps it's a proper class) and set \([t\mid true]\) is a member of every set, which is strange;
  • a set of which a singleton can be formed has very shallow intentional structure. Indeed:
    \(x \unlhd y\)
    \(\equiv\){ definition of intensional subset }
    \(x\in t \Rightarrow y\in t\)
    \(\Rightarrow\){ substitute \(\{x\}\) for \(t\)}
    \(x\in \{x\} \Rightarrow y\in \{x\}\)
    \(\equiv\){ \(x\in \{x\}\) is true by the singleton property, modus ponens }
    \(y\in \{x\}\)
    \(\equiv\){ the singleton property, again }
    \(x = y\)
To get rid of contradictions and to allow intensional and extensional sets to co-exist peacefully, domains on which singleton and uniqum operations are defined must be narrowed. To be continued.


From the traveller diary.

Note to oneself: general anaesthesia is a very efficient cure for jet lag.


An obscure generalisation: a redundancy domain should be transversal (better yet, orthogonal) to a potential failure vector. Hello, Karmarkar!


The Hunt for Addi(c)tive Monster 2.

In the previous post, we were looking for a monster—a nonlinear additive function. We found that such a function is extremely pathological: it is nowhere locally monotone, nowhere continuous and nowhere locally bounded. Worse than that, it's easy to prove that the graph of a monster is dense in , that is, for every x and y, an arbitrary neighborhood of x contains a point that monster sends arbitrarily close to y.

Recall our attempt to construct a monster. Any additive function is linear on any -set and is fully determined on this set by a value it has in any single of its points. Our Direchlet function derived monster failed (or rather fell) because the slopes an additive function has on different -sets are not independent. Indeed, given that f has a slope on a -set and a slope on a -set , it has to have a slope on a -set . This shows a way to construct a monster: one has to find a collection B of real numbers such that (i) every real number can be represented as a sum  , with rational coefficients of which only finite number is non-zero (so that the sum is defined) and (ii) that such representation is unique. Then one can select arbitrary values on elements of B and take moster's value on to be , which is well-defined thanks to (ii).

Looks familiar? It should be: the definition of B is exactly the definition of a basis of a vector space. Real numbers can be added to each other and multiplied by rationals and, therefore, form a vector space over . This space is very different from a usual one-dimensional vector space real numbers form over  (i.e., over themselves).

After a streak of bad and unlikely properties that a monster has, we now got something positive: a monster exists if and only if  as a vector space over  has a basis. Does it?

But of course. Any vector space has a basis—this is a general theorem almost immediately following from the Zorn's lemma. The basis we are looking for even got a name of its own: Hamel basis.

At last we stumbled across the whole family on monsters. Specifically, there exists a set  and a function such that every real number r can be uniquely represented as

where only finite number of  are non-zero for a given r. From this it immediately follows that .

Take an arbitrary function , and define

that is, f is additive. Intuitively,  is a slope f has at the -set . f is linear if and only if is a constant function, in all other cases f is a monster. If one takes , then


is an especially weird monster function: it takes only rational values!

Note that almost all additive functions are, after all, monsters—only very small sub-set of them is linear.


The Hunt for Addi(c)tive Monster

This is another spurious post about mathematics that happened instead of something more useful. Let's talk about one of the most common mathematical objects: a function that maps real numbers () to real numbers. We shall call such a function f : \mathbb{R} \rightarrow \mathbb{R} additive iff for any real x and y

f(a + b) = f(a) + f(b)

This is a natural and simple condition. Well-known examples of additive functions are provided by linear functions , where k is called a slope. Are there other, non-linear additive functions? And if so, how do they look like? A bit of thinking convinces one that a non-linear additive function is not trivial to find. In fact, as we shall show below, should such a function exist, it would exhibit extremely weird features. Hence, we shall call a non-linear additive function a monster. In the following, some properties that a monster function has to possess are investigated, until a monster is cornered either out of existence or into an example. Out of misplaced spite we shall use technique in some proofs.

First, some trivial remarks about the collection of all additive functions (which will be denoted ).

If f : \mathbb{R} \rightarrow \mathbb{R}, g : \mathbb{R} \rightarrow \mathbb{R} are two additive functions and — an arbitrary real number, then , and are additive. This means that additive functions form a vector space over field with constant zero additive function as a zero vector. This vector space is a sub-space of a vector space of all functions from to . Product of two additive functions is not in general additive (when it is?), but their composition is:

Unfortunately, composition is not compatible with scalars (i.e., ), and additive functions are not an algebra over a field , but see below.

Looking at an individual additive function f : \mathbb{R} \rightarrow \mathbb{R}, it's easy to see that even it is not clear that it must be linear everywhere, there are some subsets of  on which is obviously has to be linear:

For any real number x and for any natural number n,

that is, restriction of f to any subset is linear and in particular, f is linear when restricted to the natural numbers. Specifically, , from this


Combining these results, for any rational number ,

That is, f is linear when restricted to any subset of the form . We shall call such subset a -set.

Notice that we just proved that composition of linear functions is compatible with multiplication on rational scalars (see above), so that is an algebra over .

Having briefly looked over the landscape of , let's start converging on our prey. How bad a monster function must be? A linear function has all the good qualities one might wish for: it is monotoniccontinuous, differentiable, smooth, it's even... linear. Which of these it enjoys together with a monster? It's intuitively very unlikely that an additive, but non-linear function might happen to be differentiable, so let's start with checking continuity.

Statement 0. If an additive function f : \mathbb{R} \rightarrow \mathbb{R} is continuous at point x, then .

That is, an additive function is linear everywhere it is continuous. This means that a monster must be discontinuous at least in one point. Note that linear functions are precisely everywhere continuous additive functions. Can a monster function be discontinuous in a single point? Or, even, can it be continuous somewhere? It is easy to show that property of additivity constrains structure of sets on which function is continuous severely:

Statement 1. If an additive function f : \mathbb{R} \rightarrow \mathbb{R} is continuous at point x, it is linear.

Take an arbitrary non-zero point y. By statement 0, . By definition of continuity at x,
For any natural n, take  and find a rational number , such that . Such always exists due to density of rationals. By choice of we have:

and by the sandwich theorem, converges: . Now, again, by choice of : , so y satisfies the condition on x' above and
\epsilon = \frac1n > |f(q_n \cdot y) - f(x)| = |q_n \cdot f(y) - x \cdot f(1)|

Taking the (obviously existing) limits of both sides of this inequality, one gets
f(y) = \frac{x \cdot f(1)}{\lim_{n\to\infty}q_n} = \frac{x \cdot f(1)}{x/y} = y \cdot f(1)
The case of y being 0 is trivial.
Oh. A monster function cannot be continuous even at a single point—it is discontinuous everywhere. Additive functions are divided into two separated classes: nice, everywhere continuous linear functions and  unseemly, everywhere discontinuous monsters. (We still don't know whether the latter class is populated, though.) Our expectations of capturing a monster and placing a trophy in a hall must be adjusted: even if we prove that a monster exists and construct it, an idea of drawing its graph must be abandoned—it's too ugly to be depicted.

Let's think for a moment how a monster might look like. Every additive function is linear on any -set. If it is linear with the same slope on all -sets—it is linear. A monster must have different slopes for at least some of -sets. In the simplest case there is a single -set with a slope different from the others. There is a famous function (a freshman nightmare and examiner delight) immediately coming into a mind: the Dirichlet function, , equal 1 on rationals and 0 on irrationals. The function  is identity (and hence linear) when restricted to rationals and is constant zero (again linear) when restricted to irrationals. Looks promising? Unfortunately,

Also, d is continuous at 0 and thus disqualified from monstrousness by statement 1. This shot into darkness missed. At at least we now see that not only monster must have different slopes at different -sets, but these slopes must be selected to be consistent with addition.

Let's return to monster properties. A monster function is discontinuous everywhere, but how badly is it discontinuous? E.g., a function is locally bounded at every point where it is continuous. A monster is continuous nowhere, but is it still locally bounded anywhere or somewhere? In a way similar to statement 1 it's easy to prove the

Statement 2. If an additive function f : \mathbb{R} \rightarrow \mathbb{R} is bounded on any segment [a, b], a < b, then it is linear.
First, by using that ,  and restricting segment if necessary, one can assume that , and .
Let's prove that f is continuous at 0, then it will be linear by the statement 1. For arbitrary , let's select , such that  (this choice is a typical magician hat trick of proofs). For arbitrary x from the -vicinity of 0 there is always a rational q, such that . For such q we have:
on the other hand, we have:
establishing that f is continuous at 0.
This indicates that a monster is not just a bad function, it's a very bad function, that takes arbitrarily large absolute values in arbitrarily small segments. Too bad. As a byproduct, a monster cannot be monotonic on any segment, because a function monotonic on [a, b] is bounded there:  (for increasing function, similarly for decreasing).