tag:blogger.com,1999:blog-57992462024-03-14T06:16:18.581+00:00a very occasional diary @ Nikita Danilovnikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.comBlogger88125tag:blogger.com,1999:blog-5799246.post-67340621685272130082024-03-04T19:12:00.001+00:002024-03-04T19:12:29.257+00:00When people used to care about the quality of presentation<p> </p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEj7jYm4YsU9IsXomtoeh31gFhGPh1xPCqGlPBMlZgftELofS66V90CbNn9Zauh3hq6ZQMHe0PcNFvqZa9tzagd8xpOhKOvuiJFw1rMXjlMLMBOWyKodEnK0GGK8Mf8Nngyy61u4YNSD2V3Ed1bPjN7giyxNyUF69WE1ZLAgivJGKHgNUgrbSnBtbA" style="margin-left: 1em; margin-right: 1em;"><img alt="" data-original-height="136" data-original-width="492" height="88" src="https://blogger.googleusercontent.com/img/a/AVvXsEj7jYm4YsU9IsXomtoeh31gFhGPh1xPCqGlPBMlZgftELofS66V90CbNn9Zauh3hq6ZQMHe0PcNFvqZa9tzagd8xpOhKOvuiJFw1rMXjlMLMBOWyKodEnK0GGK8Mf8Nngyy61u4YNSD2V3Ed1bPjN7giyxNyUF69WE1ZLAgivJGKHgNUgrbSnBtbA" width="320" /></a></div>From Errata to Dijsktra's <a href="https://www.amazon.com/Primer-Algol-Programming-Studies-Processing/dp/0122162501">A Primer of Algol 60 Programming</a>.<p></p>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-75166995335829600302022-10-12T03:42:00.014+00:002023-03-14T03:00:26.142+00:00usched: update<div style="text-align: justify;"><b>Update</b> for <a href="https://www.cofault.com/2022/10/usched-stackswap-coroutines-neither.html">the previous post</a> about stackswap coroutine implementation <a href="https://github.com/nikitadanilov/usched">usched</a>.</div><p style="text-align: justify;">To recap, usched is an experimental (and <i>very</i> simple, 120LOC) coroutine implementation different from stackful and stackless models: coroutines are executed on the native stack of the caller and when the coroutine is about to block its stack is copied into a separately allocated (<i>e.g.</i>, in the heap) buffer. The buffer is copied back onto the native stack when the coroutine is ready to resume.</p>
<p style="text-align: justify;">I added a new scheduler <a href="https://github.com/nikitadanilov/usched/blob/master/ll.c">ll.c</a> that distributes coroutines across multiple native threads and then does lockless scheduling within each thread. In the benchmark (the same as in the previous post), each coroutine in the communicating <i>cycle</i> belongs to the same thread.</p>
<p style="text-align: justify;">Results are amazing: usched actually beats compiler-assisted C++ coroutines by a large margin. The horizontal axis is the number of coroutines in the test (logarithmic) and the vertical axis is coroutine wakeup-wait operations per second (1 == 1e8 op/sec).</p>
<p style="text-align: justify;"><a href="https://raw.githubusercontent.com/nikitadanilov/usched/master/c8-darwin.png"><img height="386" src="https://raw.githubusercontent.com/nikitadanilov/usched/master/c8-darwin.png" width="640" /></a></p>
<table style="width: 150%;">
<thead>
<tr>
<th></th>
<th>16</th>
<th>32</th>
<th>64</th>
<th>400</th>
<th>800</th>
<th>4000</th>
<th>8000</th>
<th>40000</th>
<th>80000</th>
<th>400000</th>
<th>800000</th>
<th>4M</th>
<th>8M</th>
</tr>
</thead>
<tbody>
<tr>
<td>GO</td>
<td>0.077</td>
<td>0.127</td>
<td>0.199</td>
<td>0.326</td>
<td>0.323</td>
<td>0.285</td>
<td>0.228</td>
<td>0.142</td>
<td>0.199</td>
<td>0.305</td>
<td>0.303</td>
<td>0.286</td>
<td>0.268</td>
</tr>
<tr>
<td>C++</td>
<td>1.089</td>
<td>1.234</td>
<td>1.344</td>
<td>1.262</td>
<td>1.201</td>
<td>1.159</td>
<td>1.141</td>
<td>1.135</td>
<td>1.163</td>
<td>1.168</td>
<td>1.138</td>
<td>1.076</td>
<td>1.051</td>
</tr>
<tr>
<td>UL</td>
<td>0.560</td>
<td>0.955</td>
<td>1.515</td>
<td>2.047</td>
<td>2.095</td>
<td>2.127</td>
<td>2.148</td>
<td>2.160</td>
<td>2.154</td>
<td>2.020</td>
<td>1.932</td>
<td>1.819</td>
<td>1.811</td>
</tr>
</tbody>
</table>
<p style="text-align: justify;">I only kept the most efficient implementation from every competing class: C++ for stackless, GO for stackful and usched for stackswap. See the full results in <a href="https://github.com/nikitadanilov/usched/blob/master/results.darwin">results.darwin</a></p>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-64455996947478243542022-10-07T07:48:00.006+00:002023-03-27T06:44:42.421+00:00Generating Catalan numbers.
Enumerate all binary trees with N nodes, C++20 way: <div><div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><code class="language-c++">#include <memory>
#include <string>
#include <cassert>
#include <iostream>
#include <coroutine>
#include <cppcoro/generator.hpp>
struct tnode;
using tree = std::shared_ptr<tnode>;
struct tnode {
tree left;
tree right;
tnode() {};
tnode(tree l, tree r) : left(l), right(r) {}
};
auto print(tree t) -> std::string {
return t ? (std::string{"["} + print(t->left) + " "
+ print(t->right) + "]") : "*";
}
cppcoro::generator<tree> gen(int n) {
if (n == 0) {
co_yield nullptr;
} else {
for (int i = 0; i < n; ++i) {
for (auto left : gen(i)) {
for (auto right : gen(n - i - 1)) {
co_yield tree(new tnode(left, right));
}
}
}
}
}
int main(int argc, char **argv) {
for (auto t : gen(std::atoi(argv[1]))) {
std::cout << print(t) << std::endl;
}
}
</code></pre></div>
Source: <a href="https://github.com/nikitadanilov/bits/blob/master/gen.cpp">gen.cpp</a>. </div><div><br /></div><div>To generate <a href="https://en.wikipedia.org/wiki/Catalan_number">Catalan numbers</a>, do:
<div style="background: rgb(240, 240, 240); border-color: gray; border-image: initial; border-style: solid; border-width: 0.1em 0.1em 0.1em 0.8em; border: solid gray; overflow: auto; padding: 0.2em 0.6em; width: auto;"><pre style="line-height: 125%; margin: 0px;"><code class="language-bash">$ for i in $(seq 0 1000000) ;do ./gen $i | wc -l ;done
1
1
2
5
14
42
132
429
1430
4862
16796
58786
208012
742900
</code></pre></div></div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-13723904056898671472022-10-07T07:07:00.003+00:002023-03-27T06:45:11.160+00:00A Python: it's dictionaries all the way down.<div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><code class="language-c">def drill():
return defaultdict(drill)
</code></pre></div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com1tag:blogger.com,1999:blog-5799246.post-45902835839385149262022-10-06T07:18:00.015+00:002023-03-14T02:59:33.630+00:00usched: stackswap coroutines, neither stackful nor stackless<div style="text-align: justify;"><span style="text-align: left;">[Please read the </span><a href="https://www.cofault.com/2022/10/usched-update.html" style="text-align: left;"><b>update</b></a><span style="text-align: left;">.]</span></div><p style="text-align: justify;">This repository (<a href="https://github.com/nikitadanilov/usched">https://github.com/nikitadanilov/usched</a>) contains a simple experimental implementation of
<a href="https://en.wikipedia.org/wiki/Coroutine">coroutines</a> alternative to well-known
"stackless" and "stackful" methods.</p>
<p style="text-align: justify;">The term "coroutine" gradually grew to mean a mechanism where a computation,
which in this context means a chain of nested function calls, can "block" or
"yield" so that the top-most caller can proceed and the computation can later be
resumed at the blocking point with the chain of intermediate function activation
frames preserved.</p>
<p style="text-align: justify;">Prototypical uses of coroutines are lightweight support for potentially blocking
operations (user interaction, IO, networking) and
<a href="https://en.wikipedia.org/wiki/Generator_(computer_programming)"><em>generators</em></a>,
which produce multiple values (see <a href="https://wiki.c2.com/?SameFringeProblem">same fringe
problem</a>).</p>
<p style="text-align: justify;">There are two common coroutine implementation methods:</p>
<ul>
<li><p style="text-align: justify;">a <em>stackful</em> coroutine runs on a separate stack. When a stackful coroutine
blocks, it performs a usual context switch. Historically "coroutines" meant
stackful coroutines. Stackful coroutines are basically little more than usual
threads, and so they can be <em>kernel</em> (supported by the operating system) or
<em>user-space</em> (implemented by a user-space library, also known as <em>green
threads</em>), preemptive or cooperative.</p></li>
<li><p style="text-align: justify;">a <em>stackless</em> coroutine does not use any stack when blocked. In a typical
implementation instead of using a normal function activation frame on the
stack, the coroutine uses a special activation frame allocated in the heap so
that it can outlive its caller. Using heap-allocated frame to store all local
variable lends itself naturally to compiler support, but some people are known
to implement stackless coroutines manually via a combination of
pre-processing, library and tricks much worse than <a href="https://en.wikipedia.org/wiki/Protothread">Duff's
device</a>.</p></li>
</ul>
<p style="text-align: justify;">Stackful and stateless are by no means the only possibilities. One of the
earliest languages to feature generators
<a href="https://en.wikipedia.org/wiki/CLU_(programming_language)">CLU</a>
(<a href="https://pmg.csail.mit.edu/ftp.lcs.mit.edu/pub/pclu/CLU/">distribution</a>) ran
generators on the caller's stack.</p>
<p style="text-align: justify;">usched is in some sense intermediate between stackful and stackless: its
coroutines do not use stack when blocked, nor do they allocate individual
activation frames in the heap.</p>
<p style="text-align: justify;">The following is copied with some abbreviations from
<a href="https://github.com/nikitadanilov/usched/blob/master/usched.c">usched.c</a>.</p>
<h2 id="toc_1" style="text-align: justify;">Overview</h2>
<p style="text-align: justify;">usched: A simple dispatcher for cooperative user-space threads.</p>
<p style="text-align: justify;">A typical implementation of user-space threads allocates a separate stack for
each thread when the thread is created and then dispatches threads (as
decided by the scheduler) through some context switching mechanism, for
example, <code>longjmp()</code>.</p>
<p style="text-align: justify;">In usched all threads (represented by struct ustack) are executed on the same
"native" stack. When a thread is about to block (<code>usched_block()</code>), a memory
buffer for the stack used by this thread is allocated and the stack is copied
to the buffer. After that the part of the stack used by the blocking thread
is discarded (by <code>longjmp()</code>-ing to the base of the stack) and a new thread is
selected. The stack of the selected thread is restored from its buffer and
the thread is resumed by <code>longjmp()</code>-ing to the <code>usched_block()</code> that blocked it.</p>
<p style="text-align: justify;">The focus of this implementation is simplicity: the total size of <code>usched.[ch]</code>
is less than 120LOC, as measured by SLOCCount.</p>
<p style="text-align: justify;">Advantages:</p>
<ul>
<li><p style="text-align: justify;">no need to allocate maximal possible stack at thread initialisation: stack
buffer is allocated as needed. It is also possible to free the buffer when the
thread is resumed (not currently implemented);</p></li>
<li><p style="text-align: justify;">thread that doesn't block has 0 overhead: it is executed as a native function
call (through a function pointer) without any context switching;</p></li>
<li><p style="text-align: justify;">because the threads are executed on the stack of the same native underlying
thread, native synchronisation primitives (mutices, etc.) work, although the
threads share underlying TLS. Of course one cannot use native primitives to
synchronise between usched threads running on the same native thread.</p></li>
</ul>
<p style="text-align: justify;">Disadvantages:</p>
<ul>
<li><p style="text-align: justify;">stack copying introduces overhead (<code>memcpy()</code>) in each context switch;</p></li>
<li><p style="text-align: justify;">because stacks are moved around, addresses on a thread stack are only valid
while the thread is running. This invalidates certain common programming
idioms: other threads and heap cannot store pointers to the stacks, at least
to the stacks of the blocked threads. Note that Go language, and probably
other run-times, maintains a similar invariant.</p></li>
</ul>
<h2 id="toc_2" style="text-align: justify;">Usage</h2>
<p style="text-align: justify;">usched is only a dispatcher and not a scheduler: it blocks and resumes
threads but</p>
<ul>
<li><p style="text-align: justify;">it does not keep track of threads (specifically allocation and freeing of
struct ustack instances is done elsewhere),</p></li>
<li><p style="text-align: justify;">it implements no scheduling policies.</p></li>
</ul>
<p style="text-align: justify;">These things are left to the user, together with stack buffers allocation and
freeing. The user supplies 3 call-backs:</p>
<ul>
<li><p style="text-align: justify;"><code>usched::s_next()</code>: the scheduling function. This call-backs returns the next
thread to execute. This can be either a new (never before executed) thread
initialised with <code>ustack_init()</code>, or it can be a blocked thread. The user must
keep track of blocked and runnable threads, presumably by providing wrappers
to <code>ustack_init()</code> and <code>ustack_block()</code> that would record thread state changes. It
is up to <code>usched::s_next()</code> to block and wait for events if there are no
runnable threads and all threads are waiting for something;</p></li>
<li><p style="text-align: justify;"><code>usched::s_alloc()</code>: allocates new stack buffer of at least the specified
size. The user have full control over stack buffer allocation. It is possible
to pre-allocate the buffer when the thread is initialised (reducing the cost
of <code>usched_block()</code>), it is possible to cache buffers, etc.;</p></li>
<li><p style="text-align: justify;"><code>usched::s_free()</code>: frees the previously allocated stack buffer.</p></li>
</ul>
<p style="text-align: justify;"><a href="https://github.com/nikitadanilov/usched/blob/master/rr.h">rr.h</a> and
<a href="https://github.com/nikitadanilov/usched/blob/master/rr.c">rr.c</a> provide a
simple "round-robin" scheduler implementing all the call-backs. Use it
carefully, it was only tested with
<a href="https://github.com/nikitadanilov/usched/blob/master/rmain.c">rmain.c</a>
benchmark.</p>
<h2 id="toc_3" style="text-align: justify;">Pictures!</h2>
<p style="text-align: justify;">The following diagrams show stack management by usched. The stack grows from
right to left.</p>
<p style="text-align: justify;">At the entrance to the dispatcher loop. <code>usched_run(S)</code>:</p><p style="text-align: justify;"><br /></p>
<div><pre><div style="text-align: justify;"> usched_run()</div><code class="language-none"><div style="text-align: justify;">----------------------------------------------+--------------+-------+</div><div style="text-align: justify;"> | buf | anchor | ... |</div><div style="text-align: justify;">----------------------------------------------+--------------+-------+</div><div style="text-align: justify;"> ^</div><div style="text-align: justify;"> |</div><div style="text-align: justify;"> sp = S->s_buf</div></code></pre></div>
<p style="text-align: justify;"><br /></p><p style="text-align: justify;">A new (never before executed) thread <code>U</code> is selected by <code>S->s_next()</code>, <code>launch()</code>
calls the thread startup function <code>U->u_f()</code>:</p>
<div><pre><div style="text-align: justify;"><br /></div><div style="text-align: justify;"> U->u_f() launch() usched_run()</div><code class="language-none"><div style="text-align: justify;"> -----------------------------+---------+-----+--------------+-------+</div><div style="text-align: justify;"> | | pad | buf | anchor | ... |</div><div style="text-align: justify;"> -----------------------------+---------+-----+--------------+-------+</div><div style="text-align: justify;"> ^ ^</div><div style="text-align: justify;"> | |</div><div style="text-align: justify;"> sp U->u_bottom</div></code></pre></div>
<p style="text-align: justify;"><br /></p><p style="text-align: justify;">The thread executes as usual on the stack, until it blocks by calling
<code>usched_block()</code>:</p><p style="text-align: justify;"><br /></p>
<div><pre><div style="text-align: justify;"> usched_block() bar() U->u_f() launch() usched_run()</div><code class="language-none"><div style="text-align: justify;"> ----------+------+-----+-----+---------+-----+--------------+-------+</div><div style="text-align: justify;"> | here | ... | | | pad | buf | anchor | ... |</div><div style="text-align: justify;"> ----------+------+-----+-----+---------+-----+--------------+-------+</div><div style="text-align: justify;"> ^ ^ ^</div><div style="text-align: justify;"> | +-- sp = U->u_cont |</div><div style="text-align: justify;"> | U->u_bottom</div><div style="text-align: justify;"> U->u_top</div></code></pre></div>
<p style="text-align: justify;"><br /></p><p style="text-align: justify;">The stack from <code>U->u_top</code> to <code>U->u_bottom</code> is copied into the stack buffer
<code>U->u_stack</code>, and control returns to <code>usched_run()</code> by <code>longjmp(S->s_buf)</code>:</p>
<div><pre><div style="text-align: justify;"><br /></div><div style="text-align: justify;"> usched_run()</div><code class="language-none"><div style="text-align: justify;">----------------------------------------------+--------------+-------+</div><div style="text-align: justify;"> | buf | anchor | ... |</div><div style="text-align: justify;">----------------------------------------------+--------------+-------+</div><div style="text-align: justify;"> ^</div><div style="text-align: justify;"> |</div><div style="text-align: justify;"> sp = S->s_buf</div></code></pre></div>
<p style="text-align: justify;"><br /></p><p style="text-align: justify;">Next, suppose <code>S->s_next()</code> selects a previously blocked thread <code>V</code> ready to be
resumed. <code>usched_run()</code> calls <code>cont(V)</code>.</p>
<div><pre><div style="text-align: justify;"><br /></div><div style="text-align: justify;"> cont() usched_run()</div><code class="language-none"><div style="text-align: justify;">----------------------------------------+-----+--------------+-------+</div><div style="text-align: justify;"> | | buf | anchor | ... |</div><div style="text-align: justify;">----------------------------------------+-----+--------------+-------+</div><div style="text-align: justify;"> ^</div><div style="text-align: justify;"> |</div><div style="text-align: justify;"> sp</div></code></pre></div>
<p style="text-align: justify;"><code><br /></code></p><p style="text-align: justify;"><code>cont()</code> copies the stack from the buffer to [<code>V->u_top</code>, <code>V->u_bottom</code>]
range. It's important that this <code>memcpy()</code> operation does not overwrite
<code>cont()</code>'s own stack frame, this is why <code>pad[]</code> array is needed in <code>launch()</code>: it
advances <code>V->u_bottom</code> and gives <code>cont()</code> some space to operate.</p>
<div><pre><div style="text-align: justify;"><br /></div><div style="text-align: justify;"> usched_block() foo() V->u_f() cont() usched_run()</div><code class="language-none"><div style="text-align: justify;">---------+------+-----+-----+--------+--+-----+--------------+-------+</div><div style="text-align: justify;"> | here | ... | | | | | buf | anchor | ... |</div><div style="text-align: justify;">---------+------+-----+-----+--------+--+-----+--------------+-------+</div><div style="text-align: justify;"> ^ ^ ^ ^</div><div style="text-align: justify;"> | +-- V->u_cont | +-- sp</div><div style="text-align: justify;"> | |</div><div style="text-align: justify;"> V->u_top V->u_bottom</div></code></pre></div>
<p style="text-align: justify;"><br /></p><p style="text-align: justify;">Then <code>cont()</code> <code>longjmp()</code>-s to <code>V->u_cont</code>, restoring <code>V</code> execution context:</p>
<div><pre><div style="text-align: justify;"><br /></div><div style="text-align: justify;"> usched_block() foo() V->u_f() cont() usched_run()</div><code class="language-none"><div style="text-align: justify;">---------+------+-----+-----+--------+--+-----+--------------+-------+</div><div style="text-align: justify;"> | here | ... | | | | | buf | anchor | ... |</div><div style="text-align: justify;">---------+------+-----+-----+--------+--+-----+--------------+-------+</div><div style="text-align: justify;"> ^</div><div style="text-align: justify;"> +-- sp = V->u_cont</div></code></pre></div>
<p style="text-align: justify;"><code><br /></code></p><p style="text-align: justify;"><code>V</code> continues its execution as if it returned from <code>usched_block()</code>.</p>
<h2 id="toc_4" style="text-align: justify;">Multiprocessing</h2>
<p style="text-align: justify;">By design, a single instance of <code>struct usched</code> cannot take advantage of
multiple processors, because all its threads are executing within a single
native thread. Multiple instances of <code>struct usched</code> can co-exist within a
single process address space, but a ustack thread created for one instance
cannot be migrated to another. One possible strategy to add support for multiple
processors is to create multiple instances of struct usched and schedule them
(that is, schedule the threads running respective <code>usched_run()</code>-s) to
processors via <code>pthread_setaffinity_np()</code> or similar. See
<a href="https://github.com/nikitadanilov/usched/blob/master/rr.c">rr.c</a> for a
simplistic implementation.</p>
<h2 id="toc_5" style="text-align: justify;">Current limitations</h2>
<ul>
<li><p style="text-align: justify;">the stack is assumed to grow toward lower addresses. This is easy to fix,
if necessary;</p></li>
<li><p style="text-align: justify;">the implementation is not signal-safe. Fixing this can be as easy as
replacing *jmp() calls with their sig*jmp() counterparts. At the moment
signal-based code, like gperf -lprofiler library, would most likely crash
usched;</p></li>
<li><p style="text-align: justify;"><code>usched.c</code> must be compiled without optimisations and with
<code>-fno-stack-protector</code> option (gcc);</p></li>
<li><p style="text-align: justify;">usched threads are cooperative: a thread will continue to run until it
completes of blocks. Adding preemption (via signal-based timers) is
relatively easy, the actual preemption decision will be relegated to the
external "scheduler" via a new <code>usched::s_preempt()</code> call-back invoked from
a signal handler.</p></li>
</ul>
<h2 id="toc_6" style="text-align: justify;">Notes</h2>
<p style="text-align: justify;"><a href="https://joeduffyblog.com/2015/11/19/asynchronous-everything/">Midori</a> seems to
use a similar method: a coroutine (called <em>activity</em> there) starts on the native
stack. If it needs to block, frames are allocated in the heap (this requires
compiler support) and filled in from the stack, the coroutine runs in these
heap-allocated frames when resumed.</p>
<h2 id="toc_7" style="text-align: justify;">Benchmarks</h2>
<p style="text-align: justify;">usched was benchmarked against a few stackful (go, pthreads) and stackless
(rust, c++ coroutines) implementations. A couple of caveats:</p>
<ul>
<li><p style="text-align: justify;">all benchmarking in general is subject to the reservations voiced by
Hippocrates and usually translated (with the complete reversal of the meaning)
as <em>ars longa, vita brevis</em>, which means: "the <strong>art</strong> [of doctor or tester]
takes <strong>long</strong> time to learn, but the <strong>life</strong> of a student is <strong>brief</strong>,
symptoms are vague, chances of success are doubtful".</p></li>
<li><p style="text-align: justify;">the author is much less than fluent with all the languages and frameworks used
in the benchmarking. It is possible that some of the benchmarking code is
inefficient or just outright wrong. Any comments are appreciated.</p></li>
</ul>
<p style="text-align: justify;">The benchmark tries to measure the efficiency of coroutine switching. It creates
R <em>cycles</em>, N coroutines each. Each cycle performs M <em>rounds</em>, where each round
consists of sending a message across the cycle: a particular coroutine (selected
depending on the round number) sends the message to its right neighbour, all
other coroutines relay the message received from the left to the right, the
round completes when the originator receives the message after it passed through
the entire cycle.</p>
<p style="text-align: justify;">If N == 2, the benchmark is R pairs of processes, ping-ponging M messages within
each pair.</p>
<p style="text-align: justify;">Some benchmarks support two additional parameters: D (additional space in bytes,
artificially consumed by each coroutine in its frame) and P (the number of
native threads used to schedule the coroutines.</p>
<p style="text-align: justify;">The benchmark creates N*R coroutines and sends a total of N*R*M messages,
the latter being proportional to the number of coroutine switches.</p>
<p style="text-align: justify;"><a href="https://github.com/nikitadanilov/usched/blob/master/bench.sh">bench.sh</a> runs
all implementations with the same N, R and M
parameters. <a href="https://github.com/nikitadanilov/usched/blob/master/graph.py">graph.py</a>
plots the results.</p>
<ul style="text-align: left;">
<li><p style="text-align: justify;">POSIX. Source:
<a href="https://github.com/nikitadanilov/usched/blob/master/pmain.c">pmain.c</a>,
binary: pmain. Pthreads-based stackful implementation in C. Uses default
thread attributes. pmain.c also contains emulation of unnamed POSIX semaphores
for Darwin. Plot label: "P". This benchmarks crashes with "pmain:
pthread_create: Resource temporarily unavailable" for large values of N*R.</p></li>
<li><p style="text-align: justify;">Go. Source:
<a href="https://github.com/nikitadanilov/usched/blob/master/gmain.go">gmain.go</a>,
binary: gmain. The code is straightforward (it was a pleasure to write). D is
supported via runtime.GOMAXPROCS(). "GO1T" are the results for a single native
thread, "GO" are the results without the restriction on the number of threads.</p></li>
<li><p style="text-align: justify;">Rust. Source:
<a href="https://github.com/nikitadanilov/usched/blob/master/cycle/src/main.rs">cycle/src/main.rs</a>,
binary: cycle/target/release/cycle. Stackless implementation using Rust
builtin <a href="https://rust-lang.github.io/async-book/">async/.await</a>. Label:
"R". It is single-threaded (I haven't figured out how to distribute coroutines
to multiple executors), so should be compared with GO1T, C++1T and
U1T. Instead of fighting with the Rust borrow checker, I used "unsafe" and
shared data-structures between multiple coroutines much like other benchmarks
do.</p></li>
<li><p style="text-align: justify;">C++. Source:
<a href="https://github.com/nikitadanilov/usched/blob/master/c%2B%2Bmain.cpp">c++main.cpp</a>,
binary: c++main. The current state of coroutine support in C++ is unclear. Is
everybody supposed to directly use <code><coroutine></code> interfaces or one of the
mutually incompatible libraries that provide easier to use interfaces on top
of <code><coroutine></code>? This benchmark uses <a href="https://lewissbaker.github.io/">Lewis
Baker</a>'s
<a href="https://github.com/lewissbaker/cppcoro">cppcoro</a>, (<a href="https://www.andreasbuhr.de/">Andreas
Buhr</a>'s
<a href="https://github.com/andreasbuhr/cppcoro">fork</a>). Labels: "C++" and "C++1T" (for
single-threaded results).</p></li>
<li><p>usched. Source:
<a href="https://github.com/nikitadanilov/usched/blob/master/rmain.c">rmain.c</a>,
binary: rmain. Based on <code>usched.[ch]</code> and <code>rr.[ch]</code> This is our main interest,
so we test a few combinations of parameters.</p>
<ul>
<li>Label: "U": the default configuration, round-robin scheduler over 16 native threads,</li>
<li>"U1K": 1000 bytes of additional stack space for each coroutine</li>
<li>"U10K": 10000 bytes,</li>
<li>"U1T": round-robin over 1 native thread,</li>
<li>"U1TS": round-robin over 1 native thread with pthread locking in
<a href="https://github.com/nikitadanilov/usched/blob/master/rr.c">rr.c</a> compiled
out (<code>-DSINGLE_THREAD</code> compilation option, a separate binary rmain.1t).</li>
<li><strong>Update</strong> "UL": uses "local" scheduler
<a href="https://github.com/nikitadanilov/usched/blob/master/ll.c">ll.c</a>. All
coroutines within a cycle are assigned to the same native thread so that
scheduling between them require no locking. This demonstrates very high
throughput (comparable to C++), but unfortunately I do not have time right
now to re-do all the measurements consistently. Binary: lmain.</li>
</ul></li>
</ul>
<p style="text-align: justify;"><a href="https://github.com/nikitadanilov/usched/blob/master/bench.sh">bench.sh</a> runs
all benchmarks with N == 2 (message ping-pong) and N == 8. Raw results are in
<a href="https://github.com/nikitadanilov/usched/blob/master/results.linux">results.linux</a>. In
the graphs, the horizontal axis is the number of coroutines (N*R, logarithmic)
and the vertical axis is the operations (N*R*M) per second</p>
<p style="text-align: justify;">Environment: Linux VM, 16 processors, 16GB of memory. Kernel: 4.18.0 (Rocky
Linux).</p>
<p style="text-align: justify;"><a href="https://raw.githubusercontent.com/nikitadanilov/usched/master/c8.png"><img height="467" src="https://raw.githubusercontent.com/nikitadanilov/usched/master/c8.png" width="640" /></a></p>
<table style="width: 150%;">
<thead>
<tr>
<th></th>
<th>16</th>
<th>32</th>
<th>64</th>
<th>400</th>
<th>800</th>
<th>4000</th>
<th>8000</th>
<th>40000</th>
<th>80000</th>
<th>400000</th>
<th>800000</th>
<th>4000000</th>
<th>8000000</th>
</tr>
</thead>
<tbody>
<tr>
<td>POSIX</td>
<td>1.76</td>
<td>3.46</td>
<td>6.39</td>
<td>14.58</td>
<td>14.85</td>
<td>14.70</td>
<td>13.63</td>
<td>9.87</td>
<td>8.02</td>
<td>0.00</td>
<td>0.00</td>
<td>0.00</td>
<td>0.01</td>
</tr>
<tr>
<td>GO</td>
<td>4.14</td>
<td>5.62</td>
<td>7.77</td>
<td>36.74</td>
<td>41.64</td>
<td>49.72</td>
<td>48.24</td>
<td>37.24</td>
<td>43.06</td>
<td>46.31</td>
<td>46.22</td>
<td>46.09</td>
<td>45.95</td>
</tr>
<tr>
<td>GO1T</td>
<td>4.38</td>
<td>4.30</td>
<td>4.27</td>
<td>4.11</td>
<td>3.81</td>
<td>3.53</td>
<td>3.40</td>
<td>3.33</td>
<td>3.43</td>
<td>3.99</td>
<td>3.98</td>
<td>3.95</td>
<td>3.86</td>
</tr>
<tr>
<td>RUST</td>
<td>9.48</td>
<td>8.71</td>
<td>8.69</td>
<td>8.64</td>
<td>8.53</td>
<td>7.85</td>
<td>6.59</td>
<td>4.32</td>
<td>3.80</td>
<td>3.63</td>
<td>3.63</td>
<td>3.83</td>
<td>3.90</td>
</tr>
<tr>
<td>U</td>
<td>17.24</td>
<td>17.27</td>
<td>17.30</td>
<td>25.77</td>
<td>29.99</td>
<td>71.68</td>
<td>77.32</td>
<td>78.92</td>
<td>77.98</td>
<td>80.88</td>
<td>82.09</td>
<td>83.66</td>
<td>82.15</td>
</tr>
<tr>
<td>U1K</td>
<td>16.21</td>
<td>16.29</td>
<td>16.35</td>
<td>25.38</td>
<td>28.41</td>
<td>69.92</td>
<td>75.76</td>
<td>74.31</td>
<td>73.65</td>
<td>76.69</td>
<td>76.75</td>
<td>75.84</td>
<td>76.56</td>
</tr>
<tr>
<td>U10K</td>
<td>9.04</td>
<td>8.96</td>
<td>9.09</td>
<td>20.38</td>
<td>21.69</td>
<td>58.13</td>
<td>60.95</td>
<td>59.66</td>
<td>60.50</td>
<td>61.32</td>
<td>61.71</td>
<td>62.06</td>
<td>62.72</td>
</tr>
<tr>
<td>U1T</td>
<td>17.37</td>
<td>17.31</td>
<td>17.35</td>
<td>17.35</td>
<td>17.36</td>
<td>17.27</td>
<td>17.29</td>
<td>17.14</td>
<td>17.06</td>
<td>16.91</td>
<td>16.91</td>
<td>16.91</td>
<td>16.87</td>
</tr>
<tr>
<td>C++</td>
<td>49.87</td>
<td>67.85</td>
<td>74.94</td>
<td>73.91</td>
<td>73.04</td>
<td>62.48</td>
<td>59.15</td>
<td>57.23</td>
<td>56.48</td>
<td>55.35</td>
<td>55.44</td>
<td>54.02</td>
<td>53.61</td>
</tr>
<tr>
<td>C++1T</td>
<td>97.03</td>
<td>97.38</td>
<td>96.82</td>
<td>96.06</td>
<td>96.58</td>
<td>95.78</td>
<td>94.83</td>
<td>89.83</td>
<td>86.09</td>
<td>80.48</td>
<td>79.37</td>
<td>77.04</td>
<td>77.48</td>
</tr>
<tr>
<td>U1TS</td>
<td>49.53</td>
<td>49.76</td>
<td>49.83</td>
<td>50.16</td>
<td>49.93</td>
<td>48.88</td>
<td>49.75</td>
<td>48.75</td>
<td>47.99</td>
<td>46.48</td>
<td>46.25</td>
<td>45.99</td>
<td>46.12</td>
</tr>
<tr>
<td>UL</td>
<td>76.03</td>
<td>116.63</td>
<td>160.72</td>
<td>169.74</td>
<td>169.99</td>
<td>171.57</td>
<td>170.32</td>
<td>165.82</td>
<td>169.43</td>
<td>174.32</td>
<td>171.55</td>
<td>169.48</td>
<td>170.04</td>
</tr>
</tbody>
</table>
<p style="text-align: justify;">(N == 8) A few notes:</p>
<ul>
<li><p style="text-align: justify;">As mentioned above, pthreads-based solution crashes with around 50K threads.</p></li>
<li><p style="text-align: justify;">Most single-threaded versions ("GO1T", "R" and "U1T") are stable as corpse's
body temperature. Rust cools off completely at about 500K
coroutines. Single-threaded C++ ("C++1T") on the other hand is the most
performant solution for almost the entire range of measurement, it is only for
coroutine counts higher than 500K when "U" overtakes it.</p></li>
<li><p style="text-align: justify;">It is interesting that a very simple and unoptimised usched fares so well
against heavily optimized C++ and Go run-times. (Again, see the reservations
about the benchmarking.)</p></li>
<li><p style="text-align: justify;">Rust is disappointing: one would hope to get better results from a rich type combined with compiler support.</p></li>
</ul>
<p style="text-align: justify;"><a href="https://raw.githubusercontent.com/nikitadanilov/usched/master/c8.png"><img height="469" src="https://raw.githubusercontent.com/nikitadanilov/usched/master/c2.png" width="640" /></a></p>
<table style="width: 150%;">
<thead>
<tr>
<th></th>
<th>4</th>
<th>8</th>
<th>16</th>
<th>100</th>
<th>200</th>
<th>1000</th>
<th>2000</th>
<th>10000</th>
<th>20000</th>
<th>100000</th>
<th>200000</th>
<th>1000000</th>
<th>2000000</th>
</tr>
</thead>
<tbody>
<tr>
<td>POSIX</td>
<td>0.56</td>
<td>0.97</td>
<td>1.84</td>
<td>6.36</td>
<td>6.88</td>
<td>7.78</td>
<td>7.82</td>
<td>7.58</td>
<td>7.15</td>
<td>5.34</td>
<td>0.00</td>
<td>0.00</td>
<td>0.00</td>
</tr>
<tr>
<td>GO</td>
<td>7.40</td>
<td>11.03</td>
<td>19.23</td>
<td>40.44</td>
<td>45.79</td>
<td>51.81</td>
<td>52.87</td>
<td>52.77</td>
<td>53.15</td>
<td>53.62</td>
<td>53.22</td>
<td>55.77</td>
<td>56.82</td>
</tr>
<tr>
<td>GO1T</td>
<td>4.54</td>
<td>4.55</td>
<td>4.53</td>
<td>4.53</td>
<td>4.53</td>
<td>4.52</td>
<td>4.52</td>
<td>4.50</td>
<td>4.47</td>
<td>4.36</td>
<td>4.31</td>
<td>4.26</td>
<td>4.26</td>
</tr>
<tr>
<td>RUST</td>
<td>5.68</td>
<td>5.75</td>
<td>5.75</td>
<td>4.74</td>
<td>4.74</td>
<td>4.62</td>
<td>4.46</td>
<td>4.13</td>
<td>3.70</td>
<td>2.81</td>
<td>2.77</td>
<td>2.76</td>
<td>2.73</td>
</tr>
<tr>
<td>U</td>
<td>11.22</td>
<td>11.27</td>
<td>11.26</td>
<td>11.30</td>
<td>7.91</td>
<td>24.66</td>
<td>38.72</td>
<td>35.67</td>
<td>40.60</td>
<td>41.18</td>
<td>42.06</td>
<td>42.96</td>
<td>42.74</td>
</tr>
<tr>
<td>U1K</td>
<td>9.64</td>
<td>9.62</td>
<td>9.65</td>
<td>9.67</td>
<td>7.61</td>
<td>22.14</td>
<td>34.38</td>
<td>31.70</td>
<td>34.54</td>
<td>34.56</td>
<td>34.59</td>
<td>35.47</td>
<td>35.56</td>
</tr>
<tr>
<td>U10K</td>
<td>4.43</td>
<td>4.62</td>
<td>4.50</td>
<td>4.25</td>
<td>5.02</td>
<td>15.79</td>
<td>26.18</td>
<td>25.33</td>
<td>27.60</td>
<td>27.62</td>
<td>27.63</td>
<td>27.72</td>
<td>28.16</td>
</tr>
<tr>
<td>U1T</td>
<td>11.24</td>
<td>11.29</td>
<td>11.34</td>
<td>11.26</td>
<td>11.32</td>
<td>11.30</td>
<td>11.28</td>
<td>11.28</td>
<td>11.22</td>
<td>11.19</td>
<td>11.15</td>
<td>11.13</td>
<td>11.15</td>
</tr>
<tr>
<td>C++</td>
<td>46.33</td>
<td>46.30</td>
<td>63.38</td>
<td>114.30</td>
<td>117.05</td>
<td>114.12</td>
<td>111.36</td>
<td>101.32</td>
<td>100.13</td>
<td>84.30</td>
<td>78.53</td>
<td>72.77</td>
<td>71.00</td>
</tr>
<tr>
<td>C++1T</td>
<td>96.56</td>
<td>96.03</td>
<td>96.37</td>
<td>95.97</td>
<td>95.49</td>
<td>95.68</td>
<td>94.94</td>
<td>92.95</td>
<td>91.23</td>
<td>83.55</td>
<td>80.33</td>
<td>77.22</td>
<td>76.22</td>
</tr>
<tr>
<td>U1TS</td>
<td>19.59</td>
<td>19.66</td>
<td>19.80</td>
<td>19.87</td>
<td>19.89</td>
<td>19.86</td>
<td>19.82</td>
<td>19.72</td>
<td>19.66</td>
<td>19.51</td>
<td>19.45</td>
<td>19.33</td>
<td>19.37</td>
</tr>
<tr>
<td>UL</td>
<td>12.19</td>
<td>23.48</td>
<td>50.39</td>
<td>65.71</td>
<td>67.22</td>
<td>69.17</td>
<td>70.01</td>
<td>70.09</td>
<td>69.36</td>
<td>69.28</td>
<td>69.43</td>
<td>68.83</td>
<td>68.00</td>
</tr>
</tbody>
</table>
<p style="text-align: justify;">(N == 2)</p>
<ul>
<li><p style="text-align: justify;">First, note that the scale is different on the vertical axis.</p></li>
<li><p style="text-align: justify;">Single-threaded benchmarks display roughly the same behaviour (exactly the
same in "C++1T" case) as with N == 8.</p></li>
<li><p style="text-align: justify;">Go is somewhat better. Perhaps its scheduler is optimised for message
ping-pong usual in channel-based concurrency models?</p></li>
<li><p style="text-align: justify;">usched variants are much worse (50% worse for "U") than N == 8.</p></li>
<li><p style="text-align: justify;">Rust is disappointing.</p></li>
</ul>
<p style="text-align: justify;">To reproduce:</p>
<div><pre><div style="text-align: justify;">$ # install libraries and everything, then...</div><code class="language-none"><div style="text-align: justify;">$ make</div><div style="text-align: justify;">$ while : ;do ./bench.sh | tee -a results; sleep 5 ;done # collect enough results, this might take long...</div><div style="text-align: justify;">^C</div><div style="text-align: justify;">$ grep -h '^ *[2N],' results | python3 graph.py c2.svg > c2-table.md # create plot for N == 2</div><div style="text-align: justify;">$ grep -h '^ *[8N],' results | python3 graph.py c8.svg > c8-table.md # create plot for N == 8</div></code></pre></div>
<h2 id="toc_8">Conclusion</h2>
<p>Overall, the results are surprisingly good. The difference between "U1T" and
"U1TS" indicates that the locking in
<a href="https://github.com/nikitadanilov/usched/blob/master/rr.c">rr.c</a> affects
performance significantly, and affects it even more with multiple native
threads, when locks are contended across processors. I'll try to produce a more
efficient (perhaps lockless) version of a scheduler as the next step.</p>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-65253142007620485642022-08-22T21:57:00.004+00:002023-03-14T02:39:47.502+00:003-lisp: an infinite tower of meta-circular interpreters.<p><a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.lisp#L259"><img width="100%" src="https://nikitadanilov.github.io/3-lisp/epigraph.png"/></a></p>
<h2 id="toc_1">Précis</h2>
<p style="text-align: justify;">3-lisp is a dialect of Lisp designed and implemented by <a href="https://en.wikipedia.org/wiki/Brian_Cantwell_Smith">Brian C. Smith</a>
as part of his PhD. thesis <a href="https://dspace.mit.edu/handle/1721.1/15961">Procedural Reflection in Programming Languages</a> (what this thesis refers to as "<a href="https://en.wikipedia.org/wiki/Reflective_programming">reflection</a>"
is nowadays more usually called "<a href="https://en.wikipedia.org/wiki/Reification_(computer_science)">reification</a>"). A 3-lisp program is conceptually executed by an interpreter written in 3-lisp that is itself executed by an interpreter written in 3-lisp and so on <em>ad infinitum</em>. This forms a (countably) infinite tower of meta-circular (<em>v.i.</em>) interpreters. <em>reflective lambda</em> is a function that is executed one tower level above its caller. Reflective lambdas provide a very general language extension mechanism.</p>
The code is <a href="https://github.com/nikitadanilov/3-lisp">here</a>.
<h2 id="toc_2" style="text-align: justify;">Meta-circular interpreters</h2>
<p style="text-align: justify;">An <a href="https://en.wikipedia.org/wiki/Interpreter_(computing)">interpreter</a> is a
program that executes programs written in some programming language.</p>
<p style="text-align: justify;">A <a href="https://en.wikipedia.org/wiki/Meta-circular_evaluator">meta-circular interpreter</a> is an interpreter for a programming language written in that language. Meta-circular interpreters can be used to clarify or define the semantics of the language by reducing the full language to a sub-language in which the interpreter is expressed. Historically, such <em>definitional interpreters</em> become popular within the functional programming community, see the classical <a href="https://surface.syr.edu/cgi/viewcontent.cgi?article=1012&context=lcsmith_other">Definitional interpreters for higher-order programming languages</a>. Certain important techniques were classified and studied in the framework of meta-circular interpretation, for example, <a href="https://en.wikipedia.org/wiki/Continuation-passing_style">continuation passing style</a> can be understood as a mechanism that makes meta-circular interpretation independent of the <a href="https://en.wikipedia.org/wiki/Evaluation_strategy">evaluation strategy</a>: it allows an eager meta-language to interpret a lazy object language and <em>vice versa</em>. As a by-product, a continuation passing style interpreter is essentially a state machine and so can be implemented in hardware, see <a href="https://dspace.mit.edu/handle/1721.1/6334">The Scheme-79 chip</a>. Similarly, <em><a href="https://www.brics.dk/RS/08/4/BRICS-RS-08-4.pdf">de-functionalisation</a></em> of languages with higher-order functions obtains for them first-order interpreters. But meta-circular interpreters occur in imperative contexts too, for example, the usual proof of the <a href="https://en.wikipedia.org/wiki/Structured_program_theorem">Böhm–Jacopini theorem</a> (interestingly, it was <a href="https://en.wikipedia.org/wiki/Corrado_B%C3%B6hm">Corrado Böhm</a> who first introduced meta-circular interpreters in his 1954 PhD. thesis) constructs for an Algol-like language a meta-circular interpreter expressed in some goto-less subset of the language and then <a href="https://en.wikipedia.org/wiki/Partial_evaluation">specialises</a> this interpreter for a particular program in the source language.</p>
<p style="text-align: justify;">Given a language with a meta-circular interpreter, suppose that the language is extended with a mechanism to <em>trap</em> to the meta-level. For example, in a lisp-like language, that trap can be a new special form <code>(reflect FORM)</code> that directly executes (rather than interprets) <code>FORM</code> within the interpreter. Smith is mostly interested in reflective (<em>i.e.</em>, reification) powers obtained this way, and it is clear that the meta-level trap provides a very general language extension method: one can add new primitives, data types, flow and sequencing control operators, <em>etc</em>. But if you try to add <code>reflect</code> to an existing LISP meta-circular interpreter (for example, see p. 13 of <a href="https://www.softwarepreservation.org/projects/LISP/book/LISP%201.5%20Programmers%20Manual.pdf">LISP 1.5 Programmers Manual</a>) you'd hit a problem: <code>FORM</code> cannot be executed at the meta-level, because at this level it is not a form, but an <a href="https://en.wikipedia.org/wiki/S-expression">S-expression</a>.</p>
<h2 id="toc_3" style="text-align: justify;">Meta-interpreting machine code</h2>
<p style="text-align: justify;">To understand the nature of the problem, consider a very simple case: the object language is the machine language (or equivalently the assembly language) of some processor. Suppose that the interpreter for the machine code is written in (or, more realistically, compiled to) the same machine language. The interpreter maintains the state of the simulated processor that is, among other things registers and memory. Say, the object (interpreted) code can access a register, <code>R0</code>, then the interpreter has to keep the contents of this register somewhere, but typically not in <em>its</em> (interpreter's) <code>R0</code>. Similarly, a memory word visible to the interpreted code at an address <code>ADDR</code> is stored by the interpreter at some, generally different, address <code>ADDR'</code> (although, by applying the <a href="https://en.wikipedia.org/wiki/Banach_fixed-point_theorem">contractive mapping theorem</a> and a <em>lot</em> of hand-waving one might argue that there will be at least one word stored at the same address at the object- and meta-levels). Suppose that the interpreted machine language has the usual sub-routine call-return instructions <code>call ADDR</code> and <code>return</code> and is extended with a new instruction <code>reflect ADDR</code> that forces the interpreter to call the sub-routine <code>ADDR</code>. At the very least the interpreter needs to convert <code>ADDR</code> to the matching <code>ADDR'</code>. This might not be enough because, for example, the object-level sub-routine <code>ADDR</code> might not be contiguous at the meta-level, <em>i.e.</em>, it is not guaranteed that if <code>ADDR</code> maps to <code>ADDR'</code> then <code>(ADDR + 1)</code> maps <code>(ADDR' + 1)</code>. This example demonstrates that a reflective interpreter needs a systematic and efficient way of converting or translating between object- and meta-level representations. If such a method is somehow provided, <code>reflect</code> is a very powerful mechanism: by modifying interpreter state and code it can add new instructions, addressing modes, condition bits, branch predictors, <em>etc</em>.</p>
<h2 id="toc_4" style="text-align: justify;">N-LISP for a suitable value of N</h2>
<p style="text-align: justify;">In his thesis Prof. Smith analyses what would it take to construct a dialect of LISP for which a faithful reflective meta-circular interpreter is possible. He starts by defining a formal model of computation with an (extremely) rigorous distinction between meta- and object- levels (and, hence, between <a href="https://en.wikipedia.org/wiki/Use%E2%80%93mention_distinction">use and mention</a>). It is then determined that this model can not be satisfactorily applied to the <em>traditional</em> LISP (which is called <code>1-LISP</code> in the thesis and is mostly based on <a href="https://en.wikipedia.org/wiki/Maclisp">Maclisp</a>). The reason is that LISP's notion of <a href="https://en.wikipedia.org/wiki/Eval#Lisp">evaluation</a> conflates two operations: <a href="https://en.wikipedia.org/wiki/Normal_form_(abstract_rewriting)">normalisation</a> that operates within the level and <a href="https://en.wikipedia.org/wiki/Referent">reference</a> that moves one level down. A dialect of LISP that consistently separates normalisation and reference is called <code>2-LISP</code> (the then new <a href="https://en.wikipedia.org/wiki/Scheme_(programming_language)">Scheme</a> is called <code>LISP-1.75</code>). Definition of <code>2-LISP</code> occupies the bulk of the thesis, which the curious reader should consult for (exciting, believe me) details.</p>
<p style="text-align: justify;">Once <code>2-LISP</code> is constructed, adding the reflective capability to it is relatively straightforward. Meta-level trap takes the form of a special <a href="https://en.wikipedia.org/wiki/Anonymous_function#Lisp">lambda expression</a>:</p>
<div><pre style="text-align: justify;"><code class="language-none">(lambda reflect [ARGS ENV CONT] BODY)</code></pre></div>
<p style="text-align: justify;">When this lambda function is applied (at the object level), the body is directly executed (not interpreted) at the meta-level with <code>ARGS</code> bound to the meta-level representation of the actual parameters, <code>ENV</code> bound to the <em>environment</em> (basically, the list of identifiers and the values they are bound to) and <code>CONT</code> bound to the <a href="https://en.wikipedia.org/wiki/Continuation">continuation</a>. Environment and continuation together represent the <code>3-LISP</code> interpreter state (much like registers and memory represent the machine language interpreter state), this representation goes all the way back to <a href="https://en.wikipedia.org/wiki/SECD_machine">SECD machine</a>, see <a href="https://doi.org/10.1093%2Fcomjnl%2F6.4.308">The Mechanical Evaluation of Expressions</a>.</p>
<p style="text-align: justify;">Here is the fragment of <code>3-LISP</code> meta-circular interpreter code that handles <code>lambda reflect</code> (together with "ordinary" lambda-s, denoted by <code>lambda simple</code>):</p>
<p style="text-align: justify;"><a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.lisp#L1570"><img width="100%" src="https://nikitadanilov.github.io/3-lisp/reduce.png" /></a></p>
<h2 id="toc_5">Implementation</h2>
<p style="text-align: justify;">It is of course not possible to run an infinite tower of interpreters directly.</p>
<p style="text-align: justify;"><a href="https://nikitadanilov.github.io/3-lisp/infinity.png"><img width="100%" src="https://nikitadanilov.github.io/3-lisp/infinity.png" /></a></p>
<p style="text-align: justify;"><code>3-LISP</code> implementation creates a meta-level on demand, when a reflective lambda is invoked. At that moment the state of the meta-level interpreter is synthesised (<em>e.g.</em>, <a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.lisp#L1586">see</a> <code>make-c1</code> in the listing above). The implementation takes pain to detect when it can drop down to a lower level, which is not entirely simple because a reflective lambda can, instead of returning (that is, invoking the supplied continuation), run a potentially modified version of the <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">read-eval-loop</a> (called <code>READ-NORMALISE-PRINT</code> (<a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.lisp#L1563">see</a>) in <code>3-LISP</code>) which does not return. There is a lot of non-trivial machinery operating behind the scenes and though the implementation modestly proclaims itself <a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.lisp#L33">EXTREMELY INEFFICIENT</a> it is, in fact, remarkably fast.</p>
<h2 id="toc_6" style="text-align: justify;">Porting</h2>
<p style="text-align: justify;">I was unable to find a digital copy of the <code>3-LISP</code> sources and so manually retyped the sources from the appendix of the thesis. The transcription in <a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.lisp">3-lisp.lisp</a> (2003 lines, 200K characters) preserves the original pagination and character set, see the comments at the top of the file. Transcription was mostly straightforward except for a few places where the PDF is illegible (for example, <a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.lisp#L396">here</a>) all of which fortunately are within comment blocks.</p>
<p style="text-align: justify;">The sources are in <a href="https://dspace.mit.edu/handle/1721.1/5718">CADR machine</a> dialect of LISP, which, save for some minimal and no longer relevant details, is equivalent to Maclisp.</p>
<p style="text-align: justify;"><code>3-LISP</code> implementation does not have its own parser or interpreter. Instead, it uses flexibility built in a lisp reader (see, <a href="https://www.cs.cmu.edu/Groups/AI/html/cltl/clm/node192.html">readtables</a>) to parse, interpret and even compile <code>3-LISP</code> with a very small amount of additional code. Amazingly, this more than 40 years old code, which uses arcane features like readtable customisation, runs on a modern <a href="https://en.wikipedia.org/wiki/Common_Lisp">Common Lisp</a> platform after a very small set of changes: some functions got renamed (<code>CASEQ</code> to <code>CASE</code>, <code>*CATCH</code> to <code>CATCH</code>, <em>etc</em>.), some functions are missing (<code>MEMQ</code>, <code>FIXP</code>), some signatures changed (<code>TYPEP</code>, <code>BREAK</code>, <code>IF</code>). See <a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.cl">3-lisp.cl</a> for details.</p>
<p style="text-align: justify;">Unfortunately, the port does not run on <em>all</em> modern Common Lisp implementations, because it relies on the proper support for <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Backquote.html">backquotes</a> across recursive reader <a href="https://github.com/nikitadanilov/3-lisp/blob/master/3-lisp.cl#L92">invocations</a>:</p>
<div><pre><div style="text-align: justify;">;; Maclisp maintains backquote context across recursive parser</div><code class="language-none">;; invocations. For example in the expression (which happens within defun
;; 3-EXPAND-PAIR)
;;
;; `\(PCONS ~,a ~,d)
;;
;; the backquote is consumed by the top-level activation of READ. Backslash
;; forces the switch to 3-lisp readtable and call to 3-READ to handle the
;; rest of the expression. Within this 3-READ activation, the tilde forces
;; switch back to L=READTABLE and a call to READ to handle ",a". In Maclisp,
;; this second READ activation re-uses the backquote context established by
;; the top-level READ activation. Of all Common Lisp implementations that I
;; tried, only sbcl correctly handles this situation. Lisp Works and clisp
;; complain about "comma outside of backquote". In clisp,
;; clisp-2.49/src/io.d:read_top() explicitly binds BACKQUOTE-LEVEL to nil.</code></pre></div>
<p style="text-align: justify;">Among Common Lisp implementations I tried, only <a href="https://www.sbcl.org/">sbcl</a> supports it properly. After reading Common Lisp <a href="http://www.lispworks.com/documentation/common-lisp.html">Hyperspec</a>, I believe that it is Maclisp and sbcl that implement the specification correctly and other implementations are faulty.</p>
<h2 id="toc_7" style="text-align: justify;">Conclusion</h2>
<p style="text-align: justify;">Procedural Reflection in Programming Languages is, in spite of its age, a very interesting read. Not only does it contain an implementation of a refreshingly new and bold idea (it is not even immediately obvious that infinite reflective towers can at all be implemented, not to say with any reasonable degree of efficiency), it is based on an interplay between mathematics and programming: the model of computation is proposed and afterward implemented in <code>3-LISP</code>. Because the model is implemented in an actual running program, it has to be specified with extreme precision (which would make <a href="https://en.wikipedia.org/wiki/Alfred_Tarski">Tarski</a> and <a href="https://en.wikipedia.org/wiki/Jan_%C5%81ukasiewicz">Łukasiewicz</a> tremble), and any execution of the <code>3-LISP</code> interpreter validates the model. </p>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com2tag:blogger.com,1999:blog-5799246.post-9684649542895444282022-07-26T01:26:00.010+00:002022-07-27T05:34:30.449+00:00Treadmill<div style="text-align: justify;"><em>Treadmill</em> is a "real-time" in-place garbage collection algorithm
designed by H. Baker [<a href="#0">0</a>]. It is simple, elegant, efficient and surprisingly
little known. Speaking of which, Mr. Baker's Wikipedia <a href="https://en.wikipedia.org/wiki/Henry_Baker_(computer_scientist)">page</a>
rivals one for an obscure Roman decadent poet in scarcity of information.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The general situation of garbage collection is that there is a program (called
<em>a mutator</em> in this case) that allocates <em>objects </em>(that will also be called <i>nodes</i>) in a
<em>heap</em>, which is a pool of memory managed by the garbage collector. The
mutator can update objects to point to other earlier allocated objects so that
objects form a graph, possibly with cycles. The mutator can store pointers to
objects in some locations outside of the heap, for example in the stack or in
the registers. These locations are called <em>roots</em>.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The mutator allocates objects, but does not frees them explicitly. It is the job
of the garbage collector to return unreachable objects, that is, the objects that
can not be reached by following pointers from the roots, back to the allocation
pool.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">It is assumed that the collector, by looking at an object, can identify all
pointers to the heap stored in the object and that the collector knows all the
roots. If either of these assumptions does not hold, one needs a
<em>conservative collector</em> that can be implemented as a library for an
uncooperative compiler and run-time (<i>e.g.</i>, <a href="https://en.wikipedia.org/wiki/Boehm_garbage_collector">Boehm garbage
collector</a> for C and C++).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The earliest garbage collectors were part of <a href="https://en.wikipedia.org/wiki/Lisp_(programming_language)">Lisp</a>
run-time. Lisp programs tend to allocate a large number of <a href="https://en.wikipedia.org/wiki/Cons">cons cells</a> and organise them in
complex structures with cycles and sub-structure sharing. In fact, some of the
<a href="https://en.wikipedia.org/wiki/Lisp_machine">Lisp Machines</a> had
garbage collection implemented in hardware and allocated everything including
stack frames and binding environments in the heap. Even processor instructions
were stored as cons cells in the heap.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">To allocate a new object, the mutator calls <tt>alloc()</tt>. Treadmill is
"real-time" because the cost of <tt>alloc()</tt> in terms of processor cycles is
independent of the number of allocated objects and the total size of the heap,
in other words, <tt>alloc()</tt> is O(1) and this constant cost is not
high. This means garbage collection without "stop-the-world" pauses, at least as
long as the mutator does not genuinely exhaust the heap with reachable objects. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Treadmill is "in-place" because the address of an allocated object does not
change. This is in contrast with <em>copying</em> garbage collectors that can
move an object to a new place as part of the collection process (that implies some mechanism of updating the pointers to the moved object).</div><div style="text-align: justify;"><br /></div>
<div style="text-align: justify;">All existing garbage collection algorithms involve some form of scanning of
allocated objects and this scanning is usually described in terms of colours
assigned to objects. In the standard 3-colour scheme (introduced in [<a href="#3">3</a>] together
with the term "mutator"), <b><span style="background-color: black; color: white;">black</span></b> objects have been completely scanned together
with the objects they point to, <b style="background-color: #999999;">gray</b> objects have been scanned, but the
objects they point to are not guaranteed to be scanned and <b>white</b> objects
have not been scanned.
</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">For historical reasons, Baker's papers colour free (un-allocated) objects white
and use black-gray-<a href="https://en.wikipedia.org/wiki/Ecru">ecru</a> instead
of black-gray-white. We stick with <span style="background-color: #c2b280;">ecru</span>, at least to get a chance to learn a fancy word.</div><div style="text-align: justify;"><br /></div><div><div style="text-align: justify;">Consider the simplest case first:</div><ul><li style="text-align: justify;">the heap has a fixed size;</li><li style="text-align: justify;">the mutator is single-threaded;</li><li style="text-align: justify;">allocated objects all have the same size (like cons cells).</li></ul><div style="text-align: justify;">(All these restrictions will be lifted eventually.)</div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The main idea of treadmill is that all objects in the heap are organised in a
cyclic double-linked list, divided by 4 pointers into 4 segments:</div><div style="text-align: justify;"><br /></div><div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><img height="463" src="https://raw.githubusercontent.com/nikitadanilov/nikitadanilov.github.io/master/treadmill/0-treadmill.png" width="517" /></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Figure 0: treadmill</b></td></tr></tbody></table><div class="separator" style="clear: both; text-align: center;"><br /></div></div><div style="text-align: justify;">Allocation of new objects happens at <tt>free</tt> (clockwise), scan advances at
<tt>scan</tt> (counterclockwise), still non-scanned objects are between
<tt>bottom</tt> and <tt>top</tt> (the latter 2 terms, somewhat confusing for a cyclic
list of objects, are re-used from an earlier paper [<a href="#1">1</a>], where a copying real-time
garbage collector was introduced).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Remarkably, the entire description and the proof of correctness of Treadmill
algorithm (and many other similar algorithms) depends on a single invariant:</div><div style="text-align: justify;"><b><br /></b></div><div style="text-align: center;"><b>Invariant</b>: there are no pointers from black to ecru nodes.</div><div style="text-align: justify;"><br /></div><div><div style="text-align: justify;">That is, a black node can contain a pointer to another black node or to a gray
node. A non-black (that is, gray or ecru) node can point to any allocated node: black,
gray or ecru. An ecru node can be reached from a black node only through at least one intermediate gray node.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;"><div>Let's for the time being postpone the explanation of why this invariant is important and instead discuss the usual 2 issues that any invariant introduces: how to establish it and how to maintain it.</div><div><br /></div><div>Establishing is easy:</div><div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><img height="470" src="https://raw.githubusercontent.com/nikitadanilov/nikitadanilov.github.io/master/treadmill/4-init.png" width="584" /></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Figure 1: initial heap state</b></td></tr></tbody></table><div style="text-align: center;"><br /></div></div><div>In the initial state, all objects are un-allocated (white), except for the roots that are gray. The invariant is satisfied trivially because there are no black objects.</div><div><br /></div>After some allocations by the mutator and scanning, the heap looks like the one in
Figure 0. A call to <tt>alloc()</tt> advances <tt>free</tt> pointer clockwise,
thus moving one object from <tt>FREE</tt> to <tt>SCANNED</tt> part of the
heap. There is no need to update double-linked list pointers within the
allocated object and, as we will see, there is no need to change the object
colour. This makes the allocation fast path very quick: just a single pointer
update: <tt>free := free.next</tt>.
<div><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><img height="456" src="https://raw.githubusercontent.com/nikitadanilov/nikitadanilov.github.io/master/treadmill/1-alloc.png" width="538" /></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Figure 2: alloc()</b></td></tr></tbody></table></div><div><br /><br /></div><div><br /></div></div><div style="text-align: justify;">Allocation cannot violate the invariant, because the newly allocated object does
not point to anything. In addition to calls to <tt>alloc()</tt> the mutator can
read pointer fields from nodes it already reached and update fields of reachable
nodes to point to other reachable nodes. There is no pointer arithmetic
(otherwise a conservative collector is needed). A reachable node is either
black, gray or ecru, so it seems, at the first sight, that the only way the
mutator can violate the invariant is by setting a field in a black object to
point to an ecru object. This is indeed the case with some collection algorithms
(called "gray mutator algorithms" in [<a href="#2">2</a>]). Such algorithms use
a <em>write barrier</em>, which is a special code inserted by the compiler
before (or instead of) updating a pointer field. The simplest write barrier
prevents a violation of the 3-colour invariant by graying the ecru target object if necessary:</div></div><div><div style="text-align: justify;"><br /></div>
<div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><code class="language-c">writebarrier(obj, field, target) {
obj.field := target;
if black(obj) && ecru(target) {
darken(target);
}
}
darken(obj) { /* Make an ecru object gray. */
assert ecru(obj);
unlink(obj); /* Remove the object from the treadmill list. */
link(top, obj); /* Put it back at the tail of the gray list. */
}</code></pre></div>
<br/><div style="text-align: justify;">More sophisticated write barriers were studied that make use of the old value
of <tt>obj.field</tt> or are integrated with virtual memory sub-system, see [<a href="#2">2</a>] for details.
In our case, however, when the mutator reads a pointer field of an object, it
effectively stores the read value in a register (or in a stack frame slot) and
in Treadmill, registers can be black (Treadmill is a "black mutator
algorithm"). That is, the mutator can violate the invariant simply by reading
the pointer to an ecru object in a black register. To prevent this a <em>read
barrier</em> is needed, executed on every read of a pointer field:</div><div style="text-align: justify;"><br /></div>
<div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><code class="language-c">readbarrier(obj, field) {
if ecru(obj) {
darken(obj);
}
return obj.field;
}
</code></pre>
</div>
<div><br /></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><img border="0" data-original-height="676" data-original-width="800" height="425" src="https://nikitadanilov.github.io/treadmill/3-barrier.png" width="502" />
</td>
</tr>
<tr><td class="tr-caption" style="text-align: center;"><b>Figure 3: read barrier<br /></b><br /></td></tr></tbody></table>
<span style="text-align: justify;">When a black or gray object is read, the read barrier leaves it in place. When an ecru
object is read, the barrier un-links the object from the treadmill list (effectively
removing it from TOSCAN section) and re-links it to the treadmill either at
</span><tt style="text-align: justify;">top</tt><span style="text-align: justify;"> or at </span><tt style="text-align: justify;">scan</tt><span style="text-align: justify;">, thus making it gray. This barrier guarantees
that the mutator cannot violate the invariant simply because the mutator never
sees ecru objects (which are grayed by the barrier) and hence cannot store
pointers to them anywhere. If the read barrier is present, the write barrier is not
necessary.</span></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">That's how the invariant is established and maintained by the mutator. We still
haven't discussed how the collector works and where these mysterious ecru
objects appear from. The collector is very simple: it has a single entry point:</div><div><div style="text-align: justify;"><br /></div><div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><code class="language-c">advance() { /* Scan the object pointed to by "scan". */<br /> for field in pointers(scan) {
if ecru(scan.field) {
darken(scan.field);
}
}
scan := scan.prev; /* Make it black. */
}
</code></pre></div>
<tt><div><tt><br /></tt></div></tt><div><tt><tt>advance() </tt></tt>takes the gray object pointed to by <tt>scan</tt>, which is the head of the FRONT list, and grays all ecru objects that this object points to. After that, <tt>scan</tt> is advanced (counterclockwise), effectively moving the scanned object into the SCANNED section and making it black.<br /><tt><br /></tt></div><div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://nikitadanilov.github.io/treadmill/2-scan.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="723" data-original-width="800" height="452" src="https://nikitadanilov.github.io/treadmill/2-scan.png" width="500" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Figure 4: advance()</b></td></tr></tbody></table><br /><tt><br /></tt></div><div><tt><br /></tt></div><div style="text-align: justify;"> It's not important for now how
and when exactly <tt>advance()</tt> is called. What matters is that it blackens
an object while preserving the invariant. </div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Now comes the crucial part. An allocated object only darkens: the mutator
(<tt>readbarrier()</tt>) and the collector (<tt>advance()</tt>) can gray an ecru
object and <tt>advance()</tt> blackens a gray object. There is no way for a
black object to turn gray or for a gray object to turn ecru. Hence, the total
number of allocated non-black objects never increases. But <tt>advance()</tt>
always blackens one object, which means that after some number of calls
(interspersed with arbitrary mutator activity), <tt>advance()</tt> will run out
of objects to process: the FRONT section will be empty and there will be no gray
objects anymore:</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: justify;"><tbody><tr><td style="text-align: center;"><a href="https://nikitadanilov.github.io/treadmill/5-nogray.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="737" data-original-width="800" height="472" src="https://nikitadanilov.github.io/treadmill/5-nogray.png" width="512" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Figure 5: no gray objects</b></td></tr></tbody></table><br /><div style="text-align: justify;">All roots were originally gray and could only darken, so they are now black. And
an ecru object is reachable from a black object only through a gray object, but
there are no gray objects, so ecru objects are not reachable from roots—they
are garbage. This completes the collection cycle and, in principle, it is
possible to move all ecru objects to the FREE list at that point and start the next
collection cycle. But we can do better. Instead of replenishing
the FREE list, wait until all objects are allocated and the FREE list is empty:</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://nikitadanilov.github.io/treadmill/6-end.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="751" data-original-width="800" height="457" src="https://nikitadanilov.github.io/treadmill/6-end.png" width="487" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Figure 6: neither gray nor white<br /><br /><br /></b></td></tr></tbody></table><br />
Only black and ecru objects remain. <em>Flip</em> them: swap <tt>top</tt> and
<tt>bottom</tt> pointers and redefine colours: the old black objects are now
ecru and the old ecru objects (remember they are garbage) are now white:<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://nikitadanilov.github.io/treadmill/7-flip.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="769" data-original-width="800" height="457" src="https://nikitadanilov.github.io/treadmill/7-flip.png" width="476" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Figure 7: flip</b></td></tr></tbody></table><br />
The next collection cycle starts: put the roots between <tt>top</tt> and <tt>scan</tt> so that they are the new FRONT:<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://nikitadanilov.github.io/treadmill/8-again.png" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="652" data-original-width="800" height="413" src="https://nikitadanilov.github.io/treadmill/8-again.png" width="506" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><b>Figure 8: new cycle</b></td></tr></tbody></table><br /><div style="text-align: justify;">From this point <tt>alloc()</tt> and <tt>advance()</tt> can continue as before.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Note that <tt>alloc()</tt>, <tt>advance()</tt> and <tt>readbarrier()</tt> do not
actually have to know object colour. They only should be able to tell an ecru
(allocated) object from non-ecru, so 1 bit of information per object is
required. By waiting until the FREE list is empty and re-defining colours Treadmill
avoids the need to scan the objects and change their colours at the end of a
collection cycle: it is completely O(1).</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The last remaining bit of the puzzle is still lacking: how is it guaranteed that the
collection is completed before the FREE list is empty? If the mutator runs out of
free objects before the collection cycle is completed, then the only option is to
force the cycle to completion by calling <tt>advance()</tt> repeatedly until
there are no more gray objects and then flip, but that's a stop-the-world
situation.
The solution is to call <tt>advance()</tt> from within <tt>alloc()</tt>
guaranteeing scan progress. Baker proved that if <tt>advance()</tt> is called
<tt>k</tt> times for each <tt>alloc()</tt> call, then the algorithm never runs
out of free objects, provided that the total heap size is at least <tt>R*(1 +
1/k)</tt> objects, where <tt>R</tt> is the number of reachable objects.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">This completes the Treadmill description. </div></div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">The algorithm is very flexible. First, the restriction of a single-threaded
mutator is not really important: as long as <tt>alloc()</tt>, <tt>advance()</tt>, <tt>readbarrier()</tt>
and flip are mutually exclusive, no further constraints on concurrency are
necessary. The mutator can be multi-threaded. The collector can be
multi-threaded. <tt>advance()</tt> can be called "synchronously" (from
<tt>alloc()</tt>), explicitly from the mutator code or "asynchronously" from the
dedicated collector threads. A feedback-based method can regulate the frequency
of calls to <tt>advance()</tt> depending on the amount of free and garbage
objects. <tt>alloc()</tt> can penalise heavy-allocating threads forcing
them to do most of the scanning, <i>etc</i>.</div><div style="text-align: justify;"><br /></div>
<div style="text-align: justify;">Next, when an object is grayed by <tt>darken()</tt>, all that matter is that the
object is placed in the FRONT section. If <tt>darken()</tt> places the object
next to <tt>top</tt>, then FRONT acts as a FIFO queue and the scan proceeds in the
breadth-first order. If the object is placed next to <tt>scan</tt> then the scan
proceeds in the depth-first order, which might result in a better locality of
reference and better performance of a heap in virtual memory. A multi-threaded collector can use multiple FRONT lists, <i>e.g.</i>, one per core and scan them
concurrently.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">New objects can be added to the heap at any time, by atomically linking them
somewhere in the FREE list. Similarly, a bunch of objects can be at any moment
atomically released from the FREE list with the usual considerations of
fragmentation-avoidance in the lower layer allocator.</div><div style="text-align: justify;"><br /></div><div><div style="text-align: justify;">Support for variable-sized objects requires a separate cyclic list for each size
(plus, perhaps an additional overflow list for very large objects). The
<tt>top</tt>, <tt>bottom</tt>, <tt>scan</tt> and <tt>free</tt> pointers become
arrays of pointers with an element for each size. If arbitrarily large objects
(<i>e.g.</i>, arrays) are supported then atomicity of <tt>advance()</tt> will
require additional work: large objects need to be multi-coloured and will blacken gradually.</div><div style="text-align: justify;"><br /></div><div style="text-align: justify;">Forward and backward links to the cyclic list can be embedded in the object
header or they can be stored separately, the latter might improve cache
utilisation by the scanner.</div>
<h2 style="text-align: justify;"><tt><br /></tt></h2><h2 style="text-align: justify;"><tt>References</tt></h2>
<a name="0"></a>[<b>0</b>] The Treadmill: Real-Time Garbage Collection Without Motion Sickness
<a href="https://dl.acm.org/doi/pdf/10.1145/130854.130862">PDF (subscription)</a>,
<a href="https://www.plover.com/~mjd/misc/hbaker-archive/NoMotionGC.ps.Z">Postscript</a> </div><div>[<b>1</b>] List Processing in Real Time on a Serial Computer
<a href="https://dspace.mit.edu/bitstream/handle/1721.1/41976/AI_WP_139.pdf">PDF</a> </div><div>[<b>2</b>] The Garbage Collection Handbook. The art of automatic memory management
<a href="https://gchandbook.org/">gchandbook.org</a> </div><div><a name="3"></a>[<b>3</b>] On-the-Fly Garbage Collection: An Exercise in Cooperation
<a href="https://lamport.azurewebsites.net/pubs/garbage.pdf">PDF</a>
</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com2tag:blogger.com,1999:blog-5799246.post-72165497160963286202021-02-13T22:55:00.001+00:002021-02-13T23:16:58.532+00:00360 years later or „Скрещенья ног“In 1896 <a href="https://en.wikipedia.org/wiki/Paul_Gauguin">Paul Gauguin</a> completed <i><a href="https://fineartamerica.com/featured/te-arii-vahine-the-king-s-wife-paul-gauguin.html">Te Arii Vahine</a></i> (<i>The King’s Wife</i>):
<center><img width="800" src="https://upload.wikimedia.org/wikipedia/commons/d/df/Gauguin_La_donna_dei_manghi.jpg" border=0></center>
From many similar paintings of his Tahitian period this, together with a couple of preparatory watercolours, is distinguished by artificial legs placement, which can be characterised in Russian by the equally forced line (quoted in this article's title) from a certain universally acclaimed poem. This strange posture is neither a deficiency nor an artistic whim. It is part of a silent, subtle game played over centuries, where moves are echoes and the reward—some flickering form of immortality:
<center><img width="800" src="https://www.meisterdrucke.uk/kunstwerke/500px/Lucas%20Cranach%20%20the%20Elder%20-%20Diana%20Resting%20or%20The%20Nymph%20of%20the%20Fountain%201537%20%20-%20%28MeisterDrucke-84728%29.jpg" border=0></center>
This is <i><a href="https://www.meisterdrucke.uk/fine-art-prints/Lucas-Cranach-the-Elder/84728/Diana-Resting,-or-The-Nymph-of-the-Fountain,-1537-.html">Diana Resting</a></i>, by <a href="https://en.wikipedia.org/wiki/Lucas_Cranach_the_Elder">Cranach the Elder</a>, 1537. Let me just note the birds and leave the pleasure of finding other clues to the reader. Lucas Cranach (and this is more widely known) himself played a very similar game with <a href="https://en.wikipedia.org/wiki/Albrecht_D%C3%BCrer">Dürer</a>.
<p>By sheer luck, the first painting is just few kilometers away from me in Pushkin's museum.</p>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-21257939860329151152020-11-11T08:27:00.001+00:002020-11-11T08:27:27.891+00:00A curious case of stacks and queues.<div align="justify">When studying computing science we all learn how to convert an expression in the "normal" ("<a href="https://en.wikipedia.org/wiki/Infix_notation">infix</a>", "algebraic") notation to "<a href="https://en.wikipedia.org/wiki/Reverse_Polish_notation">reverse Polish</a>" notation. For example, an expression "<code style="background: #e0e0e0">a*b + c*d</code>" is converted to "<code style="background: #e0e0e0">a b * c d * +</code>". An expression in reverse Polish notation can be seen as a program for <a href="https://en.wikipedia.org/wiki/Pushdown_automaton">a stack automaton</a>:
<div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><code class="language-c">PUSH A
PUSH B
MUL
PUSH C
PUSH D
MUL
ADD</code></pre></div>
<p>Where <code style="background: #e0e0e0">PUSH</code> pushes its argument on the top of the (implicit) stack, while <code style="background: #e0e0e0">ADD</code> and <code style="background: #e0e0e0">MUL</code> pop 2 top elements from the stack, perform the respective operation and push the result back.
<p>For reasons that will be clearer anon, let's re-write this program as
<div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><code class="language-c">Container c;
c.put(A);
c.put(B);
c.put(c.get() * c.get())
c.put(C);
c.put(D);
c.put(c.get() * c.get())
c.put(c.get() + c.get())</code></pre></div>
<p>Where <code style="background: #e0e0e0">Container</code> is the type of stacks, <code style="background: #e0e0e0">c.put()</code> pushes the element on the top of the stack and <code style="background: #e0e0e0">c.get()</code> pops and returns the top of the stack. <a href="https://en.wikipedia.org/wiki/LIFO">LIFO</a> discipline of stacks is so widely used (implemented natively on all modern processors, built in programming languages in the form of call-stack) that one never ask whether a different method of evaluating expressions is possible.
<p>Here is a problem: find a way to translate infix notation to a program for a queue automaton, that is, in a program like the one above, but where <code style="background: #e0e0e0">Container</code> is the type of <a href="https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)">FIFO</a> <a href="https://en.wikipedia.org/wiki/Queue_(abstract_data_type)">queues</a> with <code style="background: #e0e0e0">c.put()</code> enqueuing an element at the rear of the queue and <code style="background: #e0e0e0">c.get()</code> dequeuing at the front. This problem was <a href="https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD887.PDF">reportedly</a> solved by <a href="https://en.wikipedia.org/wiki/Jan_L._A._van_de_Snepscheut">Jan L.A. van de Snepscheut</a> sometime during spring 1984.
<p>While you are thinking about it, consider the following tree-traversal code (in some abstract imaginary language):
<div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><code class="language-c">walk(Treenode root) {
Container todo;
todo.put(root);
while (!todo.is_empty()) {
next = todo.get();
visit(next);
for (child in next.children) {
todo.put(child);
}
}
}</code></pre></div>
<p>Where <code style="background: #e0e0e0">node.children</code> is the list of node children suitable for iteration by <code style="background: #e0e0e0">for</code> loop.
<p>Convince yourself that if <code style="background: #e0e0e0">Container</code> is the type of stacks, tree-walk is depth-first. And if <code style="background: #e0e0e0">Container</code> is the type of queues, tree-walk is breadth-first. Then, convince yourself that a depth-first walk of the parse tree of an infix expression produces the expression in Polish notation (unreversed) and its breadth-first walk produces the expression in "queue notation" (that is, the desired program for a queue automaton). Isn't it marvelous that traversing a parse tree with a stack container gives you the program for stack-based execution and traversing the same tree with a queue container gives you the program for queue-based execution?
<p>I feel that there is something deep behind this. <a href="https://en.wikipedia.org/wiki/Alexander_Stepanov">A. Stepanov</a> had an intuition (which cost him <a href="http://www.stlport.org/resources/StepanovUSA.html">dearly</a>) that <em>algorithms are defined on algebraic structures</em>. Elegant interconnection between queues and stacks on one hand and tree-walks and automaton programs on the other, tells us that the correspondence between algorithms and structures goes in both directions.
</div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-51749374821328282542020-10-14T22:27:00.001+00:002020-10-14T22:30:39.450+00:00Unexpected isomorphism<div dir="ltr" style="text-align: left;" trbidi="on"><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br /><div align="justify">
<p>Since Cantor's "I see it, but I cannot believe it" (1877), we know that \(\mathbb{R}^n\) are isomorphic sets for all \(n > 0\). This being as shocking as it is, over time we learn to live with it, because the bijections between continua of different dimensions are extremely discontinuous and we assume that if we limit ourselves to any reasonably well-behaving class of maps the isomorphisms will disappear. Will they?</p>
<p><b>Theorem</b>. Additive groups \(\mathbb{R}^n\) are isomorphic for all \(n > 0\) (and, therefore, isomorphic to the additive group of the complex numbers).</p>
<p><b>Proof</b>. Each \(\mathbb{R}^n\) is a vector space over rationals. Assuming axiom of choice, any vector space has a basis. By simple cardinality considerations, the cardinality of a basis of \(\mathbb{R}^n\) over \(\mathbb{Q}\) is the same as cardinality of \(\mathbb{R}^n\). Therefore all \(\mathbb{R}^n\) have the same dimension over \(\mathbb{Q}\), and, therefore, are isomorphic as vector spaces and as additive groups. <b>End of proof</b>.</p>
<p>This means that for any \(n, m > 0\) there are bijections \(f : \mathbb{R}^n \to \mathbb{R}^m\) such that \(f(a + b) = f(a) + f(b)\) and, necessary, \(f(p\cdot a + q\cdot b) = p\cdot f(a) + q\cdot f(b)\) for all rational \(p\) and \(q\).</p>
<p>I feel that this should be highly counter-intuitive for anybody who internalised the Cantor result, or, maybe, especially to such people. The reason is that intuitively there are many more continuous maps than algebraic homomorphisms between the "same" pair of objects. Indeed, the formula defining continuity has the form \(\forall x\forall\epsilon\exists\delta\forall y P(x, \epsilon, \delta, y)\) (a local property), while homomorphisms are defined by \(\forall x\forall y Q(x, y)\) (a stronger global property). Because of this, topological categories have much denser lattices of sub- and quotient-objects than algebraic ones. From this one would expect that as there are no isomorphisms (continuous bijections) between continua of different dimensions, there definitely should be no homomorphisms between them. Yet there they are.
</div>
</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-9416899217640585242019-01-27T22:25:00.002+00:002023-03-27T06:50:03.126+00:00Why Go is Not My Favorite Programming Language<div dir="ltr" style="text-align: left;" trbidi="on"><div dir="ltr" style="text-align: left;" trbidi="on"><div style="text-align: justify;"><i>Disclaimer</i>: this article shares very little except the title with the classical <a href="https://www.lysator.liu.se/c/bwk-on-pascal.html">Why Pascal is Not My Favorite Programming Language</a>. No attempt is made to analyse Go in any systematic fashion. To the contrary, the focus is on one particular, if grave, issue. Moreover, the author happily admits that his experience with Go programming is very limited.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Go is a system programming language and a large fraction of system software is processing of incoming <em>requests</em> of some sort, for example:</div><ul><li style="text-align: justify;">[KERNEL] an OS kernel processes system calls;</li>
<li style="text-align: justify;">[SERVER] a server processes requests received over network or IPC;</li>
<li style="text-align: justify;">[LIB] a library processes invocations of its entry points.</li>
</ul><div style="text-align: justify;">A distinguishing feature of system software is that it should be resilient against abnormal conditions it the environment such as network communication failures, storage failure, etc. Of course, there are practical limits to such resilience and it is very difficult to construct a software that would operate correctly in the face on undetected processor or memory failures (albeit, such systems <a href="https://en.wikipedia.org/wiki/Tandem_Computers">were built</a> in the past), but it is generally agreed that system software should handle a certain class of failures to be usable as a foundation of software stack. We argue that Go is not suitable for system programming because it cannot deal with one of the most important failures in this class: memory allocation errors.</div><br />
<div style="text-align: justify;">Out of many existing designs of failure handling (exceptions, <a href="https://users.ece.cmu.edu/~koopman/des_s99/sw_fault_tolerance/">recovery </a><a href="https://users.ece.cmu.edu/~koopman/des_s99/sw_fault_tolerance/">blocks</a>, <i>etc</i>.) Go exclusively selects explicit error checking with a simple <a href="https://golang.org/ref/spec#Handling_panics">panic-recovery </a>mechanism. This makes sense, because this is the only design that works in all the use-cases mentioned above. However, memory allocation errors do not produce checkable errors in Go. The language specification does not even <a href="https://golang.org/ref/spec#Allocation">mention a possibility of allocation failure</a> and in the discussions of these issues (see e.g., <a href="https://github.com/golang/go/issues/243">here</a> and <a href="https://github.com/golang/go/issues/14162">here</a>) Google engineers adamantly refuse considering a possibility of adding an interface to intercept memory allocation errors. Instead, various methods to warn the application that memory is "about to be exhausted" as proposed. These methods, of course, only reduce the probability of running out of memory, but never eliminate it (thus making bugs in the error handling code more difficult to test). As one can easily check by running a simple program that allocates all available memory, memory allocation error results in unconditional program termination, rather than a recoverable panic.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">But even if a way to check for allocation errors or recover from them were added, it would not help, because Go often allocates memory behind the scenes, so that there is no point in the program source, where a check could be made. For example, memory is allocated whenever a struct is used as an interface:</div><div style="text-align: justify;"><br />
</div><div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">package main
type foo interface {
f() int
}
type bar struct {
v int
}
func out(s foo) int {
return s.f() - 1
}
func (self bar) f() int {
return self.v + 1
}
func main() {
for {
out(bar{})
}
}
</pre></div>
<br><div style="text-align: justify;">The program above contains no explicit memory allocations, still, it allocates a lot of memory. The assembly output (use <a href="https://go.godbolt.org/">godbolt.org</a> for example) for <tt>out(bar{})</tt> contains a call to <tt>runtime.convT64()</tt> (see the <a href="https://github.com/golang/go/blob/master/src/runtime/iface.go">source</a>) that calls <a href="https://github.com/golang/go/blob/master/src/runtime/malloc.go" style="text-align: left;">mallocgc</a><span style="text-align: left;">.</span></div><br />
<div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">func convT64(val uint64) (x unsafe.Pointer) {
if val == 0 {
x = unsafe.Pointer(&zeroVal[0])
} else {
x = mallocgc(8, uint64Type, false)
*(*uint64)(x) = val
}
return
}
</pre></div><br />
To summarise, the combination of the following reasons makes Go unsuitable for<br />
construction of reliable system software:<br />
<ul><li>it is not, in general, possible to guarantee that memory allocation would always succeed. For example, in the [LIBRARY] case, other parts of the process or other processes can exhaust all the available memory. Pre-allocating memory for the worst case is impractical except in the simplest cases;</li>
<li>due to the design of Go runtime and the implementation of the fundamental language features like interfaces, it is not possible to reliably check for memory allocation errors;</li>
<li>software that can neither prevent nor resolve memory allocation errors is unreliable. For example, a library that when called crashes the entire process, because some other process allocated all available memory cannot be used to build reliable software on top of it.</li>
</ul></div><a href="https://www.blogger.com/null"></a></div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-51177804896984654762018-03-09T21:58:00.001+00:002018-03-09T22:06:44.593+00:00Cographs, cocounts and coconuts.<div dir="ltr" style="text-align: left;" trbidi="on"><script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br />
<div style="-webkit-text-stroke-width: 0px; color: black; font-family: Rubik; font-size: medium; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: justify; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"></div><br />
<blockquote class="tr_bq" style="-webkit-text-stroke-width: 0px; color: black; font-family: Rubik; font-size: medium; font-style: normal; font-variant-caps: normal; font-variant-ligatures: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: justify; text-decoration-color: initial; text-decoration-style: initial; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px;"><div style="margin: 0px;"><span style="font-size: xx-small;"><i>Abstract</i>: Dual of the familiar construction of the graph of a function is considered. The symmetry between graphs and cographs can be extended to a suprising degree.</span></div></blockquote><div style="text-align: justify;">Given a function \(f : A \rightarrow B\), the <em><a href="https://en.wikipedia.org/wiki/Graph_of_a_function">graph of f</a></em> is defined as $$f^* = \{(x, f(x)) \mid x \in A\}.$$ In fact, within <a href="https://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory">ZFC</a> framework, functions are <em>defined</em> as graphs. A graph is a subset of the Cartesian product \(A \times B\). One might want to associate to \(f\) a dual <em>cograph</em> object: a certain quotient set of the disjoint sum \(A \sqcup B\), which would uniquely identify the function. To understand the structure of the cograph, define the graph of a morphism \(f : A \rightarrow B\) in a <a href="https://en.wikipedia.org/wiki/Category_(mathematics)">category</a> with suitable products as a fibred product:<br />
\(\require{AMScd}\)\begin{CD}<br />
f^* @>\pi_2>> B\\<br />
@V \pi_1 V V @VV 1_B V\\<br />
A @>>f> B<br />
\end{CD}In the category of sets this gives the standard definition. The cograph can be defined by a dual construction as a push-out:<br />
\begin{CD}<br />
A @>1_A>> A\\<br />
@V f V V @VV j_1 V\\<br />
B @>>j_2> f_*<br />
\end{CD}Expanding this in the category of sets gives the following definition:<br />
$$f_* = (A \sqcup B) / \pi_f,$$<br />
where \(\pi_f\) is the reflexive transitive closure of a relation \(\theta_f\) given by (assuming in the following, without loss of generality, that \(A\) and \(B\) are disjoint)<br />
$$x\overset{\theta_f}{\sim} y \equiv y = f(x)$$<br />
That is, \(A\) is partitioned by \(\pi_f\) into subsets which are inverse images of elements of \(B\) and to each such subset the element of \(B\) which is its image is glued. This is somewhat similar to the <a href="https://en.wikipedia.org/wiki/Mapping_cylinder">mapping cylinder</a> construction in topology. Some similarities between graphs and cographs are immediately obvious. For graphs: $$\forall x\in A\; \exists! y\in B\; (x, y)\in f^*$$ $$f(x) = \pi_2((\{x\}\times B) \cap f^*)$$ $$f(U) = \{y \mid y = f(x) \wedge x\in U \} = \pi_2((U\times B)\cap f^*)$$ where \(x\in A\) and \(U\subseteq A\). Similarly, for cographs: $$\forall x\in A\; \exists! y\in B\; [x] = [y]$$ $$f(x) = [x] \cap B$$ $$f(U) = (\bigcup [U])\cap B$$ where \([x]\) is the equivalance set of \(x\) w.r.t. \(\pi_f\) and \([U] = \{[x] \mid x \in U\}\). For inverse images: $$f^{-1}(y) = \pi_1((A \times \{y\}) \cap f^*) = [y] \cap A$$ $$f^{-1}(S) = \pi_1((A \times S) \cap f^*) = (\bigcup [S])\cap A$$ where \(y\in B\) and \(S\subseteq B\).<br />
<br />
A graph can be expressed as $$f^* = \bigcup_{x \in A}(x, f(x))$$ To write out a similar representation of a cograph, we have to recall some elementary facts about <a href="https://en.wikipedia.org/wiki/Equivalence_relation">equivalence relations</a>.<br />
<br />
Given a set \(A\), let \(Eq(A) \subseteq Rel(A) = P(A \times A)\) be the set of equivalence relations on \(A\). For a relation \(\pi \subseteq A \times A\), we have $$\pi \in Eq(A) \;\; \equiv \;\; \pi^0 = \Delta \subseteq \pi \; \wedge \; \pi^n = \pi, n \in \mathbb{Z}, n \neq 0.$$ To each \(\pi\) corresponds a surjection \(A \twoheadrightarrow A/\pi\). Assuming axiom of choice (in the form "<a href="https://math.stackexchange.com/questions/1206013/every-epimorphism-in-sets-is-split-why-is-it-equivalent-to-axiom-of-choice">all epics split</a>"), an endomorphism \(A \twoheadrightarrow A/\pi \rightarrowtail A\) can be assigned (non-uniquely) to \(\pi\). It is easy to check, that this gives \(Eq(A) = End(A) / Aut(A)\), where \(End(A)\) and \(Aut(A)\) are the monoid and the group of set endomorphisms and automorphisms respectively, with composition as the operation (\(Aut(A)\) is not, in general, normal in \(End(A)\), so \(Eq(A)\) does not inherit any useful operation from this quotient set representation.). In addition to the monoid structure (w.r.t. composition) that \(Eq(A)\) inherits from \(Rel(A)\), it is also a <a href="https://en.wikipedia.org/wiki/Lattice_(order)">lattice</a> with infimum and supremum given by $$\pi \wedge \rho = \pi \cap \rho$$ $$\pi \vee \rho = \mathtt{tr}(\pi \cup \rho) = \bigcup_{n \in \mathbb{N}}(\pi \cup \rho)^n$$ For a subset \(X\subseteq A\) define an equivalence relation \(e(X) = \Delta_A \cup (X\times X)\), so that $$x\overset{e(X)}{\sim} y \equiv x = y \vee \{x, y\} \subseteq X$$ (Intuitively, \(e(X)\) collapses \(X\) to one point.) It is easy to check that $$f_* = \bigvee_{x \in A}e(\{x, f(x)\})$$ which is the desired dual of the graph representation above.</div></div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-26403346277935640032017-12-18T22:10:00.000+00:002017-12-18T22:10:45.102+00:00Conatus trimetro iambico.<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px;">
<span style="letter-spacing: -0.12px;"><i>Aloof as stardust rains</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px;">
<span style="letter-spacing: -0.12px;"><i>Are memory dim prints</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<i>Eliding tensed face</i></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<span class="text_exposed_show" style="display: inline; font-family: inherit;"><i>By shadows within of</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<span class="text_exposed_show" style="display: inline; font-family: inherit;"><i>All conquering space that</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<span class="text_exposed_show" style="display: inline; font-family: inherit;"><i>Inly trusts each friend</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<span class="text_exposed_show" style="display: inline; font-family: inherit;"><i>Of madness—whose embrace</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<span class="text_exposed_show" style="display: inline; font-family: inherit;"><i>Accept with no delays</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<span class="text_exposed_show" style="display: inline; font-family: inherit;"><i>To take one final look</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<span class="text_exposed_show" style="display: inline; font-family: inherit;"><i>Before you turn Rose ways.</i></span></div>
<div style="background-color: white; color: #1d2129; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px; margin-bottom: 6px; margin-top: 6px;">
<span class="text_exposed_show" style="display: inline; font-family: inherit;"><br /></span></div>
<div class="text_exposed_show" style="background-color: white; color: #1d2129; display: inline; font-family: Telex, sans-serif; font-size: 14px; letter-spacing: -0.12px;">
<div style="font-family: inherit; margin-bottom: 6px;">
A rather rare meter in English, but much easier once you let enjambments in.</div>
</div>
</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-54398795409584622482017-09-22T22:05:00.001+00:002017-09-22T23:02:27.905+00:00Translations.<div dir="ltr" style="text-align: left;" trbidi="on">
<blockquote>
Licence my roving hands, and let them go,<br />
Before, behind, between, above, below.<br />
O my America! my new-found-land<br />
<div style="text-align: right;">
-- J. Donne, 1633.</div>
</blockquote>
<blockquote>
Блуждающим рукам моим дай разрешенье,<br />
Пусти вперед, назад, промеж, и вверх, и вниз,<br />
О дивный новый мир, Америка моя!</blockquote>
Variant reading <i>reeing</i> instead of <i>roving</i> is even better.<br />
<br />
I hope that "О дивный новый мир" (<i>O brave new world</i>) is not entirely anachronistic.</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-88482507127666714982014-12-13T21:57:00.000+00:002014-12-17T12:33:23.348+00:00Zero everywhere.<div dir="ltr" style="text-align: left;" trbidi="on">
<script src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7sSbGi4zvDY2F4HbR0uDssmUCFc9szMj0KqqXlSlTtAaN8b9aTn5hYY5qbccWhPaaNEUX3GTv5xXURQjkRGrj0ZPC8AWhOpjJ4hvAUJxNLtGmRp1OqOul7pzRc4U-J5FKa_FWdQ/s1600/zero-limit.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7sSbGi4zvDY2F4HbR0uDssmUCFc9szMj0KqqXlSlTtAaN8b9aTn5hYY5qbccWhPaaNEUX3GTv5xXURQjkRGrj0ZPC8AWhOpjJ4hvAUJxNLtGmRp1OqOul7pzRc4U-J5FKa_FWdQ/s1600/zero-limit.jpg" height="76" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
(<a href="https://vk.com/doc177654709_349274647">студенческая олимпиада МФТИ по математике</a>, 2013, задача 3)</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div style="text-align: justify;">
Предположим, что \(\forall x\in\mathbb{R}\to f(x) \neq 0\). Возьмем произвольный интервал \([a, b] \subset \mathbb{R}\), \(a \lt b\) и докажем, что на этом интервале есть точка \(x_0\) такая, что \(\lim \limits_{x\to x_0} f(x) \neq 0\).<br />
<br />
Пусть \(T_n = |f|^{-1}([\frac{1}{n}, +\infty)) \cap [a, b]\), т.е. \(x \in T_n \equiv |f(x)| \ge \frac{1}{n} \wedge x\in[a, b]\), для \(n > 0\).<br />
<br />
Если каждое \(T_n\) нигде неплотно, нигде неплотно и их объединение, (т.к. \([a, b]\) — <a href="http://en.wikipedia.org/wiki/Baire_space">бэровское пространство</a>), но их объединение это весь интервал \([a, b]\) — противоречие. Следовательно, некоторое \(T_n\) имеет внутреннюю точку, \(x_0 \in T_n\), тогда \(T_n\) содержит \(x_0\) вместе с неким открытым интервалом на котором, таким образом, \(|f(x)| ≥ \frac{1}{n}\), и, следовательно, \(|\lim \limits_{x\to x_0} f(x)| \ge \frac{1}{n} \gt 0\).<br />
<br />
Заметим, что мы доказали больше, чем требовалось, а именно, что множество нулей всюду плотно. Или, что функция всюду сходящаяся к непрерывной, почти всюду непрерывна (замените \(0\) на произвольную непрерывную \(g:\mathbb{R}\to\mathbb{R}\)).<br />
<br />
(2014.12.15) Обобщение.<br />
<br />
Let \(X\) be a Baire space, \(Y\)—a second-countable Urysohn space and \(f,g : X \to Y\)—arbitrary maps. If \((\forall x\in X)(\lim\limits_{t\to x}f(t) = \lim\limits_{t\to x}g(t))\) then \(f = g\) on a dense subset.<br />
<br />
<blockquote class="tr_bq">
Proof (by contraposition). Suppose that there is an open \(A \subseteq X\), such that \((\forall x\in A)(f(x)\neq g(x))\). Let \(B\) be a countable base of \(Y\).</blockquote>
<blockquote class="tr_bq">
Define a countable family of subsets of \(A\): \(T_{U,V} = f^{-1}(U) \cap g^{-1}(V) \cap A\), where \(U, V \in B\) (that is, \(U\) and \(V\) are open subsets of \(Y\)). For any \(x\in A\), \(f(x)\neq g(x)\), and because \(Y\) is Urysohn, there are \(U, V\in B, cl(U)\cap cl(V) = \varnothing, f(x)\in U, g(x)\in V\), that is, \(x\in T_{U,V}\) that is,</blockquote>
<blockquote class="tr_bq" style="text-align: center;">
\(\bigcup\limits_{cl(U)\cap cl(V) = \varnothing} T_{U,V} = A\)</blockquote>
<blockquote class="tr_bq">
Because \(X\) and hence \(A\) are Baire spaces, one of \(T_{U,V}\) in the union above, say \(T_{U_0, V_0}\) is not nowhere dense. That is, there is an open set \(G\subseteq A\) such that for any \(x_0\in G\), and any open neighbourhood \(S, x_0\in S\), \(S \cap T_{U_0,V_0}\neq\varnothing\), that is there exists a \(x'\in S\) such that \(f(x') \in U_0\) and \(g(x')\in V_0\).</blockquote>
<blockquote class="tr_bq">
Suppose that \(\lim\limits_{t\to x_0}f(t) = \lim\limits_{t\to x_0}g(t) = y\in Y\). This means that every open neighbourhood of \(y\) intersects with \(U_0\), hence \(y\in cl(U_0)\). Similarly \(y\in cl(V_0)\), contradiction with \(cl(U_0)\cap cl(V_0) = \varnothing\).</blockquote>
</div>
PS: для задачи 2, ответ \(k = 2\cdot n - 2\).</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-40469494617582939992014-10-16T20:14:00.000+00:002014-10-16T20:14:58.313+00:00Blindsight spot.<div dir="ltr" style="text-align: justify;" trbidi="on"><script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script><br />
I just realised that in any unital ring commutativity of addition follows from distributivity:<br />
<br />
<table border="0" bordercolor="#ffffff" cellpadding="3" cellspacing="3"><tbody>
<tr><td></td><td>\(a + b\)</td></tr>
<tr><td>\(=\)</td><td>\(-a + a + a + b + b - b\)</td></tr>
<tr><td>\(=\)</td><td>\(-a + a\cdot(1 + 1) + b\cdot(1 + 1) - b\)</td></tr>
<tr><td>\(=\)</td><td>\(-a + (a + b)\cdot(1 + 1) - b\)</td></tr>
<tr><td>\(=\)</td><td>\(-a + (a + b)\cdot 1 + (a + b)\cdot 1 - b\)</td></tr>
<tr><td>\(=\)</td><td>\(-a + a + b + a + b - b\)</td></tr>
<tr><td>\(=\)</td><td>\(b + a\)</td></tr>
</tbody></table><br />
The same holds for unital modules, algebras, vector spaces, <em>&c.</em> Note that multiplication doesn't even need to be associative. It's amazing how such things can pass unnoticed.<br />
</div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-90507729989591912502014-05-28T18:08:00.000+00:002014-05-28T18:34:25.189+00:00Whither science (А болт мы сделаем деревянным).<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
I was recently on a committee, reviewing papers for a computer science conference. The submissions, from all over the world, were in PDF. One submission was actually a Trojan. An executable masking as a PDF file. This by itself is a startling evidence of how widespread spying became (the Trojan came from a respectable university email address), but the really alarming thing is that this was <b>*not*</b> the paper that got the lowest grades. That is, some legitimate submissions have less scientific content than a virus.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A few days later, while perusing proceedings of <a href="http://books.google.com/books?id=I7Vn-7fmJckC&pg=PP5">another conference</a>, which shall remain nameless only for so long, I found something that triggered this post.</div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiISEVcKjsHkrjzwt6aXNFizMysYL_FCNGl49RsaBSZnOtKVVmtjDQ2v_VBU-p4sLedG3EO5kcUjLKQg7qrSBgapx2FuShse-8kYHZMlR_irGXVWLRr_xy5xhhJBULJFnuSzjREjQ/s1600/pdcn0.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiISEVcKjsHkrjzwt6aXNFizMysYL_FCNGl49RsaBSZnOtKVVmtjDQ2v_VBU-p4sLedG3EO5kcUjLKQg7qrSBgapx2FuShse-8kYHZMlR_irGXVWLRr_xy5xhhJBULJFnuSzjREjQ/s1600/pdcn0.png" /></a></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: justify;">
A conference on distributed computing must be interesting. And indeed, for an attentive reader, persistent enough to reach the last articles, there is a treasure in store. An impressive proof of how versatile and far-reaching distributed computing is nowadays.</div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbOjBoGZW2U8ck4HCpDB-HqLB6LgTNpnOAhBLgKNz3coqUN2wabdDHLJppzELRMPnrTtJtqZrE8cLJ9JGX3V760yr4Z86HN14kwxS4MqcpCdG582F47RuDqlbepeq3_8P6s5a4Vw/s1600/pdcn1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjbOjBoGZW2U8ck4HCpDB-HqLB6LgTNpnOAhBLgKNz3coqUN2wabdDHLJppzELRMPnrTtJtqZrE8cLJ9JGX3V760yr4Z86HN14kwxS4MqcpCdG582F47RuDqlbepeq3_8P6s5a4Vw/s1600/pdcn1.png" /></a></div>
<br />
And who do you think publishes this?<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrVV6hWb9emkl92QZ-l5pXYDpBjHGn9wmyotTuuTFJYuStoMR652QfRx33Fx_ZWkhiZBorA8UAgJGeDyQ9Q-rm5UFPrs48LKK3tZdQ6HfH358RBA6vNBt4Rz6uf0ysXzjUPiS4Fg/s1600/pdcn2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrVV6hWb9emkl92QZ-l5pXYDpBjHGn9wmyotTuuTFJYuStoMR652QfRx33Fx_ZWkhiZBorA8UAgJGeDyQ9Q-rm5UFPrs48LKK3tZdQ6HfH358RBA6vNBt4Rz6uf0ysXzjUPiS4Fg/s1600/pdcn2.png" /></a></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Oh, pity.</div>
</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-62043756277000560932014-02-09T12:17:00.001+00:002014-02-10T08:53:37.420+00:00Parmigianino's faces<div dir="ltr" style="text-align: left;" trbidi="on">
An angel from <a href="http://en.wikipedia.org/wiki/Madonna_with_the_Long_Neck" style="font-style: italic;">Madonna with the long neck</a> (1535):<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBDL2NwpQ24c2l6PeTHnsc4HHRz2Agn5aEy145XF1VXfKT5U1epPZficgJ-fzw5KhutEDBmmEXvBJCLwh_mMMs3YBMEZwwfgiCwMMT2KejoaJdDRjBjNRlPBRhFf5Fr26Y7JZByA/s1600/Madonna+dal+collo+lungo.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiBDL2NwpQ24c2l6PeTHnsc4HHRz2Agn5aEy145XF1VXfKT5U1epPZficgJ-fzw5KhutEDBmmEXvBJCLwh_mMMs3YBMEZwwfgiCwMMT2KejoaJdDRjBjNRlPBRhFf5Fr26Y7JZByA/s1600/Madonna+dal+collo+lungo.png" height="320" width="256" /></a></div>
<br />
An unidentified girl from <a href="http://en.wikipedia.org/wiki/Portrait_of_a_Young_Woman_(Parmigianino)"><i>Antea</i></a> (1524, survived everything, including Austrian salt mines):<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhy0deyMAOTi6_NZdSbjGXj9xRLypcoigzEWRxWQ3vJK00EBYwvcGieeHQkIjTAmFHMO4JpbW1QwKqNeLSPxYQxqrVLKB6ufA9So6D4IvU5dWHe6VyisomYTkpVuRjx7odQQfXNPA/s1600/Antea.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhy0deyMAOTi6_NZdSbjGXj9xRLypcoigzEWRxWQ3vJK00EBYwvcGieeHQkIjTAmFHMO4JpbW1QwKqNeLSPxYQxqrVLKB6ufA9So6D4IvU5dWHe6VyisomYTkpVuRjx7odQQfXNPA/s1600/Antea.png" height="342" width="256" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Albeit one can argue that the resemblance is due to the stricture of the mannerist canon (see the earlobes, for example), this is undoubtedly the same face.<br />
<br />
A discovery no less thrilling even though I am definitely not the first to make it.</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-53095423681154950792013-12-16T12:08:00.003+00:002013-12-16T12:11:22.317+00:00threads, contexts and doors.<div dir="ltr" style="text-align: left;" trbidi="on">
<div align="justify">
<a href="https://plus.google.com/102766894085961969420/posts">Paul Turner</a> gave a <a href="http://www.youtube.com/watch?v=KXuZi9aeGTw">talk</a> about new threading interface, designed by Google, at this year <a href="http://www.linuxplumbersconf.org/2013/">Linux Plumbers Conference</a>:<br />
<br />
<iframe allowfullscreen="" frameborder="0" height="315" src="//www.youtube.com/embed/KXuZi9aeGTw" width="420"></iframe><br />
<br />
The idea is, very roughly, to implement the <a href="http://pubs.opengroup.org/onlinepubs/7908799/xsh/ucontext.h.html">ucontext</a> interface with kernel support. This gives the benefits of kernel threads (<i>e.g.</i>, SMP support), while retaining fast context switches of user threads. <tt>switchto_switch(tid)</tt> call hands off control to the specified thread without going through the kernel scheduler. This is like <tt>swapcontext(3)</tt>, except that kernel stack pointer is switched too. Of course, there is still an overhead of the system call and return, but it is not as important as it used to be: the cost of normal context switch is dominated by the scheduler invocation (with all the associated locking), plus, things like TLB flushes drive the difference between user and kernel context switching further down.<br />
<br />
I did something similar (but much more primitive) in 2001. The difference was that in that old implementation, one could switch context with a thread running in a different address space, so it was possible to make a "continuation call" to another process. This was done to implement <a href="http://en.wikipedia.org/wiki/Doors_(computing)">Solaris doors</a> RPC mechanism on Linux. Because this is an RPC mechanism, arguments have to be passed, so each context switch also performed a little dance to copy arguments between address spaces.</div>
</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-10539151038570212882013-01-19T18:50:00.000+00:002013-07-24T20:59:52.034+00:00Weekend affinity.<div dir="ltr" style="text-align: left;" trbidi="on"><div style="text-align: justify;"><script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script><br />
Groups usually come with <a href="http://en.wikipedia.org/wiki/Group_(mathematics)">homomorphisms</a>, defined as mappings preserving multiplication:<br />
$$f(a\cdot b) = f(a)\cdot f(b)$$<br />
From this definition, notions of subgroup (monomorphism), <a href="http://en.wikipedia.org/wiki/Quotient_group">quotient group</a> (epimorphism, <a href="http://en.wikipedia.org/wiki/Normal_subgroup">normal subgroup</a>) and the famous <a href="http://en.wikipedia.org/wiki/Isomorphism_theorem#First_isomorphism_theorem">isomorphism theorem</a> follow naturally. The <a href="http://en.wikipedia.org/wiki/Category_of_groups">category of groups</a> with homomorphisms as arrows has products and sums, equalizers and coequalizers all well-known and with nice properties.<br />
<br />
Consider, instead, <em>affine morphism</em>, that can be defined by the following equivalent conditions:<br />
<br />
<ol><li>\(f(a \cdot b^{-1} \cdot c) = f(a) \cdot f^{-1}(b) \cdot f(c)\)</li>
<li>\(f(a \cdot b) = f(a) \cdot f^{-1}(e) \cdot f(b)\)</li>
<li>\(\exists t. f(a \cdot b) = f(a) \cdot t \cdot f(b)\)</li>
</ol><br />
The motivation for this definition is slightly roundabout.<br />
<br />
The difference between homomorphism and affine morphism is similar to the difference between a <a href="http://en.wikipedia.org/wiki/Vector_subspace">vector subspace</a> and an <a href="http://en.wikipedia.org/wiki/Affine_space#Affine_subspaces">affine subspace</a> of a vector space. A vector subspace always goes through the origin (for a homomorphism \(f\), \(f(e) = e\)), whereas an affine subspace is translated from the origin (\(f(e) \neq e\) is possible for an affine morphism).<br />
<br />
Take points \(f(a)\) and \(f(b)\) in the image of an affine morphism, translate them back to the corresponding "vector subspace" to obtain \(f(a) \cdot f^{-1}(e)\) and \(f(b) \cdot f^{-1}(e)\). If translated points are multiplied and the result is translated back to the affine image, the resulting point should be the same as \(f(a \cdot b)\):<br />
$$<br />
f(a \cdot b) = (f(a) \cdot f^{-1}(e)) \cdot (f(b) \cdot f^{-1}(e)) \cdot f(e) = f(a) \cdot f^{-1}(e) \cdot f(b)<br />
$$<br />
which gives the definition (2).<br />
<br />
(1) => (2) immediately follows by substituting \(e\) for \(b\).<br />
(2) => (3) by substituting \(f^{-1}(e)\) for \(t\).<br />
(3) => (2) by substituting \(e\) for \(a\) and \(b\).<br />
(2) => (1)<br />
<br />
<table border="0" bordercolor="#ffffff" style="background-color:" cellpadding="3" cellspacing="3"><tr><td></td><td>\(f(a \cdot b^{-1} \cdot c)\)</td></tr>
<tr><td>\(=\)</td><td>{ (2) for \(a \cdot (b^{-1} \cdot c)\) }</td></tr>
<tr><td></td><td>\(f(a) \cdot f^{-1}(e) \cdot f(b^{-1} \cdot c)\)</td></tr>
<tr><td>\(=\)</td><td>{ (2) for \(b^{-1} \cdot c\) }</td></tr>
<tr><td></td><td>\(f(a) \cdot f^{-1}(e) \cdot f(b^{-1}) \cdot f^{-1}(e) \cdot f(c)\)</td></tr>
<tr><td>\(=\)</td><td>{ \(e = f(b) \cdot f^{-1}(b)\), working toward creating a sub-expression that can be collapsed by (2) }</td></tr>
<tr><td></td><td>\(f(a) \cdot f^{-1}(e) \cdot f(b^{-1}) \cdot f^{-1}(e) \cdot f(b) \cdot f^{-1}(b) \cdot f(c)\)</td></tr>
<tr><td>\(=\)</td><td>{ collapsing \(f(b^{-1}) \cdot f^{-1}(e) \cdot f(b)\) by (2) }</td></tr>
<tr><td></td><td>\(f(a) \cdot f^{-1}(e) \cdot f(b^{-1} \cdot b) \cdot f^{-1}(b) \cdot f(c)\)</td></tr>
<tr><td>\(=\)</td><td>{ \(b^{-1} \cdot b = e\) }</td></tr>
<tr><td></td><td>\(f(a) \cdot f^{-1}(e) \cdot f(e) \cdot f^{-1}(b) \cdot f(c)\)</td></tr>
<tr><td>\(=\)</td><td>{ \(f^{-1}(e) \cdot f(e) = e\) }</td></tr>
<tr><td></td><td>\(f(a) \cdot f^{-1}(b) \cdot f(c)\)</td></tr>
</table><br />
It is easy to check that each homomorphism is an affine morphism (specifically, homomorphisms are exactly affine morphisms with \(f(e) = e\)).<br />
<br />
Composition of affine morphisms is affine and hence groups with affine morphisms form a category \(\mathbb{Aff}\).<br />
<br />
A subset of a group \(G\) is called an <em>affine subgroup</em> of \(G\) if one of the following equivalent conditions holds:<br />
<ol><li>\(\exists h \in G:\forall p, q \in H \rightarrow (p \cdot h^{-1} \cdot q \in H \wedge h \cdot p^{-1} \cdot h \in H)\)</li>
<li>\(\forall p, q, h \in H \rightarrow (p \cdot h^{-1} \cdot q \in H \wedge h \cdot p^{-1} \cdot h \in H)\)</li>
</ol>The equivalence (with a proof left as an exercise) means that if any \(h\) "translating" affine subgroup to a subgroup exists, then any member of the affine subgroup can be used for translation. In fact, any \(h\) that satisfies (1) belongs to \(H\) (prove). This matches the situation with affine and vector subspaces: any vector from an affine subspace translates this subspace to the subspace passing through the origin.<br />
<br />
Finally for to-day, consider an affine morphism \(f:G_0\rightarrow G_1\). For \(t\in G_0\) define <em>kernel</em>:<br />
$$ker_t f = \{g\in G_0 | f(g) = f(t)\}$$<br />
It's easy to check that a kernel is affine subgroup (take \(t\) as \(h\)). Note that in \(\mathbb{Aff}\) a whole family of subobjects corresponds to a morphism, whereas there is <em>the</em> kernel in \(\mathbb{Grp}\).<br />
<br />
To be continued: affine quotients, products, sums, free affine groups.<br />
</div></div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-7021842604699312102013-01-17T20:11:00.000+00:002013-01-17T22:58:20.784+00:00Vale of tears.<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
Consider a system describing possible computations (<i>e.g.</i>, a programming language or a state machine formalism) including interactions with the external world, that is, input and output facilities.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
A pair of dual sub-classes of computational elements (values, objects, functions, <i>&c</i>.) can be defined:</div>
<div style="text-align: justify;">
</div>
<ul>
<li style="text-align: justify;"><i>finitary</i> elements that are known to not depend on input and</li>
<li style="text-align: justify;"><i>ideal</i> elements that output is known to not depend on.</li>
</ul>
<div>
<div style="text-align: justify;">
The rest of elements may depend on input and may affect output. Let's call such elements <i>laic</i> ("temporal" might be a better word).</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
The class of finitary elements is well-known: because they can be computed without input, they can be computed before the program starts, <i>i.e.</i>, they correspond to various constants, including static entities like types (in statically typed languages), classes, function bodies and so on. Some languages have powerful finitary computations, for example, C++ template specialisation is Turing complete.</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
Laic elements are the most usual things like variables and objects.</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
Ideal elements are less known. They have a long history of use in the area of formal program verification where they are called <i><a href="http://www-sop.inria.fr/everest/personnel/Mariela.Pavlova/ghost.pdf">ghost</a></i> or <i>auxiliary</i> variables. </div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
There is an obvious restriction of data and control flow between various types of elements:</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<ul>
<li style="text-align: justify;">finitary element may depend only on finitary elements;</li>
</ul>
<ul>
<li style="text-align: justify;">laic element may depend on laic and finitary elements (<i>e.g.</i>, normal function can take a constant as a parameter, but constant cannot be initialised with the value of a variable or function call);</li>
</ul>
<ul>
<li style="text-align: justify;">ideal element may depend on any element (<i>e.g.</i>, ideal variable can be assigned the value of a laic variable, but not other way around).</li>
</ul>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div>
<div style="text-align: justify;">
The most important property of ideal elements is that, because they don't affect observable program behaviour, there is no need to actually compute them! Yet, they are useful exactly because of this property: ideal elements are not computed and, hence, are not constrained by the limitations of actual computational environments. For example, an ideal variable can represent an infinite (even uncountable) collection or a real number (real real number, not approximation); an ideal function can be defined by the transfinite induction or by a formula involving quantifiers.</div>
</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
To use ideal elements, one assumes that they follow normal rules of the language (for example, axiomatic or denotational <a href="http://en.wikipedia.org/wiki/Semantics_(computer_science)">semantics</a>). This assumption doesn't burden the implementors of the language precisely because the ideal elements are not computed. Under that assumption, one can reason about properties of ideal elements.</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
As a simplest example, an ideal variable can be used to record the sequence of calls to a certain function:</div>
<div style="text-align: justify;">
<br /></div>
</div>
<pre>ideal f_seq = {};
function f(arg) {
f_seq := f_seq ++ arg;
...
};
</pre>
<br />
<div style="text-align: justify;">
and then reason about f_seq using whatever method is used to reason about laic elements (<i>e.g.</i>, <a href="http://en.wikipedia.org/wiki/Predicate_transformer_semantics">weakest preconditions</a>, <a href="http://en.wikipedia.org/wiki/Hoare_triple">Hoare triples</a> or usual hand-waving), for example, to prove that messages delivered to a receiver were sent by the sender (that is, deliver_seq is a sub-sequence of send_seq).</div>
<div style="text-align: justify;">
<br />
It is interesting that both finitary elements (specifically, static types) and ideal elements help to reason about the behaviour of the laic world sandwiched between them.<br />
<br /></div>
<div style="text-align: justify;">
Nothing in this short article is new, except for the (obvious) duality between ideal and finitary elements.<br />
<br />
<i>Exercise</i> 0: implement <a href="http://en.wikipedia.org/wiki/Substructural_type_system#Linear_type_systems">linear types</a> by casting laic elements to ideal.<br />
<i>Exercise</i> 1: implement garbage collection similarly.</div>
</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-68657508233311458292012-11-09T18:19:00.000+00:002012-11-09T18:22:13.913+00:00Side with effete.<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
More of "I know C better than you!" stuff.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I just discovered (the verb is deeply characteristic) how to write a <tt>HAS_SIDE_EFFECTS(expr)</tt> macro that </div>
<ul>
<li style="text-align: justify;">doesn't evaluate <tt>expr</tt> and</li>
<li style="text-align: justify;">returns true iff <tt>expr</tt> has side-effects.</li>
</ul>
<div style="text-align: justify;">
The macro essentially depends on a GCC extension. It is useful, for example, as a sanity check in conditionally compiled code, <i>e.g.</i>, in <tt>assert(expr)</tt>.</div>
</div>
nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-26477635380919414592012-11-03T17:38:00.001+00:002013-07-24T21:00:04.894+00:00Zero is overrated<div dir="ltr" style="text-align: left;" trbidi="on"><div style="text-align: justify;"><script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script><br />
Define a sequence of functions \(P_i:\mathbb{R}^+\rightarrow\mathbb{R}\), \(i\in\mathbb{N}\)<br />
$$P_0(x) = \ln(x)$$<br />
$$P_{n+1}(x) = \int P_{n}(x)\cdot dx$$<br />
<br />
I found a beautiful closed formula for \(P_i\), that I haven't seen before.<br />
<br />
Integrating by parts, it's easy to calculate first few \(P_i\):<br />
$$P_1(x) = x\cdot\ln x - x$$<br />
$$P_2(x) = \frac{x^2}{2}\cdot\ln x - \frac{3}{4}\cdot x^2$$<br />
$$P_3(x) = \frac{x^6}{6}\cdot\ln x - \frac{11}{36}\cdot x^3$$<br />
<br />
which leads to a hypothesis, that <br />
$$P_n = \frac{x^n}{n!}\cdot\ln x - K_n\cdot x^n$$<br />
for certain constants \(K_i\), \(K_1 = 1\).<br />
<br />
Again integrating by parts, obtain:<br />
$$K_{n+1} = \frac{1}{n+1}\cdot\Bigg(K_n + \frac{1}{(n+1)!}\Bigg)$$<br />
<br />
from where<br />
<table border="0" bordercolor="#ffffff" cellpadding="3" cellspacing="3"><tbody>
<tr><td>$$K_n$$</td><td>$$=$$</td><td>$$\frac{1}{n}\cdot\Bigg(K_{n-1} + \frac{1}{n!}\Bigg)$$</td></tr>
<tr><td></td><td>$$=$$</td><td>$$\frac{1}{n}\cdot\Bigg(\frac{1}{n!} + \frac{1}{n-1}\cdot\big(K_{n-2} + \frac{1}{(n-1)!}\big)\Bigg)$$</td></tr>
<tr><td></td><td>$$=$$</td><td>$$\frac{1}{n!}\cdot\Bigg(\frac{1}{n} + \frac{1}{n-1}\Bigg) + \frac{1}{n\cdot(n - 1)}\cdot K_{n-2}$$</td></tr>
<tr><td></td><td>$$=$$</td><td>$$\frac{1}{n!}\cdot\Bigg(\frac{1}{n} + \frac{1}{n - 1} + \frac{1}{n - 2}\Bigg) + \frac{1}{n\cdot(n-1)\cdot(n - 2)}\cdot K_{n-3}$$</td></tr>
<tr><td></td><td>$$=$$</td><td>$$\cdots$$</td></tr>
<tr><td></td><td>$$=$$</td><td>$$\frac{1}{n!}\cdot\Bigg(\frac{1}{n} + \frac{1}{n - 1} + \cdots +\frac{1}{n - p + 1}\Bigg) + \frac{1}{n\cdot(n-1)\cdot\cdots\cdot(n - p +1)}\cdot K_{n-p}$$</td></tr>
</tbody></table>Substitute \(p = n - 1\):<br />
$$K_n = \frac{1}{n!}\cdot\Bigg(1 + \frac{1}{2} + \cdots + \frac{1}{n}\Bigg)$$<br />
Substitute this in the hypothesis:<br />
$$P_n = \frac{x^n}{n!}\cdot\Bigg(\ln x - \big(1 + \frac{1}{2} + \cdots + \frac{1}{n}\big)\Bigg)$$<br />
This nicely contains fragments of exponent, nth-<a href="http://en.wikipedia.org/wiki/Harmonic_number">harmonic number</a> and, after a diagonalisation, the <a href="http://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant">Euler constant</a>:<br />
$$\lim_{n\to +\infty}\frac{n!}{n^n}\cdot P_n(n) = -\gamma$$<br />
<br />
Why \(P_i\) are interesting at all? Because if one completes them for negative indices as<br />
$$P_n = (-1)^{n-1}\cdot(-n-1)!\cdot x^n$$<br />
then mth-derivative of \(P_n\) is \(P_{n-m}\) for all non-negative \(m\):<br />
$$(\forall n\in\mathbb{Z})(\forall m\in\mathbb{N})\partial^m P_n = P_{n - m}$$<br />
and similarly<br />
$$(\forall n\in\mathbb{Z})(\forall m\in\mathbb{N})\int_m P_n\cdot dx = P_{n + m}$$<br />
where \(\int_m\) is <a href="http://mathworld.wolfram.com/RepeatedIntegral.html">repeated integral</a>.<br />
<br />
This is in contrast with powers \(x^n\), \(n \ge 0\), which, under repeated derivation, eventually pathetically collapse to a constant and then to 0, so that negative powers are not reachable from positive and other way around.<br />
<br />
It's interesting, which other families \((\phi_i)_{i\in\mathbb{Z}}\) are there such that<br />
$$(\forall m\in\mathbb{N})\partial^m\phi_n = \phi_{n - m}$$<br />
$$(\forall m\in\mathbb{N})\int_m \phi_n\cdot dx = \phi_{n + m}$$<br />
and<br />
$$(\forall n\neq m)\phi_n \neq const\cdot\phi_m$$<br />
(the latter condition is to avoid degenerate cases)?</div></div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-52557284466678688252012-09-28T20:39:00.000+00:002013-07-24T21:00:16.240+00:00Деревья по кругу.<div dir="ltr" style="text-align: left;" trbidi="on"><div style="text-align: justify;"><script type="text/javascript"
src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script></div><h2 style="text-align: justify;">Эндоморфизмы и орбиты, небольшая теория для левой руки.</h2><blockquote class="tr_bq" style="text-align: justify;"><span style="font-size: xx-small;"><i>Abstract</i>: orbits of an endomorphism of a finite set are represented as connected components of a suitably defined graph. Their structure is surprisingly simple: a collection of disjoint trees rooted on the vertices of a unique cycle.</span></blockquote><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Пусть \(f:X\rightarrow X\), произвольное отображение множества \(X\) в себя, т.е. <a href="http://en.wikipedia.org/wiki/Endomorphism">эндоморфизм</a> \(X\). Известно, что если \(f\) является <a href="http://en.wikipedia.org/wiki/Automorphism">автоморфизмом</a> (т.е. <a href="http://en.wikipedia.org/wiki/Bijective_map">биекцией</a>), то оно разбивает \(X\) на непересекающиеся <a href="http://en.wikipedia.org/wiki/Orbit_(group_theory)#Orbits_and_stabilizers">орбиты</a>. Каждая орбита состоит из точек<br />
$$\ldots f^{-2}(x), f^{-1}(x), x, f(x), f^2(x), \ldots$$<br />
для некоторого \(x\), где \(f^n\) означает \(n\)-кратное применение \(f\) (\(f^0(x) = x\), \(f^{n+1}(x) = f(f^n(x))\), \(f^{-n}(x) = (f^{-1})^n(x)\)). Если \(X\) конечно, то всякая орбита является <em>циклом</em>, состоящим из конечного числа точек, и для некоторого \(n\) будет \(x = f^{n}(x)\). В случае бесконечного \(X\) могут существовать орбиты, состоящие из бесконечного (счетного) числа различных точек.<br />
<br />
Как выглядят аналоги орбит для произвольных эндоморфизмов конечных множеств?<br />
<br />
Обычное определение орбиты не дает интересного обобщения на эндоморфизмы, т.к. получающися множества пересекаются. Вместо этого, дадим следующее определение:<br />
<br />
<blockquote class="tr_bq">Множество \(M\), \(M \subseteq X\), \(f\)-замкнуто, если \(x \in M\equiv f(x)\in M\), т.е. \(f\)-замкнутое множество содержит вместе с каждой точкой \(x\) ее образ \(f(x)\) и все прообразы \(f^{-1}(x)\).</blockquote>Очевидно, что <br />
<blockquote>Пересечение \(f\)-замкнутых множеств \(M\) и \(N\) — \(f\)-замкнуто:<br />
<table border="0" bordercolor="#ffffff" cellpadding="3" cellspacing="3" style="text-align: justify;"><tbody>
<tr><td></td><td>\(x\in M \cap N\)</td></tr>
<tr><td>\(\equiv\)</td><td>{ определение пересечения множеств }</td></tr>
<tr><td></td><td>\(x\in M \wedge x\in N\)</td></tr>
<tr><td>\(\equiv\)</td><td>{ определение \(f\)-замкнутости}</td></tr>
<tr><td></td><td>\(f(x)\in M \wedge f(x)\in N\)</td></tr>
<tr><td>\(\equiv\)</td><td>{ определение пересечения множеств }</td></tr>
<tr><td></td><td>\(f(x)\in M \cap N\)</td></tr>
</tbody></table></blockquote><br />
<blockquote>Орбитой называется непустое минимальное \(f\)-замкнутое множество, т.е. \(f\)-замкнутое множество не имеющее собственных \(f\)-замкнутых подмножеств.</blockquote><br />
Для автоморфизмов это определение совпадает со стандартным.<br />
<br />
Очевидно, что орбиты не пересекаются (иначе их пересечение было бы их \(f\)-замкнутым подмножеством, в нарушение требования минимальности). Кроме этого, всякая точка принадлежит хотя бы одному \(f\)-замкнутому множеству, например всему \(X\). Это значит, что для всякой точки определено пересечение всех \(f\)-замкнутых множеств, содержащих эту точку. Такое пересечение будет орбитой. Следовательно \(X\) разбивается на непересекающиеся орбиты.<br />
<br />
<h3>Граф</h3>Отображению \(f\) можно сопоставить ориентированный граф (который останется безымянным, как единственный, подлежащий рассмотрению), с множеством вершин \(X\) и множеством дуг \(\{(x, f(x)) | x \in X \}\).<br />
<br />
Этот граф обладает замечательным свойством: у каждой его вершины есть ровно одна исходящая дуга. Очевидно, что всякому графу, обладающему таким свойством соответствует эндоморфизм множества вершин.<br />
<br />
Мы будет также рассматривать неориентированную версию этого же графа (в которой забыты направление дуг) и будет в этом случае говорить о "ребрах" (а не дугах), неориентированнх путях и неориентированных циклах.<br />
<br />
Все пути и циклы предполагаются простыми, т.е. проходящими через данную дугу или вершину не более одного раза.<br />
<br />
Переформулировав определение орбиты в терминах неориентированного графа, получаем, что пара связанных ребром вершин принадлежит одной и той же орбите. Отсюда следует, что орбиты это в точности <a href="http://ru.wikipedia.org/wiki/%D0%9A%D0%BE%D0%BC%D0%BF%D0%BE%D0%BD%D0%B5%D0%BD%D1%82%D0%B0_%D1%81%D0%B2%D1%8F%D0%B7%D0%BD%D0%BE%D1%81%D1%82%D0%B8_%D0%B3%D1%80%D0%B0%D1%84%D0%B0">компоненты связности</a> неориентированного графа.<br />
<br />
Рассмотрим дугу \(a\) из вершины \(x\) в вершину \(y\) (т.е. предположим, что \(y = f(x)\)). Для соответствующего ребра \(a\) определим<br />
<br />
$$\sigma(x, a) = +1$$<br />
$$\sigma(y, a) = -1$$<br />
<br />
Таким образом, функция \(\sigma\) определена для любой вершины и смежного ей ребра и обладает следующими свойствами:<br />
<blockquote>Если ребро \(a\) соединяет вершины \(x\) и \(y\), то \(\sigma(x, a) + \sigma(y, a) = 0\) (свойство (0)) </blockquote><br />
<blockquote>Если вершина \(x\) соединяет разные ребра \(a\) и \(b\), то \(\sigma(x, a) + \sigma(x, b) \leq 0\) (свойство (1))</blockquote><br />
Теперь мы может доказать простую лемму:<br />
<blockquote>Если \(a_0, ... a_n\) неориентированный путь между вершинами \(x\) и \(y\), то \(\sigma(x, a_0) + \sigma(y, a_n) > -2\), т.е. крайние ребра пути не могут быть одновременно входящими.<br />
<br />
Индукция по длине пути.<br />
\(n = 1\). Немедленно по свойству (0).<br />
\(n = k + 1\). Пусть \(a_0, ... a_k\) неориентированный путь между вершинами \(x\) и \(z\), а ребро \(a_{k+1}\) соединяет \(z\) и \(y\).<br />
<table border="0" bordercolor="#ffffff" cellpadding="3" cellspacing="3" style="text-align: justify;"><tbody>
<tr><td></td><td>\(\sigma(x, a_0) + \sigma(y, a_{k+1})\)</td></tr>
<tr><td>\(=\)</td><td>\(\sigma(x, a_0) + \sigma(y, a_{k+1}) + \sigma(z, a_k) - \sigma(z, a_k)\)</td></tr>
<tr><td>\(>\)</td><td>{ по гипотезе индукции \(\sigma(x, a_0) + \sigma(z, a_k) > -2\) }</td></tr>
<tr><td></td><td>\(-2 + \sigma(y, a_{k+1}) - \sigma(z, a_k)\)</td></tr>
<tr><td>\(=\)</td><td>\(-2 + \sigma(y, a_{k+1}) - \sigma(z, a_k) + \sigma(z, a_{k+1}) - \sigma(z, a_{k+1})\)</td></tr>
<tr><td>\(\geq\)</td><td>{ по свойству (1) \(\sigma(z, a_k) + \sigma(z, a_{k+1}) \leq 0\)}</td></tr>
<tr><td></td><td>\(-2 + \sigma(y, a_{k+1}) + \sigma(z, a_{k+1})\)</td></tr>
<tr><td>\(=\)</td><td>{ по свойству (0) \(\sigma(y, a_{k+1}) + \sigma(z, a_{k+1}) = 0\)}</td></tr>
<tr><td></td><td>\(-2\)</td></tr>
</tbody></table></blockquote><h3>Циклы</h3>Рассмотрим ориентированные циклы в нашем графе. Для всякой вершины \(x\), лежащей на ориентированном цикле из \(n\) вершин \((x_i | 0 \leq i < n)\), имеем \(f^n(x) = x\). В частности, циклы длины 1 это в точности стационарные точки \(f\). Обратно, множество вершин всякого ориентированного цикла длины \(n\) можно описать как $$\{f^i(x) | 0 \leq i < n\} = \{f^i(x) | 0 \leq i\} = \{x, f(x), \ldots, f^{n-1}(x)\}$$ где за \(x\) можно принять любую вершину цикла. <br />
<br />
При помощи функции \(\sigma\) легко доказывается и следующий полезный факт:<br />
<blockquote>Всякий цикл ориентирован.</blockquote><br />
<blockquote>Рассмотрим неориентированный цикл из ребер \(a_i\), соединяющих вершины \(x_i\) и \(x_{i+1}\), где \(0 \leq i < n\) и \(i+1\) понимается по модулю \(n\). Составим сумму $$S = \sum_{0 \leq i < n}\sigma(x_i, a_i) + \sigma(x_{i+1}, a_i)$$ По свойству (0), каждое слагаемое этой суммы равно 0. Перегруппировав слагаемые, получим $$S = \sum_{0 \leq i < n}\sigma(x_i, a_i) + \sigma(x_i, a_{i-1}) = 0$$ По свойству (1) (применимому, т.к. цикл простой), каждое слагаемое здесь не больше нуля. Так как сумма равно 0, то каждое слагаемое равно 0: $$\sigma(x_i, a_i) + \sigma(x_i, a_{i-1}) = 0$$ Т.е., в каждую вершину цикла входит и выходит одна дуга, то есть цикл ориентирован. </blockquote><br />
<br />
С этого момента не будет различать ориентированные и неориентированные циклы.<br />
<br />
Просто доказывается следующее свойство: <br />
<blockquote>Циклы имеющие общую вершину совпадают.<br />
<br />
Действительно, пусть циклы \(\{f^i(x) | 0 \leq i\}\) и \(\{f^i(y) | 0 \leq i\}\) имеют общую точку. Т.е. \(f^p(x) = f^q(y)\). Для произвольной точки первого цикла имеем<br />
<table border="0" bordercolor="#ffffff" cellpadding="3" cellspacing="3" style="text-align: justify;"><tbody>
<tr><td></td><td>\(f^i(x)\)</td></tr>
<tr><td>\(=\)</td><td>{ для произвольного \(d\), так как \(f^n(x) = x\) }</td></tr>
<tr><td></td><td>\(f^{i+ d\cdot n}(x)\)</td></tr>
<tr><td>\(=\)</td><td>{ выберем \(d\) так, что \(i+ d\cdot n > p\) }</td></tr>
<tr><td></td><td>\(f^{p + \Delta}(x)\)</td></tr>
<tr><td>\(=\)</td><td>\(f^\Delta(f^p(x))\)</td></tr>
<tr><td>\(=\)</td><td>{ \(f^p(x) = f^q(y)\) }</td></tr>
<tr><td></td><td>\(f^{q +\Delta}(y) \in \{f^i(y) | 0 \leq i\}\)</td></tr>
</tbody></table>Аналогично, всякая точка второго цикла принадлежит первому.</blockquote><br />
Отсюда следует интересное свойство:<br />
<blockquote>Всякий неориентированный путь \(P\) между вершинами \(x\) и \(y\), лежащими на циклах, полностью лежит в некотором цикле.<br />
<br />
Применим индукцию по длине \(P\).<br />
<br />
\(n = 0\). Очевидно.<br />
\(n = k + 1\). По лемме, хотя бы одно из крайних ребер \(P\) — исходящее. Пусть это ребро, исходящее из точки \(x\). Это ребро соединяет \(x\) с вершиной \(f(x)\), также лежащей на цикле. Таким образом, путь \(f(x), \ldots, y\) длины \(k\) соединяет точки, лежащие на циклах и следовательно (по гипотезе индукции), полностью лежит в некотором цикле. Этот цикл имеет общую точку \(f(x)\) с циклом в котором лежит точка \(x\) и, следовательно, эти циклы совпадают. Таким образом, путь \(x, \ldots, y\) полностью лежит в цикле.</blockquote><br />
Таким образом, два разных цикла не могут быть соединены путем. Следовательно, каждая орбита содержит не более одного цикла.<br />
<br />
Настало время использовать конечность \(X\).<br />
<blockquote>Всякая орбита конечного множества содержит цикл.<br />
<br />
Действительно, орбита \(M\) не пуста по определению. Выберем в ней точку \(x\) и рассмотрим функцию \(\phi : \mathbb{N} \rightarrow M\), \(\phi(n) = f^n(x)\). Т.к. \(M\) конечно, \(\phi\) не может быть однозначным отображением и существуют натуральные числа \(p\) и \(q\), такие что \(f^p(x) = f^q(x)\). Следовательно, \(f^p(x)\) лежит на цикле длины не большей \(|p - q|\).</blockquote><br />
Зафиксируем орбиту и ее единственный цикл.<br />
<br />
Рассмотрим "прореженный" граф, получающийся из орбиты уделением всех ребер цикла и выберем компоненту связности этого графа, содержащую произвольную вершину на цикле.<br />
<br />
Очевидно, что<br />
<ul><li>это множество не содержит других точек цикла, так как по свойству, установленному выше, всякий путь соединяющий точки цикла лежит в этом цикле, а ребра цикла были удалены;<br />
</li>
<li>это множество не содержит циклов, т.к. цикл в компоненте единственный и его ребра удалены.</li>
</ul></div><div style="text-align: justify;">Итак, компонента является связным (по построению) графом без циклов, т.е. <a href="http://ru.wikipedia.org/wiki/%D0%94%D0%B5%D1%80%D0%B5%D0%B2%D0%BE_(%D1%82%D0%B5%D0%BE%D1%80%D0%B8%D1%8F_%D0%B3%D1%80%D0%B0%D1%84%D0%BE%D0%B2)">деревом</a>, пересекающимся с циклом ровно в одной точке, которую можно выбрать за корень. Каждой точке цикла соответствует такое дерево. Деревья, соответствующие разным точкам цикла не пересекаются (как компоненты связности).</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Докажем, что всякая точка \(x\) орбиты принадлежит такому дереву. Для этого достаточно показать, что \(x\) можно соединить с некоторой вершиной цикла неориентированным путем не проходящим через ребра цикла (т.е. неориентированным путем пролегающем в прореженном графе) Т.к. орбита связна, то существует неориентированный путь \(P\), соединяющий \(x\) с вершиной цикла. Если этот путь содержит ребро цикла \(a\), то он распадается в композицию путей \(P = P_0\cdot a\cdot Q_0\). \(P_0\) соединяет \(x\) с одной из точек, соединенных ребром \(a\), т.е. с некоторой точкой на цикле. Повторяя процесс этот процесс необходимое число раз мы получим путь \(P_n\), который не содержит ребер цикла (возможно вообще не содержит ребер).</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Итак, орбита эндоморфизма конечного множества распадается на непересекающиеся деверья, корни которых прикреплены к циклу.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Под действием последовательного применения преобразования \(f\), всякая точка сначала движется по дереву, в направлении корня, сливаясь при этом с другими точками этого дерева. После достижения корня, точка начинает двигаться по циклу.</div><div style="text-align: justify;"><br />
</div><h3 style="text-align: justify;">Бесконечность</h3><div style="text-align: justify;">В случае бесконечного множества, орбита может быть устроена несколько сложнее.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Вместо циклов, придется рассмотреть <i>траектории</i>, определенные, как последовательности точек \((f^n(x) | n > 0)\). Цикл является частным случаем траектории.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Так же как и в случае с циклами, неориентированный путь, соединяющий точки на траекториях, сам лежит в некоторой траектории. Однако траектории имеющие общую точку не обязаны совпадать. К счастью, имеется более слабое, но полезное свойство: если пересечение траекторий непусто, то оно является траекторией.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Как и в конечном случае, рассмотрим компоненту связности. Возьмем пересечение всех содержащихся в ней траекторий. Предположим, что оно непусто. В таком случае, это перечение есть траектория. Образуем прореженный граф, удалив все ребра траектории-пересечения. Так как мы удалили как минимум по ребру из каждой траектории, то в рассматриваемой компоненте связности не осталось ни одной траектории, в частности, ни одного цикла. Неориентированная компонента связности прореженного графа является таким образом, связным ацикличным графом, т.е. деревом.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Итак, в случае непустого пересечения траекторий, орбита распадается на непересекающиеся деревья, корни которых закреплены на траектории-пересечении.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Остается рассмотреть случай пустого пересечения. Пример такой ситуации доставляет функция</div><div style="text-align: justify;">$$f:\mathbb{N}\times\mathbb{N}\rightarrow\mathbb{N}\times\mathbb{N}$$</div><div style="text-align: justify;">$$f(x, y) = (\lfloor x/2^y\rfloor\cdot 2^y, y+1)$$</div><div style="text-align: justify;">\(f\) строит бесконечное "дерево", начиная с листьев, склеивая \(2^y\) смежных узлов уровня \(y\) в один узел уровня \(y+1\).</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Очевидно, что все множество \(\mathbb{N}\times\mathbb{N}\) является орбитой. Каждая траектория через конечное число шагов достигает оси \(x = 0\), следовательно все траектории пересекаются, но их пересечение пусто.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;">Однако здесь бессоница отступает и авторы прекращают дозволенные речи.</div><div style="text-align: justify;"><br />
</div><div style="text-align: justify;"><br />
</div></div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com0tag:blogger.com,1999:blog-5799246.post-45749646622288828802012-05-11T20:20:00.000+00:002012-05-11T20:53:09.193+00:00A macro.<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
<a href="http://en.wikipedia.org/wiki/C_language">C language</a> is wonderfully profound. After almost 20 years, I still find new ways to (ab-)use it. Here is a little problem:<br />
<blockquote>
Define a macro <span style="font-family: 'Courier New', Courier, monospace;">IN(x, set)</span>, called as<br />
<blockquote class="tr_bq">
<span style="font-family: 'Courier New', Courier, monospace;">IN(x, (v0, v1, ..., vn))</span></blockquote>
that expands into an expression<br />
<blockquote class="tr_bq">
<span style="font-family: 'Courier New', Courier, monospace;">((x) == (v0) || (x) == (v1) || ... (x) == (vn))</span> </blockquote>
which evaluates to true <a href="http://en.wikipedia.org/wiki/Iff">iff</a> <span style="font-family: 'Courier New', Courier, monospace;">x</span> is a member of set {<span style="font-family: 'Courier New', Courier, monospace;">v0</span>, ..., <span style="font-family: 'Courier New', Courier, monospace;">vn</span>}, where the type of <span style="font-family: 'Courier New', Courier, monospace;">x</span> and all <span style="font-family: 'Courier New', Courier, monospace;">v</span>s is the same. Alternatively, define a macro <span style="font-family: 'Courier New', Courier, monospace;">IN1(x, ...)</span> that expands to the same expression as <span style="font-family: 'Courier New', Courier, monospace;">IN(x, (...))</span>.</blockquote>
</div>
</div>nikitahttp://www.blogger.com/profile/09403336533089968821noreply@blogger.com2