After long delay I updated my kernel patches to 2.6.12-rc6. This required installing git and cogito, but it turned out that time wasn't wasted: these tools beat bitkeeper hands down CPU-wise.
New version of patches is uploaded here.
This series include:
- vm_01-zoneinfo.patch
-
Add /proc/zoneinfo file to display information about memory zones. Useful to analyze VM behaviour. This was merged into -mm.
- vm_02-skip-writepage.patch
-
Don't call ->writepage from VM scanner when page is met for the first time during scan.
New page flag PG_skipped is used for this. This flag is TestSet-ed just before calling ->writepage and is cleaned when page enters inactive list.
One can see this as „second chance“ algorithm for the dirty pages on the inactive list.
BSD does the same: src/sys/vm/vm_pageout.c:vm_pageout_scan(), PG_WINATCFLS flag.
Reason behind this is that ->writepages() will perform more efficient writeout than ->writepage(). Skipping of page can be conditioned on zone->pressure.
On the other hand, avoiding ->writepage() increases amount of scanning performed by kswapd.
(Possible drawback: executable text pages are evicted earlier.)
- vm_03-dont-rotate-active-list.patch
-
Currently, if zone is short on free pages, refill_inactive_zone() starts moving pages from active_list to inactive_list, rotating active_list as it goes. That is, pages from the tail of active_list are transferred to its head, thus destroying lru ordering, exactly when we need it most --- when system is low on free memory and page replacement has to be performed.
This patch modifies refill_inactive_zone() so that it scans active_list without rotating it. To achieve this, special dummy page zone->scan_page is maintained for each zone. This page marks a place in the active_list reached during scanning.
As an additional bonus, if memory pressure is not so big as to start swapping mapped pages (reclaim_mapped == 0 in refill_inactive_zone()), then not referenced mapped pages can be left behind zone->scan_page instead of moving them to the head of active_list. When reclaim_mapped mode is activated, zone->scan_page is reset back to the tail of active_list so that these pages can be re-scanned.
- vm_04-__alloc_pages-inject-failure.patch
-
Force artificial failures in page allocator. I used this to harden some kernel code.
- vm_05-page_referenced-move-dirty.patch
-
transfer dirtiness from pte to the struct page in page_referenced(). This makes pages dirtied through mmap „visible“ to the file system, that can write them out through ->writepages() (otherwise pages are written from ->writepage() from tail of the inactive list).
- vm_06-cluster-pageout.patch
-
Implement pageout clustering at the VM level.
With this patch VM scanner calls pageout_cluster() instead of ->writepage(). pageout_cluster() tries to find a group of dirty pages around target page, called „pivot“ page of the cluster. If group of suitable size is found, ->writepages() is called for it, otherwise, page_cluster() falls back to ->writepage().
This is supposed to help in work-loads with significant page-out of file-system pages from tail of the inactive list (for example, heavy dirtying through mmap), because file system usually writes multiple pages more efficiently. Should also be advantageous for file-systems doing delayed allocation, as in this case they will allocate whole extents at once.
Few points:
- swap-cache pages are not clustered (although they can be, but by page->private rather than page->index)
- only kswapd do clustering, because direct reclaim path should be low latency.
- this patch adds new fields to struct writeback_control and expects ->writepages() to interpret them. This is needed, because pageout_cluster() calls ->writepages() with pivot page already locked, so that ->writepages() is allowed to only trylock other pages in the cluster.
Only mpage_writepages() was updated to honor these new fields, but all in-tree ->writepages() implementations seem to call mpage_writepages(). (Except reiser4, of course, for which I'll send a (trivial) patch, if necessary).
- vm_07-proc-stack.patch
-
Export kernel backtrace in /proc/<pid>/task/<tid>/stack. Useful when debugging deadlocks.
This somewhat duplicates functionality of SysRq-T, but is less intrusive to the system operation and can be used in the scripts.
Exporting kernel stack of a thread is probably unsound security-wise. Use with care.
Instead of adding yet another architecture specific function to output thread stack through seq_file API, it introduces „iterator“
void do_with_stack(struct task_struct *tsk, int (*actor)(int, void *, void *, void *), void *opaque)
that has to be implemented by each architecture, so that generic code can iterate over stack frames in architecture-independent way.
lib/do_with_stack.c is provided for archituctures that don't implement their own. It is based on __builtin_{frame,return}_address().
- vm_08-proc-sleep.patch
-
export per-process blocking statistics in /proc/<pid>/task/<tid>/sleep and global sleeping statistics in /proc/sleep. Statistics collection for given file is activated on the first read of corresponding /proc file. When statistics collection is on on each context switch current back-trace is built (through __builtin_return_address()). For each monitored process there is a LRU list of such back-traces. Useful when trying to understand where elapsed time is spent.
- vm_09-ll_merge_requests_fn-cleanup.patch
-
ll_merge_requests_fn() assigns total_{phys,hw}_segments twice. Fix this and a typo. Merged into -mm.
- vm_0a-deadline-iosched.c-cleanup.patch
-
Small cleanup.
rmap-cleanup.patch and WRITEPAGE_ACTIVATE-doc-fix.patch were merged into Linus tree.
No comments:
Post a Comment