2011年6月11日

khungtaskdについて

カーネルスレッドにkhungtaskdというものがあるが、何者か、何ができるか少し実験。

確認環境
  • CentOS 5.6
実験

機能の確認
日本語での情報は見当たらず。英語だが以下のサイトを発見。
超訳だが、「ブロックされハングアップしたプロセスを定期巡回から発見し、プロセスのスタックダンプを取る機能」 らしい。(間違い指摘歓迎)

    姿の確認
    [owner@localhost ~]$ ps -ef | grep hung
    root       156     7  0 Jun05 ?        00:00:00 [khungtaskd]
    owner    17774 17750  0 21:27 pts/0    00:00:00 grep hung
    [owner@localhost ~]$

    カーネルパラメータの確認
    [root@localhost ~]# sysctl -a | grep hung
    kernel.hung_task_warnings = 10
    kernel.hung_task_timeout_secs = 120
    kernel.hung_task_check_count = 32768
    kernel.hung_task_panic = 0
    [root@localhost ~]#

    ソースコードの確認

    http://lxr.linux.no/#linux+v2.6.30/kernel/hung_task.c参照のこと
    kernel.hung_task_warnings
    定期巡回で異常を発見する都度、カウントダウンされる。つまり、ワーニングを発生させる回数と同値。(多分)
    kernel.hung_task_timeout_secs
    0 : 機能の停止、1~ : 機能の巡回秒数(デフォルトは120)
    kernel.hung_task_check_count
    定期巡回で検査対象にするプロセスIDの最大値?
    kernel.hung_task_panic
    0 : パニックしない(デフォルト) 、1 : パニックする

    動作実験
    何かハングアップを意図的に発生させようと考えたが、方法を思いつかず。
    そこで、kernel.hung_task_timeout_secs=1(下記のようにして設定)にして、放置すれば、そのうち何かが引っかかるだろう、と待つことに。

    [root@localhost ~]# echo 1 > /proc/sys/kernel/hung_task_timeout_secs

    一晩放置したら、出てる出てる。

    Jun 11 20:59:27 localhost kernel: INFO: task kjournald:359 blocked for more than 1 secon
    ds.
    Jun 11 20:59:27 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" dis
    ables this message.
    Jun 11 20:59:27 localhost kernel: kjournald     D 000048EF  2900   359      7
    390   348 (L-TLB)
    Jun 11 20:59:27 localhost kernel:        f7c80ed4 00000046 a751e9d4 000048ef c042e51c 00
    000082 f7d80b60 0000000a
    Jun 11 20:59:27 localhost kernel:        f7c82000 a7548843 000048ef 00029e6f 00000000 f7
    c8210c c17e7200 f393ee40
    Jun 11 20:59:27 localhost kernel:        0fd00000 c041e388 c18c5a04 c18c59fc 09923ea8 c0
    42d0bb c17e76bc 09923ea8
    Jun 11 20:59:27 localhost kernel: Call Trace:
    Jun 11 20:59:27 localhost kernel:  [<c042e51c>] del_timer+0x62/0x69
    Jun 11 20:59:27 localhost kernel:  [<c041e388>] find_busiest_group+0x177/0x462
    Jun 11 20:59:27 localhost kernel:  [<c042d0bb>] getnstimeofday+0x30/0xb6
    Jun 11 20:59:27 localhost kernel:  [<c061fe36>] io_schedule+0x36/0x59
    Jun 11 20:59:27 localhost kernel:  [<c0479a25>] sync_buffer+0x30/0x33
    Jun 11 20:59:27 localhost kernel:  [<c062000d>] __wait_on_bit+0x33/0x58
    Jun 11 20:59:27 localhost kernel:  [<c04799f5>] sync_buffer+0x0/0x33
    Jun 11 20:59:27 localhost kernel:  [<c04799f5>] sync_buffer+0x0/0x33
    Jun 11 20:59:27 localhost kernel:  [<c0620094>] out_of_line_wait_on_bit+0x62/0x6a
    Jun 11 20:59:27 localhost kernel:  [<c0436bf4>] wake_bit_function+0x0/0x3c
    Jun 11 20:59:27 localhost kernel:  [<c04799a2>] __wait_on_buffer+0x1c/0x1f
    Jun 11 20:59:27 localhost kernel:  [<f8885422>] journal_commit_transaction+0x4be/0xefc [
    jbd]
    Jun 11 20:59:27 localhost kernel:  [<c042df0b>] lock_timer_base+0x15/0x2f
    Jun 11 20:59:27 localhost kernel:  [<c042df8a>] try_to_del_timer_sync+0x65/0x6c
    Jun 11 20:59:27 localhost kernel:  [<f8888c21>] kjournald+0xa1/0x1c2 [jbd]
    Jun 11 20:59:27 localhost kernel:  [<c0436bc7>] autoremove_wake_function+0x0/0x2d
    Jun 11 20:59:27 localhost kernel:  [<f8888b80>] kjournald+0x0/0x1c2 [jbd]
    Jun 11 20:59:27 localhost kernel:  [<c0436b03>] kthread+0xc0/0xed
    Jun 11 20:59:27 localhost kernel:  [<c0436a43>] kthread+0x0/0xed
    Jun 11 20:59:27 localhost kernel:  [<c0405c87>] kernel_thread_helper+0x7/0x10
    Jun 11 20:59:27 localhost kernel:  =======================

    Jun 11 21:01:02 localhost kernel: INFO: task crond:6060 blocked for more than 1 seconds.
    Jun 11 21:01:02 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" dis
    ables this message.
    Jun 11 21:01:02 localhost kernel: crond         D 00004905  2584  6060   2288
              (NOTLB)
    Jun 11 21:01:02 localhost kernel:        cb17cdcc 00000082 b815344b 00004905 00000082 f7
    d80b60 f7d80b60 00000006
    Jun 11 21:01:02 localhost kernel:        e1ef3000 b828a8bb 00004905 00137470 00000000 e1
    ef310c c17e7200 ebc42040
    Jun 11 21:01:02 localhost kernel:        00000000 cb17cdc4 00000000 cb17cdc4 0995240e c0
    42d0bb f394193c 0995240e
    Jun 11 21:01:02 localhost kernel: Call Trace:
    Jun 11 21:01:02 localhost kernel:  [<c042d0bb>] getnstimeofday+0x30/0xb6
    Jun 11 21:01:02 localhost kernel:  [<c061fe36>] io_schedule+0x36/0x59
    Jun 11 21:01:02 localhost kernel:  [<c0459db8>] sync_page+0x0/0x3b
    Jun 11 21:01:02 localhost kernel:  [<c0459df0>] sync_page+0x38/0x3b
    Jun 11 21:01:02 localhost kernel:  [<c061ff48>] __wait_on_bit_lock+0x2a/0x52
    Jun 11 21:01:02 localhost kernel:  [<c0459d33>] __lock_page+0x52/0x59
    Jun 11 21:01:02 localhost kernel:  [<c0436bf4>] wake_bit_function+0x0/0x3c
    Jun 11 21:01:02 localhost kernel:  [<c045a55b>] do_generic_mapping_read+0x1f7/0x380
    Jun 11 21:01:02 localhost kernel:  [<c045af56>] __generic_file_aio_read+0x16a/0x1a3
    Jun 11 21:01:02 localhost kernel:  [<c0459a29>] file_read_actor+0x0/0xd5
    Jun 11 21:01:03 localhost kernel:  [<c045afca>] generic_file_aio_read+0x3b/0x42
    Jun 11 21:01:03 localhost kernel:  [<c0476d87>] do_sync_read+0xb6/0xf1
    Jun 11 21:01:03 localhost kernel:  [<c0436bc7>] autoremove_wake_function+0x0/0x2d
    Jun 11 21:01:03 localhost kernel:  [<c0476cd1>] do_sync_read+0x0/0xf1
    Jun 11 21:01:03 localhost kernel:  [<c0477660>] vfs_read+0x9f/0x141
    Jun 11 21:01:03 localhost kernel:  [<c0477ae6>] sys_read+0x3c/0x63
    Jun 11 21:01:03 localhost kernel:  [<c0404f4b>] syscall_call+0x7/0xb
    Jun 11 21:01:03 localhost kernel:  =======================

    Jun 11 21:13:21 localhost kernel: INFO: task smartd:2498 blocked for more than 1 seconds
    .
    Jun 11 21:13:21 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" dis
    ables this message.
    Jun 11 21:13:21 localhost kernel: smartd        D 000049B1  2592  2498      1          2
    503  2409 (NOTLB)
    Jun 11 21:13:21 localhost kernel:        ec3e6c7c 00000086 8ff2a672 000049b1 c084e114 ec
    3e6cd8 c0575173 0000000a
    Jun 11 21:13:21 localhost kernel:        ec4b1000 8ff4aa79 000049b1 00020407 00000000 ec
    4b110c c17e7200 f3961200
    Jun 11 21:13:21 localhost kernel:        00000000 00000000 ffffffff f7d7adc0 00000000 00
    000000 00000000 ffffffff
    Jun 11 21:13:21 localhost kernel: Call Trace:
    Jun 11 21:13:21 localhost kernel:  [<c0575173>] execute_drive_cmd+0x169/0x1ab
    Jun 11 21:13:21 localhost kernel:  [<c061f905>] wait_for_completion+0x6b/0x8f
    Jun 11 21:13:21 localhost kernel:  [<c041f80f>] default_wake_function+0x0/0xc
    Jun 11 21:13:21 localhost kernel:  [<c0575b1b>] ide_do_drive_cmd+0xd7/0xfa
    Jun 11 21:13:21 localhost kernel:  [<c0579ac7>] ide_task_ioctl+0x45/0x6b
    Jun 11 21:13:21 localhost kernel:  [<c04e29ee>] blk_end_sync_rq+0x0/0x1d
    Jun 11 21:13:21 localhost kernel:  [<c0574793>] generic_ide_ioctl+0x257/0x450
    Jun 11 21:13:21 localhost kernel:  [<c057d8eb>] idedisk_ioctl+0x1c/0x20
    Jun 11 21:13:21 localhost kernel:  [<c04e51f3>] blkdev_driver_ioctl+0x4b/0x5b
    Jun 11 21:13:21 localhost kernel:  [<c04e590e>] blkdev_ioctl+0x70b/0x759
    Jun 11 21:13:21 localhost kernel:  [<c04c9c29>] inode_has_perm+0x54/0x5c
    Jun 11 21:13:21 localhost kernel:  [<c06200e8>] mutex_lock+0xb/0x19
    Jun 11 21:13:21 localhost kernel:  [<c057e05e>] idedisk_open+0x38/0xb3
    Jun 11 21:13:21 localhost kernel:  [<c04c95c3>] avc_has_perm+0x3c/0x46
    Jun 11 21:13:21 localhost kernel:  [<c04c9c29>] inode_has_perm+0x54/0x5c
    Jun 11 21:13:21 localhost kernel:  [<c047e7f7>] blkdev_open+0x0/0x44
    Jun 11 21:13:21 localhost kernel:  [<c047e813>] blkdev_open+0x1c/0x44
    Jun 11 21:13:21 localhost kernel:  [<c047595a>] __dentry_open+0xea/0x1ab
    Jun 11 21:13:21 localhost kernel:  [<c047dd12>] block_ioctl+0x13/0x16
    Jun 11 21:13:21 localhost kernel:  [<c047dcff>] block_ioctl+0x0/0x16
    Jun 11 21:13:21 localhost kernel:  [<c0487629>] do_ioctl+0x1c/0x5d
    Jun 11 21:13:21 localhost kernel:  [<c0487bbd>] vfs_ioctl+0x47b/0x4d3
    Jun 11 21:13:21 localhost kernel:  [<c0487c5d>] sys_ioctl+0x48/0x5f
    Jun 11 21:13:21 localhost kernel:  [<c0404f4b>] syscall_call+0x7/0xb
    Jun 11 21:13:21 localhost kernel:  =======================

    出た結果の見方は分からないが、検知してくれていることを確認。

    以上

    0 件のコメント:

    コメントを投稿