在msm8610里,有一個debug選項LOCKUP_DETECTOR。之前有過說明。 在msm8916里,又多了CONFIG_MSM_WATCHDOG_V2 和CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU兩個宏。 **CONFIG_MSM_WATCHDOG_V2后面再說,先說一下LOCKUP_DETECTOR和CONFIG_HARDLOCKUP_DETECTOR_OTHER_CPU。
下面來分析一下代碼,然后看一下系統怎么檢測soft lockup和hard lockup。
接下來我們從具體代碼入手分析linux(3.10)是如何實現這兩種lockup的探測的:
然后第一次watchdog_enable運行之后,過4秒就會調用watchdog_timer_fn()。
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer){ unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts);//watchdog_touch_ts每次在 //__touch_watchdog里邊會賦值成get_timestamp(),get_timestamp()函數在 //watchdog.c文件里邊的定義,返回值是秒數! //但touch_ts的值絕大多數時間都是0,因為touch_softlockup_watchdog()函數和 //touch_nmi_watchdog()頻繁被調用,把watchdog_touch_ts的值賦值為0 //沒有仔細看touch_softlockup_watchdog和touch_nmi_watchdog的調用點,但好像都是在 //調度的時候調用這兩個函數去把watchdog_touch_ts清零的。 //如果這里發現touch_ts不是0,那表明上次watchdog被喚醒,一直到現在都沒有調度,應該是有問題的。 //那怎么判斷是否是真的有問題呢?繼續往下看 struct pt_regs *regs = get_irq_regs(); int duration; pr_info("watchdog_timer_fn started/n"); /* kick the hardlockup detector */ watchdog_interrupt_count(); /* test for hardlockups on the next cpu */ watchdog_check_hardlockup_other_cpu();//這個先不管,后面再說明怎么檢測其他cpu上是否有hardlock的 /* kick the softlockup detector */ wake_up_process(__this_cpu_read(softlockup_watchdog));//喚醒watchdog進程 //前面說過softlockup_watchdog這里保存watchdog相關的task_struct指針的指針, //所以這里用來喚醒watchdog的task。 /* .. and repeat */ hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));//延長當前hrtimer到期的時間!! //使用這個函數,下一次hrtimer到期的時間是現在的時間加上ns_to_ktime(sample_period)這個時間!! if (touch_ts == 0) { if (unlikely(__this_cpu_read(softlockup_touch_sync))) { /* * If the time stamp was touched atomically * make sure the scheduler tick is up to date. */ __this_cpu_write(softlockup_touch_sync, false); sched_clock_tick(); } /* Clear the guest paused flag on watchdog reset */ kvm_check_and_clear_guest_paused(); __touch_watchdog(); return HRTIMER_RESTART; } /* check for a softlockup * This is done by making sure a high priority task is * being scheduled. The task touches the watchdog to * indicate it is getting cpu time. If it hasn't then * this is a good indication some task is hogging the cpu */ //能跑到這里,像上面說的,上次watchdog進程被喚醒之后,到現在一直是沒有調度的。 //但是怎么判斷現在已經有問題呢?下面is_softlockup()函數判斷,上次watchdog被喚醒到現在這段時間,是否大于20秒 //如果大于20秒,那就進入下面的if語句了,表明系統有問題。但具體是hardlock還是softlock還得判斷一下 duration = is_softlockup(touch_ts); if (unlikely(duration)) { /* * If a virtual machine is stopped by the host it can look to * the watchdog like a soft lockup, check to see if the host * stopped the vm before we issue the warning */ if (kvm_check_and_clear_guest_paused()) return HRTIMER_RESTART; /* only warn once */ if (__this_cpu_read(soft_watchdog_warn) == true) return HRTIMER_RESTART; printk(KERN_EMERG "BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]/n", smp_processor_id(), duration, current->comm, task_pid_nr(current)); print_modules(); print_irqtrace_events(current); if (regs) show_regs(regs); else dump_stack(); if (softlockup_panic) panic("softlockup: hung tasks"); __this_cpu_write(soft_watchdog_warn, true); } else __this_cpu_write(soft_watchdog_warn, false); return HRTIMER_RESTART;//重新開始hrtimer!!和上面的 //hrtimer_forward_now函數配合,可以重新 //開始hrtimer并設置下一次運行的時間!!}新聞熱點
疑難解答