Interrupts -5- (Softirq)

Softirq

특징

리눅스 커널의 interrupt bottom-half 처리기 중 가장 큰 부분으로 동작한다.
softirq는 최고 우선 순위의 ksoftirqd 스레드가 각각의 cpu에서 동작된다.
기존의 많은 드라이버들이 interrupt bottom-half 처리기로 tasklet을 많이 사용해었는데 tasklet 인터페이스가 그대로 softirq의 한 부분으로 동작하면서 기존 tasklet을 사용하던 드라이버들을 흡수하였다.

두 가지 context 사용

softirq는 경우에 따라 두 가지 context 모드를 전환하면서 동작한다. 단 CONFIG_IRQ_FORCED_THREADING 커널 옵션과 “threadirqs” 커널 파라메터를 사용하여 항상 process context로만 동작시킬 수 있다.

irq context
- hardirq라고도 불린다.
- 처음 호출 시 irq context 상태에서 직접 핸들러 함수를 호출하여 처리한다.
process context
- task context 또는 thread context 라고도 불린다.
- irq context 에서 처리하다 2ms 이상 처리가 길어지면 process context에서 동작하는 ksoftirqd 스레드를 깨워서 softirq 처리를 의뢰한다.
- ksoftirqd가 깨어나면 다시 잠들기 전까지 이후의 펜딩 softirq들의 처리를 ksoftirqd가 담당한다.
  - 모든 펜딩된 softirq 처리 요청이 완료되면 다시 ksoftirqd는 잠든다. 이 후 softirq 요청 들은 다시 irq context에서 처리된다.
- irq context에서 계속 처리하지 않고 ksoftirqd 같은 태스크에 처리를 의뢰하는 것은 대량의 인터럽트 처리 를 irq context에서 수행하느라 cpu를 독식하여 일반 태스크들이 동작하지 못하는 기아(starvation) 현상을 없애기 위함이다.

다음 그림은 softirq가 수행되는 두 가지 context를 비교하여 보여준다.

Softirq action 핸들러

softirq action 핸들러는 매우 빠르게 처리해야 할 특정 인터럽트에 대해서만 제한적으로 메인 라인 커미터에 의해 하드 코딩되어 유지 관리되고 있다.

include/linux/interrupt.h

/* PLEASE, avoid to allocate new softirqs, if you need not _really_ high
   frequency threaded job scheduling. For almost all the purposes
   tasklets are more than enough. F.e. all serial device BHs et
   al. should be converted to tasklets, not to softirqs.
 */

enum
{
        HI_SOFTIRQ=0,
        TIMER_SOFTIRQ,
        NET_TX_SOFTIRQ,
        NET_RX_SOFTIRQ,
        BLOCK_SOFTIRQ,
        IRQ_POLL_SOFTIRQ,
        TASKLET_SOFTIRQ,
        SCHED_SOFTIRQ,
        HRTIMER_SOFTIRQ, /* Unused, but kept as tools rely on the
                            numbering. Sigh! */
        RCU_SOFTIRQ,    /* Preferable RCU should always be the last softirq */

        NR_SOFTIRQS
};

다음 그림은 softirq의 처리 우선 순위와 각 softirq 액션 핸들러들을 보여준다.

softirq 스택

irq context의 처리가 완료될 때 irq_exit() 함수를 호출하는데, 이 때 펜딩된 softirq의 처리를 irq context에서 계속 처리할 수 있다. 이러한 경우에 한하여 아키텍처에 따라 자체 softirq 스택을 사용할지 아니면 그냥 계속 irq stack을 사용할 지 여부를 결정한다.

HAVE_IRQ_EXIT_ON_IRQ_STACK 커널 옵션을 사용하는 경우 irq 스택을 그대로 사용한다.
- x86, powerpc, spark, s390 등의 시스템에서는 자체 softirq 스택을 사용한다.
- 그 외의 arm, arm64 등의 시스템들은 위 커널 옵션을 사용하지 않아도 fallback 루틴에 의해 irq stack을 그대로 사용한다.

다음 그림은 irq context에서 softirq 서비스가 계속되는 경우 사용되는 스택에 따른 비교를 보여준다.

ksoftirqd 스레드 생성

spawn_ksoftirqd()

kernel/softirq.c

static __init int spawn_ksoftirqd(void)
{
        cpuhp_setup_state_nocalls(CPUHP_SOFTIRQ_DEAD, "softirq:dead", NULL,
                                  takeover_tasklets);
        BUG_ON(smpboot_register_percpu_thread(&softirq_threads));

        return 0;
}
early_initcall(spawn_ksoftirqd);

각 cpu에서 ksoftirqd를 실행한다.

코드 라인 3~4에서 cpu가 offline되어 CPUHP_SOFTIRQ_DEAD 상태로 변할 때 마다 takeover_tasklets() 함수가 호출되도록 등록한다.
코드 라인 5에서 각 cpu에서 softirqd를 실행한다.

softirq_threads

kernel/softirq.c

static struct smp_hotplug_thread softirq_threads = {
        .store                  = &ksoftirqd,
        .thread_should_run      = ksoftirqd_should_run,
        .thread_fn              = run_ksoftirqd,
        .thread_comm            = "ksoftirqd/%u",
};

다음 그림은 ksoftirqd가 각 cpu 마다 호출되는 과정을 보여준다.

cpu off에 따른 tasklet 이주

takeover_tasklets()

kernel/softirq.c

static int takeover_tasklets(unsigned int cpu)
{
        /* CPU is dead, so no lock needed. */
        local_irq_disable();

        /* Find end, append list for that CPU. */
        if (&per_cpu(tasklet_vec, cpu).head != per_cpu(tasklet_vec, cpu).tail) {
                *__this_cpu_read(tasklet_vec.tail) = per_cpu(tasklet_vec, cpu).head;
                __this_cpu_write(tasklet_vec.tail, per_cpu(tasklet_vec, cpu).tail);
                per_cpu(tasklet_vec, cpu).head = NULL;
                per_cpu(tasklet_vec, cpu).tail = &per_cpu(tasklet_vec, cpu).head;
        }
        raise_softirq_irqoff(TASKLET_SOFTIRQ);

        if (&per_cpu(tasklet_hi_vec, cpu).head != per_cpu(tasklet_hi_vec, cpu).tail) {
                *__this_cpu_read(tasklet_hi_vec.tail) = per_cpu(tasklet_hi_vec, cpu).head;
                __this_cpu_write(tasklet_hi_vec.tail, per_cpu(tasklet_hi_vec, cpu).tail);
                per_cpu(tasklet_hi_vec, cpu).head = NULL;
                per_cpu(tasklet_hi_vec, cpu).tail = &per_cpu(tasklet_hi_vec, cpu).head;
        }
        raise_softirq_irqoff(HI_SOFTIRQ);

        local_irq_enable();
        return 0;
}

@cpu가 offline 상태로 변화하면 해당 @cpu의 tasklets 들을 현재 동작 중인 로컬 cpu로 전환시킨다.

코드 라인 7~13에서 요청 @cpu의 tasklet_vec 리스트의 엔트리들을 현재 로컬 cpu로 옮기고 tasklet softirq를 요청한다.
코드 라인 15~21에서 요청 @cpu의 tasklet_hi_vec 리스트의 엔트리들을 현재 로컬 cpu로 옮기고 hi softirq를 요청한다.

SMP 핫플러그 스레드 등록

모든 online cpu 마다 동작할 커널 스레드들을 등록한다. 다음은 smp 핫플러그 등록 기능을 사용하여 동작하는 스레드들이다.

run_ksoftirqd – “ksoftirqd/<cpu>”
cpu_stopper_thread() – “migration/<cpu>”
rcu_cpu_kthread() – “rcuc/<cpu>”
cpuhp_thread_fun() – “cpuhp/<cpu>”
idle_inject_fn() – “idle_inject/<cpu>”

smpboot_register_percpu_thread()

kernel/smpboot.c

/**
 * smpboot_register_percpu_thread - Register a per_cpu thread related
 *                                          to hotplug
 * @plug_thread:        Hotplug thread descriptor
 *
 * Creates and starts the threads on all online cpus.
 */

int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
{
        unsigned int cpu;               
        int ret = 0;

        get_online_cpus();
        mutex_lock(&smpboot_threads_lock);
        for_each_online_cpu(cpu) {
                ret = __smpboot_create_thread(plug_thread, cpu);
                if (ret) {
                        smpboot_destroy_threads(plug_thread);
                        goto out;       
                }
                smpboot_unpark_thread(plug_thread, cpu);
        }
        list_add(&plug_thread->list, &hotplug_threads);
out:
        mutex_unlock(&smpboot_threads_lock);
        put_online_cpus();
        return ret;
}
EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);

동작시킬 hotplug 스레드 정보를 가진 @plug_thread를 모든 online cpu에서 동작하게 한다.

코드 라인 8~15에서 online cpu 수 만큼 @plug_thread 정보에 포함된 스레드 함수를 fork한 후 unpark 상태로 변경한다.
코드 라인 16에서 동작시킨 hotplug 스레드 정보를 전역 @hotplug_threads 리스트에 추가한다.

__smpboot_create_thread()

kernel/smpboot.c

static int
__smpboot_create_thread(struct smp_hotplug_thread *ht, unsigned int cpu)
{
        struct task_struct *tsk = *per_cpu_ptr(ht->store, cpu);
        struct smpboot_thread_data *td;

        if (tsk)
                return 0;

        td = kzalloc_node(sizeof(*td), GFP_KERNEL, cpu_to_node(cpu));
        if (!td)
                return -ENOMEM;
        td->cpu = cpu;
        td->ht = ht;

        tsk = kthread_create_on_cpu(smpboot_thread_fn, td, cpu,
                                    ht->thread_comm);
        if (IS_ERR(tsk)) {
                kfree(td);
                return PTR_ERR(tsk);
        }
        /*
         * Park the thread so that it could start right on the CPU
         * when it is available.
         */
        kthread_park(tsk);
        get_task_struct(tsk);
        *per_cpu_ptr(ht->store, cpu) = tsk;
        if (ht->create) {
                /*
                 * Make sure that the task has actually scheduled out
                 * into park position, before calling the create
                 * callback. At least the migration thread callback
                 * requires that the task is off the runqueue.
                 */
                if (!wait_task_inactive(tsk, TASK_PARKED))
                        WARN_ON(1);
                else
                        ht->create(cpu);
        }
        return 0;
}

hotplug 스레드 정보 @ht를 @cpu에서 fork 하여 동작하게 한다. 생성된 태스크명은 @ht->thread_comm으로 한다. 성공 시엔 0을 반환한다.

코드 라인 4~8에서 요청한 @cpu에 해당하는 @ht->store 값을 tsk로 가져온다. 이 값이 이미 설정된 경우 함수를 빠져나간다.
코드 라인 10~14에서 smpboot_thread_data 구조체를 할당받아 인자로 받은 @cpu와 @ht 정보를 대입한다.
코드 라인 16~21에서 스레드 데이터 @td 정보를 사용하여 @cpu에서 스레드를 fork 하여 동작시킨다. 생성된 태스크명으로 @ht->thread_comm을 사용한다.
- 항상 smpboot_thread_fn() 함수가 처음 fork 되어 스레드의 lifetime을 관리한다.
코드 라인 26에서 해당 스레드를 처음 park 상태로 변경한다.
코드 라인 27~28에서 생성된 태스크를 @ht->store에 대입한다.
코드 라인 29~40에서 @ht->create에 함수가 등록된 경우 함수를 호출한다.
- migration 커널 스레드에서 cpu_stop_create() 함수가 등록되어 사용된다.
코드 라인 41에서 성공 값 0을 반환한다.

smpboot_thread_fn()

kernel/smpboot.c

/**
 * smpboot_thread_fn - percpu hotplug thread loop function
 * @data:       thread data pointer
 *
 * Checks for thread stop and park conditions. Calls the necessary
 * setup, cleanup, park and unpark functions for the registered
 * thread.
 *
 * Returns 1 when the thread should exit, 0 otherwise.
 */

static int smpboot_thread_fn(void *data)
{
        struct smpboot_thread_data *td = data;
        struct smp_hotplug_thread *ht = td->ht;

        while (1) {
                set_current_state(TASK_INTERRUPTIBLE);
                preempt_disable();
                if (kthread_should_stop()) {
                        __set_current_state(TASK_RUNNING);
                        preempt_enable();
                        if (ht->cleanup)
                                ht->cleanup(td->cpu, cpu_online(td->cpu));
                        kfree(td);
                        return 0;
                }

                if (kthread_should_park()) {
                        __set_current_state(TASK_RUNNING);
                        preempt_enable();
                        if (ht->park && td->status == HP_THREAD_ACTIVE) {
                                BUG_ON(td->cpu != smp_processor_id());
                                ht->park(td->cpu);
                                td->status = HP_THREAD_PARKED;
                        }
                        kthread_parkme();
                        /* We might have been woken for stop */
                        continue;
                }

                BUG_ON(td->cpu != smp_processor_id());

                /* Check for state change setup */
                switch (td->status) {
                case HP_THREAD_NONE:
                        __set_current_state(TASK_RUNNING);
                        preempt_enable();
                        if (ht->setup)
                                ht->setup(td->cpu);
                        td->status = HP_THREAD_ACTIVE;
                        continue;

                case HP_THREAD_PARKED:
                        __set_current_state(TASK_RUNNING);
                        preempt_enable();
                        if (ht->unpark)
                                ht->unpark(td->cpu);
                        td->status = HP_THREAD_ACTIVE;
                        continue;
                }

                if (!ht->thread_should_run(td->cpu)) {
                        preempt_enable_no_resched();
                        schedule();
                } else {
                        __set_current_state(TASK_RUNNING);
                        preempt_enable();
                        ht->thread_fn(td->cpu);
                }
        }
}

per-cpu 핫플러그 스레드 루프 함수로 무한 루프를 돌며 요청 시 마다 등록된 함수를 호출한다.

코드 라인 7~8에서 현재 태스크를 인터럽트 허용상태로 두고 커널 선점을 막는다.
코드 라인 9~16에서 현재 태스크에 KTHREAD_SHOULD_STOP 플래그가 설정된 경우 스레드의 종료 처리를 한다.
코드 라인 18~29에서 현재 태스크에 KTHREAD_SHOULD_PARK 플래그가 설정된 경우 스레드를 park 상태로 변경시켜 sleep한다. 깨어나는 경우 계속 루프를 돈다.
코드 라인 34~41에서 smp 핫플러그 스레드 상태가 HP_THREAD_NONE인 경우 태스크를 running 상태로 변경하고 커널 선점을 오픈하며 ht->(*setup) 함수를 동작시키고 smp 핫플러그 상태를 active로 변경한다.
코드 라인 43~50에서 smp 핫플러그 스레드 상태가 HP_THREAD_PARKED 상태로 요청한 경우 태스크를 running 상태로 변경하고 커널 선점을 오픈하며 ht->unpark 함수를 동작시키고 smp 핫플러그 상태를 active로 변경한다.
코드 라인 52~54에서 ht->(*should_run)를 수행하여 false인 경우 리스케쥴한다. (sleep)
- ksoftirqd 커널 스레드에서는 ksoftirqd_should_run() 함수를 호출하여 처리할 softirq의 여부를 확인한다.
코드 라인 55~59에서 태스크를 TASK_RUNNING 상태로 바꾸고 선점 enable 한 후 해당 스레드의 처리 함수를 호출한다.
- ksoftirqd 커널 스레드에서는 run_ksoftirqd() 함수를 호출한다.

kernel/smpboot.c

enum {
        HP_THREAD_NONE = 0,
        HP_THREAD_ACTIVE,
        HP_THREAD_PARKED,
};

다음 그림은 SMP 핫플러그 스레드 상태별 처리 흐름을 보여준다.

ksoftirqd_should_run()

kernel/softirq.c

static int ksoftirqd_should_run(unsigned int cpu)
{
        return local_softirq_pending();
}

처리할 softirq가 있는지 여부를 알아온다.

softirqd 스레드

다음 그림은 softirq 호출 시 각 softirq의 처리 루틴으로 분기되는 과정을 보여준다. 특정 softirq를 호출하기 위해서는 raise_softirq()를 사용한다.

run_ksoftirqd()

kernel/softirq.c

static void run_ksoftirqd(unsigned int cpu)
{
        local_irq_disable();
        if (local_softirq_pending()) {
                /*
                 * We can safely run softirq on inline stack, as we are not deep
                 * in the task stack here.
                 */
                __do_softirq();
                local_irq_enable();
                cond_resched_rcu_qs();
                return;
        }
        local_irq_enable();
}

local 인터럽트를 막아둔채로 처리할 softirq가 있으면 해당 softirq 핸들러 함수를 호출한다.

local_softirq_pending()

include/linux/irq_cpustat.h

#define local_softirq_pending() (__this_cpu_read(local_softirq_pending_ref))

현재 cpu에 처리할 softirq가 있는지 유무를 알아온다.

irq_stat

kernel/softirq.c

#ifndef __ARCH_IRQ_STAT
DEFINE_PER_CPU_ALIGNED(irq_cpustat_t, irq_stat);
EXPORT_SYMBOL(irq_stat);
#endif

do_softirq()

kernel/softirq.c

asmlinkage __visible void do_softirq(void)
{
        __u32 pending;
        unsigned long flags;

        if (in_interrupt())
                return;

        local_irq_save(flags);

        pending = local_softirq_pending();

        if (pending && !ksoftirqd_running(pending))
                do_softirq_own_stack();

        local_irq_restore(flags);
}

local 인터럽트를 막아둔채로 처리할 softirq가 있으면 해당 softirq 핸들러 함수를 호출한다.

do_softirq_own_stack()

include/linux/interrupt.h

static inline void do_softirq_own_stack(void)
{
        __do_softirq();
}

처리할 softirq가 있으면 해당 softirq 핸들러 함수를 호출한다.

이 함수는 아키텍처에 따라 다른 코드가 수행되는데 arm 및 arm64의 경우 상기 코드와 같이 동작한다.

__do_softirq()

kernel/softirq.c

asmlinkage __visible __softirq_entry__do_softirq(void)
{
        unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
        unsigned long old_flags = current->flags;
        int max_restart = MAX_SOFTIRQ_RESTART;
        struct softirq_action *h;
        bool in_hardirq;
        __u32 pending;
        int softirq_bit;

        /*
         * Mask out PF_MEMALLOC s current task context is borrowed for the
         * softirq. A softirq handled such as network RX might set PF_MEMALLOC
         * again if the socket is related to swap
         */
        current->flags &= ~PF_MEMALLOC;

        pending = local_softirq_pending();
        account_irq_enter_time(current);
                
        __local_bh_disable_ip(_RET_IP_, SOFTIRQ_OFFSET);
        in_hardirq = lockdep_softirq_start();

restart:
        /* Reset the pending bitmask before enabling irqs */
        set_softirq_pending(0);

        local_irq_enable();

        h = softirq_vec;

        while ((softirq_bit = ffs(pending))) {
                unsigned int vec_nr;
                int prev_count;

                h += softirq_bit - 1;

                vec_nr = h - softirq_vec;
                prev_count = preempt_count();

                kstat_incr_softirqs_this_cpu(vec_nr);

                trace_softirq_entry(vec_nr);
                h->action(h);
                trace_softirq_exit(vec_nr);
                if (unlikely(prev_count != preempt_count())) {
                        pr_err("huh, entered softirq %u %s %p with preempt_count %08x, exited with %08x?\n",
                               vec_nr, softirq_to_name[vec_nr], h->action,
                               prev_count, preempt_count());
                        preempt_count_set(prev_count);
                }
                h++;
                pending >>= softirq_bit;
        }

        if (__this_cpu_read(ksoftirqd) == current)
                rcu_bh_qs();
        local_irq_disable();

        pending = local_softirq_pending();
        if (pending) {
                if (time_before(jiffies, end) && !need_resched() &&
                    --max_restart)
                        goto restart;

                wakeup_softirqd();
        }

        lockdep_softirq_end(in_hardirq);
        account_irq_exit_time(current);
        __local_bh_enable(SOFTIRQ_OFFSET);
        WARN_ON_ONCE(in_interrupt());
        tsk_restore_flags(current, old_flags, PF_MEMALLOC);
}

처리할 softirq가 있으면 해당 softirq 핸들러 함수를 호출한다. 이 루틴은 irq context 또는 process context 모두에서 호출될 수 있다.

코드 라인 3에서 최대 softirq 처리 시간으로 현재 시간으로 부터 2ms 후에 해당하는 jiffies 값을 산출한다.
코드 라인 4에서 현재 태스크의 플래그를 백업해둔다.
코드 라인 5에서 최대 재시도 수를 10번으로 제한한다.
코드 라인 16에서 현재 태스크 플래그에서 PF_MEMALLOC을 제거하여 softirq 처리 핸들러 루틴에서 응급 메모리를 사용하지 못하도록 한다. 단 네트워크 swap을 이용하여야 하는 NET_RX_SOFTIRQ는 메모리 부족 상태에서도 패킷 생성을 위해 슬랩 할당을 해야 하므로 예외적으로 해당 핸들러에서 PF_MEMALLOC을 설정한다.
코드 라인 18에서 처리할 softirq를 알아온다.
코드 라인 19에서 CONFIG_IRQ_TIME_ACCOUNTING 커널 옵션을 사용한 경우 트레이스 목적으로 irq 진입 시간을 기록한다.
코드 라인 21에서 커널 선점되지 않도록 preempt_count를 SOFTIRQ_OFFSET 만큼 더한다.
코드 라인 26~28에서 irq를 enable하기 전에 __softirq_pending 플래그를 클리어한 후 irq를 enable 한다.
코드 라인 32~38에서 처리할 softirq 중 가장 우선 순위가 높은 softirq의 벡터번호를 알아온다.
- vec_nr=0 <- softirq_vec[HI_SOFTIRQ]
코드 라인 41에서 처리할 softirq 통계 카운터를 1 증가 시킨다.
코드 라인 44에서 softirq 핸들러 함수를 호출한다.
코드 라인 46~51에서 softirq 핸들러 함수를 수행하기 전에 읽은 preempt_count 값에 변화가 발생하면 에러 메시지를 출력하고 preempt_count 값을 처음 읽었던 값으로 되돌린다.
코드 라인 52~54에서 다음 순위의 softirq를 처리할 준비를 한다.
코드 라인 56~57에서 현재 동작 중인 태스크가 irqthread인 경우 rcu의 bottom half 처리를 한다.
코드 라인 58에서 irq를 disable 한다.
코드 라인 60~64에서 여전히 pending된 softirq가 있는지 확인하여 여전히 존재하면 다음 조건을 만족하면 다시 restart: 레이블로 이동하여 softirq 처리를 수행하게 한다.
- 최대 반복 횟수: 10회 이내
- 최대 처리 시간: 2ms 이내
- 리스케쥴 요청이 없어야 한다.
코드 라인 65에서 ksoftirqd에서 호출되지 않고 irq context에서 직접 호출한 경우ksoftirqd가 나머지 pending 된 softirq를 처리하도록 깨운다. 다음과 같은 특정 커널 조건에서만 유효하다.
- CONFIG_IRQ_FORCED_THREADING 커널 옵션 사용
- “thredirqs” 커널 파라메터 사용
코드 라인 70에서 CONFIG_IRQ_TIME_ACCOUNTING 커널 옵션을 사용한 경우 트레이스 목적으로 irq 퇴출 시간을 기록한다.
코드 라인 71에서 softirq에서 막은 커널 선점을 다시 가능하도록 preempt_count를 SOFTIRQ_OFFSET 만큼 감소시킨다.
코드 라인 72에서 아직도 interrupt 처리중인 경우 경고 메시지를 출력한다.
코드 라인 73에서 현재 태스크 플래그를 원래대로 다시 복구한다.

softirq 실행

다음 그림은 softirq가 요청되고 실행되는 과정을 보여준다.

특정 디바이스의 인터럽트 핸들러 수행 중 softirq로 처리하고자 할 때 raise_softirq()를 호출하여 softirq 펜딩 플래그를 설정한다.
irq context에서 빠져나갈때 호출되는 irq_exit() 함수에서는 softirq가 펜딩되었는지 여부를 알아보고 펜딩된 softirq가 있으면 invoke_softirq() 함수를 호출한다.

softirq 실행 요청

raise_softirq()

kernel/softirq.c

void raise_softirq(unsigned int nr)
{
        unsigned long flags;

        local_irq_save(flags);
        raise_softirq_irqoff(nr);
        local_irq_restore(flags);
}

요청한 @nr의 softirq 서비스를 호출한다. softirq @nr번에 해당하는 펜딩 비트 플래그를 설정한다.

raise_softirq_irqoff()

kernel/softirq.c

/*
 * This function must run with irqs disabled!
 */

inline void raise_softirq_irqoff(unsigned int nr)
{
        __raise_softirq_irqoff(nr);

        /*
         * If we're in an interrupt or softirq, we're done
         * (this also catches softirq-disabled code). We will
         * actually run the softirq once we return from
         * the irq or softirq.
         *
         * Otherwise we wake up ksoftirqd to make sure we
         * schedule the softirq soon.
         */
        if (!in_interrupt())
                wakeup_softirqd();
}

요청한 번호의 softirq 서비스를 호출한다.

코드 라인 3에서 요청한 softirq 번호에 해당하는 펜딩 비트플래그를 설정한다.
코드 라인 14~15에서 irq context가 아닌 경우에 한하여 softirqd 스레드를 깨운다.
- 일어나자마자 알아서 펜딩 플래그를 조사해서 수행한다.
- irq context에서 호출된 경우 잠시 후 irq context가 끝날 때 실행되는 irq_exit() 함수내에서 invoke_softirq()를 호출하여 irq context를 유지한 상태에서 softirq를 수행될 예정이다.

__raise_softirq_irqoff()

kernel/softirq.c

void __raise_softirq_irqoff(unsigned int nr)
{
        trace_softirq_raise(nr);
        or_softirq_pending(1UL << nr);
}

요청한 softirq 번호에 해당하는 펜딩 비트플래그를 설정한다.

or_softirq_pending

include/linux/interrupt.h

#define or_softirq_pending(x)  (__this_cpu_or(local_softirq_pending_ref, (x)))

softirq 호출

invoke_softirq()

kernel/softirq.c

static inline void invoke_softirq(void)
{
        if (ksoftirqd_running(local_softirq_pending()))
                return;

        if (!force_irqthreads) {
#ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK
                /*
                 * We can safely execute softirq on the current stack if
                 * it is the irq stack, because it should be near empty
                 * at this stage.
                 */
                __do_softirq();
#else
                /*
                 * Otherwise, irq_exit() is called on the task stack that can
                 * be potentially deep already. So call softirq in its own stack
                 * to prevent from any overrun.
                 */
                do_softirq_own_stack();
#endif
        } else {
                wakeup_softirqd();
        }
}

펜딩된 softirq를 호출하여 서비스한다. irq_exit() 함수에서는 softirq가 펜딩되었는지 여부를 알아보고 펜딩된 softirq가 있으면 이 함수를 호출한다.

하여 펜딩된 softirq들을 모두 처리하는데, 함수를 빠져나간다. 한다. 펜딩된 softirq들을 수행 중 2ms 이상 소요되는 경우 process context에서 동작하는 ksoftirqd를 깨워 대신 수행하게 한다.

코드 라인 3~4에서 process context에서 동작하는 ksoftirqd 이미 수행 중인 경우 ksoftirqd 스레드가 동작중인 경우 ksoftirq가 알아서 대신 펜딩된 softirq들을 호출하여 처리하므로 이 수간에는 그냥 함수를 빠져나간다.
코드 라인 6~24에서 “threadirqs” 커널 파라미터가 설정된 경우 모든 펜딩 softirq를 process context에서 동작하는 ksoftirqd에서 처리하도록 ksoftirqd를 깨운다. 그렇지 않은 경우 현재 irq context에서 직접 softirq 서비스를 호출한다.
- arm, arm64 시스템의 경우 CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK 커널 옵션을 사용하지 않아도 irq 스택을 계속 사용한다.

do_softirq_own_stack()

include/linux/interrupt.h

#ifdef __ARCH_HAS_DO_SOFTIRQ
void do_softirq_own_stack(void);
#else
static inline void do_softirq_own_stack(void)
{
        __do_softirq();
}
#endif

__ARCH_HAS_DO_SOFTIRQ 옵션을 사용하지 않는 arm, arm64 시스템은 softirq 자체 스택 처리 없이, 그냥 스택의 변경 없이 현재 irq context에서 사용 중인 irq 스택을 계속 사용하여 softirq를 처리한다.

softirqd 깨우기

wakeup_softirqd()

kernel/softirq.c

/*
 * we cannot loop indefinitely here to avoid userspace starvation,
 * but we also don't want to introduce a worst case 1/HZ latency
 * to the pending events, so lets the scheduler to balance
 * the softirq load for us.
 */

static void wakeup_softirqd(void)
{
        /* Interrupts are disabled: no need to stop preemption */
        struct task_struct *tsk = __this_cpu_read(ksoftirqd);

        if (tsk && tsk->state != TASK_RUNNING)
                wake_up_process(tsk);
}

softirqd 스레드가 잠들어있는 경우 깨운다.

softirq 초기화

softirq_init()

kernel/softirq.c

void __init softirq_init(void)
{
        int cpu;

        for_each_possible_cpu(cpu) {
                per_cpu(tasklet_vec, cpu).tail =
                        &per_cpu(tasklet_vec, cpu).head;
                per_cpu(tasklet_hi_vec, cpu).tail =
                        &per_cpu(tasklet_hi_vec, cpu).head;
        }

        open_softirq(TASKLET_SOFTIRQ, tasklet_action);
        open_softirq(HI_SOFTIRQ, tasklet_hi_action);
}

softirq를 사용하기 전에 초기화를 수행한다.

코드 라인 5~10에서 possible cpu 수 만큼 루프를 돌며 tasklet_vec 및 tasklet_hi_vec 리스트를 초기화한다.
코드 라인 12에서 TASKLET_SOFTIRQ용 핸들러 함수를 지정한다.
코드 라인 13에서 HI_SOFTIRQ용 핸들러 함수를 지정한다.

특정 softirq 핸들러 준비

open_softirq()

kernel/softirq.c

void open_softirq(int nr, void (*action)(struct softirq_action *))
{
        softirq_vec[nr].action = action;
}

요청한 softirq 벡터 번호에 action 핸들러 함수를 대입한다.

Timer Softirq

run_timer_softirq()

kernel/time/timer.c

/*
 * This function runs timers and the timer-tq in bottom half context.
 */

static void run_timer_softirq(struct softirq_action *h)
{
        struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);

        __run_timers(base);
        if (IS_ENABLED(CONFIG_NO_HZ_COMMON))
                __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
}

타이머 softirq 처리 루틴이다.

코드 라인 3~5에서 lowres 타이머 휠에서 만료된 타이머가 있는 경우 등록된 콜백 함수를 호출한다.
코드 라인 6~7에서 nohz를 지원하는 경우 nohz용 lowres 타이머 휠에서 만료된 타이머가 있는 경우 등록된 콜백 함수를 호출한다.

참고

Interrupts -1- (Interrupt Controller) | 문c
Interrupts -2- (irq chip) | 문c
Interrupts -3- (irq domain) | 문c
Interrupts -4- (Top-Half & Bottom-Half) | 문c
Interrupts -5- (Softirq) | 문c – 현재 글
Interrupts -6- (IPI Cross-call) | 문c
Interrupts -7- (Workqueue 1) | 문c
Interrupts -8- (Workqueue 2) | 문c
Interrupts -9- (GIC v3 Driver) | 문c
Interrupts -10- (irq partition) | 문c
Interrupts -11- (RPI2 IC Driver) | 문c
Interrupts -12- (irq desc) | 문c

Softirq

특징

두 가지 context 사용

Softirq action 핸들러

softirq 스택

ksoftirqd 스레드 생성

spawn_ksoftirqd()

softirq_threads

cpu off에 따른 tasklet 이주

takeover_tasklets()

SMP 핫플러그 스레드 등록

smpboot_register_percpu_thread()

__smpboot_create_thread()

smpboot_thread_fn()

ksoftirqd_should_run()

softirqd 스레드

run_ksoftirqd()

local_softirq_pending()

irq_stat

do_softirq()

do_softirq_own_stack()

__do_softirq()

softirq 실행

softirq 실행 요청

raise_softirq()

raise_softirq_irqoff()

__raise_softirq_irqoff()

or_softirq_pending

softirq 호출

invoke_softirq()

do_softirq_own_stack()

softirqd 깨우기

wakeup_softirqd()

softirq 초기화

softirq_init()

특정 softirq 핸들러 준비

open_softirq()

Timer Softirq

run_timer_softirq()

참고

댓글 남기기 댓글 취소