문c 블로그

Timer -9- (Tick Device)

2017-03-082020-02-11 문영일 Leave a comment

Tick Device Subsystem

Tick Device Subsystem은 CONFIG_HZ 주기에 해당하는 타이머 스케줄 틱을 발생시키고, 이 스케줄 틱을 이용하는 다음과 같은 여러 루틴들을 처리한다.

jiffies
- jiffies 값을 증가시킨다. (jiffies 담당 cpu가 처리하며 nohz를 위해 담당 cpu는 변경될 수 있다.)
timer
- lowres timer wheel을 검색하여 만료된 타이머의 함수를 호출한다.
스케줄러 틱
- 런큐의 로드 값을 갱신하고, 현재 동작 중인 태스크 스케줄러(cfs, rt, deadline, …)의 (*task_tick)을 호출한다.
rcu
- rcu core를 처리한다.
process account
- cpu 사용 시간을 계량한다.

다음 그림은 generic timer sybsystem을 모두 보여준다.

틱 디바이스의 모드 운영

주기적으로 틱을 발생하기 위해 다음과 같이 두 가지 모드를 사용한다.

hz 기반의 periodic 모드로 운영
- periodic 또는 oneshot 기능을 가진 클럭 이벤트 디바이스 모두 동작할 수 있다.
- legacy 하드웨어에서는 low-resolution 타이머가 oneshot을 지원하지 않고 지속적(periodic)으로 틱을 만드는 hw를 호환하여 운영하기 위해 커널은 처음 부트업시 항상 이 모드로 시작한다.
nohz 기반의 oneshot 모드로 운영
- oneshot 기능을 가진 high-resolution 타이머를 가진 클럭 이벤트 디바이스만 운영 가능하다.
- 틱 모드는 처음에 periodic으로 운영하다가 hrtimer가 준비된 후에 틱 디바이스의 모드를 oneshot 모드로 변경하여 운영한다. 이 과정에서 tick 핸들러가 다음과 같은 순으로 바뀐다.
  - tick_handle_periodic() -> clockevents_handle_noop() -> tick_sched_timer()

다음 그림은 tick devices subsystem이 동작하는 과정을 보여준다.

Per-cpu Tick Device 및 Tick Broadcast Device

틱을 관리하는 드라이버들로 클럭 이벤트 디바이스의 기능에 따라 각 cpu들은 틱 cpu 디바이스 또는 틱 브로드캐스트 디바이스 둘 중 하나에 연결하여 사용한다.

Per-cpu Tick 디바이스 (전역 변수 per-cpu tick_cpu_device)
- per-cpu 타이머, 절전기능(c3stop) 타이머와 dummy 타이머를 사용한 클럭 이벤트 디바이스들로 사용 가능하다.
Tick 브로드캐스트 디바이스 (전역 변수 tick_broadcast_device)
- per-cpu 타이머, 절전기능(c3stop) 타이머 및 dummy 타이머를 사용하지 않는 클럭 이벤트 디바이스만 사용 가능하다.
- 틱 브로드캐스트 모드가 oneshot인 경우 클럭 이벤트 디바이스도 oneshot 기능이 준비되어 있어야 한다.
- 시스템에서 nohz 구현을 위해 Tick 브로드캐스트 디바이스를 사용하지 않는 경우도 있다. (rpi2, rpi3도 사용하지 않는다.)

c3stop (절전 상태 진입하여 코어, IC 및 타이머 등 파워 다운)

arm 아키텍처에 내장된 타이머 중 CP15 레지스터를 통해 제어되는 경우 cpu가 c3(deep-sleep) 절전 상태로 진입하면 절전을 위해 코어, IC 및 타이머의 power를 다운시킨다. 이렇게 타이머 전원을 다운시키면 해당 타이머에 의해 틱 처리를 하기 위해 스스로 wake-up할 수 없으므로 다른 깨어있는 cpu의 도움을 받아야 한다. arm 아키텍처의 내장 타이머는 기본적으로 c3stop을 지원하는데 cpu가 c3(deep-sleep) 상태로 진입할 때 최소 한 개의 cpu는 지속적으로 tick을 처리하고 c3 상태에 진입한 cpu를 대신해서 틱의 만료시간을 계산하여 알릴 필요가 있다. 이렇게 만료된 cpu를 깨워 틱을 처리하게 하기 위해 틱 브로드캐스트 디바이스를 사용한다.

현재까지 armv7, armv8 아키텍처의 대부분은 core 단위의 idle에서는 wfi를 사용하여 절전활동을 하고, 클러스터에 소속한 모든 core가 idle되는 경우 deep-sleep 상태로 진입한다. suspend와 동일하게 클러스터 내의 파워를 끄는 형태인데 이 때 시스템에 따라 해당 클러스터의 타이머 및 인터럽트 컨트롤러도 같이 전원이 꺼질 수 있다.
arm 아키텍처에 내장된 CP15로 제어되는 타이머라 하더라도 특정 시스템은 절전 설계되지 않아 항상 타이머 전원이 on되어 있다. 이러한 경우 c3stop을 false(“always-on”)로 설정하여 내부적으로 스케줄 틱의 관리에 틱 브로드캐스트 디바이스를 사용할 필요 없이 그냥 틱 cpu 디바이스를 사용한다.

c3stop 기능을 사용하는 클럭소스

x86의 LAPIC 타이머
ARM 내장 generic 타이머
- arm_arch_timer
기타
- mips-gic-timer
- clps711x-timer

절전을 위한 전원 관리 시스템이 없는 SoC에서는 c3stop 기능을 사용하면 안된다. 이러한 시스템에서 절전을 위해 타이머의 전원을 off 후 on 하는 경우 타이머의 comparison 레지스터의 내용이 유실되는 현상이 벌어진다. 이러한 증상으로 인해 리눅스 커널은 디바이스 트리 스크립트의 타이머 노드에 “always-on;” 속성을 추가하여 c3stop 기능을 사용하지 못하게 막는다. (항상 전원이 켜져서 동작하는 타이머로 인식한다.)

rpi2, rpi3: “always-on;” 속성을 사용하여 브로드캐스트 기능을 사용하지 못하게 한다.

틱 브로드캐스트

절전을 위해 tick을 발생시키지 않는 nohz-idle을 구현할 때 cpu가 c3(deep-sleep) 상태에 진입되면 타이머 전원도 같이 꺼지는 시스템을 위해 다음과 같은 브로드캐스트 구현이 필요하다.

cpu가 처리할 일이 없어 idle 상태로 진입하기 전에 브로드캐스트를 수신하기 위해 해당 cpu의 비트를 설정한다.
- 브로드캐스트 디바이스의 모드가 periodic인 경우 shutdown 한다.
- 모든 cpu가 처리할 일이 없어도 최소 하나의 대표 cpu는 브로드캐스트 디바이스가 아니라 틱 디바이스로 동작해야 한다.
대표 cpu는 c3(deep-sleep) 상태의 cpu를 깨우기 위해 broadcast를 한다.
- c3 상태에 있는 cpu를 깨울 때 IPI(Inter Process Interrupt)를 통해 브로드캐스트하여 해당 cpu를 깨운다.
깨어난 cpu는 더 이상 브로드캐스트를 수신하지 않아도 되므로 해당 cpu의 비트를 클리어한다.
- 브로드캐스트 디바이스의 모드가 periodic인 경우 다시 프로그램한다.

C-State

인텔에서 정의한 절전 상태 (참고: Everything You Need to Know About the CPU C-States Power Saving Modes

C0
- 풀 파워로 인스트럭션을 수행한다.
C1
- 내부 코어 클럭을 정지시킨 halt 상태
- 외부 클럭과 ACPI는 동작 중으로 인터럽트는 처리할 수 있는 상태
C2
- Stop-Grant & Stop-Clock 상태로 내부 및 외부 클럭이 정지된 상태
- ACPI는 동작중으로 인터럽트는 처리할 수 있는 상태
C3
- 플러시 캐시. 내부 클럭 및 외부 클럭도 받아들이지 않고 deep sleep 상태
- ACPI를 통한 인터럽트도 받아들이지 않으며, 특별히 wake-up을 통한 장치를 통해서만 깨울 수 있다.
- c3stop이라고도 한다.
C4/C5
- 인텔 듀오프로세서에서 멀티 코어 전체가 deep sleep 상태
C6
- 인텔 Core i7에서 모든 코어가 좀 더 deeper sleep 상태

다음 그림은 tick 핸들러가 처음 설정되는 모습을 보여준다.

arm 아키텍처에 내장된 타이머는 부트업 시 틱 periodic 모드로 출발하므로 tick_handle_periodic() 핸들러가 선택된다.

다음 그림은 tick_periodic_handler()가 호출된 후 high-resolution 타이머가 준비되고, nohz oneshot 모드로 전환되어 hrtimer_interrupt() 핸들러가 선택된다. hrtimer_intrerrupt() 내부에서 스케줄 틱 타이머 함수인 tick_sched_timer() 함수가 호출된다.

다음 그림은 틱 cpu 디바이스 및 틱 브로드캐스트 디바이스가 사용하는 틱 핸들러들이다.

hz/nohz 기반의 high resolution을 사용하는 핸들러는 hrtimer_interrupt이지만 틱 디바이스와 별개로 모든 hrtimer의 만료 시간을 관리한다. 따라서 틱 디바이스만을 대상으로 좁히는 경우 실제 틱 처리에 대한 핸들러는 tick_sched_timer()라고 할 수 있다.

틱 cpu 디바이스 또는 틱 브로드캐스트 디바이스 등록

tick_check_new_device()

kernel/time/tick-common.c

/*
 * Check, if the new registered device should be used. Called with
 * clockevents_lock held and interrupts disabled.
 */

void tick_check_new_device(struct clock_event_device *newdev)
{
        struct clock_event_device *curdev;
        struct tick_device *td;
        int cpu;

        cpu = smp_processor_id();
        td = &per_cpu(tick_cpu_device, cpu);
        curdev = td->evtdev;
        
        /* cpu local device ? */
        if (!tick_check_percpu(curdev, newdev, cpu))
                goto out_bc;

        /* Preference decision */
        if (!tick_check_preferred(curdev, newdev))
                goto out_bc;

        if (!try_module_get(newdev->owner))
                return;

        /* 
         * Replace the eventually existing device by the new
         * device. If the current device is the broadcast device, do
         * not give it back to the clockevents layer !
         */
        if (tick_is_broadcast_device(curdev)) {
                clockevents_shutdown(curdev);
                curdev = NULL;
        }
        clockevents_exchange_device(curdev, newdev);
        tick_setup_device(td, newdev, cpu, cpumask_of(cpu));
        if (newdev->features & CLOCK_EVT_FEAT_ONESHOT)
                tick_oneshot_notify();
        return;

out_bc:
        /*
         * Can the new device be used as a broadcast device ?
         */
        tick_install_broadcast_device(newdev);
}

현재 cpu의 tick 디바이스로 기존 tick 디바이스보다 새 tick 디바이스가 더 좋은 rating 등급인 경우 변경하여 사용할지 체크한다.

처음 호출 시에는 요청 clock event 디바이스가 tick 디바이스로 사용된다.
nohz 구현을 위해 경우에 따라 등록되는 클럭 이벤트 디바이스가 틱 브로드캐스트 디바이스로 동작할 수도 있다.

코드 라인 12~13에서 현재 cpu의 틱 디바이스에 사용중인 clock event 디바이스를 새 디바이스로 변경 가능한지 체크한다. 사용할 수 없으면 out_bc 레이블로 이동한다.
코드 라인 16~17에서 기존 디바이스와 새 디바이스를 비교하여 새 디바이스의 rating이 더 높은 경우가 아니면 out_bc 레이블로 이동한다.
코드 라인 19~20에서 새 clock event 디바이스 모듈에 대한 참조 카운터를 증가시킨다.
코드 라인 27~30에서 기존 디바이스가 브로드 캐스트 디바이스인 경우 shutdown 시킨다.
코드 라인 31~32에서 틱 디바이스에 사용할 새 클럭 이벤트 디바이스로 변경한다.
코드 라인 33~35에서 새 디바이스에 oneshot 기능이 있는 경우 클럭 이벤트 디바이스가 변경되었음을 틱 스케쥴 정보에 async하게 알리고 함수를 빠져나간다.
코드 라인 37~41에서 out_bc 레이블이다. 새 클럭 이벤트 디바이스를 체크하여 브로드 캐스트 디바이스로 지정한다.

다음 그림은 각 cpu별로 여러 개의 틱 이벤트 디바이스가 존재하는 경우 best rating 디바이스를 선택하고 그 나머지들은 release 리스트에 두는 모습을 보여준다.

틱 디바이스 조건 체크

tick_check_percpu()

kernel/time/tick-common.c

static bool tick_check_percpu(struct clock_event_device *curdev,
                              struct clock_event_device *newdev, int cpu)
{
        if (!cpumask_test_cpu(cpu, newdev->cpumask))
                return false;
        if (cpumask_equal(newdev->cpumask, cpumask_of(cpu)))
                return true;
        /* Check if irq affinity can be set */
        if (newdev->irq >= 0 && !irq_can_set_affinity(newdev->irq))
                return false;
        /* Prefer an existing cpu local device */
        if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
                return false;
        return true;
}

새 클럭 이벤트 디바이스를 tick 디바이스로 설정할 수 있는지 여부를 반환한다.

코드 라인 4~5에서 요청 cpu가 새 클럭 이벤트 디바이스가 허용하는 cpu가 아닌 경우 false를 반환한다.
코드 라인 6~7에서 요청 cpu가 새 디바이스에 per-cpu로 등록된 경우 true를 반환한다.
코드 라인 9~10에서 새 디바이스가 취급하는 irq로 affinity 설정할 수 없으면 false를 반환한다.
- FEAT_DYNTICK이 부여된 드라이버들
  - armv7 아키텍처에 내장된 generic 타이머의 메모리 mapped 드라이버 – “arm,armv7-timer-mem”
  - “st,nomadik-mtu” 드라이버
코드 라인 12~13에서 요청 cpu가 현재 디바이스에 이미 per-cpu로 등록되어 있는 경우 false를 반환한다.

tick_check_preferred()

kernel/time/tick-common.c

static bool tick_check_preferred(struct clock_event_device *curdev,
                                 struct clock_event_device *newdev)
{
        /* Prefer oneshot capable device */
        if (!(newdev->features & CLOCK_EVT_FEAT_ONESHOT)) {
                if (curdev && (curdev->features & CLOCK_EVT_FEAT_ONESHOT))
                        return false;
                if (tick_oneshot_mode_active())
                        return false;
        }

        /*
         * Use the higher rated one, but prefer a CPU local device with a lower
         * rating than a non-CPU local device
         */
        return !curdev ||
                newdev->rating > curdev->rating ||
               !cpumask_equal(curdev->cpumask, newdev->cpumask);
}

기존 틱 디바이스와 새 디바이스를 비교하여 새 디바이스의 동작 모드 및 등급(rating)이 더 높은 경우 true를 반환한다. (oneshot -> rating 순)

코드 라인 5~7에서 기존 틱 디바이스는 oneshot 기능이 있는데 새 디바이스가 oneshot 기능이 없는 경우 false를 반환한다. (oneshot 우선)
코드 라인 8~9에서 기존 틱 디바이스가 이미 oneshot 모드로 동작하는데 새 디바이스가 oneshot 기능이 없는 경우 false를 반환한다. (oneshot 우선)
코드 라인 16~18에서 새 디바이스의 rating이 더 높은 경우 true를 반환한다.

다음 그림은 새 클럭 이벤트 디바이스가 기존보다 더 구형(periodic) 모드로 동작하거나 rating이 더 낮을 때 선호되지 않는 케이스들을 보여준다.

성공 case:
- 기존 틱 디바이스로 등록된 클럭 이벤트 디바이스가 하나도 없는 경우
- 기존 틱 디바이스에 등록된 클럭 이벤트 디바이스의 등급(rating)보다 높고 feature도 서로 같거나 더 높은 oneshot인 경우

tick_oneshot_mode_active()

kernel/time/tick-oneshot.c

/**
 * tick_check_oneshot_mode - check whether the system is in oneshot mode
 *
 * returns 1 when either nohz or highres are enabled. otherwise 0.
 */

int tick_oneshot_mode_active(void)
{
        unsigned long flags;
        int ret;

        local_irq_save(flags);
        ret = __this_cpu_read(tick_cpu_device.mode) == TICKDEV_MODE_ONESHOT;
        local_irq_restore(flags);

        return ret;
}

현재 cpu의 tick 디바이스가 oneshot 모드인지 여부를 반환한다.

Tick 디바이스로 지정

tick_setup_device()

kernel/time/tick-common.c

/*
 * Setup the tick device
 */

static void tick_setup_device(struct tick_device *td,
                              struct clock_event_device *newdev, int cpu,
                              const struct cpumask *cpumask)
{
        void (*handler)(struct clock_event_device *) = NULL;
        ktime_t next_event = 0;

        /*
         * First device setup ?
         */
        if (!td->evtdev) {
                /*
                 * If no cpu took the do_timer update, assign it to
                 * this cpu:
                 */
                if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
                        tick_do_timer_cpu = cpu;

                        tick_next_period = ktime_get();
                        tick_period = NSEC_PER_SEC / HZ;
#ifdef CONFIG_NO_HZ_FULL
                        /*
                         * The boot CPU may be nohz_full, in which case set
                         * tick_do_timer_boot_cpu so the first housekeeping
                         * secondary that comes up will take do_timer from
                         * us.
                         */
                        if (tick_nohz_full_cpu(cpu))
                                tick_do_timer_boot_cpu = cpu;

                } else if (tick_do_timer_boot_cpu != -1 &&
                                                !tick_nohz_full_cpu(cpu)) {
                        tick_take_do_timer_from_boot();
                        tick_do_timer_boot_cpu = -1;
                        WARN_ON(tick_do_timer_cpu != cpu);
#endif
                }

                /*
                 * Startup in periodic mode first.
                 */
                td->mode = TICKDEV_MODE_PERIODIC;
        } else {
                handler = td->evtdev->event_handler;
                next_event = td->evtdev->next_event;
                td->evtdev->event_handler = clockevents_handle_noop;
        }

        td->evtdev = newdev;

        /*
         * When the device is not per cpu, pin the interrupt to the
         * current cpu:
         */
        if (!cpumask_equal(newdev->cpumask, cpumask))
                irq_set_affinity(newdev->irq, cpumask);

        /*
         * When global broadcasting is active, check if the current
         * device is registered as a placeholder for broadcast mode.
         * This allows us to handle this x86 misfeature in a generic
         * way. This function also returns !=0 when we keep the
         * current active broadcast state for this CPU.
         */
        if (tick_device_uses_broadcast(newdev, cpu))
                return;

        if (td->mode == TICKDEV_MODE_PERIODIC)
                tick_setup_periodic(newdev, 0);
        else
                tick_setup_oneshot(newdev, handler, next_event);
}

요청한 클럭 이벤트 디바이스를 현재 cpu에 대한 틱 디바이스로 지정한다. 현재 cpu에 대해 처음 틱 디바이스가 지정된 경우 틱 디바이스의 모드를 periodic으로 시작한다. (hrtimer가 초기화되어 사용되기 전에는 oneshot 모드로 바꾸지 않는다.)

코드 라인 11~42에서 현재 cpu에 대해 tick 디바이스의 클럭 이벤트 디바이스가 설정되지 않은 경우 모드를 periodic으로 설정한다. 그리고 모든 cpu를 대상으로 가장 처음 틱 디바이스가 설정되면 틱에 대한 주기 등 관련 값들을 설정한다. (tick_do_timer_cpu, tick_next_period, tick_period)
- tick_do_timer_cpu는 디폴트 값으로 TICK_DO_TIMER_BOOT 값으로 설정되어 cpu가 아직 지정되지 않았음을 의미한다.
코드 라인 43~47에서 tick 디바이스의 클럭 이벤트 디바이스가 이미 설정되어 있지만 새로운 요청이 있는 경우 틱 이벤트 핸들러에 빈 함수를 지정한 후 기존 이벤트 핸들러에서 사용 중인 핸들러를 사용할 준비를 한다.
- 해당 cpu에 대한 틱 디바이스의 클럭 이벤트 디바이스가 변경되는 과정에서 이벤트 핸들러가 동작되어야 할 때 자연스럽게 아무것도 처리하지 않게 한다.
코드 라인 49에서 tick 디바이스에 클럭 이벤트 디바이스를 연결한다.
코드 라인 55~56에서 새 클럭 이벤트 디바이스가 cpu 내장형(per-cpu)이 아니면 현재 cpu에서 새 틱 디바이스의 irq를 처리할 수 있도록 affinity 설정을 한다.
- FEAT_IRQDYN 기능이 있는 클럭 이벤트 디바이스만이 irq를 요청한 cpu로 연결할 수 있다.
- 브로드캐스트 목적의 인터럽트가 특정 cpu에 고정되지 않고 cpu를 선택(set_irq_affinity)하여 사용할 수 있으며 다음 드라이버에서 사용되고 있다.
  - armv7 또는 armv8 아키텍처에 내장된 generic 타이머의 메모리 mapped 드라이버 – “arm,armv7-timer-mem”
  - “st,nomadik-mtu”
코드 라인 65~66에서 새 클럭 이벤트 디바이스가 브로드캐스트 디바이스로 동작하는 경우 처리가 완료되었으므로 함수를 빠져나간다.
코드 라인 68~71에서 tick 디바이스의 모드에 맞게 틱 디바이스를 periodic 및 oneshot 모드로 설정한다.

tick_nohz_full_cpu()

include/linux/tick.h

static inline bool tick_nohz_full_cpu(int cpu)
{
        if (!tick_nohz_full_enabled())
                return false;

        return cpumask_test_cpu(cpu, tick_nohz_full_mask);
}

현재 cpu가 nohz full로 운영되는지 여부를 반환한다.

tick_nohz_full_enabled()

include/linux/tick.h

static inline bool tick_nohz_full_enabled(void)
{
        if (!context_tracking_is_enabled())
                return false;

        return tick_nohz_full_running;
}

nohz full로 운영되는지 여부를 반환한다.

틱 디바이스를 브로드캐스트 디바이스로 등록

tick_device_uses_broadcast()

kernel/time/tick-broadcast.c

/*
 * Check, if the device is disfunctional and a place holder, which
 * needs to be handled by the broadcast device.
 */

int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu)
{
        struct clock_event_device *bc = tick_broadcast_device.evtdev;
        unsigned long flags;
        int ret = 0;

        raw_spin_lock_irqsave(&tick_broadcast_lock, flags);

        /*
         * Devices might be registered with both periodic and oneshot
         * mode disabled. This signals, that the device needs to be
         * operated from the broadcast device and is a placeholder for
         * the cpu local device.
         */
        if (!tick_device_is_functional(dev)) {
                dev->event_handler = tick_handle_periodic;
                tick_device_setup_broadcast_func(dev);
                cpumask_set_cpu(cpu, tick_broadcast_mask);
                if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
                        tick_broadcast_start_periodic(bc);
                else
                        tick_broadcast_setup_oneshot(bc);
                ret = 1;
        } else {
                /*
                 * Clear the broadcast bit for this cpu if the
                 * device is not power state affected.
                 */
                if (!(dev->features & CLOCK_EVT_FEAT_C3STOP))
                        cpumask_clear_cpu(cpu, tick_broadcast_mask);
                else
                        tick_device_setup_broadcast_func(dev);

                /*
                 * Clear the broadcast bit if the CPU is not in
                 * periodic broadcast on state.
                 */
                if (!cpumask_test_cpu(cpu, tick_broadcast_on))
                        cpumask_clear_cpu(cpu, tick_broadcast_mask);

틱 디바이스 상태에 따라 브로드캐스트 디바이스로 동작시킬지 아닐지 여부를 다음과 같이 결정한다.

더미 틱 디바이스인 경우 현재 cpu에 대해 브로드캐스트 디바이스로 동작시킨다. 결과=1
oneshot 틱 디바이스인 경우 현재 cpu를 브로드캐스트에서 제외한다. 결과=0
periodic 모드인 경우 브로드캐스트할 cpu가 없는 경우 shutdown한다. 결과=현재 cpu가 브로드캐스트에 포함되어 있는지 여부

코드 라인 15~23에서 클럭 이벤트 디바이스가 periodic 및 oneshot 기능이 없는 더미인 경우 핸들러 함수와 브로드캐스트 함수를 설정한다. 그리고 현재 cpu를 tick_broadcast_mask에 추가한다.
- 틱 브로드캐스트 디바이스가 더미 디바이스 상태에 있을 때에도 핸들러 함수가 정상적으로 호출되도록 설정한다.
- 더미 디바이스를 틱 브로드캐스트 디바이스로 사용하고 결과로 1을 반환한다. 모드가 periodic을 지원하면 periodic 모드로 동작시키고 그렇지 않으면 oneshot 모드로 동작시킨다.
코드 라인 24~32에서 절전을 위한 c3stop 기능이 없는 경우 tick_broadcast_mask에서 요청 cpu를 제외시킨다. 반대로 c3stop 기능이 있는 경우 현재 디바이스에 브로드캐스트 함수를 설정한다.
- c3stop 기능이 없으면 타이머 전원을 끄지 않고 계속 동작시킨다.
코드 라인 38~39에서 periodic용 tick_broadcast_on에 현재 cpu가 포함되지 않은 경우 tick_broadcast_mask에서 현재 cpu를 클리어한다.

                switch (tick_broadcast_device.mode) {
                case TICKDEV_MODE_ONESHOT:
                        /*
                         * If the system is in oneshot mode we can
                         * unconditionally clear the oneshot mask bit,
                         * because the CPU is running and therefore
                         * not in an idle state which causes the power
                         * state affected device to stop. Let the
                         * caller initialize the device.
                         */
                        tick_broadcast_clear_oneshot(cpu);
                        ret = 0;
                        break;

                case TICKDEV_MODE_PERIODIC:
                        /*
                         * If the system is in periodic mode, check
                         * whether the broadcast device can be
                         * switched off now.
                         */
                        if (cpumask_empty(tick_broadcast_mask) && bc)
                                clockevents_shutdown(bc);
                        /*
                         * If we kept the cpu in the broadcast mask,
                         * tell the caller to leave the per cpu device
                         * in shutdown state. The periodic interrupt
                         * is delivered by the broadcast device, if
                         * the broadcast device exists and is not
                         * hrtimer based.
                         */
                        if (bc && !(bc->features & CLOCK_EVT_FEAT_HRTIMER))
                                ret = cpumask_test_cpu(cpu, tick_broadcast_mask);
                        break;
                default:
                        break;
                }
        }
        raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
        return ret;
}

코드 라인 1~13에서 틱 브로드캐스트 디바이스가 oneshot 모드로 동작하는 경우 요청 cpu를 브로드캐스트 대상에서 제외하고 결과로 0을 반환한다.
- tick_broadcast_oneshot_mask와 tick_broadcast_pending_mask에서 요청 cpu를 제외시킨다.
코드 라인 15~33에서 틱 브로드캐스트 디바이스가 periodic 모드로 동작하는 경우 요청 cpu가 브로드캐스트 대상에 포함되어 있는지 여부를 반환한다. 만일 브로드캐스트 대상이 없는 경우 디바이스를 shutdown 한다.

tick_device_is_functional()

kernel/time/tick-internal.h

/*
 * Check, if the device is functional or a dummy for broadcast
 */
static inline int tick_device_is_functional(struct clock_event_device *dev)
{
        return !(dev->features & CLOCK_EVT_FEAT_DUMMY);
}

클럭 이벤트 디바이스가 periodic 및 oneshot 기능 중 하나가 있는 경우 true를 반환한다.

dummy 디바이스는 periodic 및 oneshot 기능 모두 없다.

tick_device_setup_broadcast_func()

kernel/time/tick-broadcast.c

static void tick_device_setup_broadcast_func(struct clock_event_device *dev)
{
        if (!dev->broadcast)
                dev->broadcast = tick_broadcast;
        if (!dev->broadcast) {
                pr_warn_once("%s depends on broadcast, but no broadcast function available\n",
                             dev->name);
                dev->broadcast = err_broadcast;
        }
}

틱 디바이스에 브로드캐스트 함수가 설정되지 않은 경우 브로드캐스트 함수를 설정한다.

틱 브로드캐스트 함수에서 대상 cpu로 IPI(Inter Process Interrupt)를 발생시킨다.

tick_broadcast()

arch/arm/kernel/smp.c

#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
void tick_broadcast(const struct cpumask *mask)
{
        smp_cross_call(mask, IPI_TIMER);
}
#endif

대상 cpu들을 IPI(Inter Process Interrupt)로 깨워 IPI _TIMER 기능에 대한 처리를 수행하게 요청한다.

tick_broadcast_start_periodic()

kernel/time/tick-broadcast.c

/*
 * Start the device in periodic mode
 */
static void tick_broadcast_start_periodic(struct clock_event_device *bc)
{
        if (bc)
                tick_setup_periodic(bc, 1);
}

요청한 클럭 이벤트 디바이스를 periodic 모드의 브로드캐스트 디바이스로 사용하여 tick을 발생시킨다.

틱 디바이스 모드 설정

틱 디바이스를 periodic 모드로 설정

tick_setup_periodic()

/*
 * Setup the device for a periodic tick
 */

void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
{
        tick_set_periodic_handler(dev, broadcast);

        /* Broadcast setup ? */
        if (!tick_device_is_functional(dev))
                return;

        if ((dev->features & CLOCK_EVT_FEAT_PERIODIC) &&
            !tick_broadcast_oneshot_active()) {
                clockevents_set_mode(dev, CLOCK_EVT_MODE_PERIODIC);
        } else {
                unsigned long seq;
                ktime_t next; 

                do {
                        seq = read_seqbegin(&jiffies_lock);
                        next = tick_next_period;
                } while (read_seqretry(&jiffies_lock, seq));

                clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);

                for (;;) {
                        if (!clockevents_program_event(dev, next, false))
                                return;
                        next = ktime_add(next, tick_period);
                }
        }
}

요청한 클럭 이벤트 디바이스를 periodic 모드의 틱 디바이스로 사용하여 tick을 발생시킨다. broadcast=1로 요청한 경우 브로드캐스트 핸들러를 준비한다.

코드 라인 3에서 periodic 핸들러 함수를 지정한다. @broadcast가 요청된 경우 broadcast용 periodic 핸들러 함수를 지정한다.
코드 라인 6~7에서 클럭 이벤트 디바이스가 dummy 디바이스인 경우 이미 브로드 캐스트 디바이스이므로 함수를 빠져나간다.
코드 라인 9~11에서 클럭 이벤트 디바이스 기능에 periodic이 있고 틱 브로드캐스트 디바이스가 oneshot 모드가 아닌 경우 클럭 이벤트 디바이스를 periodic로 설정한다.
코드 라인 12~21에서 1 tick을 더해 다음 만료 시간을 지정한 후 클럭 이벤트 디바이스를 oneshot 모드로 설정한다.
코드 라인 23~27에서 클럭 이벤트 디바이스에 프로그램하여 성공하면 함수를 빠져나간다. 만일 실패하는 경우 1 tick 만큼 만료 시간을 더해 성공할 때까지 루프를 돌며 시도한다.

tick_set_periodic_handler()

kernel/time/tick-broadcast.c

/*
 * Set the periodic handler depending on broadcast on/off
 */

void tick_set_periodic_handler(struct clock_event_device *dev, int broadcast)
{
        if (!broadcast)
                dev->event_handler = tick_handle_periodic;
        else
                dev->event_handler = tick_handle_periodic_broadcast;
}

periodic 핸들러 함수를 지정한다. @broadcast가 요청된 경우 broadcast용 periodic 핸들러 함수를 지정한다.

틱 디바이스를 oneshot 모드로 설정

tick_setup_oneshot()

kernel/time/tick-oneshot.c

/**
 * tick_setup_oneshot - setup the event device for oneshot mode (hres or nohz)
 */

void tick_setup_oneshot(struct clock_event_device *newdev,
                        void (*handler)(struct clock_event_device *),
                        ktime_t next_event)
{
        newdev->event_handler = handler;
        clockevents_set_mode(newdev, CLOCK_EVT_MODE_ONESHOT);
        clockevents_program_event(newdev, next_event, true);
}

oneshot 모드로 클럭이벤트 디바이스를 설정하고 핸들러를 대입한 후 이벤트를 프로그램한다.

tick_is_broadcast_device()

kernel/time/tick-broadcast.c

/*
 * Check, if the device is the broadcast device
 */

int tick_is_broadcast_device(struct clock_event_device *dev)
{
        return (dev && tick_broadcast_device.evtdev == dev);
}

요청한 클럭 이벤트 디바이스가 브로드캐스트 디바이스인지 여부를 반환한다.

틱 디바이스 이벤트 핸들러

틱 디바이스 핸들러 -1- (hz based, 저해상도 타이머)

tick_handle_periodic()

kernel/time/tick-common.c

/*
 * Event handler for periodic ticks
 */

void tick_handle_periodic(struct clock_event_device *dev)
{
        int cpu = smp_processor_id();
        ktime_t next = dev->next_event;

        tick_periodic(cpu);
#if defined(CONFIG_HIGH_RES_TIMERS) || defined(CONFIG_NO_HZ_COMMON)
        /*
         * The cpu might have transitioned to HIGHRES or NOHZ mode via
         * update_process_times() -> run_local_timers() ->
         * hrtimer_run_queues().
         */
        if (dev->event_handler != tick_handle_periodic)
                return;
#endif
        if (dev->mode != CLOCK_EVT_MODE_ONESHOT)
                return;
        for (;;) {
                /*
                 * Setup the next period for devices, which do not have
                 * periodic mode:
                 */
                next = ktime_add(next, tick_period);

                if (!clockevents_program_event(dev, next, false))
                        return;
                /*
                 * Have to be careful here. If we're in oneshot mode,
                 * before we call tick_periodic() in a loop, we need
                 * to be sure we're using a real hardware clocksource.
                 * Otherwise we could get trapped in an infinite
                 * loop, as the tick_periodic() increments jiffies,
                 * which then will increment time, possibly causing
                 * the loop to trigger again and again.
                 */
                if (timekeeping_valid_for_hres())
                        tick_periodic(cpu);
        }
}

틱 디바이스가 periodic 모드로 동작하여 인터럽트가 발생하면 이 함수가 호출되며 이 때 tick_periodic() 함수를 호출한다. oneshot 모드를 사용하는 클럭 이벤트 디바이스인 경우 다음 tick을 프로그램한다.

코드 라인 6에서 정규 틱 마다 할 일(스케줄, account process, rcu, irq work, posix cpu 타이머 등)을 처리한다. 추가로 현재 cpu가 do_timer() 호출 전담인 경우 jiffies를 1 증가시키고 wall time 등을 갱신한다.
코드 라인 13~14에서 periodic 핸들러가 아닌 경우 함수를 빠져나간다.
코드 라인 16~17에서 클럭 이벤트 디바이스가 이미 oneshot 모드가 아닌 경우 더 이상 처리를 할 필요 없으므로 핸들러를 빠져나간다.
- 결국 periodic 모드에서는 tick 마다 인터럽트 발생하고 jiffies++하고 wall time 만 갱신한다.
코드 라인 18~38에서 tick에 대한 시간을 추가하여 oneshot 이벤트를 프로그램 한다. 프로그램에 실패한 경우 timekeeping용 클럭 소스에서 CLOCK_SOURCE_VALID_FOR_HRES 플래그를 사용했으면 다시 한 번 jiffies를 1 증가시키고 wall time 등을 갱신하게 한다. 그 후 다시 반복한다.

tick_periodic()

kernel/time/tick-common.c

/*
 * Periodic tick
 */

static void tick_periodic(int cpu)
{
        if (tick_do_timer_cpu == cpu) {
                write_seqlock(&jiffies_lock);

                /* Keep track of the next tick event */
                tick_next_period = ktime_add(tick_next_period, tick_period);

                do_timer(1);
                write_sequnlock(&jiffies_lock);
                update_wall_time();
        }                          

        update_process_times(user_mode(get_irq_regs()));
        profile_tick(CPU_PROFILING);
}

정규 틱 마다 할 일(스케줄, account process, rcu, irq work, posix cpu 타이머 등)을 처리한다. 추가로 현재 cpu가 do_timer() 호출 전담인 경우jiffies를 1 증가시키고 글로벌 로드를 계산하고 wall time 등을 갱신한다

코드 라인 3~10에서 현재 cpu가 do_timer() 호출 전담인 경우 다음 틱 시간을 갱신하고 jiffies를 1 증가시키고 global load를 계산한다.
- 32bit arm은 jiffies 증가 시 64비트인 jiffies_64를 사용하므로 접근 시에는 반드시 시퀀스 락을 사용하여야 한다.
코드 라인 11에서 wall time을 갱신한다.
코드 라인 14에서 cpu 처리 시간을 갱신한다.
코드 라인 15에서 cpu profile 정보를 남긴다

do_timer()

kernel/time/timekeeping.c

/*              
 * Must hold jiffies_lock
 */

void do_timer(unsigned long ticks)
{
        jiffies_64 += ticks;
        calc_global_load(ticks);
}

jiffies 값을 요청한 ticks 만큼 증가시키고 global load를 계산한다.

global load 산출
- 계산 간격이 10 tick을 초과한 경우 1min, 5min, 15min 간격의 로드 평균 값을 avenrun[]에 갱신한다.
- 참고:
  - Scheduler -2- (cpu load & PELT) | 문c
  - Linux kernel load accounting | HAPPY HACKING

profile_tick()

kernel/profile.c

void profile_tick(int type)
{
        struct pt_regs *regs = get_irq_regs();

        if (!user_mode(regs) && prof_cpu_mask != NULL &&
            cpumask_test_cpu(smp_processor_id(), prof_cpu_mask))
                profile_hit(type, (void *)profile_pc(regs));
}

cpu profile 정보를 남긴다.

코드 라인 3에서 마지막 exception 프레임의 주소가 담긴 현재 cpu의 프레임 포인터를 알아온다.
코드 라인 5~7에서 커널에서 exception되었으며 prof_cpu_mask에 현재 cpu가 포함된 경우 cpu profile 정보를 남긴다.

profile_hit()

include/linux/profile.h

/*
 * Single profiler hit:
 */

static inline void profile_hit(int type, void *ip)
{
        /*
         * Speedup for the common (no profiling enabled) case:
         */
        if (unlikely(prof_on == type))
                profile_hits(type, ip, 1);
}

커널이 요청 타입에 대한 profile 타입을 준비한 경우 ip 주소에 대해 profile 정보를 남긴다.

요청 타입이 profiling 중인 타입이 아닌 거나 profile 버퍼가 설정되지 않은 경우 처리를 포기한다.
- cpu, sched, sleep, kvm profiling 중 하나를 선택할 수 있다.
- “profile=[schedule,]<number>”
  - 스케쥴 포인트에 대해 profile
  - <number>
    - step/bucket 사이즈를 2의 차수 값을 사용하며 통계 시간 기반 profile
- “profile=sleep”
  - D-state sleeping (ms)에 대한 profile
- “profile=kvm”
  - VM 종료에 대한 profile
참고:
- Profiling the kernel
- kerneltop | LWN.net

update_process_times()

kernel/time/timer.c

/*
 * Called from the timer interrupt handler to charge one tick to the current
 * process.  user_tick is 1 if the tick is user time, 0 for system.
 */

void update_process_times(int user_tick)
{
        struct task_struct *p = current;

        /* Note: this timer irq context must be accounted for as well. */
        account_process_tick(p, user_tick);
        run_local_timers();
        rcu_sched_clock_irq(user_tick);
#ifdef CONFIG_IRQ_WORK
        if (in_irq())
                irq_work_tick();
#endif
        scheduler_tick();
        if (IS_ENABLED(CONFIG_POSIX_TIMERS))
                run_posix_cpu_timers();
}

매 틱마다 호출되는 처리해야 하는 루틴들을 묶어놓은 함수이다.

코드 라인 6에서 cpu가 처리되는 시간 비율을 측정한다. user, system 및 idle 타임으로 구분하여 측정한다
코드 라인 7에서 만료된 hrtimer 및 timer 함수를 호출한다.
- hrtimer는 periodic 모드에서만 호출된다. oneshot 모드는 별도의 hrtimer_interrupt() 핸들러에서 직접 처리한다.
- timer는 softirq에서 처리하도록 raise 한다.
코드 라인 8에서 rcu core에 대한 처리를 수행한다.
코드 라인 10~11에서 enque되어 대기중인 irq_work를 처리한다.
코드 라인 13에서 현재 동작중인 태스크의 스캐줄러에 스케줄 틱을 호출한다.
코드 라인 14~15에서 posix cpu 타이머에 대한 처리를 수행한다.

run_local_timers()

kernel/time/timer.c

/*
 * Called by the local, per-CPU timer interrupt on SMP.
 */

void run_local_timers(void)
{
        struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);

        hrtimer_run_queues();
        /* Raise the softirq only if required. */
        if (time_before(jiffies, base->clk)) {
                if (!IS_ENABLED(CONFIG_NO_HZ_COMMON))
                        return;
                /* CPU is awake, so check the deferrable base. */
                base++;
                if (time_before(jiffies, base->clk))
                        return;
        }
        raise_softirq(TIMER_SOFTIRQ);
}

만료된 hrtimer 및 timer를 처리한다.

코드 라인 5에서 periodic 모드에서 호출된 경우 hrtimer 해시 리스트에서 대기중인 만료된 hrtimer 함수들을 처리하도록 호출한다.
코드 라인 7~14에서 jiffies 까지 모든 timer가 처리된 경우 함수를 빠져나간다.
코드 라인 15에서 타이머 휠을 처리하도록 softirq를 raise 한다.

틱 디바이스 핸들러 -2- (nohz 기반, 저해상도 타이머)

tick_nohz_handler()

kernel/time/tick-sched.c

/*
 * The nohz low res interrupt handler
 */

static void tick_nohz_handler(struct clock_event_device *dev)
{
        struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
        struct pt_regs *regs = get_irq_regs();
        ktime_t now = ktime_get();

        dev->next_event.tv64 = KTIME_MAX;

        tick_sched_do_timer(now);
        tick_sched_handle(ts, regs);

        /* No need to reprogram if we are running tickless  */
        if (unlikely(ts->tick_stopped))
                return;

        while (tick_nohz_reprogram(ts, now)) {
                now = ktime_get();
                tick_do_update_jiffies64(now);
        }
}

nohz 기반의 저해상도 타이머를 사용한 틱 인터럽트 핸들러 루틴이다.

코드 라인 9에서 jiffies를 update하고 wall time을 조정한다.
코드 라인 10에서 process accounting 및 profile에 관련한 일들을 처리한다.
코드 라인 13~14에서 낮은 확률로 tick이 정지된 경우 함수를 빠져나간다.
코드 라인 16~19에서 tick을 재프로그램한다. 실패 시 반복한다.

틱 디바이스 핸들러 -3- (hz/nohz 기반, 고해상도 타이머)

hrtimer를 사용하여 tick을 프로그래밍한 인터럽트로 인해 hrtimer_interrupt()가 호출되면 등록된 tick 스케쥴 핸들러에 도달한다.

tick_sched_timer()

kernel/time/tick-sched.c

/*      
 * We rearm the timer until we get disabled by the idle code.
 * Called with interrupts disabled.
 */

static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
{               
        struct tick_sched *ts =
                container_of(timer, struct tick_sched, sched_timer);
        struct pt_regs *regs = get_irq_regs();
        ktime_t now = ktime_get();

        tick_sched_do_timer(now);

        /*
         * Do not call, when we are not in irq context and have
         * no valid regs pointer
         */
        if (regs)
                tick_sched_handle(ts, regs);
        else
                ts->next_tick = 0;

        /* No need to reprogram if we are in idle or full dynticks mode */
        if (unlikely(ts->tick_stopped))
                return HRTIMER_NORESTART;

        hrtimer_forward(timer, now, tick_period);

        return HRTIMER_RESTART;
}

hz/nohz 기반의 고해상도 타이머를 사용한 틱 인터럽트 핸들러 루틴이다.

코드 라인 8에서 jiffies를 update하고 wall time을 조정한다.
코드 라인 14~17에서 process accounting 및 profile에 관련한 일들을 처리한다.
코드 라인 20~21에서 틱 스케쥴러에서 틱을 멈춰달라는 요청이 있는 경우 HRTIMER_NORESTART를 결과로 함수를 빠져나간다.
코드 라인 23~25에서 틱을 다시 프로그램하고 HRTIMER_RESTART를 결과로 함수를 빠져나간다.

Tick 디바이스의 Oneshot 모드 전환

periodic 틱에서 매번 oneshot 준비 상태 확인

틱 디바이스가 periodic 모드로 동작할 때 매 tick 인터럽트마다 호출되어 처리되는 함수 중 update_process_times() -> run_local_timers() -> hrtimer_run_queues() 함수 내에서 periodic 모드용 hrtimer(for hardirq)를 처리한다. hrtimer_run_queues() 함수 내부에서 high-resolution hw 타이머가 준비되어 틱 모드를 oneshot으로 전환할 수 있는지 여부를 체크하는 tick_check_oneshot_change() 함수를 통해 oneshot으로 전환된다. 이렇게 oneshot 모드 전환되면 hrtimer 처리 경로가 바뀌게 된다.

hrtimer_run_queues()

kernel/time/hrtimer.c

/*
 * Called from run_local_timers in hardirq context every jiffy
 */

void hrtimer_run_queues(void)
{
        struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases);
        unsigned long flags;
        ktime_t now;

        if (__hrtimer_hres_active(cpu_base))
                return;

        /*
         * This _is_ ugly: We have to check periodically, whether we
         * can switch to highres and / or nohz mode. The clocksource
         * switch happens with xtime_lock held. Notification from
         * there only sets the check bit in the tick_oneshot code,
         * otherwise we might deadlock vs. xtime_lock.
         */
        if (tick_check_oneshot_change(!hrtimer_is_hres_enabled())) {
                hrtimer_switch_to_hres();
                return;
        }

        raw_spin_lock_irqsave(&cpu_base->lock, flags);
        now = hrtimer_update_base(cpu_base);

        if (!ktime_before(now, cpu_base->softirq_expires_next)) {
                cpu_base->softirq_expires_next = KTIME_MAX;
                cpu_base->softirq_activated = 1;
                raise_softirq_irqoff(HRTIMER_SOFTIRQ);
        }

        __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD);
        raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
}

hrtimer 해시 리스트에서 대기중인 만료된 hrtimer를 처리한다. 이 루틴은 periodic 틱에서만 동작한다.

코드 라인 7~8에서 oneshot 모드로 전환하여 이미 hres 타이머가 active된 경우 함수를 빠져나간다.
코드 라인 17~20에서 oneshot 모드로 전환한다. 만일 hres 타이머를 사용할 수 있는 경우 hres 타이머용 oneshot/nohz 모드로 전환한다.
- oneshot 모드 전환 시 두 가지 모드 지원
  - low-resolution 타이머용 nohz 전환
  - high-reslution 타이머용 nohz 전환
코드 라인 23에서 periodic 모드인 경우에만 처리되는데 현재 cpu의 hrtimer 시각을 알아온다.
코드 라인 25~29에서 softirq용 hrtimer가 만료된 경우 이를 처리하도록 softirq를 raise 한다.
코드 라인 31에서 hardirq용 hrtimer에 대한 처리를 수행한다.

hrtimer_hres_active()

kernel/time/hrtimer.c

static inline int hrtimer_hres_active(void)
{
        return __hrtimer_hres_active(this_cpu_ptr(&hrtimer_bases));
}

현재 cpu의 hrtimer가 고해상도 모드로 동작중인지 여부를 반환한다.

/*
 * Is the high resolution mode active ?
 */

static inline int __hrtimer_hres_active(struct hrtimer_cpu_base *cpu_base) 
{
        return IS_ENABLED(CONFIG_HIGH_RES_TIMERS) ?
                cpu_base->hres_active : 0; 
}

@cpu_base의 hrtimer가 고해상도 모드로 동작중인지 여부를 반환한다.

hrtimer_is_hres_enabled()

kernel/time/hrtimer.c

/*
 * hrtimer_high_res_enabled - query, if the highres mode is enabled
 */

static inline int hrtimer_is_hres_enabled(void) 
{
        return hrtimer_hres_enabled;
}

고해상도모드의 hrtimer가 enable되어 있는지 여부를 반환한다.

디폴트 값으로 on되어 있고 “highres=off” 커널 파라메터를 사용하여 disable할 수 있다.

tick_check_oneshot_change()

kernel/time/tick-sched.c

/**
 * Check, if a change happened, which makes oneshot possible.
 *
 * Called cyclic from the hrtimer softirq (driven by the timer
 * softirq) allow_nohz signals, that we can switch into low-res nohz
 * mode, because high resolution timers are disabled (either compile
 * or runtime). Called with interrupts disabled.
 */

int tick_check_oneshot_change(int allow_nohz)
{
        struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);

        if (!test_and_clear_bit(0, &ts->check_clocks))
                return 0;

        if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
                return 0;

        if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
                return 0;

        if (!allow_nohz)
                return 1;

        tick_nohz_switch_to_nohz();
        return 0;
}

oneshot 모드로의 변경이 일어난 경우 체크한다.

코드 라인 5~6에서 tick_sched의 check_clocks에서 0번 cpu에 해당하는 비트가 설정된 경우 클리어한다. 설정되지 않은 경우 0을 반환한다.
코드 라인 8~9에서 nohz 모드가 활성화되지 않은 경우 0을 반환한다.
코드 라인 11~12에서 timekeeping에 사용하는 클럭 이벤트 소스의 플래그에 valid_for_hres가 설정되지 않았거나 틱 이벤트 디바이스가 oneshot 모드가 아닌 경우 0을 반환한다.
코드 라인 14~15에서 인수 allow_nohz가 0인 경우 1을 반환한다.
코드 라인 17~18에서 low-resolution 타이머용 nohz 모드로 전환하고 hrtimer를 사용하여 다음 스케쥴 틱을 프로그래밍 한다.

oneshot 모드 및 low-resolution nohz 전환

tick_nohz_switch_to_nohz()

kernel/time/tick-sched.c

/**
 * tick_nohz_switch_to_nohz - switch to nohz mode
 */

static void tick_nohz_switch_to_nohz(void)
{
        struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
        ktime_t next;

        if (!tick_nohz_enabled)
                return;

        if (tick_switch_to_oneshot(tick_nohz_handler))
                return;

        /*
         * Recycle the hrtimer in ts, so we can share the
         * hrtimer_forward with the highres code.
         */
        hrtimer_init(&ts->sched_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
        /* Get the next period */
        next = tick_init_jiffy_update();

        hrtimer_set_expires(&ts->sched_timer, next);
        hrtimer_forward_now(&ts->sched_timer, tick_period);
        tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
        tick_nohz_activate(ts, NOHZ_MODE_LOWRES);
}

oneshot 및 nohz 모드로 전환하고 hrtimer를 사용하여 다음 스케쥴 틱을 프로그래밍 한다.

코드 라인 6~7에서 nohz가 enable되어 있지 않은 경우 함수를 빠져나간다.
코드 라인 9~10에서 틱 디바이스를 oneshot 모드로 전환시키고 처리할 핸들러를 준비한다. 만일 전환이 실패하는 경우 함수를 빠져나간다.
코드 라인 16~22에서 hrtimer를 사용하여 1 jiffies 틱을 프로그램한다.
코드 라인 23에서 nohz 모드가 저해상도 타이머 모드로 동작됨을 틱 sched에 표시한다.

tick_switch_to_oneshot()

kernel/time/tick-oneshot.c

/**
 * tick_switch_to_oneshot - switch to oneshot mode
 */

int tick_switch_to_oneshot(void (*handler)(struct clock_event_device *))
{
        struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
        struct clock_event_device *dev = td->evtdev;

        if (!dev || !(dev->features & CLOCK_EVT_FEAT_ONESHOT) ||
                    !tick_device_is_functional(dev)) {

                printk(KERN_INFO "Clockevents: "
                       "could not switch to one-shot mode:");
                if (!dev) {
                        printk(" no tick device\n");
                } else {
                        if (!tick_device_is_functional(dev))
                                printk(" %s is not functional.\n", dev->name);
                        else
                                printk(" %s does not support one-shot mode.\n",
                                       dev->name);
                }
                return -EINVAL;
        }

        td->mode = TICKDEV_MODE_ONESHOT;
        dev->event_handler = handler;
        clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);
        tick_broadcast_switch_to_oneshot();
        return 0;
}

틱 디바이스를 oneshot 모드로 전환시키고 처리할 핸들러를 준비한다.

이 함수는 다음 두 군데에서 호출되어 사용된다.
- 고해상도 타이머가 필요할 때 hrtimer_switch_to_hres() 함수 내부의 tick_init_hres() 함수에서 사용된다.
- dynamic tick을 처리하기 위해 nohz 모드로 전환하기 위해 tick_nohz_switch_to_nohz() 함수에서 사용된다.

코드 라인 6~21에서 clock_event_device가 준비되지 않았거나 oneshot 모드로 동작하지 않는 경우 에러 메시지를 출력하고 -EINVAL 에러를 반환한다.
코드 라인 23~25에서 인수로 받은 이벤트 핸들러를 디바이스에 대입한다. 또한 clock_event_device의 set_mode() 후크를 통해 드라이버에서 oneshot 모드를 설정하게 한다.
- rpi2: arch_timer_set_mode_virt() 호출되는데 shutdown 모드가 아닌 oneshot 모드에서는 특별히 설정하는 것이 없다.
코드 라인 26에서 tick broadcast 디바이스를 위해 oneshot 모드로 변경한다.

tick_broadcast_switch_to_oneshot()

kernel/time/tick-broadcast.c

/*  
 * Select oneshot operating mode for the broadcast device
 */

void tick_broadcast_switch_to_oneshot(void)
{       
        struct clock_event_device *bc;
        unsigned long flags;
        
        raw_spin_lock_irqsave(&tick_broadcast_lock, flags);

        tick_broadcast_device.mode = TICKDEV_MODE_ONESHOT;
        bc = tick_broadcast_device.evtdev;
        if (bc)
                tick_broadcast_setup_oneshot(bc);

        raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
}

tick broadcast 디바이스의 모드를 oneshot 모드로 바꾸고 tick broadcast 디바이스의 클럭 이벤트 디바이스가 준비된 경우 다음 tick을 설정한다.

oneshot 모드 및 high-resolution nohz 전환

hrtimer_switch_to_hres()

kernel/time/hrtimer.c

/*
 * Switch to high resolution mode
 */

static void hrtimer_switch_to_hres(void)
{
        struct hrtimer_cpu_base *base = this_cpu_ptr(&hrtimer_bases);

        if (tick_init_highres()) {
                pr_warn("Could not switch to high resolution mode on CPU %u\n",
                        base->cpu);
                return;
        }
        base->hres_active = 1;
        hrtimer_resolution = HIGH_RES_NSEC;

        tick_setup_sched_timer();
        /* "Retrigger" the interrupt to get things going */
        retrigger_next_event(NULL);
}

hrtimer를 high resolution 모드로 설정하고 스케쥴 틱 타이머를 동작시킨다.

코드 라인 5~9에서 틱 디바이스를 oneshot 모드로 전환한다. 전환되지 않는 경우 경고 메시지를 출력하고 함수를 빠져나간다.
코드 라인 10~11에서 hrtimer가 high resolution 상태임을 설정하고 4 개 클럭의 해상도에 1ns를 대입한다.
코드 라인 13에서 스케줄 틱을 설정한다.
코드 라인 15에서 hrtimer를 리프로그램하고 1을 반환한다.

tick_init_highres()

kernel/time/tick-oneshot.c

/**
 * tick_init_highres - switch to high resolution mode
 *
 * Called with interrupts disabled.
 */

int tick_init_highres(void)
{
        return tick_switch_to_oneshot(hrtimer_interrupt);
}

고해상도 모드의 oneshot 타이머를 동작시키고 인터럽트 발생 시 hrtimer 인터럽트 핸들러 콜백 함수인 hrtimer_interrupt() 함수를 호출하게 한다.

고해상도 hrtimer를 사용한 Sched Tick 설정

tick_setup_sched_timer()

kernel/time/tick-sched.c

/**
 * tick_setup_sched_timer - setup the tick emulation timer
 */

void tick_setup_sched_timer(void)
{
        struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
        ktime_t now = ktime_get();

        /*
         * Emulate tick processing via per-CPU hrtimers:
         */
        hrtimer_init(&ts->sched_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
        ts->sched_timer.function = tick_sched_timer;

        /* Get the next period (per-CPU) */
        hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());

        /* Offset the tick to avert jiffies_lock contention. */
        if (sched_skew_tick) {
                u64 offset = ktime_to_ns(tick_period) >> 1;
                do_div(offset, num_possible_cpus());
                offset *= smp_processor_id();
                hrtimer_add_expires_ns(&ts->sched_timer, offset);
        }

        hrtimer_forward(&ts->sched_timer, now, tick_period);
        hrtimer_start_expires(&ts->sched_timer, HRTIMER_MODE_ABS_PINNED_HARD);
        tick_nohz_activate(ts, NOHZ_MODE_HIGHRES);
}

고해상도 hrtimer를 사용하여 틱 스케쥴 타이머를 가동한다. 또한 nohz 모드로 설정한다.

코드 라인 9~10에서 틱 스케쥴에서 사용할 hrtimer를 초기화하고 핸들러 함수를 지정한다.
- tick_sched_timer()에서 jiffies 관리 cpu를 지정하고, 매번 틱을 프로그램한다.
코드 라인 13에서 틱 스케쥴용 hrtimer의 만료 시간은 1 jiffies 기간으로 한다.
코드 라인 16~21에서 jiffies lock 혼잡을 피하기 위해 cpu 마다 offset을 산출하고 추가하여 틱 만료시간을 재설정한다.
코드 라인 23~24에서 hrtimer를 forward한 후 프로그램한다.
코드 라인 25에서 nohz 모드를 highres로 변경하고 active 되었음을 설정한다.

oneshot 틱 프로그램

tick_program_event()

kernel/time/tick-oneshot.c

/**
 * tick_program_event
 */

int tick_program_event(ktime_t expires, int force)
{
        struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);

        return clockevents_program_event(dev, expires, force);
}

요청한 만료 시간에 틱이 발생하도록 해당 cpu의 클럭 이벤트 디바이스에 프로그램한다.

틱 브로드캐스트 디바이스 핸들러

tick_handle_periodic_broadcast()

kernel/time/tick-broadcast.c

/*
 * Event handler for periodic broadcast ticks
 */

static void tick_handle_periodic_broadcast(struct clock_event_device *dev)
{
        struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
        bool bc_local;

        raw_spin_lock(&tick_broadcast_lock);

        /* Handle spurious interrupts gracefully */
        if (clockevent_state_shutdown(tick_broadcast_device.evtdev)) {
                raw_spin_unlock(&tick_broadcast_lock);
                return;
        }

        bc_local = tick_do_periodic_broadcast();

        if (clockevent_state_oneshot(dev)) {
                ktime_t next = ktime_add(dev->next_event, tick_period);

                clockevents_program_event(dev, next, true);
        }
        raw_spin_unlock(&tick_broadcast_lock);

        /*
         * We run the handler of the local cpu after dropping
         * tick_broadcast_lock because the handler might deadlock when
         * trying to switch to oneshot mode.
         */
        if (bc_local)
                td->evtdev->event_handler(td->evtdev);
}

브로드캐스트가 필요한 cpu를 대상으로 브로드캐스트하여 cpu를 idle 상태에서 깨운다. 클럭 이벤트 디바이스의 모드가 oneshot인 경우 틱을 프로그램한다.

코드 라인 9~12에서 브로드 캐스트에 사용할 클럭 이벤트 디바이스가 shutdown된 상태라면 함수를 빠져나간다.
코드 라인 14에서 브로드캐스트가 필요한 cpu를 대상으로 브로드캐스트하여 cpu를 idle 상태에서 깨운다.
코드 라인 16~20에서 클럭 이벤트 디바이스가 oneshot 모드인 경우 틱을 프로그래밍한다.
코드 라인 28~29에서 브로드 캐스트용 이벤트 핸들러를 호출한다.

다음 그림은 nohz idle 상태에 있는 cpu#1~#3 들을 깨우도록 브로드캐스트 하는 과정을 보여준다.

틱 브로드캐스트 디바이스가 periodic 모드로 동작

tick_handle_oneshot_broadcast()

kernel/time/tick-broadcast.c

/*
 * Handle oneshot mode broadcasting
 */

static void tick_handle_oneshot_broadcast(struct clock_event_device *dev)
{
        struct tick_device *td;
        ktime_t now, next_event;
        int cpu, next_cpu = 0;
        bool bc_local;

        raw_spin_lock(&tick_broadcast_lock);
        dev->next_event = KTIME_MAX;
        next_event = KTIME_MAX;
        cpumask_clear(tmpmask);
        now = ktime_get();
        /* Find all expired events */
        for_each_cpu(cpu, tick_broadcast_oneshot_mask) {
                /*
                 * Required for !SMP because for_each_cpu() reports
                 * unconditionally CPU0 as set on UP kernels.
                 */
                if (!IS_ENABLED(CONFIG_SMP) &&
                    cpumask_empty(tick_broadcast_oneshot_mask))
                        break;

                td = &per_cpu(tick_cpu_device, cpu);
                if (td->evtdev->next_event <= now) {
                        cpumask_set_cpu(cpu, tmpmask);
                        /*
                         * Mark the remote cpu in the pending mask, so
                         * it can avoid reprogramming the cpu local
                         * timer in tick_broadcast_oneshot_control().
                         */
                        cpumask_set_cpu(cpu, tick_broadcast_pending_mask);
                } else if (td->evtdev->next_event < next_event) {
                        next_event = td->evtdev->next_event;
                        next_cpu = cpu;
                }
        }

        /*
         * Remove the current cpu from the pending mask. The event is
         * delivered immediately in tick_do_broadcast() !
         */
        cpumask_clear_cpu(smp_processor_id(), tick_broadcast_pending_mask);

        /* Take care of enforced broadcast requests */
        cpumask_or(tmpmask, tmpmask, tick_broadcast_force_mask);
        cpumask_clear(tick_broadcast_force_mask);

        /*
         * Sanity check. Catch the case where we try to broadcast to
         * offline cpus.
         */
        if (WARN_ON_ONCE(!cpumask_subset(tmpmask, cpu_online_mask)))
                cpumask_and(tmpmask, tmpmask, cpu_online_mask);

        /*
         * Wakeup the cpus which have an expired event.
         */
        bc_local = tick_do_broadcast(tmpmask);

        /*
         * Two reasons for reprogram:
         *
         * - The global event did not expire any CPU local
         * events. This happens in dyntick mode, as the maximum PIT
         * delta is quite small.
         *
         * - There are pending events on sleeping CPUs which were not
         * in the event mask
         */
        if (next_event != KTIME_MAX)
                tick_broadcast_set_event(dev, next_cpu, next_event);

        raw_spin_unlock(&tick_broadcast_lock);

        if (bc_local) {
                td = this_cpu_ptr(&tick_cpu_device);
                td->evtdev->event_handler(td->evtdev);
        }
}

브로드캐스트가 필요한 cpu를 대상으로 브로드캐스트하여 cpu를 idle 상태에서 깨운다. 클럭 이벤트 디바이스의 모드가 oneshot인 경우 틱을 프로그램한다.

코드 라인 14~36에서 브로드 캐스트가 필요한 cpu들에 대해 틱 만료된 cpu들을 tmp 마스크에 알아온다.
코드 라인 42~46에서 현재 cpu는 pending 마스크에서 제거하고 tmp 마스크 대상에 force 마스크를 추가한다.
코드 라인 52~53에서 tmp 마스크에서 online된 cpu들은 제거한다.
코드 라인 58에서 tmp 마스크를 대상으로 브로드캐스트를 수행하여 idle 상태에 있는 cpu들을 모두 깨운다.
코드 라인 70~71에서 남아있는 이벤트가 있는 경우 브로드캐스트 디바이스로 사용되는 클럭 이벤트 디바이스의 모드를 oneshot으로 변경하고 프로그램한다.
코드 라인 75~78에서 등록된 브로드 캐스트용 이벤트 핸들러를 호출한다.

다음 그림은 nohz idle 상태에 있는 cpu#1~#3 들을 깨우도록 브로드캐스트 하는 과정을 보여준다.

틱 브로드캐스트 디바이스가 oneshot 모드로 동작

다음 그림은 nohz를 사용하여 idle 상태에 있는 3개의 cpu가 브로드캐스트에 의해 각 만료시간마다 깨어나는 과정을 보여준다.

별표와 같이 프로그램된 틱들이 브로드캐스트를 발송하는 cpu의 틱 타임에 맞춰 약간씩 지연되는 모습을 볼 수 있다.

tick_broadcast_set_event()

kernel/time/tick-broadcast.c

static void tick_broadcast_set_event(struct clock_event_device *bc, int cpu,
                                     ktime_t expires)
{
        if (!clockevent_state_oneshot(bc))
                clockevents_switch_state(bc, CLOCK_EVT_STATE_ONESHOT);

        clockevents_program_event(bc, expires, 1);
        tick_broadcast_set_affinity(bc, cpumask_of(cpu));
}

틱 브로드캐스트 디바이스에서 사용되는 클럭 이벤트 디바이스의 모드를 oneshot으로 변경하고 프로그램 한다. 만일 실패하는 경우 요청 cpu로 irq affinity를 설정한다.

tick_broadcast_set_affinity()

kernel/time/tick-broadcast.c

/*
 * Set broadcast interrupt affinity
 */

static void tick_broadcast_set_affinity(struct clock_event_device *bc,
                                        const struct cpumask *cpumask)
{
        if (!(bc->features & CLOCK_EVT_FEAT_DYNIRQ))
                return;

        if (cpumask_equal(bc->cpumask, cpumask))
                return;

        bc->cpumask = cpumask;
        irq_set_affinity(bc->irq, bc->cpumask);
}

클럭 이벤트 디바이스의 인터럽트를 요청한 cpu 들이 수신할 수 있도록 설정한다. 단 FEAT_DYNIRQ 기능이 없는 클럭 이벤트 디바이스는 수행할 수 없다.

C3Stop과 틱 브로드캐스트

타이머의 절전 기능이 구현된 SoC 즉, c3stop이 설정되었거나 per-cpu 타이머가 없는 경우 cpu가 deep idle 상태에 진입하고 나갈 때 마다 전역 비트맵에 알림 요청을 하고 다른 cpu에서 deep idle 상태의 cpu들을 broadcast 기능을 통해 깨울 수 있다.

cpuidle_setup_broadcast_timer()

drivers/cpuidle/driver.c

/**
 * cpuidle_setup_broadcast_timer - enable/disable the broadcast timer on a cpu
 * @arg: a void pointer used to match the SMP cross call API
 *
 * If @arg is NULL broadcast is disabled otherwise enabled
 *
 * This function is executed per CPU by an SMP cross call.  It's not
 * supposed to be called directly.
 */

static void cpuidle_setup_broadcast_timer(void *arg)
{
        if (arg)
                tick_broadcast_enable();
        else
                tick_broadcast_disable();
}

틱 브로드캐스트 기능을 켜거나 끈다.

tick_broadcast_enable()

include/linux/tick.h

static inline void tick_broadcast_enable(void)
{
        tick_broadcast_control(TICK_BROADCAST_ON);
}

틱 브로드캐스트 기능을 켠다.

tick_broadcast_disable()

include/linux/tick.h

static inline void tick_broadcast_disable(void)
{
        tick_broadcast_control(TICK_BROADCAST_OFF);
}

틱 브로드캐스트 기능을 끈다.

tick_broadcast_control()

kernel/time/tick-broadcast.c

/**
 * tick_broadcast_control - Enable/disable or force broadcast mode
 * @mode:       The selected broadcast mode
 *
 * Called when the system enters a state where affected tick devices
 * might stop. Note: TICK_BROADCAST_FORCE cannot be undone.
 */

void tick_broadcast_control(enum tick_broadcast_mode mode)
{
        struct clock_event_device *bc, *dev;
        struct tick_device *td;
        int cpu, bc_stopped;
        unsigned long flags;

        /* Protects also the local clockevent device. */
        raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
        td = this_cpu_ptr(&tick_cpu_device);
        dev = td->evtdev;

        /*
         * Is the device not affected by the powerstate ?
         */
        if (!dev || !(dev->features & CLOCK_EVT_FEAT_C3STOP))
                goto out;

        if (!tick_device_is_functional(dev))
                goto out;

        cpu = smp_processor_id();
        bc = tick_broadcast_device.evtdev;
        bc_stopped = cpumask_empty(tick_broadcast_mask);

        switch (mode) {
        case TICK_BROADCAST_FORCE:
                tick_broadcast_forced = 1;
                /* fall through */
        case TICK_BROADCAST_ON:
                cpumask_set_cpu(cpu, tick_broadcast_on);
                if (!cpumask_test_and_set_cpu(cpu, tick_broadcast_mask)) {
                        /*
                         * Only shutdown the cpu local device, if:
                         *
                         * - the broadcast device exists
                         * - the broadcast device is not a hrtimer based one
                         * - the broadcast device is in periodic mode to
                         *   avoid a hickup during switch to oneshot mode
                         */
                        if (bc && !(bc->features & CLOCK_EVT_FEAT_HRTIMER) &&
                            tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
                                clockevents_shutdown(dev);
                }
                break;

        case TICK_BROADCAST_OFF:
                if (tick_broadcast_forced)
                        break;
                cpumask_clear_cpu(cpu, tick_broadcast_on);
                if (cpumask_test_and_clear_cpu(cpu, tick_broadcast_mask)) {
                        if (tick_broadcast_device.mode ==
                            TICKDEV_MODE_PERIODIC)
                                tick_setup_periodic(dev, 0);
                }
                break;
        }

        if (bc) {
                if (cpumask_empty(tick_broadcast_mask)) {
                        if (!bc_stopped)
                                clockevents_shutdown(bc);
                } else if (bc_stopped) {
                        if (tick_broadcast_device.mode == TICKDEV_MODE_PERIODIC)
                                tick_broadcast_start_periodic(bc);
                        else
                                tick_broadcast_setup_oneshot(bc);
                }
        }
out:
        raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
}
EXPORT_SYMBOL_GPL(tick_broadcast_control);

현재 cpu의 상태 변화에 따른 브로드캐스트 여부를 결정할 수 있는 비트마스크 설정을 한다.

코드 라인 16~17에서 c3stop 기능이 없는 경우 함수를 빠져나간다.
코드 라인 19~20에서 기능이 구현되지 않은 dummy 디바이스인 경우 함수를 빠져나간다.
코드 라인 30~45에서 broadcast on 요청을 받은 경우 tick_broadcast_on 및 tick_broadcast_mask의 현재 cpu에 해당하는 비트를 설정한다. 만일 틱 브로드캐스트 모드가 periodic 이면서 tick이 발생중이면 tick을 정지시킨다.
코드 라인 47~56에서 broadcast off 요청을 받은 경우 tick_broadcast_on 및 tick_broadcast_mask의 현재 cpu에 해당하는 비트를 클리어한다. 만일 틱 브토드캐스트 모드가 periodic이면서 tick이 발생하지 않는 상태인 경우 tick을 발생시키도록 요청한다.
코드 라인 60~62에서 브로드캐스트할 cpu가 없는 경우 클럭이벤트 디바이스를 shutdown 한다.
코드 라인 63~68에서 브로드캐스트할 cpu가 있는 경우 틱 브로드캐스트 모드에 따라 periodic 또는 oneshot 모드로 설정한다.

다음 그림은 틱 브로드 캐스트를 켜고 끌 때 비트마스크들의 상황을 보여준다.

스케줄 틱 분석

먼저 타이머 인터럽트의 처리 경로를 다시 한 번 살펴보면 여러 개의 경로 중 rpi2의 경우 다음 타이머 인터럽트의 처리 경로가 있다.

tick_handle_periodic()
- 루틴내에서 bottom-half 처리를 목적으로 softirq를 사용하여 타이머휠에 대한 처리를 수행한다. (run_timer_softirq())
hrtimer_interrupt() → __run_hrtimer() → tick_sched_timer()
- hrtimer를 사용한 softirq는 커널 4.2.rc1부터 사용하지 않는다.

다음 그림은 스케줄틱이 처음 tick_handle_periodic() 함수로 처리되다가 hrtimer가 준비되면 tick_sched_timer() 함수로 이전되는 것을 보여준다.

nohz 진출입 함수

tick_nohz_idle_enter()
tick_nohz_idle_exit()

SMP 시스템에서 타이머 인터럽트가 발생되는 상황에 대해 다음 컴포넌트들의 상태와 비교하여 살펴보자

cpu 마다 스케줄 틱은 조금씩 어긋나게 프로그램된다.
타이머 인터럽트를 발생시키는 틱 디바이스 상태 변화
- SMP 코어 중 하나는 jiffies를 규칙적으로 발생하므로 nohz로 진입할 수 없다. 그 외의 cpu들은 nohz 상태에 진입가능하다.
타이머를 프로그램하는 클럭이벤트디바이스 상태 변화
- armv7 & armv8에 내장된 아키텍처 타이머는 oneshot으로만 프로그래밍 가능하므로 매번 CONFIG_HZ 단위로 틱을 프로그래밍해야 한다.

구조체

tick_device 구조체

kernel/time/tick-sched.h

struct tick_device {
        struct clock_event_device *evtdev;
        enum tick_device_mode mode;
};

*evtdev
- 틱 디바이스로 사용하는 클럭 이벤트 디바이스
mode
- TICKDEV_MODE_PERIODIC
  - nohz 모드를 사용하지 못하는 틱 디바이스 또는 부트업 과정
- TICKDEV_MODE_ONESHOT
  - nohz 모드를 사용할 수 있는 틱 디바이스(possible)

tick_sched 구조체

kernel/time/tick-sched.h

/**
 * struct tick_sched - sched tick emulation and no idle tick control/stats
 * @sched_timer:        hrtimer to schedule the periodic tick in high
 *                      resolution mode
 * @check_clocks:       Notification mechanism about clocksource changes
 * @nohz_mode:          Mode - one state of tick_nohz_mode
 * @inidle:             Indicator that the CPU is in the tick idle mode
 * @tick_stopped:       Indicator that the idle tick has been stopped
 * @idle_active:        Indicator that the CPU is actively in the tick idle mode;
 *                      it is resetted during irq handling phases.
 * @do_timer_lst:       CPU was the last one doing do_timer before going idle
 * @got_idle_tick:      Tick timer function has run with @inidle set
 * @last_tick:          Store the last tick expiry time when the tick
 *                      timer is modified for nohz sleeps. This is necessary
 *                      to resume the tick timer operation in the timeline
 *                      when the CPU returns from nohz sleep.
 * @next_tick:          Next tick to be fired when in dynticks mode.
 * @idle_jiffies:       jiffies at the entry to idle for idle time accounting
 * @idle_calls:         Total number of idle calls
 * @idle_sleeps:        Number of idle calls, where the sched tick was stopped
 * @idle_entrytime:     Time when the idle call was entered
 * @idle_waketime:      Time when the idle was interrupted
 * @idle_exittime:      Time when the idle state was left
 * @idle_sleeptime:     Sum of the time slept in idle with sched tick stopped
 * @iowait_sleeptime:   Sum of the time slept in idle with sched tick stopped, with IO outstanding
 * @timer_expires:      Anticipated timer expiration time (in case sched tick is stopped)
 * @timer_expires_base: Base time clock monotonic for @timer_expires
 * @next_timer:         Expiry time of next expiring timer for debugging purpose only
 * @tick_dep_mask:      Tick dependency mask - is set, if someone needs the tick
 */

struct tick_sched {
        struct hrtimer                  sched_timer;
        unsigned long                   check_clocks;
        enum tick_nohz_mode             nohz_mode;

        unsigned int                    inidle          : 1;
        unsigned int                    tick_stopped    : 1;
        unsigned int                    idle_active     : 1;
        unsigned int                    do_timer_last   : 1;
        unsigned int                    got_idle_tick   : 1;

        ktime_t                         last_tick;
        ktime_t                         next_tick;
        unsigned long                   idle_jiffies;
        unsigned long                   idle_calls;
        unsigned long                   idle_sleeps;
        ktime_t                         idle_entrytime;
        ktime_t                         idle_waketime;
        ktime_t                         idle_exittime;
        ktime_t                         idle_sleeptime;
        ktime_t                         iowait_sleeptime;
        unsigned long                   last_jiffies;
        u64                             timer_expires;
        u64                             timer_expires_base;
        u64                             next_timer;
        ktime_t                         idle_expires;
        atomic_t                        tick_dep_mask;
};

sched_timer
- hres timer가 활성된 이후 스케줄 틱을 발생하는 hrtimer
check_clocks
- 클럭 소스의 변화가 있거나 틱 디바이스 모드가 변경될 때 1로 설정된다.
nohz_mode
- NOHZ_MODE_INACTIVE
  - nohz 모드가 동작하지 않는 경우
- NOHZ_MODE_LOWRES
  - low-resolution 타이머를 사용하는 nohz 모드
- NOHZ_MODE_HIGHRES
  - high-resolution 타이머를 사용하는 nohz 모드
inidle:1
- nohz idle 진입 여부
- tick_nohz_idle_enter() 함수에서 1로 설정하고, tick_nohz_idle_exit() 함수에서 0으로 클리어한다.
tick_stopped:1
- nohz로 인해 tick이 멈춘 상태 여부 (nohz idle & nohz full)
idle_active:1
- cpu가 틱 idle 모드인지 여부
do_timer_last:1
- idle 진입 전 do_timer() 함수를 수행했었는지 여부
last_tick
- 마지막 프로그래밍된 틱 시간(ns)
- nohz를 멈추고 다시 periodic 틱을 발생 시킬 때 기억해둔 이 틱 시간 이후로 틱을 forward하여 사용한다.
idle_expires
- nohz idle 진입 시 nohz 만료 예정 시각(ns)

참고

Timer -1- (Lowres Timer) | 문c
Timer -2- (HRTimer) | 문c
Timer -3- (Clock Sources Subsystem) | 문c
Timer -4- (Clock Sources Watchdog) | 문c
Timer -5- (Clock Events Subsystem) | 문c
Timer -6- (Clock Source & Timer Driver) | 문c
Timer -7- (Sched Clock & Delay Timers) | 문c
Timer -8- (Timecounter) | 문c – 현재 글
Timer -9- (Tick Device) | 문c
Timer -10- (Timekeeping) | 문c
Timer -11- (Posix Clock & Timers) | 문c
time_init() | 문c
sched_clock_postinit() | 문c
tick_init() | 문c
timekeeping_init() | 문c
calibrate_delay() | 문c

The tick broadcast framework | LWN.net

Timer -5- (Clock Events Subsytem)

2017-03-022020-02-06 문영일 Leave a comment

Timer -5- (Clock Events Subsytem)

clock events subsystem은 타이머를 프로그래밍하고, 만료되어 처리될 핸들러를 지정할 수 있도록 hw 독립형 코드로 구성된 framework을 제공한다. 이를 사용하여 쉽게 타이머 hw를 제어하는 clock event 디바이스 드라이버를 구성할 수 있다.

다음 그림은 clock events subsystem이 동작하는 과정을 보여준다.

타이머 hw는 사이클을 기록하여 동작하고, 커널의 clock events subsystem은 나노초(ns) 단위의 monotonic 절대 시각을 사용한다.

클럭 이벤트 디바이스 기능 타입

CLOCK_EVT_FEAT_PERIODIC(0x000001)
- shutdown 으로 제어하지 않는 한 규칙적으로 이벤트가 발생한다.
CLOCK_EVT_FEAT_ONESHOT(0x000002)
- 타이머 장치가 인터럽트 컨트롤러에 연결되어 단발 이벤트 프로그램이 가능하다.
CLOCK_EVT_FEAT_KTIME(0x000004)
- 아래 CLOCK_EVT_FEAT_HRTIMER 기능과 같이 사용된다.
CLOCK_EVT_FEAT_C3STOP(0x00008)
- cpu가 c3(deep-sleep) 상태에 진입할 때 타이머도 전원이 꺼지는 절전 기능을 가진다.
- nohz 구현 시 틱 브로드캐스트 디바이스에 의해 wakeup 된다.
CLOCK_EVT_FEAT_DUMMY(0x000010)
- x86 시스템의 Local APIC 에 사용되는 아무것도 수행하지 않는 더미 디바이스 드라이버에서 사용된다.
CLOCK_EVT_FEAT_DYNIRQ(0x000020)
- 브로드캐스트 목적의 인터럽트가 특정 cpu에 고정되지 않고 cpu를 선택(set_irq_affinity)하여 사용할 수 있으며 다음 드라이버에서 사용되고 있다.
  - armv7 또는 armv8 아키텍처에 내장된 generic 타이머의 메모리 mapped 드라이버 – “arm,armv7-timer-mem”
  - “st,nomadik-mtu”
CLOCK_EVT_FEAT_PERCPU(0x000040)
- arm cortex-a9 아키텍처에 긴밀하게 부착된 타이머 장치 타입으로 틱 브로드캐스트 디바이스로 사용되지 않게 제한한다.
- “arm,cortex-a9-global_timer” 드라이버에서 사용된다.
CLOCK_EVT_FEAT_HRTIMER
- 브로드캐스트에 IPI가 아닌 hrtimer를 사용하여 클럭 이벤트 디바이스로 사용된다.
- kernel/time/tick-broadcast-hrtimer.c – ONESHOT | KTIME | HRTIMER 기능을 사용하는 것을 볼 수 있다.
- 참고: tick: Introduce hrtimer based broadcast

Clock Events 설정 및 등록 -1-

clockevents_config_and_register()

kernel/time/clockevents.c

/**
 * clockevents_config_and_register - Configure and register a clock event device
 * @dev:        device to register
 * @freq:       The clock frequency
 * @min_delta:  The minimum clock ticks to program in oneshot mode
 * @max_delta:  The maximum clock ticks to program in oneshot mode
 *
 * min/max_delta can be 0 for devices which do not support oneshot mode.
 */

void clockevents_config_and_register(struct clock_event_device *dev,
                                     u32 freq, unsigned long min_delta,
                                     unsigned long max_delta)
{
        dev->min_delta_ticks = min_delta; 
        dev->max_delta_ticks = max_delta;
        clockevents_config(dev, freq);
        clockevents_register_device(dev);
}
EXPORT_SYMBOL_GPL(clockevents_config_and_register);

요청 주파수의 clock_event_device를 최소 클럭 틱에서 최대 클럭 틱 값 사이에서 동작하도록 등록한다.

예) rpi3 & 4: max_delta_ticks=0x7fff_ffff, 54Mhz, 1000HZ 사용 시 최대 ns=약 39.7초
예) rpi2: max_delta_ticks=0x7fff_ffff, 19.2Mhz, 1000HZ 사용 시 최대 ns=약 111.8초
- tick 값들은 32bit long 형을 사용하고 위의 rpi2와 같이 한계치인 0x7fff_ffff 값을 사용할 때 틱의 프로그램은 최대 111.8초를 초과할 수 없다.

clockevents_config()

kernel/time/clockevents.c

void clockevents_config(struct clock_event_device *dev, u32 freq)
{
        u64 sec;

        if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT))
                return;

        /*
         * Calculate the maximum number of seconds we can sleep. Limit
         * to 10 minutes for hardware which can program more than
         * 32bit ticks so we still get reasonable conversion values.
         */
        sec = dev->max_delta_ticks;
        do_div(sec, freq);
        if (!sec)
                sec = 1;
        else if (sec > 600 && dev->max_delta_ticks > UINT_MAX)
                sec = 600;
        
        clockevents_calc_mult_shift(dev, freq, sec);
        dev->min_delta_ns = cev_delta2ns(dev->min_delta_ticks, dev, false);
        dev->max_delta_ns = cev_delta2ns(dev->max_delta_ticks, dev, true);
}

요청 주파수의 clock_event_device에 대해 mult/shift 값 및 min_delta_ns/max_delta_ns 값을 조정한다.

코드 라인 5~6에서 디바이스가 oneshot 모드를 지원하지 않는 경우 함수를 빠져나간다.
코드 라인 13~18에서 최대 max_delta_ticks를 freq로 나누면 소요 초가 산출되는데 32bit tick을 초과한 경우 최대 600초로 제한한다.
- 10분 이상의 오차 보정을 할 필요 없어서 제한한다.
코드 라인 20에서 max_delta_ticks, 주파수와 소요시간(sec)으로 mult 및 shift 값을 산출한다.
코드 라인 21~22에서 min_delta_ticks, max_delta_ticks 값과 산출된 mult/shift 값으로 min_delta_ns와 max_delta_ns 값을 산출한다.
- min_delta_ns 값은 최하 1000 ns(1 ms) 이상으로 한다.

다음 그림은 192.Mhz로 클럭 이벤트 디바이스의 mult/shift, min(max)_delta_ns 값을 설정하는 것을 보여준다.

clockevents_calc_mult_shift()

include/linux/clockchips.h

static inline void
clockevents_calc_mult_shift(struct clock_event_device *ce, u32 freq, u32 maxsec)
{
        return clocks_calc_mult_shift(&ce->mult, &ce->shift, NSEC_PER_SEC, freq, maxsec);
}

1초 -> freq로 변환 시 요청 기간(sec) 동안 필요한 mult 및 shift 값을 산출한다.

예) 1G -> 54Mhz에 39초인 경우
- mult=0xdd2_f1aa, shift=32
예) 1G -> 19.2Mhz에 111초인 경우
- mult=0x682_aaab, shift=32

cev_delta2ns()

kernel/time/clockevents.c

static u64 cev_delta2ns(unsigned long latch, struct clock_event_device *evt,
                        bool ismax)
{
        u64 clc = (u64) latch << evt->shift;
        u64 rnd;

        if (WARN_ON(!evt->mult))
                evt->mult = 1;
        rnd = (u64) evt->mult - 1;

        /*
         * Upper bound sanity check. If the backwards conversion is
         * not equal latch, we know that the above shift overflowed.
         */
        if ((clc >> evt->shift) != (u64)latch)
                clc = ~0ULL;

        /*
         * Scaled math oddities:
         *
         * For mult <= (1 << shift) we can safely add mult - 1 to
         * prevent integer rounding loss. So the backwards conversion
         * from nsec to device ticks will be correct.
         *
         * For mult > (1 << shift), i.e. device frequency is > 1GHz we
         * need to be careful. Adding mult - 1 will result in a value
         * which when converted back to device ticks can be larger
         * than latch by up to (mult - 1) >> shift. For the min_delta
         * calculation we still want to apply this in order to stay
         * above the minimum device ticks limit. For the upper limit
         * we would end up with a latch value larger than the upper
         * limit of the device, so we omit the add to stay below the
         * device upper boundary.
         *
         * Also omit the add if it would overflow the u64 boundary.
         */
        if ((~0ULL - clc > rnd) &&
            (!ismax || evt->mult <= (1ULL << evt->shift)))
                clc += rnd;

        do_div(clc, evt->mult);

        /* Deltas less than 1usec are pointless noise */
        return clc > 1000 ? clc : 1000;
}

latch 값을 nano 초 값으로 변환한다. 정확도를 위해 주파수에 따라 다음의 수식을 사용한다.

주파수가 1Ghz 이하인 경우 mult로 나눌 때 반올림 처리한다. 이렇게 해야 nsec에서 장치 틱으로 역방향 시 변환이 더 정확해진다.
- (latch << shift + mult – 1) / mult = ns
주파수가 1Ghz를 초과하는 경우 그대로 mult로 나눈다.
- (latch << shift) / mult = ns

Clock Events 설정 및 등록 -2-

clockevents_register_device()

kernel/time/clockevents.c

/**
 * clockevents_register_device - register a clock event device
 * @dev:        device to register
 */

void clockevents_register_device(struct clock_event_device *dev)
{
        unsigned long flags;

        /* Initialize state to DETACHED */
        clockevent_set_state(dev, CLOCK_EVT_STATE_DETACHED);

        if (!dev->cpumask) {
                WARN_ON(num_possible_cpus() > 1);
                dev->cpumask = cpumask_of(smp_processor_id());
        }

        if (dev->cpumask == cpu_all_mask) {
                WARN(1, "%s cpumask == cpu_all_mask, using cpu_possible_mask instead\n",
                     dev->name);
                dev->cpumask = cpu_possible_mask;
        }

        raw_spin_lock_irqsave(&clockevents_lock, flags);

        list_add(&dev->list, &clockevent_devices);
        tick_check_new_device(dev);
        clockevents_notify_released();

        raw_spin_unlock_irqrestore(&clockevents_lock, flags);
}
EXPORT_SYMBOL_GPL(clockevents_register_device);

클럭 이벤트 디바이스를 등록한다.

코드 라인 6에서 클럭 이벤트 디바이스 상태를 CLOCK_EVT_STATE_DETACHED로 설정한다.
코드 라인 8~11에서 클럭 이벤트 디바이스가 동작할 cpu를 현재 태스크가 수행 중인 cpu 번호로 설정한다. (cpumask 형태로 지정)
코드 라인 13~17에서 모든 cpu를 대상으로 하는 경우 cpumask에 cpu_possible_mask를 대입한다.
코드 라인 21에서 clockevent_devices 리스트에 클럭 이벤트 디바이스를 등록한다.
코드 라인 22에서 기존 tick 디바이스보다 새 tick 디바이스가 더 좋은 rating 등급인 경우 변경하여 사용할지 체크한다.
- 처음 호출 시에는 요청 디바이스가 tick 디바이스로 사용된다.
- nohz 구현을 위해 경우에 따라 등록되는 클럭 이벤트 디바이스가 틱 브로드캐스트 디바이스로 동작할 수도 있다.
- 참고: Timer -8- (Tick Device) | 문c
코드 라인 23에서 clockevents_released 리스트에 등록된 클럭 이벤트 디바이스들을 제거하고 clockevent_devices 리스트에 추가한 후 새로운 tick 디바이스로 사용할 수 있는지 체크한다.

clockevents_notify_released()

kernel/time/clockevents.c

/*
 * Called after a notify add to make devices available which were
 * released from the notifier call.
 */

static void clockevents_notify_released(void)
{
        struct clock_event_device *dev;

        while (!list_empty(&clockevents_released)) {
                dev = list_entry(clockevents_released.next,
                                 struct clock_event_device, list);
                list_del(&dev->list);
                list_add(&dev->list, &clockevent_devices);
                tick_check_new_device(dev);
        }
}

clockevents_released 리스트에 등록된 클럭 이벤트 디바이스들을 제거하고 clockevent_devices 리스트에 추가한 후 새로운 tick 디바이스로 사용할 수 있는지 체크한다.

Clock Events 변경(Exchange)

clockevents_exchange_device()

kernel/time/clockevents.c

/**
 * clockevents_exchange_device - release and request clock devices
 * @old:        device to release (can be NULL)
 * @new:        device to request (can be NULL)
 *
 * Called from various tick functions with clockevents_lock held and
 * interrupts disabled.
 */

void clockevents_exchange_device(struct clock_event_device *old,
                                 struct clock_event_device *new)
{
        /*
         * Caller releases a clock event device. We queue it into the
         * released list and do a notify add later.
         */
        if (old) {
                module_put(old->owner);
                clockevents_switch_state(old, CLOCK_EVT_STATE_DETACHED);
                list_del(&old->list);
                list_add(&old->list, &clockevents_released);
        }

        if (new) {
                BUG_ON(!clockevent_state_detached(new));
                clockevents_shutdown(new);
        }
}

old 클럭 이벤트 디바이스를 release 하고 new 클럭 이벤트 디바이스로 교체한다. 만일 교체한 new 클럭 이벤트 디바이스가 동작 중인 경우 shutdown 상태로 변경한다.

코드 라인 8~13에서 old 클럭 이벤트 디바이스를 unused 모드로 설정하고 clockevent_devices 리스트에서 제거한 후 clockevents_release 리스트에 추가한다.
코드 라인 15~18에서 new 클럭 이벤트 디바이스를 shutdown 설정한다.

clockevents_switch_state()

kernel/time/clockevents.c

/**
 * clockevents_switch_state - set the operating state of a clock event device
 * @dev:        device to modify
 * @state:      new state
 *
 * Must be called with interrupts disabled !
 */

void clockevents_switch_state(struct clock_event_device *dev,
                              enum clock_event_state state)
{
        if (clockevent_get_state(dev) != state) {
                if (__clockevents_switch_state(dev, state))
                        return;

                clockevent_set_state(dev, state);

                /*
                 * A nsec2cyc multiplicator of 0 is invalid and we'd crash
                 * on it, so fix it up and emit a warning:
                 */
                if (clockevent_state_oneshot(dev)) {
                        if (WARN_ON(!dev->mult))
                                dev->mult = 1;
                }
        }
}

요청한 클럭 이벤트 디바이스의 상태를 설정한다.

코드 라인 4~8에서 현재 클럭 이벤트 디바이스의 상태와 다른 상태 설정이 요청되면 디바이스에 등록된 상태 전환 함수를 호출하고, 그 후 상태를 변경한다.
코드 라인 14~17에서 oneshot 상태 설정을 요청한 하였고 mult 값이 0인 경우 1로 변경한다.

clockevent_set_state()

kernel/time/tick-internal.h

static inline void clockevent_set_state(struct clock_event_device *dev,
                                        enum clock_event_state state)
{
        dev->state_use_accessors = state;
}

요청한 클럭 이벤트 디바이스의 상태를 설정한다

__clockevents_switch_state()

kernel/time/clockevents.c

static int __clockevents_switch_state(struct clock_event_device *dev,
                                      enum clock_event_state state)
{
        if (dev->features & CLOCK_EVT_FEAT_DUMMY)
                return 0;

        /* Transition with new state-specific callbacks */
        switch (state) {
        case CLOCK_EVT_STATE_DETACHED:
                /* The clockevent device is getting replaced. Shut it down. */

        case CLOCK_EVT_STATE_SHUTDOWN:
                if (dev->set_state_shutdown)
                        return dev->set_state_shutdown(dev);
                return 0;

        case CLOCK_EVT_STATE_PERIODIC:
                /* Core internal bug */
                if (!(dev->features & CLOCK_EVT_FEAT_PERIODIC))
                        return -ENOSYS;
                if (dev->set_state_periodic)
                        return dev->set_state_periodic(dev);
                return 0;

        case CLOCK_EVT_STATE_ONESHOT:
                /* Core internal bug */
                if (!(dev->features & CLOCK_EVT_FEAT_ONESHOT))
                        return -ENOSYS;
                if (dev->set_state_oneshot)
                        return dev->set_state_oneshot(dev);
                return 0;

        case CLOCK_EVT_STATE_ONESHOT_STOPPED:
                /* Core internal bug */
                if (WARN_ONCE(!clockevent_state_oneshot(dev),
                              "Current state: %d\n",
                              clockevent_get_state(dev)))
                        return -EINVAL;

                if (dev->set_state_oneshot_stopped)
                        return dev->set_state_oneshot_stopped(dev);
                else
                        return -ENOSYS;

        default:
                return -ENOSYS;
        }
}

클럭 이벤트 디바이스의 새 상태 전환 시 해당 상태의 콜백 함수를 호출한다.

코드 라인 4~5에서 CLOCK_EVT_FEAT_DUMMY 상태로 전환된 경우 정상 값 0을 반환한다.
코드 라인 8~15에서 CLOCK_EVT_STATE_DETACHED 및 CLOCK_EVT_STATE_SHUTDOWN 상태로 전환된 경우 (*set_state_shutdown) 후크 함수를 호출한다.
코드 라인 17~23에서 CLOCK_EVT_STATE_PERIODIC 상태로 전환된 경우 (*set_state_periodic) 후크 함수를 호출한다. 단 periodic 기능이 없는 경우 -ENOSYS 에러를 반환한다.
코드 라인 25~31에서 CLOCK_EVT_STATE_ONESHOT 상태로 전환된 경우 (*set_state_oneshot) 후크 함수를 호출한다. 단 oneshot 기능이 없는 경우 -ENOSYS 에러를 반환한다.
코드 라인 33~43에서 CLOCK_EVT_STATE_ONESHOT_STOPPED 상태로 전환된 경우 (*set_state_oneshot_stopped) 후크 함수를 호출한다. 단 oneshot 상태가 아닌 경우 -EINVAL 에러를 반환한다.
코드 라인 45~47에서 그 외의 상태로 전환된 경우 -ENOSYS 에러를 반환한다.

clockevents_shutdown()

kernel/time/clockevents.c

/**
 * clockevents_shutdown - shutdown the device and clear next_event
 * @dev:        device to shutdown
 */

void clockevents_shutdown(struct clock_event_device *dev)
{
        clockevents_switch_state(dev, CLOCK_EVT_STATE_SHUTDOWN);
        dev->next_event.tv64 = KTIME_MAX;
}

요청한 클럭 이벤트 디바이스를 shutdown 설정한다.

Clock Events 프로그램 이벤트

clockevents_program_event()

kernel/time/clockevents.c

/**
 * clockevents_program_event - Reprogram the clock event device.
 * @dev:        device to program
 * @expires:    absolute expiry time (monotonic clock)
 * @force:      program minimum delay if expires can not be set
 *
 * Returns 0 on success, -ETIME when the event is in the past.
 */

int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
                              bool force)
{
        unsigned long long clc;
        int64_t delta;
        int rc;

        if (WARN_ON_ONCE(expires < 0))
                return -ETIME;

        dev->next_event = expires;

        if (clockevent_state_shutdown(dev))
                return 0;

        /* We must be in ONESHOT state here */
        WARN_ONCE(!clockevent_state_oneshot(dev), "Current state: %d\n",
                  clockevent_get_state(dev));

        /* Shortcut for clockevent devices that can deal with ktime. */
        if (dev->features & CLOCK_EVT_FEAT_KTIME)
                return dev->set_next_ktime(expires, dev);

        delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
        if (delta <= 0)
                return force ? clockevents_program_min_delta(dev) : -ETIME;

        delta = min(delta, (int64_t) dev->max_delta_ns);
        delta = max(delta, (int64_t) dev->min_delta_ns);

        clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
        rc = dev->set_next_event((unsigned long) clc, dev);

        return (rc && force) ? clockevents_program_min_delta(dev) : rc;
}

클럭 이벤트 디바이스의 만료시간을 재프로그램 한다. force=1인 경우 요청한 만료 시간이 이미 지난 경우라 하더라도 최소 시간(min_delay_ns)이내에 이벤트가 발생하도록 프로그램한다.

코드 라인 8~9에서 만료시간이 0보다 작은 경우 경고 메시지를 출력하고 에러 값으로 -ETIME을 반환한다.
코드 라인 11에서 디바이스의 next_event에 만료 시간을 대입한다.
코드 라인 13~14에서 디바이스가 shutdown 상태인 경우 더 이상 진행할 필요 없으므로 성공(0)을 반환한다.
코드 라인 21~22에서 ktime 기능이 있는 경우 다음 이벤트의 만료시간을 설정할 때 (*set_next_ktime) 후크 함수를 호출하고 그 결과를 반환한다.
코드 라인 24~26에서 요청한 만료 시간이 이미 현재 시간을 지난 경우 force 값에 따라 min_delta_ns 값 또는 -ETIME 값을 반환한다.
코드 라인 28~29에서 delta 값이 min_delta_ns ~ max_delta_ns 범위를 벗어나는 경우 조정(clamp)한다.
코드 라인 31~32에서 (delta * mult) >> shift를 산출한 값으로 (*set_next_event) 후크 함수를 호출하여 이벤트의 만료시간을 설정한다.
코드 라인 34에서 에러이면서 force=1인 경우 min_delta 값으로 다음 이벤트의 만료시간을 설정한다.

다음 그림은 최소 지연 시간 1000us으로 이벤트를 프로그램하는 것을 보여준다.

clockevents_program_min_delta()

kernel/time/clockevents.c

/**
 * clockevents_program_min_delta - Set clock event device to the minimum delay.
 * @dev:        device to program
 *
 * Returns 0 on success, -ETIME when the retry loop failed.
 */

static int clockevents_program_min_delta(struct clock_event_device *dev)
{
        unsigned long long clc;
        int64_t delta;
        int i;

        for (i = 0;;) {
                delta = dev->min_delta_ns;
                dev->next_event = ktime_add_ns(ktime_get(), delta);

                if (clockevent_state_shutdown())
                        return 0;

                dev->retries++;
                clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
                if (dev->set_next_event((unsigned long) clc, dev) == 0)
                        return 0;

                if (++i > 2) {
                        /*
                         * We tried 3 times to program the device with the
                         * given min_delta_ns. Try to increase the minimum
                         * delta, if that fails as well get out of here.
                         */
                        if (clockevents_increase_min_delta(dev))
                                return -ETIME;
                        i = 0;
                }
        }
}

클럭 이벤트 디바이스를 최소 딜레이 시간(min_delay_ns)으로 프로그램한다.

코드 라인 7~9에서 현재 시간 보다 min_delta_ns 값을 더해 다음 이벤트의 만료 시간을 설정한다.
코드 라인 11~12에서 디바이스가 shutdown 상태인 경우 아직 동작시킬 수 없으므로 성공(0)으로 함수를 빠져나간다.
코드 라인 14~17에서 retries 값을 증가시키고 (delta * mult) >> shift 값으로 다음 이벤트의 만료 시간을 설정하고 함수를 성공(0)으로 빠져나간다.
코드 라인 19~28에서 주어진 min_delta_ns 값으로 3번을 다시 시도해보고 안되는 경우 min_delta_ns 값을 증가시키고 다시 시도한다.

clockevents_increase_min_delta()

kernel/time/clockevents.c

/**
 * clockevents_increase_min_delta - raise minimum delta of a clock event device
 * @dev:       device to increase the minimum delta
 *
 * Returns 0 on success, -ETIME when the minimum delta reached the limit.
 */

static int clockevents_increase_min_delta(struct clock_event_device *dev)
{
        /* Nothing to do if we already reached the limit */
        if (dev->min_delta_ns >= MIN_DELTA_LIMIT) {
                printk_deferred(KERN_WARNING
                                "CE: Reprogramming failure. Giving up\n");
                dev->next_event.tv64 = KTIME_MAX;
                return -ETIME;
        }

        if (dev->min_delta_ns < 5000)
                dev->min_delta_ns = 5000;
        else
                dev->min_delta_ns += dev->min_delta_ns >> 1;

        if (dev->min_delta_ns > MIN_DELTA_LIMIT)
                dev->min_delta_ns = MIN_DELTA_LIMIT;

        printk_deferred(KERN_WARNING
                        "CE: %s increased min_delta_ns to %llu nsec\n",
                        dev->name ? dev->name : "?",
                        (unsigned long long) dev->min_delta_ns);
        return 0;
}

클럭 이벤트 디바이스의 최소 딜레이 시간(min_delta_ns)을 증가시킨다. 호출 될 때마다 5us부터 시작하여 1.5배씩 증가하며 마지막에 최대 1 jiffies 만큼 상승한다. 그 이후에는 -ETIME 에러를 반환한다.

코드 라인 4~9에서 min_delta_ns 값이 한계값(1 jiffies) 이상이되면 경고 메시지 출력과 함께 에러 -ETIME을 반환한다.
- MIN_DELTA_LIMIT
  - (NSEC_PER_SEC / HZ) 값으로 1 jiffies 소요 시간과 동일하다.
코드 라인 11~14에서 min_delta_ns 값이 5us 미만인 경우 5us로 하고 그렇지 않은 경우 현재 min_delta_ns 값에서 50%를 증가시킨다.
코드 라인 16~17에서 min_delta_ns 값이 한계값을 초과하면 한계값으로 설정한다.
코드 라인 19~23에서 min_delta_ns 값이 증가되었음을 경고 메시지로 출력하고 성공(0)으로 반환한다.

구조체

clock_event_device 구조체

include/linux/clockchips.h

/**
 * struct clock_event_device - clock event device descriptor
 * @event_handler:      Assigned by the framework to be called by the low
 *                      level handler of the event source
 * @set_next_event:     set next event function using a clocksource delta
 * @set_next_ktime:     set next event function using a direct ktime value
 * @next_event:         local storage for the next event in oneshot mode
 * @max_delta_ns:       maximum delta value in ns
 * @min_delta_ns:       minimum delta value in ns
 * @mult:               nanosecond to cycles multiplier
 * @shift:              nanoseconds to cycles divisor (power of two)
 * @state_use_accessors:current state of the device, assigned by the core code
 * @features:           features
 * @retries:            number of forced programming retries
 * @set_state_periodic: switch state to periodic
 * @set_state_oneshot:  switch state to oneshot
 * @set_state_oneshot_stopped: switch state to oneshot_stopped
 * @set_state_shutdown: switch state to shutdown
 * @tick_resume:        resume clkevt device
 * @broadcast:          function to broadcast events
 * @min_delta_ticks:    minimum delta value in ticks stored for reconfiguration
 * @max_delta_ticks:    maximum delta value in ticks stored for reconfiguration
 * @name:               ptr to clock event name
 * @rating:             variable to rate clock event devices
 * @irq:                IRQ number (only for non CPU local devices)
 * @bound_on:           Bound on CPU
 * @cpumask:            cpumask to indicate for which CPUs this device works
 * @list:               list head for the management code
 * @owner:              module reference
 */

struct clock_event_device {
        void                    (*event_handler)(struct clock_event_device *);
        int                     (*set_next_event)(unsigned long evt, struct clock_event_device *);
        int                     (*set_next_ktime)(ktime_t expires, struct clock_event_device *);
        ktime_t                 next_event;
        u64                     max_delta_ns;
        u64                     min_delta_ns;
        u32                     mult;
        u32                     shift;
        enum clock_event_state  state_use_accessors;
        unsigned int            features;
        unsigned long           retries;

        int                     (*set_state_periodic)(struct clock_event_device *);
        int                     (*set_state_oneshot)(struct clock_event_device *);
        int                     (*set_state_oneshot_stopped)(struct clock_event_device *);
        int                     (*set_state_shutdown)(struct clock_event_device *);
        int                     (*tick_resume)(struct clock_event_device *);

        void                    (*broadcast)(const struct cpumask *mask);
        void                    (*suspend)(struct clock_event_device *);
        void                    (*resume)(struct clock_event_device *);
        unsigned long           min_delta_ticks;
        unsigned long           max_delta_ticks;

        const char              *name;
        int                     rating;
        int                     irq;
        int                     bound_on;
        const struct cpumask    *cpumask;
        struct list_head        list;
        struct module           *owner;
} ____cacheline_aligned;

(*event_handler)
- 만료 시간 시 호출될 이벤트 핸들러 함수가 등록된다.
(*set_next_event)
- 다음 이벤트의 만료 시간을 설정할 클럭 이벤트 드라이버 함수가 등록된다.
(*set_next_ktime)
- 다음 이벤트의 만료 시간을 ktime 으로 지정하여 설정하는 클럭 이벤트 드라이버 함수가 등록된다.
next_event
- oneshot 모드에서 다음 이벤트가 만료될 ktime 값
max_delta_ns
- 최대 프로그래밍 허용한 다음 이벤트의 최대 지연 nano 초
min_delta_ns
- 최소 프로그래밍 허용한 다음 이벤트의 최소 지연 nano 초
mult
- 1 ns를 만들기 위해 cycle 카운터에 곱할 값이며 shift와 같이 사용된다.
- 1 ns = (1 cycle 카운터 x mult) >> shift
- 예) 2Ghz, 2개의 cycle = 1 ns일 경우
  - mult=2, shift=0
shift
- 1 ns를 만들기 위해 cycle 카운터에 1 << shift 값으로 분배할 값이며 mult와 같이 사용된다.
- 예) 19.2Mhz, 1개의 cycle = 52 ns일 경우
  - mult=0x4ea_4a8c, shift=32
state_use_accessors
- 클럭 이벤트 운영 상태
  - CLOCK_EVT_STATE_DETACHED(0)
  - CLOCK_EVT_STATE_SHUTDOWN(1)
  - CLOCK_EVT_STATE_PERIODIC(2)
  - CLOCK_EVT_STATE_ONESHOT(3)
  - CLOCK_EVT_STATE_ONESHOT_STOPPED(4)
features
- 기능 플래그들로 이 글의 처음에 자세히 설명하였다
retries
- 프로그래밍 재시도 수
(*set_state_periodic)
- periodic 상태 전환 시 호출될 후크 함수
(*set_state_oneshot)
- oneshot 상태 전환 시 호출될 후크 함수
(*set_state_oneshot_stopped)
- oneshot stop 상태 전환 시 호출될 후크 함수
(*set_state_shutdown)
- shutdown 상태 전환 시 호출될 후크 함수
(*tick_resume)
- suspend 상태에서 복귀시 호출될 후크 함수
(*broadcast)
- 이벤트를 브로드캐스트하는 후크 함수
(*suspend)
- 타이머 디바이스의 suspend 기능을 지원하는 후크 함수
(*resume)
- suspend된 타이머 디바이스에 대해 resume을 지원하는 후크 함수
min_delta_ticks
- 최소 프로그래밍 허용한 다음 tick 수
max_delta_ticks
- 최대 프로그래밍 허용한 다음 tick 수
*name
- 클럭 이벤트 명
rating
- 정밀도 등급
irq
- irq 번호 (per-cpu 디바이스가 아닌 경우)
bound_on
- cpu 고정
cpumask
- 이 클럭 이벤트 디바이스가 동작할 cpumask
list
- 클럭 이벤트를 관리하고 있는 리스트에 연결되어 사용된다.
owner
- 모듈 참조자

참고

Timer -1- (Lowres Timer) | 문c
Timer -2- (HRTimer) | 문c
Timer -3- (Clock Sources Subsystem) | 문c
Timer -4- (Clock Sources Watchdog) | 문c
Timer -5- (Clock Events Subsystem) | 문c – 현재 글
Timer -6- (Clock Source & Timer Driver) | 문c
Timer -7- (Sched Clock & Delay Timers) | 문c
Timer -8- (Timecounter) | 문c
Timer -9- (Tick Device) | 문c
Timer -10- (Timekeeping) | 문c
Timer -11- (Posix Clock & Timers) | 문c
time_init() | 문c
sched_clock_postinit() | 문c
tick_init() | 문c
timekeeping_init() | 문c
calibrate_delay() | 문c

Timer -3- (Clock Sources Subsystem)

2017-02-242022-08-23 문영일 Leave a comment

Timer -3- (Clock Sources Subsystem)

리눅스의 timekeeping subsystem에서 수행하는 시간 관리를 위해 clock sources subsystem에 등록된 클럭 소스들 중 가장 정확도가 높고 안정적인 클럭 소스를 찾아 제공하기 위한 framework를 제공한다.

주요 연관 관계
- clk(common clock foundation) -> clock sources subsystem -> timekeeping subsystem

다음 그림은 clock source의 연동 관계를 보여준다.

clocksource 등록

다음 그림은 19.2Mhz의 클럭 소스를 등록하는 과정을 보여준다.

clocksource_register_hz()

include/linux/clocksource.h

static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
{
        return __clocksource_register_scale(cs, 1, hz);
}

요청한 hz의 클럭 소스를 등록한다.

clocksource_register_khz()

include/linux/clocksource.h

static inline int clocksource_register_khz(struct clocksource *cs, u32 khz)
{
        return __clocksource_register_scale(cs, 1000, khz);
}

요청한 khz의 클럭 소스를 등록한다.

__clocksource_register_scale()

kernel/time/clocksource.c

/**
 * __clocksource_register_scale - Used to install new clocksources
 * @cs:         clocksource to be registered
 * @scale:      Scale factor multiplied against freq to get clocksource hz
 * @freq:       clocksource frequency (cycles per second) divided by scale
 *
 * Returns -EBUSY if registration fails, zero otherwise.
 *
 * This *SHOULD NOT* be called directly! Please use the
 * clocksource_register_hz() or clocksource_register_khz helper functions.
 */

int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
{
        unsigned long flags;

        clocksource_arch_init(cs);

        /* Initialize mult/shift and max_idle_ns */
        __clocksource_update_freq_scale(cs, scale, freq);

        /* Add clocksource to the clocksource list */
        mutex_lock(&clocksource_mutex);

        clocksource_watchdog_lock(&flags);
        clocksource_enqueue(cs);
        clocksource_enqueue_watchdog(cs);
        clocksource_watchdog_unlock(&flags);

        clocksource_select();
        clocksource_select_watchdog(false);
        __clocksource_suspend_select(cs);
        mutex_unlock(&clocksource_mutex);
        return 0;
}
EXPORT_SYMBOL_GPL(__clocksource_register_scale);

요청한 배율(scale) 및 주파수(freq)로 클럭소스를 등록한다.

코드 라인 5에서 아키텍처별 별도의 클럭 소스 초기화 루틴을 호출한다.
- 현재 x86 아키텍처만 사용하고 있다.
코드 라인 8에서 요청한 배율(scale) 및 주파수(freq)로 클럭의 mult, shift, maxadj, max_idle_ns 등을 산출한다.
코드 라인 14~15에서 클럭 소스 @cs를 클럭 소스 리스트에 등록하고, 필요 시 워치독 리스트에도 추가한다.
코드 라인 18에서 best 클럭 소스를 선택한다.
코드 라인 19에서 워치독할 클럭 소스를 선택한다.
코드 라인 20에서 suspend용 클럭 소스로 @cs를 지정한다.
- “always-on” 속성이 있는 클럭 소스만 가능하다.

__clocksource_update_freq_hz()

include/linux/clocksource.h

static inline void __clocksource_update_freq_hz(struct clocksource *cs, u32 hz)
{
        __clocksource_updatefreq_scale(cs, 1, hz);
}

현재 클럭소스의 주파수가 요청한 hz로 변경 시 관련된 mult, shift, maxadj, max_idle_ns 등을 산출한다.

__clocksource_update_freq_khz()

include/linux/clocksource.h

static inline void __clocksource_update_freq_khz(struct clocksource *cs, u32 khz)
{
        __clocksource_updatefreq_scale(cs, 1000, khz);
}

현재 클럭소스의 주파수가 요청한 khz로 변경 시 관련된 mult, shift, maxadj, max_idle_ns 등을 산출한다.

__clocksource_update_freq_scale()

kernel/time/clocksource.c

/**
 * __clocksource_update_freq_scale - Used update clocksource with new freq
 * @cs:         clocksource to be registered
 * @scale:      Scale factor multiplied against freq to get clocksource hz
 * @freq:       clocksource frequency (cycles per second) divided by scale
 *
 * This should only be called from the clocksource->enable() method.
 *
 * This *SHOULD NOT* be called directly! Please use the
 * __clocksource_update_freq_hz() or __clocksource_update_freq_khz() helper
 * functions.
 */

void __clocksource_update_freq_scale(struct clocksource *cs, u32 scale, u32 freq)
{
        u64 sec;

        /*
         * Default clocksources are *special* and self-define their mult/shift.
         * But, you're not special, so you should specify a freq value.
         */
        if (freq) {
                /*
                 * Calc the maximum number of seconds which we can run before
                 * wrapping around. For clocksources which have a mask > 32-bit
                 * we need to limit the max sleep time to have a good
                 * conversion precision. 10 minutes is still a reasonable
                 * amount. That results in a shift value of 24 for a
                 * clocksource with mask >= 40-bit and f >= 4GHz. That maps to
                 * ~ 0.06ppm granularity for NTP.
                 */
                sec = cs->mask;
                do_div(sec, freq);
                do_div(sec, scale);
                if (!sec)
                        sec = 1;
                else if (sec > 600 && cs->mask > UINT_MAX)
                        sec = 600;

                clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
                                       NSEC_PER_SEC / scale, sec * scale);
        }
        /*
         * Ensure clocksources that have large 'mult' values don't overflow
         * when adjusted.
         */
        cs->maxadj = clocksource_max_adjustment(cs);
        while (freq && ((cs->mult + cs->maxadj < cs->mult)
                || (cs->mult - cs->maxadj > cs->mult))) {
                cs->mult >>= 1;
                cs->shift--;
                cs->maxadj = clocksource_max_adjustment(cs);
        }

        /*
         * Only warn for *special* clocksources that self-define
         * their mult/shift values and don't specify a freq.
         */
        WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
                "timekeeping: Clocksource %s might overflow on 11%% adjustment\n",
                cs->name);

        clocksource_update_max_deferment(cs);

        pr_info("%s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
                cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
}
EXPORT_SYMBOL_GPL(__clocksource_update_freq_scale);

요청한 배율(scale) 및 주파수(freq)로 클럭소스의 mult, shift, maxadj, max_idle_ns 등을 갱신한다.

코드 라인 19~21에서 클럭소스의 mask 값을 @freq 및 @scale로 나누어 최대 몇 초까지 사용될 수 있는지를 알아와서 sec에 대입한다.
코드 라인 22~25에서 sec가 1초도 안되는 경우 1초로 변경한다. 또한 클럭 소스가 32비트 카운터를 초과하는 경우에 한해 sec 값이 600을 초과하면 최대 값 600초로 제한한다.
- jiffies 클럭 소스는 32비트 카운터 값을 사용하므로 600초 제한 없이 계산된다.
- “arch_sys_counter” 클럭 소스는 56비트 카운터를 사용하므로 최대 600초로 제한된다.
코드 라인 27~28에서 @freq, @scale 및 산출된 초(sec)로 소요 시간(ns) 계산에 필요한 컨버팅 팩터인 mult 및 shift 값을 구해온다.
코드 라인 34에서mult 값의 최대 교정치 값으로 사용하기 위해 mult 값의 11%를 maxadj에 저장한다.
코드 라인 35~40에서 최대 교정치 값을 더한 mult 값이 overflow 된 경우 mult 값과 shift 값을 1씩 줄인 후 overflow 되지 않을 때까지 다시 교정한다.
코드 라인 50에서 카운터가 overflow되지 않는 범위의 ns 값을 약간 마진 12.5를 줄여 max_idle_ns에 저장한다.
- 소요 시간(delta)이 max_idle_ns 값을 초과하는 경우 카운터가 overflow될 수 있음을 나타낸다.
코드 라인 52~53에서 클럭소스의 mask 값, max_cycles 값, max_idle_ns 값등을 정보로 출력한다.
- 예) clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns
- 예) clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns

아래 그림은 54Mhz 56비트 카운터와 jiffies 32비트 카운터를 사용하는 경우 설정되는 값들을 보여준다.

mult & shift

“from 값이 to가 되려면 얼마의 컨버팅 비율이 필요할까?”에 대한 대답은 ‘to / from’을 하면 컨버팅 팩터가 산출된다. 다음과 같이 주파수를 사용하는 사례를 적용해본다.

예) 19.2Mhz 주파수를 갖는 카운터의 1 펄스 값이 사용하는 시간은 나노초 단위로 얼마일까?

디지탈 클럭 시스템에서 주파수 hz는 1초당 high/low 전압이 반복되는 개수를 의미한다. 따라서 19.2Mhz는 1초에 19.2M 번의 high/low 전압이 바뀌고 펄스 하나가 사용하는 초는 ‘1 초 / 19.2M’ 초를 갖는다. (1 / 19.2M = 52.08333333 (ns))

커널 코드는 성능과 호환성을 유지하기 위해 부동 소숫점(float)을 사용한 나눗셈 연산을 사용하지 않는다. 따라서 이러한 연산을 대체할 수 있는 방법으로 mult와 shift를 사용한다.

먼저 실수(float) 1.0이라는 수를 사용하지 않고 소숫점을 포함한 정수로 변환을 하여 사용할 때 10배를 곱하여 10이라는 정수를 1.0이라고 의미를 붙일 수 있다. 이렇게 실수를 변환하여 사용하는 정수는 이진수 시스템에서는 10진수와 다르게 2의 거듭제곱수를 사용한다. 이 값이 크면 클 수록 소숫점 이하 정밀도를 높여 표현할 수 있다.

예) 실수 52.08333333에 대해 소숫점이하 8자리의 정밀도를 정수로 표현하기 위해 10E8=100,000,000 배를 곱하여야 한다. 컴퓨터에서는 2 진수의 연산이 더 빠르므로 이를 직접 사용하기 보다는 2의 거듭제곱수를 사용한다. 100,000,000 보다 큰 유사한 2의 n 거듭 제곱수로 2^27=134,217,728을 사용하면 10진수의 8자리 소수를 해결할 수 있다.

다른 정밀도에 따라 사용되는 mult 값들을 살펴본다.

아래 파란 라인만 참고해보면실수 1.0 기준으로 정밀도 shift=24를 사용하여 2^24=0x100_0000 기준 정수를 사용한 경우 실수 52.08333333을 변환한 정수는 0x3415_5555(873,813,333)이 된다.
- 10진수의 소숫점 8자리를 해결하는 정밀도이다.
참고: Scheduler -1- (Basic)

결국 정밀도가 높아야 하는 경우 float 1.0에 대한 정수 기준 값이 커져야 함을 알 수 있다. 이 정수 기준 값을 shift 연산에 사용할 예정이다.

“x / y 와 같은 형태의 나눗셈을 커널은 어떻게 처리할까?”

y로 나누는 값이 2의 배수일 경우 우측 쉬프트 연산자를 사용하여 간단히 나눗셈을 대체할 수 있으므로 ‘(x * mult) >> shift’ 형태로 바꿔서 사용할 수 있도록 한다. mult 값은 소숫점일 수 있으므로 정수형으로 변환하여 사용한다.

예) 주파수(freq)에 해당하는 from=19.2M를 나노초 단위의 주파수인 to=1G를 대상으로 설명하면

to(1G) / from(19.2M) = 52.08333333과 같이 from이 52.0833333배가 되어야 to가 됨을 알 수 있다. 따라서 from의 1 펄스를 52.0833333 나노초로 산출해야 함을 알 수 있다.
실제 연산은 ‘(to(1G) * mult) >> shift’를 사용하므로 mult와 shift를 산출해야 한다. 먼저 정밀도를 위해 shift를 결정하고 mult 값은 실수를 사용하지 않고 shift 비트 수 만큼 배율 변화한 정수를 사용한다.

mult & shift 산출

clocks_calc_mult_shift()

kernel/time/clocksource.c

/**
 * clocks_calc_mult_shift - calculate mult/shift factors for scaled math of clocks
 * @mult:       pointer to mult variable
 * @shift:      pointer to shift variable
 * @from:       frequency to convert from
 * @to:         frequency to convert to
 * @maxsec:     guaranteed runtime conversion range in seconds
 *
 * The function evaluates the shift/mult pair for the scaled math
 * operations of clocksources and clockevents.
 *
 * @to and @from are frequency values in HZ. For clock sources @to is
 * NSEC_PER_SEC == 1GHz and @from is the counter frequency. For clock
 * event @to is the counter frequency and @from is NSEC_PER_SEC.
 *
 * The @maxsec conversion range argument controls the time frame in
 * seconds which must be covered by the runtime conversion with the
 * calculated mult and shift factors. This guarantees that no 64bit
 * overflow happens when the input value of the conversion is
 * multiplied with the calculated mult factor. Larger ranges may
 * reduce the conversion accuracy by chosing smaller mult and shift
 * factors.
 */

void
clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 maxsec)
{
        u64 tmp;
        u32 sft, sftacc= 32;

        /*
         * Calculate the shift factor which is limiting the conversion
         * range:
         */
        tmp = ((u64)maxsec * from) >> 32;
        while (tmp) {
                tmp >>=1;
                sftacc--;
        }

        /*
         * Find the conversion shift/mult pair which has the best
         * accuracy and fits the maxsec conversion range:
         */
        for (sft = 32; sft > 0; sft--) {
                tmp = (u64) to << sft;
                tmp += from / 2;
                do_div(tmp, from);
                if ((tmp >> sftacc) == 0)
                        break;
        }
        *mult = tmp;
        *shift = sft;
}

최대초 @maxsec로 @from 주파수를 @to 주파수로 변환할 때 적용할 컨버팅 팩터인 mult 및 shift 값을 구해온다.

armv7 아키텍처의 로컬 타이머는 32비트 타이머로 from 주파수를 max초 기간만큼 곱하여 64비트로 옮겼을 때 64비트에서 남는 비트들과 배율 차이만큼의 비트들을 뺀 비트를 대상으로 최대 32비트에 한하여 정밀도(shift) 비트를 최대한 올려 사용할 수 있다.
@from * factor = @to
- 예) 54M * factor = 1G
factor = @to / @from
- 예) factor = 1G / 54M = 0.185185185…
@from * mult >> shift = @to

코드 라인 11~15에서 from과 maxsec 곱한 후 leading 0 비트 갯 수로 sftacc를 구한다.
- sftacc는 0 ~ 32 범위로 제한한다. 32를 초과하는 경우 32로 계산한다. (최대 정밀도=32bit)
- sftacc는 컨버전 팩터로 정확도를 제한하는데 숫자가 작을 수록 정확도가 낮아진다.
- 측정하는 구간 소요 시간인 maxsec를 높이면 정밀도에 사용할 수 있는 비트가 줄어들므로 정확도가 낮아진다.
- shftacc를 32에서 시작하여, @from * @maxsec 값이 32bit를 초과한 비트 수 만큼 shtacc–
코드 라인 21~29에서 to 값을 32 ~ 1까지 좌측 시프트한 값을 from 으로 나눈 값을 다시 sftacc 값 만큼 우측 시프트하여 0을 초과하는 경우 mult 및 shift와 값을 구한다. 만일 0 이하인 경우 to 값을 계속 감소시키며 루프를 돈다.
- sht를 32에서 시작하여 감소시켜 나가며, ((@to << sht) + (반올림 목적: @from / 2)) / @from 값이 shtacc 비트 이내에 포함된 값을 mult 값으로 로 선택한다.

다음은 factor 대신 사용할 정확도 shift 별 mult 값을 보여준다.

from(54000000) * factor(18.518519) = to(1000000000)
----------------------------------
from(54000000) * mult(         18.518519) >> 0 = to(1000000000)
from(54000000) * mult(         37.037037) >> 1 = to(1000000000)
from(54000000) * mult(         74.074074) >> 2 = to(1000000000)
from(54000000) * mult(        148.148148) >> 3 = to(1000000000)
from(54000000) * mult(        296.296296) >> 4 = to(1000000000)
from(54000000) * mult(        592.592593) >> 5 = to(1000000000)
from(54000000) * mult(       1185.185185) >> 6 = to(1000000000)
from(54000000) * mult(       2370.370370) >> 7 = to(1000000000)
from(54000000) * mult(       4740.740741) >> 8 = to(1000000000)
from(54000000) * mult(       9481.481481) >> 9 = to(1000000000)
from(54000000) * mult(      18962.962963) >> 10 = to(1000000000)
from(54000000) * mult(      37925.925926) >> 11 = to(1000000000)
from(54000000) * mult(      75851.851852) >> 12 = to(1000000000)
from(54000000) * mult(     151703.703704) >> 13 = to(1000000000)
from(54000000) * mult(     303407.407407) >> 14 = to(1000000000)
from(54000000) * mult(     606814.814815) >> 15 = to(1000000000)
from(54000000) * mult(    1213629.629630) >> 16 = to(1000000000)
from(54000000) * mult(    2427259.259259) >> 17 = to(1000000000)
from(54000000) * mult(    4854518.518519) >> 18 = to(1000000000)
from(54000000) * mult(    9709037.037037) >> 19 = to(1000000000)
from(54000000) * mult(   19418074.074074) >> 20 = to(1000000000)
from(54000000) * mult(   38836148.148148) >> 21 = to(1000000000)
from(54000000) * mult(   77672296.296296) >> 22 = to(1000000000)
from(54000000) * mult(  155344592.592593) >> 23 = to(1000000000)
from(54000000) * mult(  310689185.185185) >> 24 = to(1000000000)
from(54000000) * mult(  621378370.370370) >> 25 = to(1000000000)
from(54000000) * mult( 1242756740.740741) >> 26 = to(1000000000)
from(54000000) * mult( 2485513481.481482) >> 27 = to(1000000000)
from(54000000) * mult( 4971026962.962963) >> 28 = to(1000000000)
from(54000000) * mult( 9942053925.925926) >> 29 = to(1000000000)
from(54000000) * mult(19884107851.851852) >> 30 = to(1000000000)
from(54000000) * mult(39768215703.703705) >> 31 = to(1000000000)

다음은 32비트 이진화 정수(1.0 = 0x1_0000_0000)를 사용한 mult 값을 보여준다.

from(54000000) * factor(18.518519) = to(1000000000)
----------------------------------
from(54000000) * mult(0x      1284bda12f) >> 0 = to(1000000000)
from(54000000) * mult(0x      25097b425e) >> 1 = to(1000000000)
from(54000000) * mult(0x      4a12f684bc) >> 2 = to(1000000000)
from(54000000) * mult(0x      9425ed0978) >> 3 = to(1000000000)
from(54000000) * mult(0x     1284bda12f0) >> 4 = to(1000000000)
from(54000000) * mult(0x     25097b425e0) >> 5 = to(1000000000)
from(54000000) * mult(0x     4a12f684bc0) >> 6 = to(1000000000)
from(54000000) * mult(0x     9425ed09780) >> 7 = to(1000000000)
from(54000000) * mult(0x    1284bda12f00) >> 8 = to(1000000000)
from(54000000) * mult(0x    25097b425e00) >> 9 = to(1000000000)
from(54000000) * mult(0x    4a12f684bc00) >> 10 = to(1000000000)
from(54000000) * mult(0x    9425ed097800) >> 11 = to(1000000000)
from(54000000) * mult(0x   1284bda12f000) >> 12 = to(1000000000)
from(54000000) * mult(0x   25097b425e000) >> 13 = to(1000000000)
from(54000000) * mult(0x   4a12f684bc000) >> 14 = to(1000000000)
from(54000000) * mult(0x   9425ed0978000) >> 15 = to(1000000000)
from(54000000) * mult(0x  1284bda12f0000) >> 16 = to(1000000000)
from(54000000) * mult(0x  25097b425e0000) >> 17 = to(1000000000)
from(54000000) * mult(0x  4a12f684bc0000) >> 18 = to(1000000000)
from(54000000) * mult(0x  9425ed09780000) >> 19 = to(1000000000)
from(54000000) * mult(0x 1284bda12f00000) >> 20 = to(1000000000)
from(54000000) * mult(0x 25097b425e00000) >> 21 = to(1000000000)
from(54000000) * mult(0x 4a12f684bc00000) >> 22 = to(1000000000)
from(54000000) * mult(0x 9425ed097800000) >> 23 = to(1000000000)
from(54000000) * mult(0x1284bda12f000000) >> 24 = to(1000000000)
from(54000000) * mult(0x25097b425e000000) >> 25 = to(1000000000)
from(54000000) * mult(0x4a12f684bc000000) >> 26 = to(1000000000)
from(54000000) * mult(0x9425ed0978000000) >> 27 = to(1000000000)
from(54000000) * mult(0x1284bda12f0000000) >> 28 = to(1000000000)
from(54000000) * mult(0x25097b425e0000000) >> 29 = to(1000000000)
from(54000000) * mult(0x4a12f684bc0000000) >> 30 = to(1000000000)
from(54000000) * mult(0x9425ed09780000000) >> 31 = to(1000000000)

다음 그림은 from=54Mhz, to=1G, maxsec=600 값이 주어질 때 mult=0x1284_bda1/shift=24 값으로 계산되는 모습을 보여준다.

1) 54M * 600 = 0x7_8830_C400 값은 32bit를 3bit 만큼 초과하였으므로 shtacc=32-3=29로 결정된다.
2) ((1G << 32) + 27M) / @from = 0x12_84bd_a12f 값은 shtacc(29) 비트를 8bit 만큼 초과하였으므로 mult 값으로 8bit 만큼 줄인 값으로 결정하고, shift 값은 32-8=24로 결정한다.

다음 그림은 몇 가지 조건으로 mult/shift 값을 산출한 결과를 보여준다.

clocksource_max_adjustment()

kernel/time/clocksource.c

/**
 * clocksource_max_adjustment- Returns max adjustment amount
 * @cs:         Pointer to clocksource
 *
 */

static u32 clocksource_max_adjustment(struct clocksource *cs)
{
        u64 ret;
        /*
         * We won't try to correct for more than 11% adjustments (110,000 ppm),
         */
        ret = (u64)cs->mult * 11;
        do_div(ret,100);
        return (u32)ret;
}

최대 조정값으로 mult 값의 11%를 반환한다.

clocksource_max_deferment()

kernel/time/clocksource.c

/**
 * clocksource_update_max_deferment - Updates the clocksource max_idle_ns & max_cycles
 * @cs:         Pointer to clocksource
 *
 */

static u64 clocksource_update_max_deferment(struct clocksource *cs)
{
        cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift, 
                                                cs->maxadj, cs->mask,
                                                &cs->max_cycles);
}

클럭 소스로 cycle 카운터 값을 읽어 ns 값으로 변환하여 사용할 때 최대 사용가능한 보류(유예)할 수 있는 최대 ns 및 cycle 값을 산출한다.

다음 그림은 56비트 클럭 소스로 최대 보류(유예)할 수 있는 ns 및 cycle 값을 산출하는 모습을 보여준다.

max_idle_ns 값은 약 440초
max_cycles 값은 약 440억번

clocks_calc_max_nsecs()

kernel/time/clocksource.c

/**
 * clocks_calc_max_nsecs - Returns maximum nanoseconds that can be converted
 * @mult:       cycle to nanosecond multiplier
 * @shift:      cycle to nanosecond divisor (power of two)
 * @maxadj:     maximum adjustment value to mult (~11%)
 * @mask:       bitmask for two's complement subtraction of non 64 bit counters
 * @max_cyc:    maximum cycle value before potential overflow (does not include
 *              any safety margin)
 *
 * NOTE: This function includes a safety margin of 50%, in other words, we
 * return half the number of nanoseconds the hardware counter can technically
 * cover. This is done so that we can potentially detect problems caused by
 * delayed timers or bad hardware, which might result in time intervals that
 * are larger than what the math used can handle without overflows.
 */

u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cyc)
{
        u64 max_nsecs, max_cycles;

        /*
         * Calculate the maximum number of cycles that we can pass to the
         * cyc2ns() function without overflowing a 64-bit result.
         */
        max_cycles = ULLONG_MAX;
        do_div(max_cycles, mult+maxadj);

        /*
         * The actual maximum number of cycles we can defer the clocksource is
         * determined by the minimum of max_cycles and mask.
         * Note: Here we subtract the maxadj to make sure we don't sleep for
         * too long if there's a large negative adjustment.
         */
        max_cycles = min(max_cycles, mask);
        max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);

        /* return the max_cycles value as well if requested */
        if (max_cyc)
                *max_cyc = max_cycles;

        /* Return 50% of the actual maximum, so we can detect bad values */
        max_nsecs >>= 1;

        return max_nsecs;
}

최대 조정 cycle 값을 뺀 mult와 cycle 수로 최대 사용 가능한 시간(ns)의 절반을 구해 반환한다.

코드 라인 9~10에서 0xffff_ffff_ffff_ffff / (mult+maxadj)로 max_cycle을 산출한다.
- max_cycle을 산출하는 방법이 커널 v4.1-rc1에서 simple하게 바뀌었다.
- 참고: clocksource: Simplify the clocks_calc_max_nsecs() logic (2015, v4.1-rc1)
코드 라인 18에서 max_cycles가 mask 값을 초과하지 않게 한다.
코드 라인 19에서 계산된 max_cycles에 maxadj를 뺀 mult를 곱한 수를 우측으로 shift 하여 최대 소요 시간(ns)을 구해 반환한다.
코드 라인 22~23에서 출력 인자 @max_cyc가 지정된 경우 max_cycles 값을 출력한다.
코드 라인 26~28에서 최대 소요 시간(ns)의 절반을 반환한다.

clocksource_cyc2ns()

include/linux/clocksource.h

/**
 * clocksource_cyc2ns - converts clocksource cycles to nanoseconds
 * @cycles:     cycles
 * @mult:       cycle to nanosecond multiplier
 * @shift:      cycle to nanosecond divisor (power of two)
 *
 * Converts clocksource cycles to nanoseconds, using the given @mult and @shift.
 * The code is optimized for performance and is not intended to work
 * with absolute clocksource cycles (as those will easily overflow),
 * but is only intended to be used with relative (delta) clocksource cycles.
 *
 * XXX - This could use some mult_lxl_ll() asm optimization
 */

static inline s64 clocksource_cyc2ns(u64 cycles, u32 mult, u32 shift)
{
        return ((u64) cycles * mult) >> shift;
}

cycle 값에 @mult를 곱하고 @shift한 나노초를 반환한다.

예) 1 cycle, mult=0x682_aaab, shift=21
- =52ns
예) 256 cycle, mult=0x682a_aaab, shift=21
- =13,333ns

다음 그림은 mult=0x1284_bda1, shift=24인 상황에서 cycle=1이 주어진 경우 18ns의 소요 시간을 산출하는 모습을 보여준다.

다음 그림은 해상도가 다른 두 개의 클럭 소스에 대해 1초 동안 카운터가 증가된 경우를 보여준다.

다음 그림은 해상도가 다른 두 개의 클럭 소스에 대해 1 사이클의 카운터가 증가된 경우에 대해 처리 가능한 최소 시간을 보여준다.

해상도가 낮은 jiffies를 사용하는 클럭 소스는 최소 단위로 4ms의 해상도를 갖음을 알 수 있다.

클럭 소스 등록 및 선택

clocksource_enqueue()

kernel/time/clocksource.c

/*
 * Enqueue the clocksource sorted by rating
 */

static void clocksource_enqueue(struct clocksource *cs)
{
        struct list_head *entry = &clocksource_list;
        struct clocksource *tmp;
        
        list_for_each_entry(tmp, &clocksource_list, list)
                /* Keep track of the place, where to insert */
                if (tmp->rating >= cs->rating)
                        entry = &tmp->list;
        list_add(&cs->list, entry);
}

요청한 클럭 소스를 clocksource_list에 추가할 때 rating 값이 큰 순서대로 정렬한다. (descending sort)

다음 그림은 클럭 소스를 추가할 때 rating 값 순으로 소팅되어 등록되는 것을 보여준다.

clocksource_select()

kernel/time/clocksource.c

/**
 * clocksource_select - Select the best clocksource available
 *
 * Private function. Must hold clocksource_mutex when called.
 *
 * Select the clocksource with the best rating, or the clocksource,
 * which is selected by userspace override.
 */

static void clocksource_select(void)
{
        __clocksource_select(false);
}

현재 지정된 클럭을 포함하여 best 클럭 소스를 찾아 선택한다.

__clocksource_select()

kernel/time/clocksource.c

static void __clocksource_select(bool skipcur)
{
        bool oneshot = tick_oneshot_mode_active();
        struct clocksource *best, *cs;

        /* Find the best suitable clocksource */
        best = clocksource_find_best(oneshot, skipcur);
        if (!best)
                return;
        if (!strlen(override_name))
                goto found;
        /* Check for the override clocksource. */
        list_for_each_entry(cs, &clocksource_list, list) {
                if (skipcur && cs == curr_clocksource)
                        continue;
                if (strcmp(cs->name, override_name) != 0)
                        continue;
                /*
                 * Check to make sure we don't switch to a non-highres
                 * capable clocksource if the tick code is in oneshot
                 * mode (highres or nohz)
                 */
                if (!(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES) && oneshot) {
                        /* Override clocksource cannot be used. */
                        if (cs->flags & CLOCK_SOURCE_UNSTABLE) {
                                pr_warn("Override clocksource %s is not HRT compatible - cannot switch while in HRT/NOHZ mode\n",
                                        cs->name);
                                override_name[0] = 0;
                        } else {
                                /*
                                 * The override cannot be currently verified.
                                 * Deferring to let the watchdog check.
                                 */
                                pr_info("Override clocksource %s is not currently HRT compatible - deferring\n",
                                        cs->name);
                        }                        
                } else
                        /* Override clocksource can be used. */
                        best = cs;
                break;
        }

found:
        if (curr_clocksource != best && !timekeeping_notify(best)) {
                pr_info("Switched to clocksource %s\n", best->name);
                curr_clocksource = best;
        }
}

best 클럭 소스를 선택한다. skip_cur에 true를 요청하는 경우 현재 선택된 클럭 소스는 제외하고 다른 best 클럭 소스를 찾아 선택한다.

코드 라인 3에서 현재 cpu의 tick_cpu_device 모드가 oneshot을 지원하는지 여부를 알아온다.
코드 라인 7~9에서 먼저 best 클럭 소스를 알아온다. 만일 찾지 못한 경우 함수를 빠져나간다.
코드 라인 10~11에서 “clocksource=” 커널 파라미터로 지정된 클럭 소스가 없으면 found: 레이블로 이동한다.
코드 라인 13~41에서 “clocksource=” 커널 파라미터로 지정된 클럭 소스를 best 클럭 소스로 지정한다.
- 고해상도를 지원하는 클럭 소스가 아니면서 oneshot을 지원해야 하는 경우 고해상도 모드가 아니다라는 메시지를 출력한다.
코드 라인 43~47에서 found: 레이블이다. 현재 클럭 소스가 best 클럭 소스로 변경된 경우 클럭 소스가 바뀌었다는 정보를 출력한다.

다음 그림은 “t4” 클럭 소스를 지정하여 선택하는 것을 보여준다.

clocksource_find_best()

kernel/time/clocksource.c

static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur) 
{
        struct clocksource *cs;

        if (!finished_booting || list_empty(&clocksource_list))
                return NULL; 

        /*
         * We pick the clocksource with the highest rating. If oneshot
         * mode is active, we pick the highres valid clocksource with
         * the best rating.
         */
        list_for_each_entry(cs, &clocksource_list, list) {
                if (skipcur && cs == curr_clocksource)
                        continue;
                if (oneshot && !(cs->flags & CLOCK_SOURCE_VALID_FOR_HRES))
                        continue;
                return cs;     
        }          
        return NULL;
}

best 클럭 소스를 찾아 반환한다. oneshot=1이 요청되는 경우 hrtimer에서만 찾는다. skipcur=true인 경우 현재 선택된 클럭 소스는 제외한다.

코드 라인 5~6에서 아직 부트업으로 인한 초기화가 안되었거나 등록된 클럭 소스가 없는 경우 함수를 빠져나간다.
코드 라인 13에서 rating 값 순으로 등록되어 있는 클럭 소스 리스트에서 조건을 만족하는 처음 클럭 소스를 찾는다. 현재 지정된 클럭 소스와 oneshot 요청 시 valid_for_hires 플래그가 없는 클럭 소스인 경우는 skip하는 조건이다.

다음 그림은 oneshot 설정 유무에 따라 best 클럭 소스를 찾는 과정을 보여준다.

최종 best 클럭 소스 선택

clocksource_done_booting()

kernel/time/clocksource.c

/*
 * clocksource_done_booting - Called near the end of core bootup
 *
 * Hack to avoid lots of clocksource churn at boot time.
 * We use fs_initcall because we want this to start before
 * device_initcall but after subsys_initcall.
 */

static int __init clocksource_done_booting(void)
{
        mutex_lock(&clocksource_mutex);
        curr_clocksource = clocksource_default_clock();
        finished_booting = 1;
        /*
         * Run the watchdog first to eliminate unstable clock sources
         */
        __clocksource_watchdog_kthread();
        clocksource_select();
        mutex_unlock(&clocksource_mutex);
        return 0;
}
fs_initcall(clocksource_done_booting);

최종 best 클럭 소스를 찾아 선택한다.

코드 라인 4에서 jiffies를 default 클럭 소스로 일단 선택한다.
코드 라인 9에서 워치독 타이머를 가동하여 unstable 클럭 소스를 제거한다.
코드 라인 10에서 best 클럭 소스를 찾아 선택한다.
코드 라인 12에서 0을 반환한다.

sysfs를 통한 clocksource 현황 확인

다음과 같이 rpi2의 clocksource를 확인해 보았다.

# cd /sys/devices/system/clocksource
# ls 
clocksource0  power  uevent
# ls clocksource0
available_clocksource  power/                 uevent
current_clocksource    subsystem/             unbind_clocksource
# cat clocksource0/available_clocksource
arch_sys_counter
# cat clocksource0/current_clocksource
arch_sys_counter

구조체

clocksource 구조체

include/linux/clocksource.h

/**
 * struct clocksource - hardware abstraction for a free running counter
 *      Provides mostly state-free accessors to the underlying hardware.
 *      This is the structure used for system time.
 *
 * @name:               ptr to clocksource name
 * @list:               list head for registration
 * @rating:             rating value for selection (higher is better)
 *                      To avoid rating inflation the following
 *                      list should give you a guide as to how
 *                      to assign your clocksource a rating
 *                      1-99: Unfit for real use
 *                              Only available for bootup and testing purposes.
 *                      100-199: Base level usability.
 *                              Functional for real use, but not desired.
 *                      200-299: Good.
 *                              A correct and usable clocksource.
 *                      300-399: Desired.
 *                              A reasonably fast and accurate clocksource.
 *                      400-499: Perfect
 *                              The ideal clocksource. A must-use where
 *                              available.
 * @read:               returns a cycle value, passes clocksource as argument
 * @enable:             optional function to enable the clocksource
 * @disable:            optional function to disable the clocksource
 * @mask:               bitmask for two's complement
 *                      subtraction of non 64 bit counters
 * @mult:               cycle to nanosecond multiplier
 * @shift:              cycle to nanosecond divisor (power of two)
 * @max_idle_ns:        max idle time permitted by the clocksource (nsecs)
 * @maxadj:             maximum adjustment value to mult (~11%)
 * @max_cycles:         maximum safe cycle value which won't overflow on multiplication
 * @flags:              flags describing special properties
 * @archdata:           arch-specific data
 * @suspend:            suspend function for the clocksource, if necessary
 * @resume:             resume function for the clocksource, if necessary
 * @mark_unstable:      Optional function to inform the clocksource driver that
 *                      the watchdog marked the clocksource unstable
 * @owner:              module reference, must be set by clocksource in modules
 *
 * Note: This struct is not used in hotpathes of the timekeeping code
 * because the timekeeper caches the hot path fields in its own data
 * structure, so no line cache alignment is required,
 *
 * The pointer to the clocksource itself is handed to the read
 * callback. If you need extra information there you can wrap struct
 * clocksource into your own struct. Depending on the amount of
 * information you need you should consider to cache line align that
 * structure.
 */

struct clocksource {
        u64 (*read)(struct clocksource *cs);
        u64 mask;
        u32 mult;
        u32 shift;
        u64 max_idle_ns;
        u32 maxadj;
#ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
        struct arch_clocksource_data archdata;
#endif
        u64 max_cycles;
        const char *name;
        struct list_head list;
        int rating;
        int (*enable)(struct clocksource *cs);
        void (*disable)(struct clocksource *cs);
        unsigned long flags;
        void (*suspend)(struct clocksource *cs);
        void (*resume)(struct clocksource *cs);
        void (*mark_unstable)(struct clocksource *cs);
        void (*tick_stable)(struct clocksource *cs);

        /* private: */
#ifdef CONFIG_CLOCKSOURCE_WATCHDOG
        /* Watchdog related data, used by the framework */
        struct list_head wd_list;
        u64 cs_last;
        u64 wd_last;
#endif
        struct module *owner;
};

(*read)
- cycle 값을 알아온다.
mask
- 타이머에 유효한 카운터 비트 마스크
  - 56비트 카운터 사용 시 mask=0xff_ffff_ffff_ffff
mult
- 1 cycle을 nano second로 변경 시 곱할 수
shift
- 1 cycle을 nano second로 변경 시 우측 시프트할 수
max_idle_ns
- clocksource가 최대 idle할 수 있는 nano second
maxadj
- 최대 조정 값 (mult의 11% 까지)
archdata
- 아키텍처 종속적인 데이터
max_cycle
- clocksource 유효성 확인을 용이하게하기 위해 잠재적으로 오버플로를 일으키지 않고 안전하게 곱할 수있는 최대 사이클 값
- 커널 v4.1-rc1에서 추가되었다.
- 참고: clocksource: Add ‘max_cycles’ to ‘struct clocksource’
*name
- clocksource 명
list
- 등록 시 리스트에 연결
rating
- 선택 등급으로 수치가 높을 수록 좋다.
(*enable)
- clocksource를 enable할 수 있는 경우 사용된다. (option)
(*disable)
- clocksource를 disable할 수 있는 경우 사용된다. (option)
flags
- 플래그
  - CLOCK_SOURCE_IS_CONTINUOUS(0x01)
  - CLOCK_SOURCE_MUST_VERIFY(0x02)
    - x86 TSC에서 사용
  - CLOCK_SOURCE_WATCHDOG(0x10)
  - CLOCK_SOURCE_VALID_FOR_HRES(0x20)
  - CLOCK_SOURCE_UNSTABLE(0x40)
  - CLOCK_SOURCE_SUSPEND_NONSTOP(0x80)
  - CLOCK_SOURCE_RESELECT(0x100)
(*suspend)
- suspend 시 closksource를 suspend 할 수 있는 경우 사용된다. (option)
(*resume)
- resume 시 closksource를 resume 할 수 있는 경우 사용된다. (option)
(*mark_unstable)
- 워치독에 의해 unstable한 상태가 된 클럭소스에 대해 호출된다.
- 현재 x86 TSC 클럭에서 구현되어 사용된다.
- 참고: sched/clock, clocksource: Add optional cs::mark_unstable() method (2016, v4.11-rc1)
(*tick_stable)
- 워치독에 의해 stable한 상태가 된 클럭 소스에 대해 호출된다.
- 현재 x86 TSC 클럭에서 구현되어 사용된다.
- 참고: x86/tsc, sched/clock, clocksource: Use clocksource watchdog to provide stable sync points (2017, v4.13-rc1)
wd_list
- 워치독 리스트
cs_last
- 워치독에서 사용
- 현재 클럭 소스의 카운터 값
wd_last
- 워치독에서 사용
- 워치독 중인 클럭 소스의 카운터 값
owner
- 모듈에서 사용 시 반드시 설정되어야 하는 레퍼런스

참고

Timer -1- (Lowres Timer) | 문c
Timer -2- (HRTimer) | 문c
Timer -3- (Clock Sources Subsystem) | 문c – 현재 글
Timer -4- (Clock Sources Watchdog) | 문c
Timer -5- (Clock Events Subsystem) | 문c
Timer -6- (Clock Source & Timer Driver) | 문c
Timer -7- (Sched Clock & Delay Timers) | 문c
Timer -8- (Timecounter) | 문c
Timer -9- (Tick Device) | 문c
Timer -10- (Timekeeping) | 문c
Timer -11- (Posix Clock & Timers) | 문c
time_init() | 문c
sched_clock_postinit() | 문c
tick_init() | 문c
timekeeping_init() | 문c
calibrate_delay() | 문c

ARM Archectected Timer | kernel.org

hrtimers_init()

2017-02-172019-12-16 문영일 Leave a comment

다음 글로 병합:

Timer -2- (HRTimer) | 문c

Common Clock Framework -2- (APIs)

2017-02-162020-01-08 문영일 Leave a comment

Common Clock Framework -2- (APIs)

주요 API

clk_get()
- 사용할 클럭을 찾아온다.
- 클럭 provider의 참조 카운터를 증가시킨다
clk_put()
- 사용이 끝난 클럭을 지정한다.
- 클럭 provider의 참조 카운터를 감소시킨다
clk_prepare()
- 클럭 소스로부터 클럭이 enable 되게 한다. 슬립 가능한 API이다.
- 절전 기능이 동작하던 클럭들을 깨운다.
clk_unprepare()
- 클럭 소스로부터 클럭이 disable 되게 한다. 슬립 가능한 API이다.
- 절전 기능이 동작하는 클럭들을 재운다.
clk_enable()
- 클럭 소스로부터 클럭 gate가 열리도록 하며 슬립하지 않으므로 atomic context에서도 사용할 수 있다.
clk_disable()
- 클럭 소스로부터 클럭 gate가 닫히도록 하며 슬립하지 않으므로 atomic context에서도 사용할 수 있다.
clk_get_rate()
- 현재 클럭의 rate를 알아온다.
clk_set_rate()
- 현재 클럭의 rate를 설정한다.
clk_set_parent()
- mux 클럭의 소스를 선택한다. (부모 클럭을 선택한다)
clk_get_parent()
- mux 클럭의 소스를 알아온다. (부모 클럭을 알아온다.)

관리 카운터

다음은 클럭 코어들의 관리 카운터들이다.

ref.refcount
- 사용되는 클럭 코어들의 참조 카운터가 증/감된다.
prepare_count
- prepare되는 클럭 코어들의 카운터가 증/감된다.
enable_count
- enable되는 클럭 코어들의 카운터가 증/감된다.
protect_count
- protection되는 클럭 코어들의 카운터가 증/감된다.
- protection된 경우 exclusive 사용자 외에는 구성 값들을 변경할 수 없다.

다음 그림은 호출되는 클럭 API와 참조 카운터들을 보여준다.

gated 클럭의 경우 clk_enable() 함수를 호출해야 클럭이 출력된다.

다음 그림은 디바이스가 클럭을 사용할 때 하이라키로 구성된 클럭들의 각종 참조 카운터들을 보여준다.

클럭 API가 호출되어 최상위 클럭까지 참조 카운터들이 증/감된다.

클럭 사용/해제

클럭 사용

clk_get()

drivers/clk/clkdev.c

struct clk *clk_get(struct device *dev, const char *con_id)
{
        const char *dev_id = dev ? dev_name(dev) : NULL;
        struct clk_hw *hw;

        if (dev && dev->of_node) {
                hw = of_clk_get_hw(dev->of_node, 0, con_id);
                if (!IS_ERR(hw) || PTR_ERR(hw) == -EPROBE_DEFER)
                        return clk_hw_create_clk(dev, hw, dev_id, con_id);
        }

        return __clk_get_sys(dev, dev_id, con_id);
}
EXPORT_SYMBOL(clk_get);

디바이스에서 사용할 클럭을 알아온다. @con_id가 주어진 경우 @con_id 명으로 검색하여 해당 클럭을 찾아오고, 지정되지 않은 경우 연결된 부모 클럭을 알아온다. 그리고 클럭 코어의 참조 카운터를 1 증가시킨다.

코드 라인 6~10에서 디바이스 트리에서 @con_id 또는 0번 인덱스의 클럭 hw를 알아온다. 만일 에러가 없거나 유예 상태인 경우 클럭을 할당하고 알아온 클럭 hw의 클럭 코어에 연결하고 클럭을 반환한다.
코드 라인 12에서 클럭 provider 리스트에서 @con_id 명으로 검색하여 해당 클럭을 할당하고 반환한다.

__clk_get_sys()

drivers/clk/clkdev.c

static struct clk *__clk_get_sys(struct device *dev, const char *dev_id,
                                 const char *con_id)
{
        struct clk_hw *hw = clk_find_hw(dev_id, con_id);

        return clk_hw_create_clk(dev, hw, dev_id, con_id);
}

클럭 provider 리스트에서 @con_id 명으로 클럭 hw를 검색하여 해당 클럭 hw에 연결할 클럭을 할당하고 반환한다.

클럭 찾기

clk_find_hw()

drivers/clk/clkdev.c

struct clk_hw *clk_find_hw(const char *dev_id, const char *con_id)
{
        struct clk_lookup *cl;
        struct clk_hw *hw = ERR_PTR(-ENOENT);

        mutex_lock(&clocks_mutex);
        cl = clk_find(dev_id, con_id);
        if (cl)
                hw = cl->clk_hw;
        mutex_unlock(&clocks_mutex);

        return hw;
}

전역 클럭 리스트에서 @dev_id 및 @con_id 명으로 매치되는 클럭 hw를 반환한다. 검색이 실패한 경우 -ENOENT 에러를 반환한다.

clk_find()

drivers/clk/clkdev.c

/*
 * Find the correct struct clk for the device and connection ID.
 * We do slightly fuzzy matching here:
 *  An entry with a NULL ID is assumed to be a wildcard.
 *  If an entry has a device ID, it must match
 *  If an entry has a connection ID, it must match
 * Then we take the most specific entry - with the following
 * order of precedence: dev+con > dev only > con only.
 */

static struct clk_lookup *clk_find(const char *dev_id, const char *con_id)
{
        struct clk_lookup *p, *cl = NULL;
        int match, best_found = 0, best_possible = 0;

        if (dev_id)
                best_possible += 2;
        if (con_id)
                best_possible += 1;

        lockdep_assert_held(&clocks_mutex);

        list_for_each_entry(p, &clocks, node) {
                match = 0;
                if (p->dev_id) {
                        if (!dev_id || strcmp(p->dev_id, dev_id))
                                continue;
                        match += 2;
                }
                if (p->con_id) {
                        if (!con_id || strcmp(p->con_id, con_id))
                                continue;
                        match += 1;
                }

                if (match > best_found) {
                        cl = p;
                        if (match != best_possible)
                                best_found = match;
                        else
                                break;
                }
        }
        return cl;
}

전역 클럭 리스트에서 @dev_id 및 @con_id 명으로 클럭을 찾아 clk_lookup을 반환한다. 검색이 실패한 경우 null을 반환한다.

@dev_id 및 @con_id가 주어진 경우 일치되지 않으면 null을 반환한다. 검색된 항목들 중 가장 best 매치를 다음과 같은 조건으로 수행한다.
- @dev_id 및 @con_id 명이 모두 일치하는 경우 즉각 반환
- @dev_id 명이 일치하는 마지막 항목
- @con_id 명이 일치하는 마지막 항목

클럭 사용 해제

clk_put()

drivers/clk/clkdev.c

void clk_put(struct clk *clk)
{
        __clk_put(clk);
}
EXPORT_SYMBOL(clk_put);

사용한 클럭을 해제한다. 그리고 사용한 클럭 코어의 참조 카운터를 1 감소시킨다.

__clk_put()

drivers/clk/clkdev.c

void __clk_put(struct clk *clk)
{
        struct module *owner;

        if (!clk || WARN_ON_ONCE(IS_ERR(clk)))
                return;

        clk_prepare_lock();

        /*
         * Before calling clk_put, all calls to clk_rate_exclusive_get() from a
         * given user should be balanced with calls to clk_rate_exclusive_put()
         * and by that same consumer
         */
        if (WARN_ON(clk->exclusive_count)) {
                /* We voiced our concern, let's sanitize the situation */
                clk->core->protect_count -= (clk->exclusive_count - 1);
                clk_core_rate_unprotect(clk->core);
                clk->exclusive_count = 0;
        }

        hlist_del(&clk->clks_node);
        if (clk->min_rate > clk->core->req_rate ||
            clk->max_rate < clk->core->req_rate)
                clk_core_set_rate_nolock(clk->core, clk->core->req_rate);

        owner = clk->core->owner;
        kref_put(&clk->core->ref, __clk_release);

        clk_prepare_unlock();

        module_put(owner);

        free_clk(clk);
}

사용한 클럭을 해제한다. 그리고 사용한 클럭 코어의 참조 카운터를 1 감소시킨다.

코드 라인 8에서 클럭 설정을 변경하려면 전역 클럭 락을 획득해야 한다.
코드 라인 15~20에서 클럭 rate 변경에 대한 protection을 위해 0이 아닌 경우 경고 메시지를 출력하고 0으로 변경한다.
코드 라인 22에서 클럭 코어에서 클럭을 제거한다.
코드 라인 23~25에서 클럭 사용자가 요청한 rate 범위가 클럭 코어의 req_rate가 아닌 경우 클럭 코어의 req_rate로 변경한다.
코드 라인 28에서 클럭 코어의 참조 카운터를 감소시킨다.
코드 라인 32에서 모듈의 참조 카운터를 감소시킨다.
코드 라인 34에서 할당된 클럭을 해제한다.

클럭 준비/해지

클럭 준비(prepare)

clk_prepare()

drivers/clk/clk.c

/**
 * clk_prepare - prepare a clock source
 * @clk: the clk being prepared
 *
 * clk_prepare may sleep, which differentiates it from clk_enable.  In a simple
 * case, clk_prepare can be used instead of clk_enable to ungate a clk if the
 * operation may sleep.  One example is a clk which is accessed over I2c.  In
 * the complex case a clk ungate operation may require a fast and a slow part.
 * It is this reason that clk_prepare and clk_enable are not mutually
 * exclusive.  In fact clk_prepare must be called before clk_enable.
 * Returns 0 on success, -EERROR otherwise.
 */

int clk_prepare(struct clk *clk)
{
        if (!clk)
                return 0;

        return clk_core_prepare_lock(clk->core);
}
EXPORT_SYMBOL_GPL(clk_prepare);

상위 클럭 소스까지 prepare 시킨다.

clk_core_prepare_lock()

drivers/clk/clk.c

static int clk_core_prepare_lock(struct clk_core *core)
{
        int ret;

        clk_prepare_lock();
        ret = clk_core_prepare(core);
        clk_prepare_unlock();

        return ret;
}

lock을 획득한 상태에서 상위 클럭 소스까지 prepare 시킨다.

clk_core_prepare()

drivers/clk/clk.c

static int clk_core_prepare(struct clk_core *core)
{
        int ret = 0;

        lockdep_assert_held(&prepare_lock);

        if (!core)
                return 0;

        if (core->prepare_count == 0) {
                ret = clk_pm_runtime_get(core);
                if (ret)
                        return ret;

                ret = clk_core_prepare(core->parent);
                if (ret)
                        goto runtime_put;

                trace_clk_prepare(core);

                if (core->ops->prepare)
                        ret = core->ops->prepare(core->hw);

                trace_clk_prepare_complete(core);

                if (ret)
                        goto unprepare;
        }

        core->prepare_count++;

        /*
         * CLK_SET_RATE_GATE is a special case of clock protection
         * Instead of a consumer claiming exclusive rate control, it is
         * actually the provider which prevents any consumer from making any
         * operation which could result in a rate change or rate glitch while
         * the clock is prepared.
         */
        if (core->flags & CLK_SET_RATE_GATE)
                clk_core_rate_protect(core);

        return 0;
unprepare:
        clk_core_unprepare(core->parent);
runtime_put:
        clk_pm_runtime_put(core);
        return ret;
}

상위 클럭 소스까지 prepare 시킨다.

코드 라인 7~8에서 이 함수는 재귀호출에서 사용되므로 클럭이 지정되지 않으면 빠져나간다.
코드 라인 10~28에서 한 번도 prepare 한 적이 없는 경우 다음과 같은 동작을 처리한다.
- 절전 기능이 있어 슬립된 클럭 코어의 경우 깨운다.
- 부모 클럭에 대해 재귀 호출로 이 함수를 호출하여 prepare 한다.
- 이 클럭 디바이스에 구현된 ops->prepare 후크 함수를 호출한다.
코드 라인 30에서 이 클럭 코어의 prepare_count를 1 증가시킨다.
코드 라인 39~40에서 이 클럭 코어가 rate를 변경할 때 반드시 gate가 닫혀있어야 하는 경우 rate를 바꿀 수 없도록 protect를 설정한다.
코드 라인 42에서 정상 값 0을 반환한다.

클럭 해제(unprepare)

clk_unprepare()

drivers/clk/clk.c

/**
 * clk_unprepare - undo preparation of a clock source
 * @clk: the clk being unprepared
 *
 * clk_unprepare may sleep, which differentiates it from clk_disable.  In a
 * simple case, clk_unprepare can be used instead of clk_disable to gate a clk
 * if the operation may sleep.  One example is a clk which is accessed over
 * I2c.  In the complex case a clk gate operation may require a fast and a slow
 * part.  It is this reason that clk_unprepare and clk_disable are not mutually
 * exclusive.  In fact clk_disable must be called before clk_unprepare.
 */

void clk_unprepare(struct clk *clk)
{
        if (IS_ERR_OR_NULL(clk))
                return;

        clk_core_unprepare_lock(clk->core);
}
EXPORT_SYMBOL_GPL(clk_unprepare);

상위 클럭 소스까지 unprepare 시킨다

clk_core_unprepare_lock()

drivers/clk/clk.c

static void clk_core_unprepare_lock(struct clk_core *core)
{
        clk_prepare_lock();
        clk_core_unprepare(core);
        clk_prepare_unlock();
}

lock을 획득한 상태에서 상위 클럭 소스까지 unprepare 시킨다

clk_core_unprepare()

drivers/clk/clk.c

static void clk_core_unprepare(struct clk_core *core)
{
        lockdep_assert_held(&prepare_lock);

        if (!core)
                return;

        if (WARN(core->prepare_count == 0,
            "%s already unprepared\n", core->name))
                return;

        if (WARN(core->prepare_count == 1 && core->flags & CLK_IS_CRITICAL,
            "Unpreparing critical %s\n", core->name))
                return;

        if (core->flags & CLK_SET_RATE_GATE)
                clk_core_rate_unprotect(core);

        if (--core->prepare_count > 0)
                return;

        WARN(core->enable_count > 0, "Unpreparing enabled %s\n", core->name);

        trace_clk_unprepare(core);

        if (core->ops->unprepare)
                core->ops->unprepare(core->hw);

        clk_pm_runtime_put(core);

        trace_clk_unprepare_complete(core);
        clk_core_unprepare(core->parent);
}

상위 클럭 소스까지 unprepare 시킨다.

코드 라인 5~6에서 이 함수는 재귀호출을 하게 하였으며 클럭이 지정되지 않으면 빠져나간다.
코드 라인 8~10에서 이미 unprepare된 클럭 코어는 경고 메시지 출력과 함께 함수를 빠져나간다.
코드 라인 12~14에서 클럭의 출력을 off하게 막은 클럭 코어의 경우 경고 메시지 출력과 함께 함수를 빠져나간다.
코드 라인 16~17에서 이 클럭 코어가 rate를 변경할 때 반드시 gate가 닫혀있어야 하는 경우이다. 이제 unprepare하여 gate가 닫혀 rate를 바꿀 수 있는 상태가 되었으므로 protect를 해제한다.
코드 라인 19~20에서 prepare_count를 1 감소시킨다. 실제 0이 되기 전까지 이 클럭 코어를 unprepare 하지 않는다.
- 최상위 부모 클럭까지 재귀 호출되어 1씩 감소시킨다.
코드 라인 26~27에서 클럭 디바이스 드라이버에 구현된 ops->unprepare 후크를 호출한다.
코드 라인 29에서 절전 기능을 가진 클럭 코어인 경우 절전 상태에 진입한다.
코드 라인 32에서 상위 부모 클럭 코어를 unprepare하도록 이 함수를 호출한다. 재귀 동작으로 최상위 클럭 코어까지 호출된다.

클럭 게이트 enable & disable

클럭 게이트 enable

clk_enable()

drivers/clk/clk.c

/**
 * clk_enable - ungate a clock
 * @clk: the clk being ungated
 *
 * clk_enable must not sleep, which differentiates it from clk_prepare.  In a
 * simple case, clk_enable can be used instead of clk_prepare to ungate a clk
 * if the operation will never sleep.  One example is a SoC-internal clk which
 * is controlled via simple register writes.  In the complex case a clk ungate
 * operation may require a fast and a slow part.  It is this reason that
 * clk_enable and clk_prepare are not mutually exclusive.  In fact clk_prepare
 * must be called before clk_enable.  Returns 0 on success, -EERROR
 * otherwise.
 */

int clk_enable(struct clk *clk)
{
        if (!clk)
                return 0;

        return clk_core_enable_lock(clk->core);
}
EXPORT_SYMBOL_GPL(clk_enable);

상위 클럭 소스까지 gate를 enable 시킨다.

clk_core_enable_lock()

drivers/clk/clk.c

static int clk_core_enable_lock(struct clk_core *core)
{
        unsigned long flags;
        int ret;

        flags = clk_enable_lock();
        ret = clk_core_enable(core);
        clk_enable_unlock(flags);

        return ret;
}

lock을 획득한 상태에서 상위 클럭 소스까지 gate를 enable 시킨다.

clk_core_enable()

drivers/clk/clk.c

static int clk_core_enable(struct clk_core *core)
{
        int ret = 0;

        lockdep_assert_held(&enable_lock);

        if (!core)
                return 0;

        if (WARN(core->prepare_count == 0,
            "Enabling unprepared %s\n", core->name))
                return -ESHUTDOWN;

        if (core->enable_count == 0) {
                ret = clk_core_enable(core->parent);

                if (ret)
                        return ret;

                trace_clk_enable_rcuidle(core);

                if (core->ops->enable)
                        ret = core->ops->enable(core->hw);

                trace_clk_enable_complete_rcuidle(core);

                if (ret) {
                        clk_core_disable(core->parent);
                        return ret;
                }
        }

        core->enable_count++;
        return 0;
}

상위 클럭 소스까지 gate를 enable 시킨다.

코드 라인 7~8에서 이 함수는 재귀호출에서 사용되므로 클럭이 지정되지 않으면 빠져나간다.
코드 라인 10~12에서 아직 unprepare된 클럭 코어는 경고 메시지 출력과 함께 함수를 빠져나간다.
코드 라인 14~31에서 한 번도 enable 한 적이 없는 경우 다음과 같은 동작을 처리한다.
- 부모 클럭에 대해 재귀 호출로 이 함수를 호출하여 enable 한다.
- 이 클럭 디바이스에 구현된 ops->enable 후크 함수를 호출한다.
코드 라인 33에서 이 클럭 코어의 enable_count를 1 증가시킨다.
코드 라인 34에서 정상 값 0을 반환한다.

클럭 게이트 disable

clk_disable()

drivers/clk/clk.c

/**
 * clk_disable - gate a clock
 * @clk: the clk being gated
 *
 * clk_disable must not sleep, which differentiates it from clk_unprepare.  In
 * a simple case, clk_disable can be used instead of clk_unprepare to gate a
 * clk if the operation is fast and will never sleep.  One example is a
 * SoC-internal clk which is controlled via simple register writes.  In the
 * complex case a clk gate operation may require a fast and a slow part.  It is
 * this reason that clk_unprepare and clk_disable are not mutually exclusive.
 * In fact clk_disable must be called before clk_unprepare.
 */

void clk_disable(struct clk *clk)
{
        if (IS_ERR_OR_NULL(clk))
                return;

        clk_core_disable_lock(clk->core);
}
EXPORT_SYMBOL_GPL(clk_disable);

상위 클럭 소스까지 gate를 disable 시킨다.

clk_core_disable_lock()

drivers/clk/clk.c

static void clk_core_disable_lock(struct clk_core *core)
{
        unsigned long flags;

        flags = clk_enable_lock();
        clk_core_disable(core);
        clk_enable_unlock(flags);
}

lock을 획득한 상태에서 상위 클럭 소스까지 gate를 disable 시킨다.

clk_core_disable()

drivers/clk/clk.c

static void clk_core_disable(struct clk_core *core)
{
        lockdep_assert_held(&enable_lock);

        if (!core)
                return;

        if (WARN(core->enable_count == 0, "%s already disabled\n", core->name))
                return;

        if (WARN(core->enable_count == 1 && core->flags & CLK_IS_CRITICAL,
            "Disabling critical %s\n", core->name))
                return;

        if (--core->enable_count > 0)
                return;

        trace_clk_disable_rcuidle(core);

        if (core->ops->disable)
                core->ops->disable(core->hw);

        trace_clk_disable_complete_rcuidle(core);

        clk_core_disable(core->parent);
}

상위 클럭 소스까지 gate를 disable 시킨다.

코드 라인 5~6에서 이 함수는 재귀호출에서 사용되므로 클럭이 지정되지 않으면 빠져나간다.
코드 라인 8~9에서 아직 enable되지 않은 클럭 코어는 경고 메시지 출력과 함께 함수를 빠져나간다.
코드 라인 11~13에서 클럭의 출력을 항상 유지해야 하는 critical 클럭 코어의 경우 disable하면 안되므로 경고 메시지 출력과 함께 함수를 빠져나간다.
코드 라인 15~16에서 enable_count를 1 감소시킨다. 실제 0이 되기 전까지 이 클럭 코어를 disable 하지 않는다.
- 최상위 부모 클럭까지 재귀 호출되어 1씩 감소시킨다.
코드 라인 20~21 이 클럭 디바이스에 구현된 ops->disable 후크 함수를 호출한다.
코드 라인 25에서 상위 부모 클럭에 대해 disable하기 위해 이 함수를 재귀호출한다. 최상위 클럭 코어까지 호출된다.

Rate 설정

clk_rate_request 구조체

include/linux/clk-provider.h

/**
 * struct clk_rate_request - Structure encoding the clk constraints that
 * a clock user might require.
 *
 * @rate:               Requested clock rate. This field will be adjusted by
 *                      clock drivers according to hardware capabilities.
 * @min_rate:           Minimum rate imposed by clk users.
 * @max_rate:           Maximum rate imposed by clk users.
 * @best_parent_rate:   The best parent rate a parent can provide to fulfill the
 *                      requested constraints.
 * @best_parent_hw:     The most appropriate parent clock that fulfills the
 *                      requested constraints.
 *
 */

struct clk_rate_request {
        unsigned long rate;
        unsigned long min_rate;
        unsigned long max_rate;
        unsigned long best_parent_rate;
        struct clk_hw *best_parent_hw;
};

rate에 관련한 API 사용 시 인자 수를 줄이기 위해 사용되는 구조체이다.

rate
- rate 값
min_rate
- 최소 rate 값
max_rate
- 최대 rate 값
best_parent_rate
- 가장 적합한 부모 클럭 rate 값
*best_parent_hw
- 가장 적합한 부모 클럭 hw 포인터

다음 그림은 clk_set_rate() 함수가 rate를 변경하기 위해 호출하는 4 단계의 주요 함수만을 보여준다.

clk_set_rate()

drivers/clk/clk.c

/**
 * clk_set_rate - specify a new rate for clk
 * @clk: the clk whose rate is being changed
 * @rate: the new rate for clk
 *
 * In the simplest case clk_set_rate will only adjust the rate of clk.
 *
 * Setting the CLK_SET_RATE_PARENT flag allows the rate change operation to
 * propagate up to clk's parent; whether or not this happens depends on the
 * outcome of clk's .round_rate implementation.  If *parent_rate is unchanged
 * after calling .round_rate then upstream parent propagation is ignored.  If
 * *parent_rate comes back with a new rate for clk's parent then we propagate
 * up to clk's parent and set its rate.  Upward propagation will continue
 * until either a clk does not support the CLK_SET_RATE_PARENT flag or
 * .round_rate stops requesting changes to clk's parent_rate.
 *
 * Rate changes are accomplished via tree traversal that also recalculates the
 * rates for the clocks and fires off POST_RATE_CHANGE notifiers.
 *
 * Returns 0 on success, -EERROR otherwise.
 */

int clk_set_rate(struct clk *clk, unsigned long rate)
{
        int ret; 

        if (!clk)
                return 0;

        /* prevent racing with updates to the clock topology */
        clk_prepare_lock();

        if (clk->exclusive_count)
                clk_core_rate_unprotect(clk->core);
 
        ret = clk_core_set_rate_nolock(clk->core, rate);

        if (clk->exclusive_count)
                clk_core_rate_protect(clk->core);
        
        clk_prepare_unlock();
                
        return ret;     
} 
EXPORT_SYMBOL_GPL(clk_set_rate);

클럭 rate를 변경 요청한다.

CLK_SET_RATE_PARENT 플래그가 사용된 클럭은 rate 설정 시 상위 클럭으로 전파되도록 한다.

clk_core_set_rate_nolock()

drivers/clk/clk.c

static int clk_core_set_rate_nolock(struct clk_core *core,
                                    unsigned long req_rate)
{
        struct clk_core *top, *fail_clk;
        unsigned long rate = req_rate;
        int ret = 0;

        if (!core)
                return 0;

        rate = clk_core_req_round_rate_nolock(core, req_rate);
        /* bail early if nothing to do */
        if (rate == clk_core_get_rate_nolock(core)
                return 0;

        /* fail on a direct rate set of a protected provider */
        if (clk_core_rate_is_protected(core))
                return -EBUSY;
        /* calculate new rates and get the topmost changed clock */
        top = clk_calc_new_rates(core, rate);
        if (!top)
                return -EINVAL;

        /* notify that we are about to change rates */
        fail_clk = clk_propagate_rate_change(top, PRE_RATE_CHANGE);
        if (fail_clk) {
                pr_debug("%s: failed to set %s rate\n", __func__,
                                fail_clk->name);
                clk_propagate_rate_change(top, ABORT_RATE_CHANGE);
                return -EBUSY;
        }

        /* change the rates */
        clk_change_rate(top);

        clk->req_rate = req_rate;
err:
        clk_pm_runtime_put(core);
        return ret;
}

클럭의 rate 설정을 한다. 성공 시에 0을 반환한다. 적용 시 현재 클럭 curr에서 변경이 필요한 클럭 코어 top까지 상위로 이동하면서 new rate를 산출하고, top 클럭에서 관련 클럭들의 최하위 클럭 bottom까지 rate를 재산출한다.

1 단계 – new rate 변경 필요 확인 (top <- curr)

코드 라인 11~14에서 @req_rate에 대해 클럭 hw가 지원하는 가장 근접한 rate 값을 알아와서 기존 설정되었던 rate 값과 변화가 있는지 사전 체크한다. 만일 rate 변화가 없으면 req_rate를 적용할 필요 없으므로 성공 값 0을 반환한다.
코드 라인 17~18에서 rate 변경이 protect된 상태이다. 이러한 경우 -EBUSY를 반환한다.
- exclusive된 클럭 코어의 경우 다른 유저가 rate를 설정할 수 없다.
- CLK_SET_RATE_GATE 플래그 설정된 클럭 코어의 경우 클럭이 prepare되어 gate가 열린 상태에선 rate를 설정할 수 없다.

2 단계 – new rate & parent 산출 (top <- curr)

코드 라인 20~22에서 @req_rate에 대해 해당 클럭 코어부터 변경이 필요한 상위 클럭 코어까지 위로 올라가며 클럭 hw가 지원하는 가장 근접한 new rate 값을 각각 산출해둔다. 그리고 변경이 필요한 최상위 최상위 클럭 코어를 알아온다.
- CLK_SET_RATE_PARENT 플래그가 사용된 클럭 코어들은 그 상위 클럭 코어에서 rate를 결정한다.
- rate를 결정하는 클럭 코어의 타입에 따라 다음과 같이 재계산된다.
  - Mux 타입
    - determine_rate() 함수를 사용하여 어떤 부모 클럭 코어를 사용해야 요청한 노드의 rate에 인접한 값이 나올 수 있는지 계산된다.
  - 그 외 rate 조절 타입
    - round_rate() 함수를 사용하여 어떤 배율의 클럭 코어를 사용해야 요청한 노드의 rate에 인접한 값이 나올 수 있는지 계산된다.

3 단계 – new rate & parent 적용 통지 체크 (top -> bottom)

코드 라인 25~31에서 적용할 클럭 코어들에 PRE_RATE_CHANGE를 통지하여 rate를 재설정 준비를 한다. 이 과정에서 실패하면 ABORT_RATE_CHANGE를 통지하여 roll-back 한다.
- 상위 부모 클럭 코어까지 rate를 설정하게 할 때 상위 전달(propagation) 과정에서 notify되는 함수에서 해당 클럭 코어의 설정을 허용할지 여부를 결정하게 한다.

4 단계 – new rate & parent 적용 (top -> bottom)

코드 라인 34에서 top 클럭 코어부터 마지막 자식 클럭 코어까지 산출된 new rate 및 new 부모 클럭을 적용하고 통지한다.
코드 라인 36에서 현재 클럭 코어의 req_rate를 설정한다.
코드 라인 38에서 필요 시 절전 기능을 켠다.

다음 그림은 클럭 F에서 rate를 바꿀 때 클럭 F->G까지 어떠한 단계로 바뀌는지 그 과정을 보여준다. (성공 예)

다음 그림은 위의 방법으로 실패 사례를 보여준다.

1 단계 – new rate 변경 필요 확인

best rate 산출과 관련한 API들은 다음과 같다.

clk_round_rate()
__clk_round_rate()
- 호출 전에 먼저 lock을 획득한 상태여야 한다.
__clk_determine_rate()
- 호출 전에 먼저 lock을 획득한 상태여야 한다.
clk_hw_round_rate()

clk_round_rate()

drivers/clk/clk.c

/**
 * clk_round_rate - round the given rate for a clk
 * @clk: the clk for which we are rounding a rate
 * @rate: the rate which is to be rounded
 *
 * Takes in a rate as input and rounds it to a rate that the clk can actually
 * use which is then returned.  If clk doesn't support round_rate operation
 * then the parent rate is returned.
 */

long clk_round_rate(struct clk *clk, unsigned long rate)
{
        struct clk_rate_request req;
        int ret;

        if (!clk)
                return 0;

        clk_prepare_lock();

        if (clk->exclusive_count)
                clk_core_rate_unprotect(clk->core);

        clk_core_get_boundaries(clk->core, &req.min_rate, &req.max_rate);
        req.rate = rate;

        ret = clk_core_round_rate_nolock(clk->core, &req);

        if (clk->exclusive_count)
                clk_core_rate_protect(clk->core);

        clk_prepare_unlock();

        if (ret)
                return ret;

        return req.rate;
}
EXPORT_SYMBOL_GPL(clk_round_rate);

요청 @rate에 대해 클럭 hw가 지원하는 가장 근접한 rate를 반환한다. 만일 클럭이 (*round_rate) ops를 지원하지 않으면 부모 클럭의 rate 값이 반환된다.

코드 라인 11~12에서 클럭 사용자가 독점 관리하는 경우 가장 근접한 rate 계산을 위해 잠시 unprotect를 한다.
코드 라인 14에서 자식 클럭들로부터 min_rate 및 max_rate 바운더리 값을 알아온다.
코드 라인 17에서 요청 rate에 대해 클럭 hw가 지원하는 가장 근접한 rate를 반환한다. 만일 클럭이 ops->round_rate를 지원하지 않으면 부모 클럭의 rate 값이 반환된다.
코드 라인 19~20에서 클럭 코어를 독점(exclusive)적으로 관리하는 경우 round rate 계산이 완료되었으므로 다시 protect를 한다.
코드 라인 27에서 rate 값을 반환한다.

__clk_round_rate()

drivers/clk/clk.c

/**
 * __clk_round_rate - round the given rate for a clk
 * @clk: round the rate of this clock
 * @rate: the rate which is to be rounded
 *
 * Caller must hold prepare_lock.  Useful for clk_ops such as .set_rate
 */
unsigned long __clk_round_rate(struct clk *clk, unsigned long rate)
{
        unsigned long min_rate;
        unsigned long max_rate;

        if (!clk)
                return 0;

        clk_core_get_boundaries(clk->core, &min_rate, &max_rate);

        return clk_core_round_rate_nolock(clk->core, rate, min_rate, max_rate);
}
EXPORT_SYMBOL_GPL(__clk_round_rate);

요청 @rate에 대해 클럭 hw가 지원하는 가장 근접한 rate를 반환한다. 만일 클럭이 (*round_rate) ops를 지원하지 않으면 부모 클럭의 rate 값이 반환된다. 호출 시 prepare_lock을 잡아야 한다.

코드 라인 16에서 자식 클럭들로부터 min_rate 및 max_rate 바운더리 값을 알아온다.
코드 라인 18에서 요청 @rate에 대해 클럭 hw가 지원하는 가장 근접한 rate를 반환한다. 만일 클럭이 ops->round_rate를 지원하지 않으면 부모 클럭의 rate 값이 반환된다.

clk_hw_round_rate()

drivers/clk/clk.c

unsigned long clk_hw_round_rate(struct clk_hw *hw, unsigned long rate)
{
        int ret;
        struct clk_rate_request req;

        clk_core_get_boundaries(hw->core, &req.min_rate, &req.max_rate);
        req.rate = rate;

        ret = clk_core_round_rate_nolock(hw->core, &req);
        if (ret)
                return 0;

        return req.rate;
}
EXPORT_SYMBOL_GPL(clk_hw_round_rate);

요청 @rate에 대해 클럭 hw가 지원하는 가장 근접한 rate를 반환한다. 실패한 경우 0을 반환한다.

코드 라인 6에서 자식 클럭들로부터 min_rate 및 max_rate 바운더리 값을 알아온다.
코드 라인 9~11에서 클럭 hw가 지원하는 @rate에 근접한 rate를 반환한다. 만일 min_rate 및 max_rate 값을 초과하는 경우 실패 0 을 반환한다.
코드 라인 13에서 결정된 rate 값을 반환한다.

__clk_determine_rate()

drivers/clk/clk.c

/**
 * __clk_determine_rate - get the closest rate actually supported by a clock
 * @hw: determine the rate of this clock
 * @req: target rate request
 *
 * Useful for clk_ops such as .set_rate and .determine_rate.
 */

int __clk_determine_rate(struct clk_hw *hw, struct clk_rate_request *req)
{
        if (!hw) {
                req->rate = 0;
                return 0;
        }

        return clk_core_round_rate_nolock(hw->core, req);
}
EXPORT_SYMBOL_GPL(__clk_determine_rate);

요청 rate에 대해 클럭 hw가 지원하는 가장 근접한 rate를 찾아 req->rate에 담아온다. 성공 시 0을 반환한다.

clk_core_req_round_rate_nolock()

drivers/clk/clk.c

static unsigned long clk_core_req_round_rate_nolock(struct clk_core *core,
                                                     unsigned long req_rate)
{
        int ret, cnt;
        struct clk_rate_request req;

        lockdep_assert_held(&prepare_lock);

        if (!core)
                return 0;

        /* simulate what the rate would be if it could be freely set */
        cnt = clk_core_rate_nuke_protect(core);
        if (cnt < 0)
                return cnt;

        clk_core_get_boundaries(core, &req.min_rate, &req.max_rate);
        req.rate = req_rate;

        ret = clk_core_round_rate_nolock(core, &req);

        /* restore the protection */
        clk_core_rate_restore_protect(core, cnt);

        return ret ? 0 : req.rate;
}

클럭 hw가 지원하는 @req_rate에 가장 근접한 rate를 반환한다. 실패한 경우 0을 반환한다.

코드 라인 9~10에서 클럭 코어가 지정되지 않은 경우 0을 반환한다.
코드 라인 13~15에서 클럭 코어에 protect가 걸려있는 경우 잠시 해제한다.
코드 라인 17에서 자식 클럭들로부터 min_rate 및 max_rate 바운더리 값을 알아온다.
코드 라인 20에서 rate 요청에 대해 클럭 hw가 지원하는 가장 근접한 rate를 반환한다. 만일 min_rate 및 max_rate 값을 초과하는 경우 에러(음수) 값을 얻어온다.
코드 라인 23에서 클럭 코어를 잠시 unprotect 한 경우 다시 원래 대로 복원한다.
코드 라인 25에서 결정된 rate 값을 반환한다. 만일 실패한 경우 0을 반환한다.

다음 그림은 rate 변경이 필요한지 유무를 판단하기 위해 호출하는 과정을 보여준다.

min_rate & max_rate boundary 산출

clk_core_get_boundaries()

drivers/clk/clk.c

static void clk_core_get_boundaries(struct clk_core *core,
                                    unsigned long *min_rate,
                                    unsigned long *max_rate)
{
        struct clk *clk_user;

        lockdep_assert_held(&prepare_lock);

        *min_rate = core->min_rate;
        *max_rate = core->max_rate;

        hlist_for_each_entry(clk_user, &core->clks, clks_node)
                *min_rate = max(*min_rate, clk_user->min_rate);

        hlist_for_each_entry(clk_user, &core->clks, clks_node)
                *max_rate = min(*max_rate, clk_user->max_rate);
}

자식 클럭들로부터 min_rate 및 max_rate 바운더리 값을 알아온다.

min_rate
- 자식 클럭들의 min_rate 값들 중 최대 min_rate
max_rate
- 자식 클럭들의 max_rate 값들 중 최소 max_rate

다음 그림은 자식 클럭들로부터 min_rate 및 max_rate 바운더리 값을 알아오는 것을 보여준다. (min_rate가 max_rate 보다 큰 숫자임을 주의한다)

연속하여 다음 두 함수를 알아본다.

clk_core_round_rate_nolock()
clk_core_determine_round_nolock()

clk_core_round_rate_nolock()

drivers/clk/clk.c

static int clk_core_round_rate_nolock(struct clk_core *core,
                                      struct clk_rate_request *req)
{
        lockdep_assert_held(&prepare_lock);

        if (!core) {
                req->rate = 0;
                return 0;
        }

        clk_core_init_rate_req(core, req);

        if (clk_core_can_round(core))
                return clk_core_determine_round_nolock(core, req);
        else if (core->flags & CLK_SET_RATE_PARENT)
                return clk_core_round_rate_nolock(core->parent, req);

        req->rate = core->rate;
        return 0;
}

rate 요청에 대해 클럭 hw가 지원하는 가장 근접한 rate를 찾아 req->rate에 담고, 성공 시 0을 반환한다.

코드 라인 6~9에서 재귀 호출되었을 때 클럭 코어가 없으면 req->rate에 0을 담고, 성공 값 0을 반환한다.
코드 라인 11에서 rate 요청 전에 현재 부모 클럭 및 rate로 best 결과를 초기화한다.
코드 라인 13~14에서 클럭이 ops->determine_rate를 지원하는 mux 타입 클럭인 경우 호출하여 hw에 맞춰 적절한 rate 값을 산출하여 반환한다.
코드 라인 15~16에서 CLK_SET_RATE_PARENT 플래그를 사용한 클럭은 rate 산출을 위해 부모 클럭에게 위임하러 이 함수를 재귀 호출한다.
코드 라인 18~19에서 그 외의 클럭 타입인 경우 현재 클럭의 rate로 결정하고, 성공 값 0을 반환한다.

clk_core_determine_round_nolock()

drivers/clk/clk.c

static int clk_core_determine_round_nolock(struct clk_core *core,
                                           struct clk_rate_request *req)
{
        long rate;

        lockdep_assert_held(&prepare_lock);

        if (!core)
                return 0;

        /*
         * At this point, core protection will be disabled if
         * - if the provider is not protected at all
         * - if the calling consumer is the only one which has exclusivity
         *   over the provider
         */
        if (clk_core_rate_is_protected(core)) {
                req->rate = core->rate;
        } else if (core->ops->determine_rate) {
                return core->ops->determine_rate(core->hw, req);
        } else if (core->ops->round_rate) {
                rate = core->ops->round_rate(core->hw, req->rate,
                                             &req->best_parent_rate);
                if (rate < 0)
                        return rate;

                req->rate = rate;
        } else {
                return -EINVAL;
        }

        return 0;
}

rate 요청에 대해 클럭 hw가 지원하는 가장 근접한 rate를 찾아 req->rate에 담고, 성공 시 0을 반환한다.

코드 라인 8~9에서 재귀 호출되었을 때 클럭 코어가 없으면 0을 반환한다.
코드 라인 17~18에서 클럭 코어가 protect된 경우 결과 값으로 현재 클럭 코어의 rate 값을 req->rate에 대입한다.
코드 라인 19~20에서 mux 클럭 타입에서 지원하는 (*determine_rate) 후크 함수를 호출하여 산출된 rate 값을 반환한다.
코드 라인 21~27에서 rate류 클럭 타입에서 지원하는 (*round_rate) 후크 함수를 호출하여 rate 값을 알아와서 req->rate에 설정하고, 성공 0을 반환한다.

rate 요청 전 초기화

clk_core_init_rate_req()

drivers/clk/clk.c

static void clk_core_init_rate_req(struct clk_core * const core,
                                   struct clk_rate_request *req)
{
        struct clk_core *parent;

        if (WARN_ON(!core || !req))
                return;

        parent = core->parent;
        if (parent) {
                req->best_parent_hw = parent->hw;
                req->best_parent_rate = parent->rate;
        } else {
                req->best_parent_hw = NULL;
                req->best_parent_rate = 0;
        }
}

현재 부모 클럭 및 rate로 best 결과를 초기화한다.

현재 클럭의 best 부모 클럭 및 rate로 현재 부모 클럭 및 rate를 지정한다.

clk_core_can_round()

drivers/clk/clk.c

static bool clk_core_can_round(struct clk_core * const core)
{
        return core->ops->determine_rate || core->ops->round_rate;
}

round & determine rate를 지원하는 클럭인지 여부를 반환한다.

(*determine_rate) 후크 함수는 mux 클럭 타입에서 지원한다.
(*round_rate) 후크 함수는 rate 클럭 타입에서 지원한다.

2 단계 – new rate & parent 산출

clk_calc_new_rates()

drivers/clk/clk.c

/*
 * calculate the new rates returning the topmost clock that has to be
 * changed.
 */

static struct clk_core *clk_calc_new_rates(struct clk_core *core,
                                           unsigned long rate)
{
        struct clk_core *top = core;
        struct clk_core *old_parent, *parent;
        unsigned long best_parent_rate = 0;
        unsigned long new_rate;
        unsigned long min_rate;
        unsigned long max_rate;
        int p_index = 0;
        long ret;

        /* sanity */
        if (IS_ERR_OR_NULL(core))
                return NULL;

        /* save parent rate, if it exists */
        parent = old_parent = core->parent;
        if (parent)
                best_parent_rate = parent->rate;

        clk_core_get_boundaries(core, &min_rate, &max_rate);

        /* find the closest rate and parent clk/rate */
        if (clk_core_can_round(core)) {
                struct clk_rate_request req;

                req.rate = rate;
                req.min_rate = min_rate;
                req.max_rate = max_rate;

                clk_core_init_rate_req(core, &req);

                ret = clk_core_determine_round_nolock(core, &req);
                if (ret < 0)
                        return NULL;

                best_parent_rate = req.best_parent_rate;
                new_rate = req.rate;
                parent = req.best_parent_hw ? req.best_parent_hw->core : NULL;

                if (new_rate < min_rate || new_rate > max_rate)
                        return NULL;
        } else if (!parent || !(core->flags & CLK_SET_RATE_PARENT)) {
                /* pass-through clock without adjustable parent */
                core->new_rate = core->rate;
                return NULL;
        } else {
                /* pass-through clock with adjustable parent */
                top = clk_calc_new_rates(parent, rate);
                new_rate = parent->new_rate;
                goto out;
        }

        /* some clocks must be gated to change parent */
        if (parent != old_parent &&
            (core->flags & CLK_SET_PARENT_GATE) && core->prepare_count) {
                pr_debug("%s: %s not gated but wants to reparent\n",
                         __func__, core->name);
                return NULL;
        }

        /* try finding the new parent index */
        if (parent && core->num_parents > 1) {
                p_index = clk_fetch_parent_index(core, parent);
                if (p_index < 0) {
                        pr_debug("%s: clk %s can not be parent of clk %s\n",
                                 __func__, parent->name, core->name);
                        return NULL;
                }
        }

        if ((core->flags & CLK_SET_RATE_PARENT) && parent &&
            best_parent_rate != parent->rate)
                top = clk_calc_new_rates(parent, best_parent_rate);

out:
        clk_calc_subtree(core, new_rate, parent, p_index);

        return top;
}

현재 클럭 코어로부터 변경이 필요한 상위 클럭까지 new rate를 산출하고, new rate 산출된 최상위 클럭 코어를 반환한다.

코드 라인 18에서 요청 클럭 코어의 부모 클럭 코어를 알아와서 parent 및 old_parent에 보관한다.
코드 라인 19~20에서 부모 클럭 코어가 존재하는 경우 부모 클럭 코어의 rate를 best_parent_rate에 보관한다.
코드 라인 22에서 자식 클럭 코어들로부터 min_rate 및 max_rate 바운더리 값을 알아온다.
코드 라인 25~43에서 rate 최적값을 구하는 후크 함수가 지원되는 경우 이를 호출하여 최적의 rate 값을 알아온다.
- 참고:
  - drivers/clk/ti/mux.c – __clk_mux_determine_rate()
  - drivers/clk/ti/divider.c – ti_clk_divider_round_rate()
코드 라인 44~47에서 부모 클럭이 없거나 CLK_SET_RATE_PARENT 플래그가 없는 경우 현재 클럭의 rate만을 변경한다.
코드 라인 48~53에서 부모 클럭의 rate를 산출하기 위해 인수로 부모 클럭과 요청 rate 값을 가지고 이 함수를 재귀호출하여 new_rate를 알아온다.
- ops->determine_rate 및 ops->round_rate가 없는 gate 타입의 클럭을 사용하는 경우 CLK_SET_RATE_PARENT 플래그가 사용되지 않을 때까지 상위 클럭으로 이동한다.
코드 라인 56~61에서 mux 타입 클럭 코어에서 rate 변경 요청으로 인해 부모 클럭의 변경이 필요한 상태이며 현재 클럭 코어에 CLK_SET_PARENT_GATE 플래그가 설정된 경우일 때 gate가 열린 상태이면 null을 반환한다.
- CLK_SET_PARENT_GATE 플래그 옵션을 사용하는 mux 클럭 코어인 경우 gate를 닫지 않으면 mux에서 부모 클럭 코어의 변경이 실패한다.
코드 라인 64~71에서 mux 타입 클럭 코어에서 부모 클럭 코어가 2개 이상이면 현재 선택된 부모 인덱스 값을 알아온다. 만일 알아올 수 없으면 null을 반환한다.
코드 라인 73~75에서 클럭 코어에 CLK_SET_RATE_PARENT 플래그가 설정되었고 parent의 rate가 변경된 경우 인수로 부모 클럭과 best_parent_rate 값으로 이 함수를 재귀호출하여 상위 클럭으로 올라가서 rate가 변경될 상위 부모 클럭을 알아온다.
- CLK_SET_RATE_PARENT 플래그가 사용되면 사용자 요청에 의해 현재 클럭의 hw가 지원하는 rate가 설정 불가능하면 부모 클럭의 hw가 지원하는 rate를 변경한다.
코드 라인 77~78에서 out: 레이블이다. 이 함수가 재귀 호출된 경우 rate가 변경될 상위 부모 클럭부터 시작하여 연결된 모든 자식 클럭 방향으로 rate를 재계산하게 한다.
코드 라인 80에서 변경된 최상위 클럭 코어를 반환한다.

다음 그림은 요청한 rate에 대해 클럭 hw가 지원하는 가장 근접한 new rate를 산출하는 과정을 보여준다.

다음 그림은 클럭 F에서 rate를 바꾸고자 계산하는 경우 클럭 F->D까지 rate를 산출하고 다시 클럭 D->G까지 재계산하는 과정을 보여준다.

clk_fetch_parent_index()

drivers/clk/clk.c

static int clk_fetch_parent_index(struct clk_core *core,
                                  struct clk_core *parent)
{
        int i;

        if (!parent)
                return -EINVAL;

        for (i = 0; i < core->num_parents; i++) {
                /* Found it first try! */
                if (core->parents[i].core == parent)
                        return i;

                /* Something else is here, so keep looking */
                if (core->parents[i].core)
                        continue;

                /* Maybe core hasn't been cached but the hw is all we know? */
                if (core->parents[i].hw) {
                        if (core->parents[i].hw == parent->hw)
                                break;

                        /* Didn't match, but we're expecting a clk_hw */
                        continue;
                }

                /* Maybe it hasn't been cached (clk_set_parent() path) */
                if (parent == clk_core_get(core, i))
                        break;

                /* Fallback to comparing globally unique names */
                if (core->parents[i].name &&
                    !strcmp(parent->name, core->parents[i].name))
                        break;
        }

        if (i == core->num_parents)
                return -EINVAL;

        core->parents[i].core = parent;
        return i;
}

현재 클럭 코어 @core의 부모 @parent 클럭 코어에 해당하는 인덱스 값을 알아온다. 실패하는 경우 음수 에러가 반환된다.

코드 라인 9~12에서 num_parents 수 만큼 순회하며 @parent와 동일한 경우 해당 인덱스를 반환한다.
코드 라인 15~16에서 다른 값을 가진 경우 skip 한다.
코드 라인 19~25에서 동일한 부모 hw를 찾은 경우 해당 인덱스를 반환하기 위해 루프를 벗어난다.
코드 라인 28~29에서 인덱스에 해당하는 부모 클럭이 @parent와 동일하면 해당 인덱스를 반환하기 위해 루프를 벗어난다.
코드 라인 32~34에서 이름으로 검색하여 동일한 이름을 가진 부모 클럭 코어를 찾은 경우 해당 인덱스를 반환하기 위해 루프를 벗어난다.
코드 라인 37~38에서 검색이 실패한 경우 -EINVAL을 반환한다.
코드 라인 40~41에서 parent[] 맵에 부모 클럭 코어를 연결하고 해당 인덱스를 반환한다.

3 단계 – new rate & parent 적용 통지 체크

Rate 변경에 따른 통지 체크

clk_propagate_rate_change()

drivers/clk/clk.c

/*
 * Notify about rate changes in a subtree. Always walk down the whole tree
 * so that in case of an error we can walk down the whole tree again and
 * abort the change.
 */

static struct clk_core *clk_propagate_rate_change(struct clk_core *core,
                                                  unsigned long event)
{
        struct clk_core *child, *tmp_clk, *fail_clk = NULL;
        int ret = NOTIFY_DONE;

        if (core->rate == core->new_rate)
                return NULL;

        if (core->notifier_count) {
                ret = __clk_notify(core, event, core->rate, core->new_rate);
                if (ret & NOTIFY_STOP_MASK)
                        fail_clk = core;
        }

        hlist_for_each_entry(child, &core->children, child_node) {
                /* Skip children who will be reparented to another clock */
                if (child->new_parent && child->new_parent != core)
                        continue;
                tmp_clk = clk_propagate_rate_change(child, event);
                if (tmp_clk)
                        fail_clk = tmp_clk;
        }

        /* handle the new child who might not be in core->children yet */
        if (core->new_child) {
                tmp_clk = clk_propagate_rate_change(core->new_child, event);
                if (tmp_clk)
                        fail_clk = tmp_clk;
        }

        return fail_clk;
}

요청 클럭 코어부터 연결된 모든 하위 트리의 클럭 코어에 rate 변화를 통지한다. 성공 시 null을 반환하고, 실패하는 경우 실패한 클럭 코어를 반환한다.

코드 라인 7~8에서 rate의 변화가 없는 경우 성공 값 null을 반환한다.
코드 라인 10~14에서 통지 대상으로 등록된 클럭 코어에 대해 rate 변화 요청을 통지한다. 만일 결과가 NOTIFY_BAD 또는 NOTIFY_STOP을 갖는 경우 반환 값으로 사용할 fail_clk에 이 클럭 코어를 대입한다.
코드 라인 16~19에서 자식 클럭 코어 수 만큼 루프를 돌며 요청 클럭이 이 자식 클럭 코어의 새로운 부모 클럭 코어가 아닌 경우 skip 한다.
코드 라인 20~22에서 자식 클럭 코어들에 rate 변화 요청을 통지한다. 만일 실패한 경우 반환 값으로 사용할 fail_clk에 에러를 반환한 클럭 코어를 담는다.
코드 라인 26~30에서 요청 클럭 코어에 새로 연결될(곧 children에 들어갈) 클럭 코어가 있는 경우 그 new_child 클럭에 대해서도 rate 변화 요청을 통지한다. 만일 실패한 경우 반환 값으로 사용할 fail_clk에 에러를 반환한 클럭 코어를 담는다.

다음 그림은 new rate로 변경하기 전에 통지 체크할 클럭에 확인하는 과정을 보여준다.

__clk_notify()

drivers/clk/clk.c

/**
 * __clk_notify - call clk notifier chain
 * @core: clk that is changing rate
 * @msg: clk notifier type (see include/linux/clk.h)
 * @old_rate: old clk rate
 * @new_rate: new clk rate
 *
 * Triggers a notifier call chain on the clk rate-change notification
 * for 'clk'.  Passes a pointer to the struct clk and the previous
 * and current rates to the notifier callback.  Intended to be called by
 * internal clock code only.  Returns NOTIFY_DONE from the last driver
 * called if all went well, or NOTIFY_STOP or NOTIFY_BAD immediately if
 * a driver returns that.
 */

static int __clk_notify(struct clk_core *core, unsigned long msg,
                unsigned long old_rate, unsigned long new_rate)
{
        struct clk_notifier *cn;
        struct clk_notifier_data cnd;
        int ret = NOTIFY_DONE;

        cnd.old_rate = old_rate;
        cnd.new_rate = new_rate;

        list_for_each_entry(cn, &clk_notifier_list, node) {
                if (cn->clk->core == core) {
                        cnd.clk = cn->clk;
                        ret = srcu_notifier_call_chain(&cn->notifier_head, msg,
                                        &cnd);
                        if (ret & NOTIFY_STOP_MASK)
                                return ret;
                }
        }

        return ret;
}

클럭 통지 리스트에 등록된 현재 클럭의 notifier 체인에 연결된 항목에 대해 srcu를 사용하여 모두 통지한다. 성공한 경우 NOTIFY_DONE을 반환한다.

클럭 Rate 변경 통지(notify) 등록 API

clk_notifier_register()

drivers/clk/clk.c

/***        clk rate change notifiers        ***/

/**
 * clk_notifier_register - add a clk rate change notifier
 * @clk: struct clk * to watch
 * @nb: struct notifier_block * with callback info
 *
 * Request notification when clk's rate changes.  This uses an SRCU
 * notifier because we want it to block and notifier unregistrations are
 * uncommon.  The callbacks associated with the notifier must not
 * re-enter into the clk framework by calling any top-level clk APIs;
 * this will cause a nested prepare_lock mutex.
 *
 * In all notification cases cases (pre, post and abort rate change) the
 * original clock rate is passed to the callback via struct
 * clk_notifier_data.old_rate and the new frequency is passed via struct
 * clk_notifier_data.new_rate.
 *
 * clk_notifier_register() must be called from non-atomic context.
 * Returns -EINVAL if called with null arguments, -ENOMEM upon
 * allocation failure; otherwise, passes along the return value of
 * srcu_notifier_chain_register().
 */

int clk_notifier_register(struct clk *clk, struct notifier_block *nb)
{
        struct clk_notifier *cn;
        int ret = -ENOMEM;

        if (!clk || !nb)
                return -EINVAL;

        clk_prepare_lock();

        /* search the list of notifiers for this clk */
        list_for_each_entry(cn, &clk_notifier_list, node)
                if (cn->clk == clk)
                        break;

        /* if clk wasn't in the notifier list, allocate new clk_notifier */
        if (cn->clk != clk) {
                cn = kzalloc(sizeof(*cn), GFP_KERNEL);
                if (!cn)
                        goto out;

                cn->clk = clk;
                srcu_init_notifier_head(&cn->notifier_head);

                list_add(&cn->node, &clk_notifier_list);
        }

        ret = srcu_notifier_chain_register(&cn->notifier_head, nb);

        clk->core->notifier_count++;

out:
        clk_prepare_unlock();

        return ret;
}
EXPORT_SYMBOL_GPL(clk_notifier_register);

클럭의 notify chain에 nofitier_block을 등록한다.

코드 라인 12~14에서 clk_notifier_list에 요청한 클럭이 있는지 검색한다.
코드 라인 17~26에서 검색되지 않는 경우 clk_notifier 구조체를 할당하고 클럭 정보를 대입한 후 clk_notifier_list에 등록한다.
코드 라인 28에서 clk_notifier 구조체의 멤버 notifier_head에 요청한 notifier_block을 추가한다.
코드 라인 30에서 클럭의 notifier_count 값을 1 증가시킨다.

4 단계 – 산출된 new rate & parent 적용

clk_change_rate()

drivers/clk/clk.c -1/2-

/*
 * walk down a subtree and set the new rates notifying the rate
 * change on the way
 */

static void clk_change_rate(struct clk_core *core)
{
        struct clk_core *child;
        struct hlist_node *tmp;
        unsigned long old_rate;
        unsigned long best_parent_rate = 0;
        bool skip_set_rate = false;
        struct clk_core *old_parent;
        struct clk_core *parent = NULL;

        old_rate = core->rate;

        if (core->new_parent) {
                parent = core->new_parent;
                best_parent_rate = core->new_parent->rate;
        } else if (core->parent) {
                parent = core->parent;
                best_parent_rate = core->parent->rate;
        }

        if (clk_pm_runtime_get(core))
                return;

        if (core->flags & CLK_SET_RATE_UNGATE) {
                unsigned long flags;

                clk_core_prepare(core);
                flags = clk_enable_lock();
                clk_core_enable(core);
                clk_enable_unlock(flags);
        }

        if (core->new_parent && core->new_parent != core->parent) {
                old_parent = __clk_set_parent_before(core, core->new_parent);
                trace_clk_set_parent(core, core->new_parent);

                if (core->ops->set_rate_and_parent) {
                        skip_set_rate = true;
                        core->ops->set_rate_and_parent(core->hw, core->new_rate,
                                        best_parent_rate,
                                        core->new_parent_index);
                } else if (core->ops->set_parent) {
                        core->ops->set_parent(core->hw, core->new_parent_index);
                }

                trace_clk_set_parent_complete(core, core->new_parent);
                __clk_set_parent_after(core, core->new_parent, old_parent);
        }

        if (core->flags & CLK_OPS_PARENT_ENABLE)
                clk_core_prepare_enable(parent);

        trace_clk_set_rate(core, core->new_rate);

        if (!skip_set_rate && core->ops->set_rate)
                core->ops->set_rate(core->hw, core->new_rate, best_parent_rate);

        trace_clk_set_rate_complete(core, core->new_rate);

요청한 클럭 코어부터 마지막 자식 클럭 코어까지 산출된 new rate 및 new 부모 클럭을 적용하고 통지한다.

코드 라인 11~19에서 현재 클럭 코어의 rate를 백업해두고, 변경될 부모 클럭 정보를 다음 변수에 지정한다.
- best_parent & best_parent_rate
코드 라인 21~22에서 절전 기능이 있는 클럭이 슬립된 상태이면 깨운다.
코드 라인 24~31에서 gate가 열린 상태에서만 rate를 바꿀 수 있는 클럭 hw를 위해 임시로 잠시 이 클럭을 prepare & enable 한다.
코드 라인 33~48에서 새 부모 클럭으로 변경된 경우 다음과 같이 처리한다.
- new 부모 클럭(입력 소스 변경)을 선택(변경)하기 전에 처리할 일을 수행한다.
- mux 타입과 rate 변경이 동시에 가능한(pll) 타입의 클럭 디바이스 드라이버에 구현된 ops->set_rate_and_parent 후크 함수를 호출하여 실제 hw 기능으로 new 부모 클럭(입력 클럭 소스) 및 new rate를 변경한다.
- mux 타입만을 지원하는 클럭 디바이스 드라이버에 구현된 ops->set_parent 후크 함수를 호출하여 실제 hw 기능으로 new 부모 클럭(입력 클럭 소스)을 선택(변경)한다.
- 코드 라인 27에서 new 부모 클럭(입력 소스 변경)을 선택한 후에 처리할 일을 수행한다
코드 라인 50~51에서 부모 클럭이 enable된 상태에서만 operation을 수행할 수 있는 클럭 hw를 지원하기 위해 부모 클럭이 닫혀있으면 임시로 잠시 부모 클럭을 prepare & enable 한다.
코드 라인 55~56에서 바로 위에서 rate 설정한 경우가 아닌 경우로 한정한다. 클럭 디바이스 드라이버의 ops->set_rate 후크 함수를 호출하여 클럭 hw의 rate를 설정한다.

drivers/clk/clk.c -2/2-

        core->rate = clk_recalc(core, best_parent_rate);

        if (core->flags & CLK_SET_RATE_UNGATE) {
                unsigned long flags;

                flags = clk_enable_lock();
                clk_core_disable(core);
                clk_enable_unlock(flags);
                clk_core_unprepare(core);
        }

        if (core->flags & CLK_OPS_PARENT_ENABLE)
                clk_core_disable_unprepare(parent);

        if (core->notifier_count && old_rate != core->rate)
                __clk_notify(core, POST_RATE_CHANGE, old_rate, core->rate);

        if (core->flags & CLK_RECALC_NEW_RATES)
                (void)clk_calc_new_rates(core, core->new_rate);

        /*
         * Use safe iteration, as change_rate can actually swap parents
         * for certain clock types.
         */
        hlist_for_each_entry_safe(child, tmp, &core->children, child_node) {
                /* Skip children who will be reparented to another clock */
                if (child->new_parent && child->new_parent != core)
                        continue;
                clk_change_rate(child);
        }

        /* handle the new child who might not be in core->children yet */
        if (core->new_child)
                clk_change_rate(core->new_child);

        clk_pm_runtime_put(core);
}

코드 라인 1에서 클럭 hw의 (*recalc_rate) ops를 호출하여 현재 클럭 코어에 재계산한 rate를 지정한다.
코드 라인 3~10에서 임시로 잠시 현재 클럭 코어를 prepare & enable 한 경우 다시 disable & unprepare 한다.
코드 라인 12~13에서 임시로 잠시 부모 클럭 코어를 prepare & enable 한 경우 disable & unprepare 한다.
코드 라인 15~16에서 통지 대상 클럭에 대해 rate가 변경된 경우 POST_RATE_CHANGE를 보내 commit 통지한다.
코드 라인 18~19에서 CLK_RECALC_NEW_RATES 플래그가 설정된 클럭은 rate가 변경된 경우 현재 클럭 코어로부터 변경이 필요한 상위 클럭까지 new rate를 다시 산출하게 한다.
- exynos cpu의 경우 재산출을 통해 divider가 잘못 설정되는 일을 막아야 한다.
코드 라인 25~30에서 하위 클럭 코어들의 부모 클럭이 변경된 경우 이 클럭 코어를 포함하고 그 하위 클럭 코어들에 대해 rate를 다시 산출한다. (마지막 child 클럭까지 재귀 호출된다)
코드 라인 33~34에서 새로운 하위 클럭 코어가 추가된 경우 이 클럭 코어를 포함하고 그 하위 클럭 코어들에 대해 rate를 다시 산출한다. (마지막 child 클럭까지 재귀 호출된다)
코드 라인 36에서 절전 기능이 있어 잠시 꺼둔 상태인 경우 필요 시 다시 슬립시킨다.

다음 그림은 최종 산출된 new rate를 결정 또는 취소할 때 호출되는 과정을 보여준다.

clk_calc_subtree()

drivers/clk/clk.c

static void clk_calc_subtree(struct clk_core *core, unsigned long new_rate,
                             struct clk_core *new_parent, u8 p_index)
{
        struct clk_core *child;

        core->new_rate = new_rate;
        core->new_parent = new_parent;
        core->new_parent_index = p_index;
        /* include clk in new parent's PRE_RATE_CHANGE notifications */
        core->new_child = NULL;
        if (new_parent && new_parent != core->parent)
                new_parent->new_child = core;

        hlist_for_each_entry(child, &core->children, child_node) {
                child->new_rate = clk_recalc(child, new_rate);
                clk_calc_subtree(child, child->new_rate, NULL, 0);
        }
}

현재 클럭 코어 및 모든 연결된 하위 클럭 코어들에 대해 새 rate, 새 부모, 새 부모 등을 갱신하게 한다.

코드 라인 6~8에서 현재 클럭 코어의 new_rate, new_parent, new_parent_index 값을 갱신한다.
코드 라인 10~12에서 부모가 변경된 경우 new_child에 현재 클럭 코어를 대입한다. 그렇지 않은 경우 null을 대입한다.
코드 라인 14~15에서 자식 클럭들 수 만큼 루프를 돌며 new_rate로 재계산하도록 한다.
코드 라인 16에서 자식 노드에 대해 이 함수를 재귀 호출하여 계산하게 한다.

1개 클럭 rate 재산출(recalc)

clk_recalc()

drivers/clk/clk.c

static unsigned long clk_recalc(struct clk_core *core,
                                unsigned long parent_rate)
{
        unsigned long rate = parent_rate;

        if (core->ops->recalc_rate && !clk_pm_runtime_get(core)) {
                rate = core->ops->recalc_rate(core->hw, parent_rate);
                clk_pm_runtime_put(core);
        }
        return rate;
}

@parent_rate 값을 사용하여 클럭 hw의 (*recalc_rate) ops를 호출하여 현재 클럭 코어의 재계산한 rate를 반환한다.

Rate 조회

clk_get_rate()

drivers/clk/clk.c

/**
 * clk_get_rate - return the rate of clk
 * @clk: the clk whose rate is being returned
 *
 * Simply returns the cached rate of the clk, unless CLK_GET_RATE_NOCACHE flag
 * is set, which means a recalc_rate will be issued.
 * If clk is NULL then returns 0.
 */

unsigned long clk_get_rate(struct clk *clk)
{
        if (!clk)
                return 0;

        return clk_core_get_rate(clk->core);
}
EXPORT_SYMBOL_GPL(clk_get_rate);

클럭의 rate 값을 반환한다.

clk_core_get_rate()

drivers/clk/clk.c

static unsigned long clk_core_get_rate(struct clk_core *core)
{
        unsigned long rate;

        clk_prepare_lock();

        if (core && (core->flags & CLK_GET_RATE_NOCACHE))
                __clk_recalc_rates(core, 0);

        rate = clk_core_get_rate_nolock(core);
        clk_prepare_unlock();

        return rate;
}

클럭 코어의 rate 값을 반환한다.

CLK_GET_RATE_NOCACHE 플래그를 사용한 클럭 코어는 캐시된 rate 값이 아니라 재산출한 rate 값을 반환한다.

클럭 rate 재산출(recalc) – 클럭 조회 및 부모 클럭 변경 시

__clk_recalc_rates()

drivers/clk/clk.c

/**
 * __clk_recalc_rates
 * @clk: first clk in the subtree
 * @msg: notification type (see include/linux/clk.h)
 *
 * Walks the subtree of clks starting with clk and recalculates rates as it
 * goes.  Note that if a clk does not implement the .recalc_rate callback then
 * it is assumed that the clock will take on the rate of its parent.
 *
 * clk_recalc_rates also propagates the POST_RATE_CHANGE notification,
 * if necessary.
 */
static void __clk_recalc_rates(struct clk_core *core, unsigned long msg)
{
        unsigned long old_rate;
        unsigned long parent_rate = 0;
        struct clk_core *child;

        old_rate = core->rate;

        if (core->parent)
                parent_rate = core->parent->rate;

        core->rate = clk_recalc(core, parent_rate);

        /*
         * ignore NOTIFY_STOP and NOTIFY_BAD return values for POST_RATE_CHANGE
         * & ABORT_RATE_CHANGE notifiers
         */
        if (core->notifier_count && msg)
                __clk_notify(core, msg, old_rate, core->rate);

        hlist_for_each_entry(child, &core->children, child_node)
                __clk_recalc_rates(child, msg);
}

요청한 클럭 코어 및 모든 연결된 하위 클럭 코어들에 대해 rate를 재산출하여 갱신한다. 그리고 부모 클럭이 통지가 필요한 클럭 코어들에 @msg를 전달한다.

코드 라인 7~10에서 현재 클럭 코어의 rate를 old_rate에 백업하고, 부모 클럭 코어의 rate도 parent_rate에 대입한다.
코드 라인 12에서 부모 클럭 rate로 현재 클럭 코어의 rate를 재산출하여 반영한다.
코드 라인 18~19에서 통지 대상 클럭 코어에 @msg를 통지하고, 결과 값은 무시한다.
- rate 조회하는 clk_core_get_rate() 함수에서 이 함수를 호출한 경우 @msg에는 0이 전달되므로 통지하지 않는다.
코드 라인 21~22에서 하위 클럭 코어들에 대해 이 함수를 재귀 호출하게 한다.

clk_core_get_rate_nolock()

drivers/clk/clk.c

static unsigned long clk_core_get_rate_nolock(struct clk_core *clk)
{
        if (!core)
                return 0;

        if (!core->num_parents || core->parent)
                return core->rate;

        /*
         * Clk must have a parent because num_parents > 0 but the parent isn't
         * known yet. Best to return 0 as the rate of this clk until we can
         * properly recalc the rate based on the parent's rate.
         */
        return 0;
}

클럭 코어의 rate 값을 반환한다.

루트 클럭 코어의 경우 num_parents 값은 0이다.

부모 클럭 선택

부모 클럭을 변경한다는 것은 입력 클럭 소스가 바뀐다는 의미이고 gate된 상태가 아닌 상태에서 실시간으로 변경하는 경우 glitch가 발생됨을 유의해야 한다. glitch를 방지하려면 클럭 gate를 닫고 변경한 후 클럭 gate를 열어야야 한다. 클럭 코어에 CLK_SET_PARENT_GATE 플래그를 사용하면 gate된 상태에서 부모 클럭을 변경할 수 없게 할 수 있다.

다음 그림은 clk_set_parent() 함수 이후의 호출 관계를 보여준다.

clk_set_parent()

drivers/clk/clk.c

/**
 * clk_set_parent - switch the parent of a mux clk
 * @clk: the mux clk whose input we are switching
 * @parent: the new input to clk
 *
 * Re-parent clk to use parent as its new input source.  If clk is in
 * prepared state, the clk will get enabled for the duration of this call. If
 * that's not acceptable for a specific clk (Eg: the consumer can't handle
 * that, the reparenting is glitchy in hardware, etc), use the
 * CLK_SET_PARENT_GATE flag to allow reparenting only when clk is unprepared.
 *
 * After successfully changing clk's parent clk_set_parent will update the
 * clk topology, sysfs topology and propagate rate recalculation via
 * __clk_recalc_rates.
 *
 * Returns 0 on success, -EERROR otherwise.
 */

int clk_set_parent(struct clk *clk, struct clk *parent)
{
        int ret;

        if (!clk)
                return 0;

        clk_prepare_lock();

        if (clk->exclusive_count)
                clk_core_rate_unprotect(clk->core);

        ret = clk_core_set_parent_nolock(clk->core,
                                         parent ? parent->core : NULL);

        if (clk->exclusive_count)
                clk_core_rate_protect(clk->core);

        clk_prepare_unlock();

        return ret;
}
EXPORT_SYMBOL_GPL(clk_set_parent);

부모 클럭 코어를 선택한다. 성공 시 클럭 topology가 변경되며 rate 재산출이 일어난다. 성공 시 0을 반환한다.

코드 라인 10~11에서 클럭 코어를 독점(exclusive)하여 관리하는 경우 parent 설정 전에 unprotect를 한다.
코드 라인 13~14에서 부모 클럭 코어를 선택한다. (입력 클럭 소스 선택)
코드 라인 16~17에서 클럭 코어를 독점(exclusive)하여 관리하는 경우 parent 설정이 완료되었으므로 다시 protect를 한다.

clk_core_set_parent_nolock()

drivers/clk/clk.c

static int clk_core_set_parent_nolock(struct clk_core *core,
                                      struct clk_core *parent)
{
        int ret = 0;
        int p_index = 0;
        unsigned long p_rate = 0;

        lockdep_assert_held(&prepare_lock);

        if (!core)
                return 0;

        if (core->parent == parent)
                return 0;

        /* verify ops for multi-parent clks */
        if (core->num_parents > 1 && !core->ops->set_parent)
                return -EPERM;

        /* check that we are allowed to re-parent if the clock is in use */
        if ((core->flags & CLK_SET_PARENT_GATE) && core->prepare_count)
                return -EBUSY;

        if (clk_core_rate_is_protected(core))
                return -EBUSY;

        /* try finding the new parent index */
        if (parent) {
                p_index = clk_fetch_parent_index(core, parent);
                if (p_index < 0) {
                        pr_debug("%s: clk %s can not be parent of clk %s\n",
                                        __func__, parent->name, core->name);
                        return p_index;
                }
                p_rate = parent->rate;
        }

        ret = clk_pm_runtime_get(core);
        if (ret)
                return ret;

        /* propagate PRE_RATE_CHANGE notifications */
        ret = __clk_speculate_rates(core, p_rate);

        /* abort if a driver objects */
        if (ret & NOTIFY_STOP_MASK)
                goto runtime_put;

        /* do the re-parent */
        ret = __clk_set_parent(core, parent, p_index);

        /* propagate rate an accuracy recalculation accordingly */
        if (ret) {
                __clk_recalc_rates(core, ABORT_RATE_CHANGE);
        } else {
                __clk_recalc_rates(core, POST_RATE_CHANGE);
                __clk_recalc_accuracies(core);
        }

runtime_put:
        clk_pm_runtime_put(core);

        return ret;
}

부모 클럭(입력 클럭 소스) 코어를 선택한다. 성공 시 연결된 모든 자식 클럭들의 rate를 재계산하고 0을 반환한다.

코드 라인 10~11 이 함수는 재귀호출에서 사용되므로 클럭 코어가 지정되지 않으면 함수를 빠져나간다.
코드 라인 13~14에서 요청한 부모 클럭(입력 클럭 소스)이 이미 지정되어 있었던 경우 변경할 필요가 없으므로 성공(0) 결과로 함수를 빠져나간다.
코드 라인 17~18에서 2개 이상의 부모 클럭(입력 클럭 소스)을 가진 mux 타입 클럭 디바이스 드라이버의 ops->set_parent 후크가 구현되어 있지 않은 경우 -ENOSYS 에러를 반환한다.
코드 라인 21~22에서 CLK_SET_PARENT_GATE 플래그를 사용한 경우 prepare 상태(클럭이 출력되는)의 클럭 코어는 glitch를 방지하기 위해 부모 클럭의 선택을 허락하지 않는다. 따라서 -EBUSY 에러를 반환한다.
코드라인 24~25에서 protect 걸린 클럭 코어의 경우 -EBUSY 에러를 반환한다.
코드 라인 28~36에서 부모 클럭(입력 클럭 소스)의 인덱스와 rate 값을 알아온다.
코드 라인 38~40에서 절전 기능이 있는 클럭이 슬립된 상태이면 깨운다.
코드 라인 47에서 현재 클럭 이하 연결된 모든 자식 클럭에 대해 PRE_RATE_CHANGE를 통지한다. 만일 결과가 NOTIFY_STOP 또는 NOTIFY_BAD인 경우 함수를 빠져나간다.
코드 라인 50~58에서 부모 클럭(입력 클럭 소스)을 선택한다. 만일 에러가 발생한 경우 ABORT_RATE_CHANGE를 하위 노드에 전파한다. 성공한 경우에는 POST_RATE_CHANGE를 하위 노드에 전파한다. 전파 중에는 rate가 재계산된다. 성공 시 accuracy도 재산출한다.
코드 라인 60~61에서 runtime_put: 레이블이다. 절전 기능이 있는 클럭의 경우 슬립이 필요하면 슬립시킨다.

__clk_set_parent()

drivers/clk/clk.c

static int __clk_set_parent(struct clk_core *core, struct clk_core *parent,
                            u8 p_index)
{
        unsigned long flags;
        int ret = 0;
        struct clk_core *old_parent;

        old_parent = __clk_set_parent_before(core, parent);

        trace_clk_set_parent(core, parent);

        /* change clock input source */
        if (parent && core->ops->set_parent)
                ret = core->ops->set_parent(core->hw, p_index);

        trace_clk_set_parent_complete(core, parent);

        if (ret) {
                flags = clk_enable_lock();
                clk_reparent(core, old_parent);
                clk_enable_unlock(flags);
                __clk_set_parent_after(core, old_parent, parent);

                return ret;
        }

        __clk_set_parent_after(core, parent, old_parent);

        return 0;
}

부모 클럭(입력 클럭 소스)을 선택한다. 성공한 경우 0을 반환한다.

코드 라인 8에서 입력 클럭 소스(부모 클럭)를 선택하기 전에 처리할 일을 수행한다.
코드 라인 13~25에서 mux 타입 클럭의 디바이스 드라이버에 구현된 ops->set_parent 후크 함수를 호출하여 실제 hw 기능으로 부모 클럭(입력 클럭 소스)을 선택하게 한다. 에러가 발생한 경우 기존 부모 클럭(입력 클럭 소스)로 재 변경한다.
코드 라인 27에서 입력 클럭 소스(부모 클럭)을 선택한 후에 처리할 일을 수행한다.

__clk_set_parent_before()

drivers/clk/clk.c

static struct clk_core *__clk_set_parent_before(struct clk_core *core,
                                           struct clk_core *parent)
{
        unsigned long flags;
        struct clk_core *old_parent = core->parent;

        /*
         * 1. enable parents for CLK_OPS_PARENT_ENABLE clock
         *
         * 2. Migrate prepare state between parents and prevent race with
         * clk_enable().
         *
         * If the clock is not prepared, then a race with
         * clk_enable/disable() is impossible since we already have the
         * prepare lock (future calls to clk_enable() need to be preceded by
         * a clk_prepare()).
         *
         * If the clock is prepared, migrate the prepared state to the new
         * parent and also protect against a race with clk_enable() by
         * forcing the clock and the new parent on.  This ensures that all
         * future calls to clk_enable() are practically NOPs with respect to
         * hardware and software states.
         *
         * See also: Comment for clk_set_parent() below.
         */

        /* enable old_parent & parent if CLK_OPS_PARENT_ENABLE is set */
        if (core->flags & CLK_OPS_PARENT_ENABLE) {
                clk_core_prepare_enable(old_parent);
                clk_core_prepare_enable(parent);
        }

        /* migrate prepare count if > 0 */
        if (core->prepare_count) {
                clk_core_prepare_enable(parent);
                clk_core_enable_lock(core);
        }

        /* update the clk tree topology */
        flags = clk_enable_lock();
        clk_reparent(core, parent);
        clk_enable_unlock(flags);

        return old_parent;
}

입력 클럭 소스(부모 클럭)를 선택하기 전에 처리할 일을 수행한다. 결과로 기존 부모 클럭을 반환한다.

코드 라인 28~31에서 현재 클럭 코어에서 CLK_OPS_PARENT_ENABLE 플래그를 사용한 경우 기존 부모 클럭과 새 부모 클럭을 prepare하고 enable한다.
코드 라인 34~37에서 현재 클럭 코어가 prepare 상태면 부모 클럭 코어도 prepare 및 enable 한다.
코드 라인 40~44에서 클럭 topology를 갱신하고 기존 부모 클럭 코어를 반환한다.

__clk_set_parent_after()

drivers/clk/clk.c

static void __clk_set_parent_after(struct clk_core *core,
                                   struct clk_core *parent,
                                   struct clk_core *old_parent)
{
        /*
         * Finish the migration of prepare state and undo the changes done
         * for preventing a race with clk_enable().
         */
        if (core->prepare_count) {
                clk_core_disable_lock(core);
                clk_core_disable_unprepare(old_parent);
        }

        /* re-balance ref counting if CLK_OPS_PARENT_ENABLE is set */
        if (core->flags & CLK_OPS_PARENT_ENABLE) {
                clk_core_disable_unprepare(parent);
                clk_core_disable_unprepare(old_parent);
        }
}

입력 클럭 소스(부모 클럭)를 선택한 후에 처리할 일을 수행한다.

코드 라인 9~12에서 클럭 코어가 prepare 상태인 경우 disable하고, 부모 클럭을 unprepare 상태로 변경한다.
코드 라인 15~18에서 현재 클럭 코어에서 CLK_OPS_PARENT_ENABLE 플래그를 사용한 경우 기존 부모 클럭과 새 부모 클럭을 disable 하고 unprepare 한다.

clk_reparent()

drivers/clk/clk.c

static void clk_reparent(struct clk_core *core, struct clk_core *new_parent)
{
        bool was_orphan = core->orphan;

        hlist_del(&core->child_node);

        if (new_parent) {
                bool becomes_orphan = new_parent->orphan;

                /* avoid duplicate POST_RATE_CHANGE notifications */
                if (new_parent->new_child == core)
                        new_parent->new_child = NULL;

                hlist_add_head(&core->child_node, &new_parent->children);

                if (was_orphan != becomes_orphan)
                        clk_core_update_orphan_status(core, becomes_orphan);
        } else {
                hlist_add_head(&core->child_node, &clk_orphan_list);
                if (!was_orphan)
                        clk_core_update_orphan_status(core, true);
        }

        core->parent = new_parent;
}

clock tree 토플로지를 갱신한다.

코드 라인 5에서 부모 클럭 코어(입력 클럭 소스)의 child_node에서 현재 클럭 코어를 제거한다.
코드 라인 7~17에서 새 부모 클럭(입력 클럭 소스)이 지정된 경우 그 부모 클럭의 children으로 추가한다. 만일 새 부모 클럭의 new_child에 현재 클럭이 지정되어 있었던 경우라면 new_child에 null을 대입한다.
코드 라인 18~22에서 새 부모 클럭이 지정되지 않은 경우 고아 리스트에 현재 클럭을 추가한다.
코드 라인 24에서 현재 클럭 코어의 부모를 갱신한다.

Mux 클럭 rate 관련 ops

(*determine_rate) 후크 함수

mux 타입 클럭의 (*determine_rate) 후크 함수에서 사용되는 아래 함수를 알아본다.

clk_mux_determine_rate()

drivers/etc/clk-mux.c

static int clk_mux_determine_rate(struct clk_hw *hw,
                                  struct clk_rate_request *req)
{
        struct clk_mux *mux = to_clk_mux(hw);

        return clk_mux_determine_rate_flags(hw, req, mux->flags);
}

요청한 rate에 대해 mux 클럭 hw가 지원하는 가장 근접한 rate를 산출한다. 성공 시 0을 반환하고, req->rate 및 req->best_parent_rate에 산출된 rate가 저장되고, req->best_parent_rate에 산출된 rate와 관련된 부모 클럭 hw가 저장된다.

clk_mux_determine_rate_flags()

drivers/etc/clk-mux.c

int clk_mux_determine_rate_flags(struct clk_hw *hw,
                                 struct clk_rate_request *req,
                                 unsigned long flags)
{
        struct clk_core *core = hw->core, *parent, *best_parent = NULL;
        int i, num_parents, ret;
        unsigned long best = 0;
        struct clk_rate_request parent_req = *req;

        /* if NO_REPARENT flag set, pass through to current parent */
        if (core->flags & CLK_SET_RATE_NO_REPARENT) {
                parent = core->parent;
                if (core->flags & CLK_SET_RATE_PARENT) {
                        ret = __clk_determine_rate(parent ? parent->hw : NULL,
                                                   &parent_req);
                        if (ret)
                                return ret;

                        best = parent_req.rate;
                } else if (parent) {
                        best = clk_core_get_rate_nolock(parent);
                } else {
                        best = clk_core_get_rate_nolock(core);
                }

                goto out;
        }

        /* find the parent that can provide the fastest rate <= rate */
        num_parents = core->num_parents;
        for (i = 0; i < num_parents; i++) {
                parent = clk_core_get_parent_by_index(core, i);
                if (!parent)
                        continue;

                if (core->flags & CLK_SET_RATE_PARENT) {
                        parent_req = *req;
                        ret = __clk_determine_rate(parent->hw, &parent_req);
                        if (ret)
                                continue;
                } else {
                        parent_req.rate = clk_core_get_rate_nolock(parent);
                }

                if (mux_is_better_rate(req->rate, parent_req.rate,
                                       best, flags)) {
                        best_parent = parent;
                        best = parent_req.rate;
                }
        }

        if (!best_parent)
                return -EINVAL;

out:
        if (best_parent)
                req->best_parent_hw = best_parent->hw;
        req->best_parent_rate = best;
        req->rate = best;

        return 0;
}
EXPORT_SYMBOL_GPL(clk_mux_determine_rate_flags);

코드 라인 8에서 기존 요청들을 parent_req에 백업해둔다.
코드 라인 11~27에서 rate 변경 시 부모 클럭(입력 소스)이 변경되지 않는 클럭 코어인 경우 다음 값으로 pass through 처리한다.
- 부모 클럭 먼저 rate 변경하게 요청한 경우 부모 클럭에서 먼저 가장 근접한 rate
- 부모 클럭의 rate
- 부모 클럭이 없으면 현재 클럭 코어의 rate
코드 라인 30~50에서 부모 클럭 수만큼 순회하며 요청 rate에 대해 mux 클럭 hw가 지원하는 가장 근접한 rate를 산출하기 위해 다음과 같이 처리하고, 이들 중 가장 적합한 rate를 선택한다.
- 부모 클럭 중에 더 상위 부모 클럭 먼저 rate 변경하게 요청한 경우 부모 클럭에서 먼저 가장 근접한 rate
- 부모 클럭의 rate
코드 라인 52~53에서 어떠한 부모 클럭도 요청을 만족하지 못한 경우 -EINVAL 에러를 반환한다.
코드 라인 55~61에서 out: 레이블은 가장 근접한 rate를 찾은 경우이다. 성공 값 0을 반환하고, req->rate 및 req->best_parent_rate에 산출된 rate가 저장하고, req->best_parent_rate에 산출된 rate와 관련된 부모 클럭 hw를 저장한다.

mux 타입 rate 설정 API

__clk_mux_determine_rate()

drivers/clk/clk.c

/*
 * __clk_mux_determine_rate - clk_ops::determine_rate implementation for a mux type clk
 * @hw: mux type clk to determine rate on
 * @req: rate request, also used to return preferred parent and frequencies
 *
 * Helper for finding best parent to provide a given frequency. This can be used
 * directly as a determine_rate callback (e.g. for a mux), or from a more
 * complex clock that may combine a mux with other operations.
 *
 * Returns: 0 on success, -EERROR value on error
 */

int __clk_mux_determine_rate(struct clk_hw *hw,
                             struct clk_rate_request *req)
{
        return clk_mux_determine_rate_flags(hw, req, 0);
}
EXPORT_SYMBOL_GPL(__clk_mux_determine_rate);

mux 타입 클럭 디바이스 드라이버의 ops->determine_rate에서 호출되는 콜백함수로도 사용된다. 요청한 rate 이하 값으로 가장 가까운 rate를 구한다. req->best_parent_rate에 산출한 최적의 rate가 담기고 req->best_parent_hw는 산출된 최적의 부모 클럭 hw를 가리킨다. 단 req->min_rate ~ req->max_rate 범위를 초과하는 경우 0을 반환한다. 디폴트 플래그로 0을 사용한다.

__clk_mux_determine_rate_closest()

drivers/clk/clk.c

int __clk_mux_determine_rate_closest(struct clk_hw *hw,
                                     struct clk_rate_request *req)
{
        return clk_mux_determine_rate_flags(hw, req, CLK_MUX_ROUND_CLOSEST);
}
EXPORT_SYMBOL_GPL(__clk_mux_determine_rate_closest);

요청한 rate에 가장 가까운 rate를 구한다. req->best_parent_rate에 산출한 최적의 rate가 담기고 req->best_parent_hw는 산출된 최적의 부모 클럭hw를 가리킨다. 단 req->min_rate ~ req->max_rate 범위를 초과하는 경우 0을 반환한다. 디폴트 플래그로 CLK_MUX_ROUND_CLOSEST을 사용한다.

mux_is_better_rate()

drivers/clk/clk.c

static bool mux_is_better_rate(unsigned long rate, unsigned long now,
                           unsigned long best, unsigned long flags)
{
        if (flags & CLK_MUX_ROUND_CLOSEST)
                return abs(now - rate) < abs(best - rate);

        return now <= rate && now > best;
}

mux가 설정하고자 하는 rate 값에 now 값이 best 값보다 더 적절한 경우 true(1)를 반환한다. CLK_MUX_ROUND_CLOSEST 플래그의 사용 여부에 따라 적절한 값의 여부 판단이 바뀐다.

사용하지 않는 경우 rate 이하 범위에서 now 값이 best 값 보다 더 가까운 경우 true(1)
사용하는 경우 요청 rate에 now 값이 best 보다 더 가까운 경우 true(1)

Fixed Factor 클럭 rate 관련 ops

(*round_rate) 후크 함수

fixed factor 타입 클럭의 (*round_rate) 후크 함수에서 사용되는 아래 함수를 알아본다.

clk_factor_round_rate()

drivers/etc/clk-fixed-factor.c

static long clk_factor_round_rate(struct clk_hw *hw, unsigned long rate,
                                unsigned long *prate)
{
        struct clk_fixed_factor *fix = to_clk_fixed_factor(hw);

        if (clk_hw_get_flags(hw) & CLK_SET_RATE_PARENT) {
                unsigned long best_parent;

                best_parent = (rate / fix->mult) * fix->div;
                *prate = clk_hw_round_rate(clk_hw_get_parent(hw), best_parent);
        }

        return (*prate / fix->div) * fix->mult;
}

요청한 @rate에 대해 fixed factor 타입 클럭 hw가 지원하는 가장 근접한 rate를 산출한다. 성공 시 rate를 반환하고, *prate에는 부모 클럭의 rate를 저장해온다.

코드 라인 6~11에서 부모 클럭 코어의 rate를 먼저 설정해야 하는 클럭 코어인 경우 변경을 원하는 rate에 따른 부모 클럭 rate를 산출하여 출력 인자 prate에 저장한다.
- 예) 부모 클럭이 1Mhz였고, 하위 divider 클럭이 1/2를 적용하여 500Khz였는데, 원하는 클럭이 600Khz인 경우 가능하면 부모 클럭을 1.2Mhz로 변경한다.
코드 라인 13에서 부모 클럭 rate로부터 factor 비율을 적용한 rate를 반환한다.
- 예) 부모 클럭 rate가 10Mhz, fix->div=3, fix->mult=2인 경우
  - 10000000 / 3 * 2 = 6666666

클럭 드라이버 샘플

클럭 디바이스 트리

샘플에 사용한 디바이스 트리의 추가된 6개의 클럭 노드 내용은 다음과 같다.

fooclk1 (fixed-rate)
- 1mhz 고정 rate
fooclk2 (divider)
- fooclk1의 1mhz를 1~32 분주하여 사용한다.
fooclk3 (divider)
- fooclk2에서 분주된 클럭을 1~8 분주하여 사용한다.
- 원하는 rate 설정이 안되면 부모 클럭 rate도 변경한다.
fooclk4 (divider)
- fooclk2에서 분주된 클럭을 1,2,4,8,16 분주하여 사용한다.
fooclk5 (mux)
- fooclk1, fooclk2, fooclk3, fooclk4 클럭 중 하나를 선택하여 사용한다.
foo
- 사용자 디바이스용으로 위의 클럭 5개를 사용할 수 있게하였다.

        fooclk1 {
                phandle = <0x8100>;
                clock-output-names = "fooclk1";
                clock-frequency = <1000000>;
                #clock-cells = <0x0>;
                compatible = "fixed-clock";
        };

        fooclk2 {
                phandle = <0x8200>;
                clock-output-names = "fooclk2";
                clocks = <0x8100>;
                #clock-cells = <0x0>;
                compatible = "foo,divider-clock";
                foo,max-div = <32>;
        };

        fooclk3 {
                phandle = <0x8300>;
                clock-output-names = "fooclk3";
                clocks = <0x8200>;
                #clock-cells = <0x0>;
                compatible = "foo,divider-clock";
                foo,max-div = <8>;
                foo,set-rate-parent;
        };

        fooclk4 {
                phandle = <0x8400>;
                clock-output-names = "fooclk4";
                clocks = <0x8200>;
                #clock-cells = <0x0>;
                compatible = "foo,divider-clock";
                foo,max-div = <5>;
                foo,index-power-of-two;
        };

        fooclk5 {
                phandle = <0x8500>;
                clock-output-names = "fooclk5";
                clocks = <0x8100 0x8200 0x8300 0x8400>;
                #clock-cells = <0x0>;
                compatible = "foo,mux-clock";
        };

        foo {
                compatible = "foo,foo";
                clock-names = "fooclk1", "fooclk2", "fooclk3", "fooclk4", "fooclk5";
                clocks = <0x8100 0x8200 0x8300 0x8400 0x8500>;
        };

rate 초기 상태

다음 그림은 5 개의 클럭이 처음 초기화된 상태를 보여준다.

모든 분배기(divider)들이 1:1로 동작하고 있고, mux 클럭은 0번 입력으로 초기화된 상태이다.

다음은 5개의 클럭에 대해 prepare 및 enable한 상태이고, cat /sys/kernel/debug/clk/clk_summary 명령을 통해 확인한 결과이다.

$ insmod clk.ko
clk: loading out-of-tree module taints kernel.
foo: foo_probe
foo: devm_clk_get() clk1
foo: devm_clk_get() clk2
foo: devm_clk_get() clk3
foo: devm_clk_get() clk4
foo: devm_clk_get() clk5
foo: clk_prepare() clk1 rc=0
foo: clk_prepare() clk2 rc=0
foo: clk_prepare() clk3 rc=0
foo: clk_prepare() clk4 rc=0
foo: clk_prepare() clk5 rc=0
foo: clk_enable() clk1 rc=0
foo: clk_enable() clk2 rc=0
foo: clk_enable() clk3 rc=0
foo: clk_enable() clk4 rc=0
foo: clk_enable() clk5 rc=0

$ cat /sys/kernel/debug/clk/clk_summary
                                 enable  prepare  protect                                duty
   clock                          count    count    count        rate   accuracy phase  cycle
---------------------------------------------------------------------------------------------
 fooclk1                              0        0        0     1000000          0     0  50000
    fooclk5                           0        0        0     1000000          0     0  50000
    fooclk2                           0        0        0     1000000          0     0  50000
       fooclk4                        0        0        0     1000000          0     0  50000
       fooclk3                        0        0        0     1000000          0     0  50000
 clk24mhz                             4        5        0    24000000          0     0  50000

rate 설정 -1

다음 그림은 clk2, clk3, 및 clk4의 rate를 초기값으로 설정하고, clk5의 입력을 2번으로 선택한 상태이다.

다음은 각 클럭을 설정하는 과정을 보여준다.

$ echo 40000 > /sys/bus/platform/drivers/foo/foo2
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=25
foo_clk_divider_round_rate: rate=40000, prate=1000000, round=40000
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=25
foo_clk_divider_round_rate: rate=40000, prate=1000000, round=40000
foo_clk_divider_recalc_rate: parent_rate=40000
foo_readl: val=0
foo_clk_divider_recalc_rate: parent_rate=40000, round=40000
foo_clk_divider_recalc_rate: parent_rate=40000
foo_readl: val=0
foo_clk_divider_recalc_rate: parent_rate=40000, round=40000
foo_clk_divider_set_rate: rate=40000, parent_rate=1000000
foo_readl: val=0
foo_writel: val=24
foo_clk_divider_recalc_rate: parent_rate=1000000
foo_readl: val=24
foo_clk_divider_recalc_rate: parent_rate=1000000, round=40000
foo_clk_divider_set_rate: rate=40000, parent_rate=40000
foo_readl: val=0
foo_writel: val=0
foo_clk_divider_recalc_rate: parent_rate=40000
foo_readl: val=0
foo_clk_divider_recalc_rate: parent_rate=40000, round=40000
foo_clk_divider_set_rate: rate=40000, parent_rate=40000
foo_readl: val=0
foo_writel: val=0
foo_clk_divider_recalc_rate: parent_rate=40000
foo_readl: val=0
foo_clk_divider_recalc_rate: parent_rate=40000, round=40000
foo foo: clk_set_rate() clk2 val=40000 rc=0

$ echo 10000 > /sys/bus/platform/drivers/foo/foo3
foo_clk_divider_bestdiv: maxdiv=8
foo_clk_divider_bestdiv: maxdiv2=8
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=32
foo_clk_divider_round_rate: rate=10000, prate=1000000, round=31250
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=32
foo_clk_divider_round_rate: rate=20001, prate=1000000, round=31250
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=32
foo_clk_divider_round_rate: rate=30002, prate=1000000, round=31250
foo_clk_divider_round_rate: rate=10000, prate=40000, round=10000
foo_clk_divider_bestdiv: maxdiv=8
foo_clk_divider_bestdiv: maxdiv2=8
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=32
foo_clk_divider_round_rate: rate=10000, prate=1000000, round=31250
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=32
foo_clk_divider_round_rate: rate=20001, prate=1000000, round=31250
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=32
foo_clk_divider_round_rate: rate=30002, prate=1000000, round=31250
foo_clk_divider_round_rate: rate=10000, prate=40000, round=10000
foo_clk_divider_set_rate: rate=10000, parent_rate=40000
foo_readl: val=0
foo_writel: val=3
foo_clk_divider_recalc_rate: parent_rate=40000
foo_readl: val=3
foo_clk_divider_recalc_rate: parent_rate=40000, round=10000
foo foo: clk_set_rate() clk3 val=10000 rc=0

$ echo 5000 > /sys/bus/platform/drivers/foo/foo4
foo_clk_divider_bestdiv: maxdiv=8
foo_clk_divider_bestdiv: bestdiv=8
foo_clk_divider_round_rate: rate=5000, prate=40000, round=5000
foo_clk_divider_bestdiv: maxdiv=8
foo_clk_divider_bestdiv: bestdiv=8
foo_clk_divider_round_rate: rate=5000, prate=40000, round=5000
foo_clk_divider_set_rate: rate=5000, parent_rate=40000
foo_readl: val=0
foo_writel: val=3
foo_clk_divider_recalc_rate: parent_rate=40000
foo_readl: val=3
foo_clk_divider_recalc_rate: parent_rate=40000, round=5000
foo foo: clk_set_rate() clk4 val=5000 rc=0
foo_clk_divider_bestdiv: maxdiv=8
foo_clk_divider_bestdiv: bestdiv=8
foo_clk_divider_round_rate: rate=5000, prate=40000, round=5000
foo foo: clk_set_rate() clk4 val=5000 rc=0

$ echo 2 > /sys/bus/platform/drivers/foo/foo5
foo_clk_mux_set_parent: index=2
foo_readl: val=0
foo_writel: val=2
foo_clk_mux_set_parent: index2=2, val=2
foo foo: clk_set_parent() val=2, select=clk3 rc=0

$ cat /sys/kernel/debug/clk/clk_summary
                                 enable  prepare  protect                                duty
   clock                          count    count    count        rate   accuracy phase  cycle
---------------------------------------------------------------------------------------------
 fooclk1                              2        2        0     1000000          0     0  50000
    fooclk2                           3        3        0       40000          0     0  50000
       fooclk4                        1        1        0        5000          0     0  50000
       fooclk3                        2        2        0       10000          0     0  50000
          fooclk5                     1        1        0       10000          0     0  50000
 clk24mhz                             4        5        0    24000000          0     0  50000

rate 설정-2

다음 그림은 clk3의 rate를 25khz로 변경된 모습을 보여준다.

clk3 자체만으로 rate 설정이 불가능하여 부모 클럭인 clk2의 rate도 변경하였다. 그 후 clk4 및 clk5의 rate에도 영향을 끼친 것을 확인할 수 있다.

다음은 clk3의 rate를 25khz로 변경하는 과정을 보여준다.

$ echo 25000 > /sys/bus/platform/drivers/foo/foo3
foo_clk_divider_bestdiv: maxdiv=8
foo_clk_divider_bestdiv: maxdiv2=8
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=32
foo_clk_divider_round_rate: rate=25000, prate=1000000, round=31250
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=20
foo_clk_divider_round_rate: rate=50001, prate=1000000, round=50000
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=14
foo_clk_divider_round_rate: rate=75002, prate=1000000, round=71429
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=10
foo_clk_divider_round_rate: rate=100003, prate=1000000, round=100000
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=8
foo_clk_divider_round_rate: rate=125004, prate=1000000, round=125000
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=7
foo_clk_divider_round_rate: rate=150005, prate=1000000, round=142858
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=6
foo_clk_divider_round_rate: rate=175006, prate=1000000, round=166667
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=5
foo_clk_divider_round_rate: rate=200007, prate=1000000, round=200000
foo_clk_divider_bestdiv: bestdiv2=2, best_parent_rate=50000
foo_clk_divider_round_rate: rate=25000, prate=50000, round=25000
foo_clk_divider_bestdiv: maxdiv=8
foo_clk_divider_bestdiv: maxdiv2=8
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=32
foo_clk_divider_round_rate: rate=25000, prate=1000000, round=31250
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=20
foo_clk_divider_round_rate: rate=50001, prate=1000000, round=50000
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=14
foo_clk_divider_round_rate: rate=75002, prate=1000000, round=71429
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=10
foo_clk_divider_round_rate: rate=100003, prate=1000000, round=100000
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=8
foo_clk_divider_round_rate: rate=125004, prate=1000000, round=125000
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=7
foo_clk_divider_round_rate: rate=150005, prate=1000000, round=142858
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=6
foo_clk_divider_round_rate: rate=175006, prate=1000000, round=166667
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=5
foo_clk_divider_round_rate: rate=200007, prate=1000000, round=200000
foo_clk_divider_bestdiv: bestdiv2=2, best_parent_rate=50000
foo_clk_divider_round_rate: rate=25000, prate=50000, round=25000
foo_clk_divider_bestdiv: maxdiv=32
foo_clk_divider_bestdiv: bestdiv=20
foo_clk_divider_round_rate: rate=50000, prate=1000000, round=50000
foo_clk_divider_recalc_rate: parent_rate=50000
foo_readl: val=3
foo_clk_divider_recalc_rate: parent_rate=50000, round=6250
foo_clk_divider_recalc_rate: parent_rate=50000
foo_readl: val=3
foo_clk_divider_recalc_rate: parent_rate=50000, round=12500
foo_clk_divider_set_rate: rate=50000, parent_rate=1000000
foo_readl: val=24
foo_writel: val=19
foo_clk_divider_recalc_rate: parent_rate=1000000
foo_readl: val=19
foo_clk_divider_recalc_rate: parent_rate=1000000, round=50000
foo_clk_divider_set_rate: rate=6250, parent_rate=50000
foo_readl: val=3
foo_writel: val=3
foo_clk_divider_recalc_rate: parent_rate=50000
foo_readl: val=3
foo_clk_divider_recalc_rate: parent_rate=50000, round=6250
foo_clk_divider_set_rate: rate=25000, parent_rate=50000
foo_readl: val=3
foo_writel: val=1
foo_clk_divider_recalc_rate: parent_rate=50000
foo_readl: val=1
foo_clk_divider_recalc_rate: parent_rate=50000, round=25000
foo foo: clk_set_rate() clk3 val=25000 rc=0

$ cat /sys/kernel/debug/clk/clk_summary
                                 enable  prepare  protect                                duty
   clock                          count    count    count        rate   accuracy phase  cycle
---------------------------------------------------------------------------------------------
 fooclk1                              2        2        0     1000000          0     0  50000
    fooclk2                           3        3        0       50000          0     0  50000
       fooclk4                        1        1        0        6250          0     0  50000
       fooclk3                        2        2        0       25000          0     0  50000
          fooclk5                     1        1        0       25000          0     0  50000
 clk24mhz                             4        5        0    24000000          0     0  50000

샘플 클럭 드라이버

참고: clk-foo-dividers.c & clk-foo-mux.c & clk.c

참고

Common Clock Framework -1- (초기화) | 문c
Common Clock Framework -2- (APIs) | 문c – 현재 글