문c 블로그

smp_build_mpidr_hash()

2016-04-122019-04-29 문영일 Leave a comment

MPIDR Hash Bits

MPIDR 해시 비트를 구성하고 사용하는 단계를 알아본다.

1단계: MPIDR 읽어오기
- a) smp_setup_processor_id() 함수를 통해 부트 cpu에 대해 affinity 레벨이 표현된 MPIDR 값을 읽어 __cpu_logical_map[0] 배열에 저장한다. 저장된 값은 cpu_logical_map(cpu) 함수를 사용하여 @cpu에 해당하는 저장된 mpidr 값을 읽어온다.
  - mpidr 값은 물리 cpu id를 포함한 affinity 단계별 id가 담겨있는 값이다.
  - 참고: smp_setup_processor_id() | 문c
- b) smp_init_cpus() 함수에서 디바이스 트리 또는 ACPI 테이블에 지정된 cpu 노드의 “reg” 값에서 읽은 mpidr 값을 __cpu_logical_map[] 배열에 저장한다.
  - 참고: smp_init_cpus() | 문c
2단계: MPIDR 해시 구성하기
- smp_build_mpidr_hash() 함수를 사용하여 각 cpu에서 읽어온 mpidr 값들로 각 affinity 레벨별로 분석하여 전역 mpidr_hash 구조체 객체를 구성한다. 이 mpidr_hash 구조체는 각 affinity 레벨별로 필요한 비트 수 및 shift 비트 수와 전체 비트 수 등을 관리한다.
- 참고로 ARM32에서는 최대 3 단계 그리고 ARM64에서는 최대 4 단계의affinity 레벨을 관리한다.
3단계: MPIDR 해시 사용하기
- 이렇게 미리 계산된 mpidr_hash는 cpu_suspend() 및 cpu_resume() 내부의 어셈블리 코드에서 사용된다.
  - sleep 또는 resume 할 cpu에 해당하는 mpidr 값을 읽어 산출된 mpidr_hash를 사용하여 각 affinity 레벨에서 사용하는 비트 들을 우측 시프트한 최종 값을 얻어낸다.

MPIDR 해시 산출

다음 그림은 mpidr_hash를 산출하는 과정을 보여준다.

smp_build_mpidr_hash() – ARM32

arch/arm/kernel/setup.c

/**
 * smp_build_mpidr_hash - Pre-compute shifts required at each affinity
 *                        level in order to build a linear index from an
 *                        MPIDR value. Resulting algorithm is a collision
 *                        free hash carried out through shifting and ORing
 */

static void __init smp_build_mpidr_hash(void)
{
        u32 i, affinity;
        u32 fs[3], bits[3], ls, mask = 0;
        /*
         * Pre-scan the list of MPIDRS and filter out bits that do
         * not contribute to affinity levels, ie they never toggle.
         */
        for_each_possible_cpu(i)
                mask |= (cpu_logical_map(i) ^ cpu_logical_map(0));
        pr_debug("mask of set bits 0x%x\n", mask);
        /*
         * Find and stash the last and first bit set at all affinity levels to
         * check how many bits are required to represent them.
         */
        for (i = 0; i < 3; i++) {
                affinity = MPIDR_AFFINITY_LEVEL(mask, i);
                /*
                 * Find the MSB bit and LSB bits position
                 * to determine how many bits are required
                 * to express the affinity level.
                 */
                ls = fls(affinity);
                fs[i] = affinity ? ffs(affinity) - 1 : 0;
                bits[i] = ls - fs[i];
        }
        /*
         * An index can be created from the MPIDR by isolating the
         * significant bits at each affinity level and by shifting
         * them in order to compress the 24 bits values space to a
         * compressed set of values. This is equivalent to hashing
         * the MPIDR through shifting and ORing. It is a collision free
         * hash though not minimal since some levels might contain a number
         * of CPUs that is not an exact power of 2 and their bit
         * representation might contain holes, eg MPIDR[7:0] = {0x2, 0x80}.
         */
        mpidr_hash.shift_aff[0] = fs[0];
        mpidr_hash.shift_aff[1] = MPIDR_LEVEL_BITS + fs[1] - bits[0];
        mpidr_hash.shift_aff[2] = 2*MPIDR_LEVEL_BITS + fs[2] -
                                                (bits[1] + bits[0]);
        mpidr_hash.mask = mask;
        mpidr_hash.bits = bits[2] + bits[1] + bits[0];
        pr_debug("MPIDR hash: aff0[%u] aff1[%u] aff2[%u] mask[0x%x] bits[%u]\n",
                                mpidr_hash.shift_aff[0],
                                mpidr_hash.shift_aff[1],
                                mpidr_hash.shift_aff[2],
                                mpidr_hash.mask,
                                mpidr_hash.bits);
        /*
         * 4x is an arbitrary value used to warn on a hash table much bigger
         * than expected on most systems.
         */
        if (mpidr_hash_size() > 4 * num_possible_cpus())
                pr_warn("Large number of MPIDR hash buckets detected\n");
        sync_cache_w(&mpidr_hash);
}

전체 logical cpu id에 대한 mpidr 값을 읽어 3 개의 affinity 레벨별로 분석하여 전역 mpidr_hash 구조체 객체를 구성한다. 구성된 mpidr_hash에는 cpi id 값으로 affinity 레벨로 변환을 할 수 있는 shift 값을 가지고 있는데 이렇게 구성한 mpidr_hash 구조체는 __cpu_suspend() 및 __cpu_resume() 등에서 사용된다.

코드 라인 9~11에서 전체 possible cpu 수만큼 순회하며 해당 로지컬 cpu의 mpidr 값이 저장된 값을 읽어 변화되는 비트들 만을 추출하여 mask에 저장하고 디버그 정보로 출력한다.
- 모든 코어에 설정되어 변화되지 않는 값들을 제거한다.
코드 라인 16~26에서 각 affinity 레벨을 순회하며 mask 값에서 각 affinity 레벨에서 변동되는 비트들만을 추출하여 해당 affnity 레벨에 필요한 hash 비트를 구해 bits[]에 저장한다.
- 처음 3 개의 affnity 레벨을 순회하며 mask에 대해 각 affnity 레벨별로 값을 추출한다. (0~255)
- affnity 값에서 가장 마지막 세트된 비트 번호와 가장 처음 세트된 비트 번호 -1을 알아온다.
- 예) affnity=0xc
  - ls=4
  - fs=2
코드 라인 37~48에서 각 affinity 레벨별로 shift 되야할 비트 수를 산출하고, mask와 전체 hash 비트수를 저장한다. 그런 후 이들 값들을 디버그 출력한다.
코드 라인 55에서 mpidr_hash 객체 영역에 대해 inner & outer 캐시 클린을 수행 한다.

아래 그림은 cluster x 2개, cpu core x 4개, virtual core 4개(실제가 아닌 가상)로 이루어진 시스템에 대해 전역 mpidr_hash 객체가 구성되는 것을 보여준다.

mpidr hash bits는 4개가 필요하고 각각의 레벨에 대해 쉬프트가 필요한 수는 2, 10, 19이다.

아래 그림은rpi2 및 exynos-5420 시스템에 대해 전역 mpidr_hash 객체가 구성되는 것을 보여준다.

rpi2: mpidr hash bits는 2개가 필요하고 각각의 레벨에 대해 쉬프트가 필요한 수는 0, 6, 14이다.
exynos-5420: mpidr hash bits는 3개가 필요하고 각각의 레벨에 대해 쉬프트가 필요한 수는 0, 6, 13이다.

smp_build_mpidr_hash() – ARM64

arch/arm64/kernel/setup.c

/**
 * smp_build_mpidr_hash - Pre-compute shifts required at each affinity
 *                        level in order to build a linear index from an
 *                        MPIDR value. Resulting algorithm is a collision
 *                        free hash carried out through shifting and ORing
 */

static void __init smp_build_mpidr_hash(void)
{
        u32 i, affinity, fs[4], bits[4], ls;
        u64 mask = 0;
        /*
         * Pre-scan the list of MPIDRS and filter out bits that do
         * not contribute to affinity levels, ie they never toggle.
         */
        for_each_possible_cpu(i)
                mask |= (cpu_logical_map(i) ^ cpu_logical_map(0));
        pr_debug("mask of set bits %#llx\n", mask);
        /*
         * Find and stash the last and first bit set at all affinity levels to
         * check how many bits are required to represent them.
         */
        for (i = 0; i < 4; i++) {
                affinity = MPIDR_AFFINITY_LEVEL(mask, i);
                /*
                 * Find the MSB bit and LSB bits position
                 * to determine how many bits are required
                 * to express the affinity level.
                 */
                ls = fls(affinity);
                fs[i] = affinity ? ffs(affinity) - 1 : 0;
                bits[i] = ls - fs[i];
        }
        /*
         * An index can be created from the MPIDR_EL1 by isolating the
         * significant bits at each affinity level and by shifting
         * them in order to compress the 32 bits values space to a
         * compressed set of values. This is equivalent to hashing
         * the MPIDR_EL1 through shifting and ORing. It is a collision free
         * hash though not minimal since some levels might contain a number
         * of CPUs that is not an exact power of 2 and their bit
         * representation might contain holes, eg MPIDR_EL1[7:0] = {0x2, 0x80}.
         */
        mpidr_hash.shift_aff[0] = MPIDR_LEVEL_SHIFT(0) + fs[0];
        mpidr_hash.shift_aff[1] = MPIDR_LEVEL_SHIFT(1) + fs[1] - bits[0];
        mpidr_hash.shift_aff[2] = MPIDR_LEVEL_SHIFT(2) + fs[2] -
                                                (bits[1] + bits[0]);
        mpidr_hash.shift_aff[3] = MPIDR_LEVEL_SHIFT(3) +
                                  fs[3] - (bits[2] + bits[1] + bits[0]);
        mpidr_hash.mask = mask;
        mpidr_hash.bits = bits[3] + bits[2] + bits[1] + bits[0];
        pr_debug("MPIDR hash: aff0[%u] aff1[%u] aff2[%u] aff3[%u] mask[%#llx] bits[%u]\n",
                mpidr_hash.shift_aff[0],
                mpidr_hash.shift_aff[1],
                mpidr_hash.shift_aff[2],
                mpidr_hash.shift_aff[3],
                mpidr_hash.mask,
                mpidr_hash.bits);
        /*
         * 4x is an arbitrary value used to warn on a hash table much bigger
         * than expected on most systems.
         */
        if (mpidr_hash_size() > 4 * num_possible_cpus())
                pr_warn("Large number of MPIDR hash buckets detected\n");
}

3 단계 affinity 단계를 4 단계 까지 관리하는 것만 다르고 ARM32와 동일한 방법을 사용한다.

캐시 싱크(clean) – ARM32

sync_cache_w() – ARM32

arch/arm/include/asm/cacheflush.h

#define sync_cache_w(ptr) __sync_cache_range_w(ptr, sizeof *(ptr))

ptr 영역을 cache clean 한다.

__sync_cache_range_w()

arch/arm/include/asm/cacheflush.h

/*
 * Ensure preceding writes to *p by this CPU are visible to
 * subsequent reads by other CPUs:
 */
static inline void __sync_cache_range_w(volatile void *p, size_t size)
{
        char *_p = (char *)p;

        __cpuc_clean_dcache_area(_p, size);
        outer_clean_range(__pa(_p), __pa(_p + size));
}

p 주소 위치 부터 해당 size 만큼의 영역에 대해 inner 캐시 및 outer cache를 clean 한다.

arch/arm/include/asm/cacheflush.h

/*
 * There is no __cpuc_clean_dcache_area but we use it anyway for
 * code intent clarity, and alias it to __cpuc_flush_dcache_area.
 */
#define __cpuc_clean_dcache_area __cpuc_flush_dcache_area

arch/arm/include/asm/cacheflush.h

#define __cpuc_flush_dcache_area        cpu_cache.flush_kern_dcache_area

MULTI_CPU가 선택된 경우 cpu_cache 구조체를 통해 cache 핸들러 함수를 호출한다.
- rpi2도 이를 사용한다.

outer_clean_range()

arch/arm/include/asm/outercache.h

/**
 * outer_clean_range - clean dirty outer cache lines
 * @start: starting physical address, inclusive
 * @end: end physical address, exclusive
 */
static inline void outer_clean_range(phys_addr_t start, phys_addr_t end)
{
        if (outer_cache.clean_range)
                outer_cache.clean_range(start, end);
}

start ~ end 주소 까지 outer cache를 clean 한다.

#ifdef CONFIG_OUTER_CACHE
struct outer_cache_fns outer_cache __read_mostly;
EXPORT_SYMBOL(outer_cache);
#endif

전역 outer_cache는 outer_cache_fns 구조체를 가리키며 outer cache 핸들러 코드를 관리한다.

구조체

mpidr_hash 구조체 – ARM32

arch/arm/include/asm/smp_plat.h

/*      
 * NOTE ! Assembly code relies on the following
 * structure memory layout in order to carry out load
 * multiple from its base address. For more
 * information check arch/arm/kernel/sleep.S
 */     
struct mpidr_hash {
        u32     mask; /* used by sleep.S */
        u32     shift_aff[3]; /* used by sleep.S */
        u32     bits;
};

전체 logical cpu id 값에서 변화되는 비트들만을 추출한다.
- rpi2 예) 0xf00, 0xf01, 0xf02, 0xf03 -> mask=0x03 (lsb 두 개만 변화됨)
shift_aff[3]
- mpidr hash bit를 logical cpu id 값으로 쉬프트하기 위한 비트 수
- rpi2 예) mpir hash bit = 전체 2 개 비트 (affinity0=2, affnity1=0, affnity2=0)
  - shift_aff[0]=0, shift_aff[1]=6, shift_aff[2]=14
bits
- mpidr hash bit 수

mpidr_hash 구조체 – ARM64

arch/arm64/include/asm/smp_plat.h

struct mpidr_hash {
        u64     mask;
        u32     shift_aff[4];
        u32     bits;
};

ARM32와 유사하고, shift_aff[] 배열만 3에서 4로 확장됨을 알 수 있다.

affinity 레벨만 3 단계에서 4 단계까지 관리한다.

outer_cache_fns 구조체 – ARM32 only

arch/arm/include/asm/outercache.h

struct outer_cache_fns {
        void (*inv_range)(unsigned long, unsigned long);
        void (*clean_range)(unsigned long, unsigned long);
        void (*flush_range)(unsigned long, unsigned long);
        void (*flush_all)(void);
        void (*disable)(void);
#ifdef CONFIG_OUTER_CACHE_SYNC
        void (*sync)(void);
#endif
        void (*resume)(void);

        /* This is an ARM L2C thing */
        void (*write_sec)(unsigned long, unsigned);
        void (*configure)(const struct l2x0_regs *);
};

outer 캐시 핸들러 함수들로 구성된다.
- rpi2: 사용하지 않는다.
- l2 또는 l3 캐시를 outer 캐시로 활용하는 특수한 arm 머신들이 몇 개 있다.

참고

ARM: kernel: build MPIDR hash function data structure

smp_init_cpus()

2016-04-082019-05-08 문영일 Leave a comment

SMP(Symetric Multi Processor) Operations

SMP 전용 명령어 핸들러를 위한 구조체 smp_operations를 준비하여 전역 smp_ops의 각 기능별 후크 함수를 갖는다.

smp 오퍼레이션은 다음과 같이 크게 3가지 타입으로 구성된다.

머신 디스크립터를 사용하는 SMP operations (ARM32 only)
PSCI용 SMP operations
spin-table을 사용하는 SMP operations

부트 CPU Operations 결정 – ARM64

cpu_read_bootcpu_ops() – ARM64

arch/arm64/include/asm/cpu_ops.h

static inline void __init cpu_read_bootcpu_ops(void)
{
        cpu_read_ops(0);
}

boot cpu가 사용할 operations를 결정한다.

cpu_read_ops() – ARM64

arch/arm64/kernel/cpu_ops.c

/*
 * Read a cpu's enable method and record it in cpu_ops.
 */

int __init cpu_read_ops(int cpu)
{
        const char *enable_method = cpu_read_enable_method(cpu);

        if (!enable_method)
                return -ENODEV;

        cpu_ops[cpu] = cpu_get_ops(enable_method);
        if (!cpu_ops[cpu]) {
                pr_warn("Unsupported enable-method: %s\n", enable_method);
                return -EOPNOTSUPP;
        }

        return 0;
}

인자로 요청받은 @cpu가 사용할 operations를 결정한다. 이 때 디바이스 트리 또는 ACPI를 통해 enable_method 속성 값을 읽어온다.

cpu_read_enable_method() – ARM64

arch/arm64/kernel/cpu_ops.c

static const char *__init cpu_read_enable_method(int cpu)
{
        const char *enable_method;

        if (acpi_disabled) {
                struct device_node *dn = of_get_cpu_node(cpu, NULL);

                if (!dn) {
                        if (!cpu)
                                pr_err("Failed to find device node for boot cpu\n");
                        return NULL;
                }

                enable_method = of_get_property(dn, "enable-method", NULL);
                if (!enable_method) {
                        /*
                         * The boot CPU may not have an enable method (e.g.
                         * when spin-table is used for secondaries).
                         * Don't warn spuriously.
                         */
                        if (cpu != 0)
                                pr_err("%pOF: missing enable-method property\n",
                                        dn);
                }
        } else {
                enable_method = acpi_get_enable_method(cpu);
                if (!enable_method) {
                        /*
                         * In ACPI systems the boot CPU does not require
                         * checking the enable method since for some
                         * boot protocol (ie parking protocol) it need not
                         * be initialized. Don't warn spuriously.
                         */
                        if (cpu != 0)
                                pr_err("Unsupported ACPI enable-method\n");
                }
        }

        return enable_method;
}

인자로 요청받은 @cpu가 사용할 enable-method를 알아온다. 발견되지 않는 경우에는 null을 가져온다.

코드 라인 5~24에서 디바이스 트리의 cpu 노드에서 “enable-method” 속성 값을 읽어온다.
- “psci” 또는 “spin-table”을 알아온다. rpi3 시스템의 경우 “brcm,bcm2836-smp” 값을 사용한다.
코드 라인 25~37에서 ACPI를 사용하는 경우엔 acpi 테이블에서 enable-method 속성 값을 읽어온다.
- “psci” 또는 “parking-protocol”을 알아온다.

SMP CPU 초기화 – ARM64

다음 그림은 SMP cpu에 대한 초기화를 수행한다.

smp_init_cpus() – ARM64

arch/arm64/kernel/smp.c

/*
 * Enumerate the possible CPU set from the device tree or ACPI and build the
 * cpu logical map array containing MPIDR values related to logical
 * cpus. Assumes that cpu_logical_map(0) has already been initialized.
 */

void __init smp_init_cpus(void)
{
        int i;

        if (acpi_disabled)
                of_parse_and_init_cpus();
        else
                acpi_parse_and_init_cpus();

        if (cpu_count > nr_cpu_ids)
                pr_warn("Number of cores (%d) exceeds configured maximum of %u - clipping\n",
                        cpu_count, nr_cpu_ids);

        if (!bootcpu_valid) {
                pr_err("missing boot CPU MPIDR, not enabling secondaries\n");
                return;
        }

        /*
         * We need to set the cpu_logical_map entries before enabling
         * the cpus so that cpu processor description entries (DT cpu nodes
         * and ACPI MADT entries) can be retrieved by matching the cpu hwid
         * with entries in cpu_logical_map while initializing the cpus.
         * If the cpu set-up fails, invalidate the cpu_logical_map entry.
         */
        for (i = 1; i < nr_cpu_ids; i++) {
                if (cpu_logical_map(i) != INVALID_HWID) {
                        if (smp_cpu_setup(i))
                                cpu_logical_map(i) = INVALID_HWID;
                }
        }
}

SMP cpu에 대해 로지컬 cpu -> 물리 cpu 매핑과 cpu -> 노드 매핑 설정 및 cpu의 초기화를 수행한다.

코드 라인 5~6에서 디바이스 트리의 cpu 노드에서 cpu 정보를 읽어 로지컬 cpu -> 물리 cpu 매핑과 cpu -> 노드 매핑을 설정한다.
코드 라인 7~8에서 ACPI 테이블에서 cpu 정보를 읽어 로지컬 cpu -> 물리 cpu 매핑과 cpu -> 노드 매핑을 설정한다.
코드 라인 26~31에서 각 cpu의 초기화를 수행한다.

of_parse_and_init_cpus() – ARM64

arch/arm64/kernel/smp.c

/*
 * Enumerate the possible CPU set from the device tree and build the
 * cpu logical map array containing MPIDR values related to logical
 * cpus. Assumes that cpu_logical_map(0) has already been initialized.
 */

static void __init of_parse_and_init_cpus(void)
{
        struct device_node *dn;

        for_each_of_cpu_node(dn) {
                u64 hwid = of_get_cpu_mpidr(dn);

                if (hwid == INVALID_HWID)
                        goto next;

                if (is_mpidr_duplicate(cpu_count, hwid)) {
                        pr_err("%pOF: duplicate cpu reg properties in the DT\n",
                                dn);
                        goto next;
                }

                /*
                 * The numbering scheme requires that the boot CPU
                 * must be assigned logical id 0. Record it so that
                 * the logical map built from DT is validated and can
                 * be used.
                 */
                if (hwid == cpu_logical_map(0)) {
                        if (bootcpu_valid) {
                                pr_err("%pOF: duplicate boot cpu reg property in DT\n",
                                        dn);
                                goto next;
                        }

                        bootcpu_valid = true;
                        early_map_cpu_to_node(0, of_node_to_nid(dn));

                        /*
                         * cpu_logical_map has already been
                         * initialized and the boot cpu doesn't need
                         * the enable-method so continue without
                         * incrementing cpu.
                         */
                        continue;
                }

                if (cpu_count >= NR_CPUS)
                        goto next;

                pr_debug("cpu logical map 0x%llx\n", hwid);
                cpu_logical_map(cpu_count) = hwid;

                early_map_cpu_to_node(cpu_count, of_node_to_nid(dn));
next:
                cpu_count++;
        }
}

디바이스 트리의 cpu 노드에서 cpu 정보를 읽어 로지컬 cpu -> 물리 cpu 매핑과 cpu -> 노드 매핑을 설정한다.

코드 라인 5~9에서 cpu 노드들을 순회하며 reg 속성 값에서 hwid를 읽어온다.
코드 라인 11~15에서 hwid가 중복되는 경우 에러 메시지를 출력하고 skip 한다.
코드 라인 23~40에서 부트 cpu에 대한 cpu -> 노드 변환을 지원하기 위해 매핑을 하고, 노드 들에 부트 cpu가 하나만 있는지 체크한다.
코드 라인 42~43에서 디바이스 트리에서 읽은 cpu 노드 수가 컴파일 당시 설정한 최대 cpu 수를 초과하는 경우 skip 한다.
코드 라인 45~46에서 로지컬 cpu id 번호를 디버그 정보로 출력하고 로지컬 cpu -> 물리 cpu 변환을 지원하기 위해 매핑한다.
코드 라인 48에서 cpu -> 노드 변환을 지원하기 위해 매핑을 한다.

다음은 boradcom 사의 northstart2 칩에서 사용된 cpu 노드들을 보여준다.

4개의 코어가 시큐어 펌웨어에 psci 콜을 사용하는 것을 알 수 있다.

        cpus {
                #address-cells = <2>;
                #size-cells = <0>;

                A57_0: cpu@0 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a57", "arm,armv8";
                        reg = <0 0>;
                        enable-method = "psci";
                        next-level-cache = <&CLUSTER0_L2>;
                };

                A57_1: cpu@1 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a57", "arm,armv8";
                        reg = <0 1>;
                        enable-method = "psci";
                        next-level-cache = <&CLUSTER0_L2>;
                };

                A57_2: cpu@2 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a57", "arm,armv8";
                        reg = <0 2>;
                        enable-method = "psci";
                        next-level-cache = <&CLUSTER0_L2>;
                };

                A57_3: cpu@3 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a57", "arm,armv8";
                        reg = <0 3>;
                        enable-method = "psci";
                        next-level-cache = <&CLUSTER0_L2>;
                };

                CLUSTER0_L2: l2-cache@0 {
                        compatible = "cache";
                };
        };

        psci {
                compatible = "arm,psci-1.0";
                method = "smc";
        };

early_map_cpu_to_node() – ARM64

arch/arm64/mm/numa.c

void __init early_map_cpu_to_node(unsigned int cpu, int nid)
{
        /* fallback to node 0 */
        if (nid < 0 || nid >= MAX_NUMNODES || numa_off)
                nid = 0;

        cpu_to_node_map[cpu] = nid;

        /*
         * We should set the numa node of cpu0 as soon as possible, because it
         * has already been set up online before. cpu_to_node(0) will soon be
         * called.
         */
        if (!cpu)
                set_cpu_numa_node(cpu, nid);
}

cpu -> node 변환을 위해 요청한 @cpu에 대한 @nid를 설정한다.

NUMA 노드에서 0번 cpu의 경우 이미 online 상태이므로 추가로 cpu에 대한 numa 노드 설정도 한다.

SMP CPU opearations 지정 및 초기화 – ARM32

setup_arch() 함수 중반 smp_init_cpus() 함수를 호출하기 직전

PSCI가 동작 상태에 따라 전역 smp_ops가 가리키는 구조체가 다르다.

PSCI 동작 시 smp_ops는 psci_smp_ops를 가리키게 한다.
PSCI 동작하지 않고 mdesc->smp가 존재하는 경우 smp_ops는 mdesc->smp를 가리킨다.

SMP cpu의 operations 지정

setup_arch() 중반 – ARM32

arch/arm/kernel/setup.c

#ifdef CONFIG_SMP
        if (is_smp()) {
                if (!mdesc->smp_init || !mdesc->smp_init()) {
                        if (psci_smp_available())
                                smp_set_ops(&psci_smp_ops);
                        else if (mdesc->smp)
                                smp_set_ops(mdesc->smp);
                }

코드 라인 3에서 SMP 머신에서 smp_init 멤버 변수가 null 이거나 머신의 smp_init() 함수 수행 결과가 실패한 경우
- A) PSCI 방식 보다 먼저 사용될 수 있도록 MCPM(Multiple Cluster Power Management) 기능을 사용할 수 있도록 smp_ops가 mcpm_smp_ops 객체를 가리키게 한다.
  - vexpress 예)
    - mdesc->smp_init = vexpress_smp_init_ops() 함수를 가리키고 수행한다.
      - DT 에서 “cci-400” 이라는 cache coherent interface 400 series 디바이스 장치가 발견되고 “status” 속성이 “ok”일 때 smp_ops를 mcpm_smp_ops 객체를 가리키게 한다.
        
        mcpm_smp_ops.smp_init_cpus = null
코드 라인 4~5에서 CONFIG_ARM_PSCI 커널 옵션이 설정되어 있고 PSCI가 동작 가능하면
- B) PSCI(Power State Cordination Interface) 기능을 사용할 수 있도록 전역 smp_ops가 psci_smp_ops를 가리키게 한다.
  - psci_smp_ops.smp_init_cpus = null
코드 라인 6~7에서 C) MCPM이나 PSCI가 동작 가능 상태가 아니면 smp_set_ops() 함수를 통해 smp_ops가 mdesc->smp를 대입한다.
- 예) rpi2:
  - smp_ops가 bcm2709_smp_ops.ops 를 가리킨다.

mdesc->smp 및 mdesc->smp_init을 사용하는 시스템 예)

arch/arm/mach-vexpress/v2m.c

static const char * const v2m_dt_match[] __initconst = { 
        "arm,vexpress",
        NULL,
};

DT_MACHINE_START(VEXPRESS_DT, "ARM-Versatile Express")
        .dt_compat      = v2m_dt_match,
        .l2c_aux_val    = 0x00400000,
        .l2c_aux_mask   = 0xfe0fffff,
        .smp            = smp_ops(vexpress_smp_dt_ops),
        .smp_init       = smp_init_ops(vexpress_smp_init_ops),
MACHINE_END

Versatile Express 시스템에서 DT 머신 정의
- .smp가 전역 vexpress_smp_dt_ops 객체를 가리킴
- .smp_init_ops가 vexpress_smp_init_ops() 함수를 가리킴

smp_ops()

#define smp_ops(ops) (&(ops))

vexpress_smp_dt_ops 전역 객체

arch/arm/mach-vexpress/platsmp.c

struct smp_operations __initdata vexpress_smp_dt_ops = { 
        .smp_prepare_cpus       = vexpress_smp_dt_prepare_cpus,
        .smp_secondary_init     = versatile_secondary_init,
        .smp_boot_secondary     = versatile_boot_secondary,
#ifdef CONFIG_HOTPLUG_CPU
        .cpu_die                = vexpress_cpu_die,
#endif
};

smp_init_ops()

arch/arm/include/asm/mach/arch.h

#define smp_init_ops(ops) (&(ops))

vexpress_smp_init_ops()

arch/arm/mach-vexpress/platsmp.c

bool __init vexpress_smp_init_ops(void)
{
#ifdef CONFIG_MCPM
        /*
         * The best way to detect a multi-cluster configuration at the moment
         * is to look for the presence of a CCI in the system.
         * Override the default vexpress_smp_ops if so.
         */
        struct device_node *node;
        node = of_find_compatible_node(NULL, NULL, "arm,cci-400");
        if (node && of_device_is_available(node)) {
                mcpm_smp_set_ops();
                return true;
        }
#endif
        return false;
}

CONFIG_MCPM
- Multi-Cluster Power Management로 big.LITTLE 등의 클러스터 기반의 파워를 관리하는 기능이다.
node = of_find_compatible_node(NULL, NULL, “arm,cci-400”);
- DT 전체 노드 중 compatible 속성이 “arm,cci-400” 인 노드를 찾는다.
if (node && of_device_is_available(node)) {
- 노드의 디바이스가 사용 가능하면
  - 노드의 “status” 속성이 “ok”이면
mcpm_smp_set_ops();
- 전역 smp_ops가 mcpm_smp_ops를 가리키게 한다.

vexpress-v2p-ca15_a7.dts 에서 “arm,cci-400” Cache Coherent Interface 400 series 디바이스에 대한 스크립트 정의를 보여준다.

2개의 a15 cpu와 3개의 a7 cpu가 big.LITTLE 클러스터 구성되어 있다.

        cci@2c090000 {
                compatible = "arm,cci-400";
                #address-cells = <1>;
                #size-cells = <1>;
                reg = <0 0x2c090000 0 0x1000>;
                ranges = <0x0 0x0 0x2c090000 0x10000>;

                cci_control1: slave-if@4000 {
                        compatible = "arm,cci-400-ctrl-if";
                        interface-type = "ace";
                        reg = <0x4000 0x1000>;
                };

                cci_control2: slave-if@5000 {
                        compatible = "arm,cci-400-ctrl-if";
                        interface-type = "ace";
                        reg = <0x5000 0x1000>;
                };
        };

psci_smp_available()

arch/arm/kernel/psci_smp.c

bool __init psci_smp_available(void)
{
        /* is cpu_on available at least? */
        return (psci_ops.cpu_on != NULL);
}

psci_ops.cpu_on에 함수가 연결되어 있는 경우 PSCI가 동작하는 것으로 간주할 수 있다.

mcpm_smp_set_ops()

arch/arm/common/mcpm_platsmp.c

static struct smp_operations __initdata mcpm_smp_ops = {
        .smp_boot_secondary     = mcpm_boot_secondary,
        .smp_secondary_init     = mcpm_secondary_init,
#ifdef CONFIG_HOTPLUG_CPU
        .cpu_kill               = mcpm_cpu_kill,
        .cpu_disable            = mcpm_cpu_disable,
        .cpu_die                = mcpm_cpu_die,
#endif
};

void __init mcpm_smp_set_ops(void)
{
        smp_set_ops(&mcpm_smp_ops);
}

전역 smp_ops가 mcpm_smp_ops 객체를 가리키게 한다.

smp_init_cpus() – ARM32

arch/arm/kernel/smp.c

/* platform specific SMP operations */
void __init smp_init_cpus(void)
{
        if (smp_ops.smp_init_cpus)
                smp_ops.smp_init_cpus();
}

smp_ops.smp_init_cpus에 함수가 연결되어 있는 경우 호출한다.

smp_ops.smp_init_cpus에 등록된 함수를 호출하여 해당 SMP 머신에 대한 초기화를 진행한다.
- 보통 SCU(Snoop Control Unit) 즉 Cache Coherent Interface에 대한 설정이나 cpu possible bitmap 설정 등을 수행한다.
- exynos 예)
  - exynos_smp_init_cpus()
- rpi2 예)
  - bcm2709_smp_init_cpus()

아래는 smp_ops에 bcm2709_smp_ops.smp 가 연결되어 있어서 bcm2709_smp_init_cpus() 함수를 호출하는 과정을 보여준다.

arch/arm/mach-bcm2709/bcm2709.c

struct smp_operations  bcm2709_smp_ops __initdata = {
        .smp_init_cpus          = bcm2709_smp_init_cpus,
        .smp_prepare_cpus       = bcm2709_smp_prepare_cpus,
        .smp_secondary_init     = bcm2709_secondary_init,
        .smp_boot_secondary     = bcm2709_boot_secondary,
};
#endif

static const char * const bcm2709_compat[] = {
        "brcm,bcm2709",
        "brcm,bcm2708", /* Could use bcm2708 in a pinch */
        NULL
};

MACHINE_START(BCM2709, "BCM2709")
    /* Maintainer: Broadcom Europe Ltd. */
#ifdef CONFIG_SMP
        .smp            = smp_ops(bcm2709_smp_ops),
#endif
        .map_io = bcm2709_map_io,
        .init_irq = bcm2709_init_irq,
        .init_time = bcm2709_timer_init,
        .init_machine = bcm2709_init,
        .init_early = bcm2709_init_early,
        .reserve = board_reserve,
        .restart        = bcm2709_restart,
        .dt_compat = bcm2709_compat,
MACHINE_END

bcm2709_smp_init_cpus()

arch/arm/mach-bcm2709/bcm2709.c

void __init bcm2709_smp_init_cpus(void)
{       
        void secondary_startup(void);
        unsigned int i, ncores;
        
        ncores = 4; // xxx scu_get_core_count(NULL);
        printk("[%s] enter (%x->%x)\n", __FUNCTION__, (unsigned)virt_to_phys((void *)secondary_startup), (unsigned)__io_address(ST_BASE + 0x10));
        printk("[%s] ncores=%d\n", __FUNCTION__, ncores);
    
        for (i = 0; i < ncores; i++) {
                set_cpu_possible(i, true);
                /* enable IRQ (not FIQ) */
                writel(0x1, __io_address(ARM_LOCAL_MAILBOX_INT_CONTROL0 + 0x4 * i));
                //writel(0xf, __io_address(ARM_LOCAL_TIMER_INT_CONTROL0   + 0x4 * i));
        }
        set_smp_cross_call(bcm2835_send_doorbell);
}

ncores를 4로 고정시켰다.
각 코어 번호에 대해 cpu possible 비트를 설정한다.
writel(0x1, __io_address(ARM_LOCAL_MAILBOX_INT_CONTROL0 + 0x4 * i));
- 각 core에 대해 IRQ enable (not FIQ)
set_smp_cross_call(bcm2835_send_doorbell);
- 전역 __smp_cross_call이 bcm2835_send_doorbell() 함수를 가리키게 한다.

set_smp_cross_call()

arch/arm/kernel/smp.c

static void (*__smp_cross_call)(const struct cpumask *, unsigned int);

void __init set_smp_cross_call(void (*fn)(const struct cpumask *, unsigned int))
{
        if (!__smp_cross_call)
                __smp_cross_call = fn; 
}

전역 __smp_cross_call이 설정되어 있지 않으면 fn으로 설정한다.

smp_cross_call()

다음의 함수들에서 호출된다.

arch_send_call_function_ipi_mask()
arch_send_wakeup_ipi_mask()
arch_send_call_function_single_ipi()
arch_irq_work_raise()
tick_broadcast()
smp_send_reschedule()
smp_send_stop()

arm/kernel/smp.c

static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
{
        trace_ipi_raise(target, ipi_types[ipinr]);
        __smp_cross_call(target, ipinr);
}

bcm2835_send_doorbell()

arch/arm/kernel/smp.c

static void bcm2835_send_doorbell(const struct cpumask *mask, unsigned int irq) 
{
        int cpu; 
        /*
         * Ensure that stores to Normal memory are visible to the
         * other CPUs before issuing the IPI.
         */
        dsb();

        /* Convert our logical CPU mask into a physical one. */
        for_each_cpu(cpu, mask)
        {    
                /* submit softirq */
                writel(1<<irq, __io_address(ARM_LOCAL_MAILBOX0_SET0 + 0x10 * MPIDR_AFFINITY_LEVEL(cpu_logical_map(cpu), 0)));
        }
}

writel()

arch/arm/include/asm/io.h

#define writel(v,c)             ({ __iowmb(); writel_relaxed(v,c); })

#define __iowmb()               wmb()

#define writel_relaxed(v,c)     __raw_writel((__force u32) cpu_to_le32(v),c)

__raw_writel()

arch/arm/include/asm/io.h

static inline void __raw_writel(u32 val, volatile void __iomem *addr)
{
        asm volatile("str %1, %0"
                     : "+Qo" (*(volatile u32 __force *)addr)
                     : "r" (val));
}

CPU 관련 API

get_cpu()

include/linux/smp.h

#define get_cpu()               ({ preempt_disable(); smp_processor_id(); })

cpu id를 알아온다. cpu가 바뀌지 않도록 Preemption을 disable한다.

이 함수를 사용하는 경우 사용 후에 반드시 짝이되는 put_cpu()를 사용하여 preemption을 다시 enable 해줘야 한다.

put_cpu()

include/linux/smp.h

#define put_cpu()               preempt_enable()

cpu id 사용이 완료되었으므로 Preemption을 enable한다.

smp_processor_id()

include/linux/smp.h

/*
 * smp_processor_id(): get the current CPU ID.
 *
 * if DEBUG_PREEMPT is enabled then we check whether it is
 * used in a preemption-safe way. (smp_processor_id() is safe
 * if it's used in a preemption-off critical section, or in
 * a thread that is bound to the current CPU.)
 *
 * NOTE: raw_smp_processor_id() is for internal use only
 * (smp_processor_id() is the preferred variant), but in rare
 * instances it might also be used to turn off false positives
 * (i.e. smp_processor_id() use that the debugging code reports but
 * which use for some reason is legal). Don't use this to hack around
 * the warning message, as your code might not work under PREEMPT.
 */
#ifdef CONFIG_DEBUG_PREEMPT
  extern unsigned int debug_smp_processor_id(void);
# define smp_processor_id() debug_smp_processor_id()
#else
# define smp_processor_id() raw_smp_processor_id()
#endif

CONFIG_DEBUG_PREEMPT를 사용하는 경우 이 함수를 호출하기 전에 preempt가 이미 enable되어 있는 경우 경고를 한다. 그리고 사용을 하지 않는 경우 raw_smp_processor_id() 매크로를 호출한다.

raw_smp_processor_id() – ARM32

arch/arm/include/asm/smp.h

#define raw_smp_processor_id() (current_thread_info()->cpu)

현재 태스크가 동작하고 있는 cpu 번호를 리턴한다.

raw_smp_processor_id() – ARM64

arch/arm64/include/asm/smp.h

/*
 * We don't use this_cpu_read(cpu_number) as that has implicit writes to
 * preempt_count, and associated (compiler) barriers, that we'd like to avoid
 * the expense of. If we're preemptible, the value can be stale at use anyway.
 * And we can't use this_cpu_ptr() either, as that winds up recursing back
 * here under CONFIG_DEBUG_PREEMPT=y.
 */
#define raw_smp_processor_id() (*raw_cpu_ptr(&cpu_number))

per-cpu로 관리되는 cpu_number를 통해 cpu id를 알아온다.

참고

Linux Kernel Power Management (PM) Framework for ARM 64-bit Processors | arm – 다운로드
Multi-cluster power management | LWN.net
Linux support for ARM big.LITTLE | LWN.net

psci_dt_init()

2016-04-082019-06-29 문영일 Leave a comment

PSCI(Power State Coordination Interface)

절전 인터페이스로 psci가 선택되었을 때 사용 가능하다. psci 기능 호출 시 하이퍼 바이저 또는 시큐어 펌웨어(시큐어 모니터 또는 Trust Firmware)로 관련 요청을 전달하여 수행한다.

대부분 psci 방법을 사용하지만 일부는 spin-table 방식을 사용하고, 특정 회사의 경우 별도의 방법을 사용하기도 한다.
cpu 노드의 enable-method 속성에서 지정한다.

다음 그림은 리눅스 커널이 하이퍼 바이저 또는 시큐어 펌웨어로 psci 호출을 하는 모습을 3 가지 사례로 보여준다.

하이퍼 바이저와 시큐어 펌웨어가 동시에 동작하는 경우 커널은 하이퍼 바이저로 보낸다.

PSCI 기능들

다음은 PSCI 기능들이고 버전마다 지원되는 기능이 다르다.

SMCCC에 관련한 기능도 4가지가 있다.

Function id

PSCI v0.1
- 위의 기능들에 대해 펑션 id가 제각기 달라 디바이스 트리에서 id 값을 지정해주어야 한다.(implementation defined).
PSCI v0.2
- 각 기능들에 대해 펑션 id가 고정되어 있다.
PSCI v0.4
- PSCI v0.2에 두 개의 명령 features와 system suspend 기능을 추가하였다.

호출 후 결과 값들

/* PSCI return values (inclusive of all PSCI versions) */
#define PSCI_RET_SUCCESS                         0
#define PSCI_RET_NOT_SUPPORTED                  -1
#define PSCI_RET_INVALID_PARAMS                 -2
#define PSCI_RET_DENIED                         -3
#define PSCI_RET_ALREADY_ON                     -4
#define PSCI_RET_ON_PENDING                     -5
#define PSCI_RET_INTERNAL_FAILURE               -6
#define PSCI_RET_NOT_PRESENT                    -7
#define PSCI_RET_DISABLED                       -8
#define PSCI_RET_INVALID_ADDRESS                -9

SMCCC(SMC Calling Convention)

SMC Calling Convention으로 커널이 시큐어 펌웨어 또는 하이퍼 바이저 콜 요청 시 사용할 SMC 및 HVC 호출 규약을 의미한다.

SMCCC 버전은 1.0과 1.1이 사용되고 있다.
참고: arm64: Add SMCCC v1.1 support and CVE-2017-5715 | LWN.net

PSCI 초기화

SMP 아키텍처에서 PSCI 기능이 지원되는 경우 해당 초기화 함수를 동작시켜 각 기능에 해당하는 핸들러 함수를 연결해준다. 다음은 관련된 전역 변수이다.

PSCI 동작 시 시큐어 모니터 콜을 호출할 때와 하이퍼 바이저 콜을 호출할 때 호출 함수를 지정한다.
- Secure Monitor Call
  - 전역 invoke_psci_fn에 __invoke_psci_fn_smc()을 지정한다.
- Hyper Visor Call
  - 전역 invoke_psci_fn에 __invoke_psci_fn_hvc() 함수를 가리키게 한다.
PSCI v0.1의 경우 기능 id를 지정하기 위해 디바이스 트리로부터 기능 별로 id를 읽어온다.
- 전역 psci_function_id[]에 device tree에서 읽은 속성 id 값이 기록된다.
전역 psci_ops의 각 핸들러 함수에 사용 가능한 PSCI 기능 함수를 가리키게 한다.
- psci v0.1에서는 디바이스 트리에서 지정한 함수만 핸들러 함수를 등록한다.
- psci v0.2 및 v1.0 에서는 전체 핸들러 함수를 등록한다.

psci_dt_init()

drivers/firmware/psci.c

int __init psci_dt_init(void)
{
        struct device_node *np;
        const struct of_device_id *matched_np;
        psci_initcall_t init_fn;

        np = of_find_matching_node_and_match(NULL, psci_of_match, &matched_np);
        if (!np || !of_device_is_available(np))
                return -ENODEV;

        init_fn = (psci_initcall_t)matched_np->data;
        return init_fn(np);
}

PSCI 기능을 지원하기 위해 기능별로 시큐어 모니터 콜과 하이퍼 바이저 콜용 함수 준비한다.

코드 라인 7~9에서 device tree에서 psci_of_match[]에 있는 디바이스들 중 하나라도 일치되는 노드를 찾아 리턴하고 출력 인수 matched_up에 psci_of_match[]에 있는 of_device_id 구조체 엔트리 중 매치된 엔트리 포인터를 저장한다.
코드 라인 11~12에서 psci 버전에 해당하는 초기화 함수를 호출한다.
- psci_0_1_init() 또는 psci_0_2_init() 함수를 가리킨다.

psci_of_match

검색하고자 하는 디바이스 compatible 명과 초기화 함수가 담겨있다. psci v1.0의 경우 초기화 함수는 psci 0.2 초기화 함수를 그대로 사용한다.

drivers/firmware/psci.c

static const struct of_device_id psci_of_match[] __initconst = {
        { .compatible = "arm,psci",     .data = psci_0_1_init},
        { .compatible = "arm,psci-0.2", .data = psci_0_2_init},
        { .compatible = "arm,psci-1.0", .data = psci_0_2_init},
        {},
};

PSCI v0.1

아래 디바이스 트리는 psci v0.1을 사용하고, 이에 대응하는 초기화 함수는 psci_0_2_init() 이다.

arch/arm/boot/dts/xenvm-4.2.dts

 *
 * Based on ARM Ltd. Versatile Express CoreTile Express (single CPU)
 * Cortex-A15 MPCore (V2P-CA15)
 *
 */

/dts-v1/;

/ {
        model = "XENVM-4.2";
        compatible = "xen,xenvm-4.2", "xen,xenvm";
        interrupt-parent = <&gic>;
        #address-cells = <2>;
        #size-cells = <2>;

        chosen {
                /* this field is going to be adjusted by the hypervisor */
                bootargs = "console=hvc0 root=/dev/xvda";
        };

        cpus {
                #address-cells = <1>;
                #size-cells = <0>;

                cpu@0 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a15";
                        reg = <0>;
                };

                cpu@1 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a15";
                        reg = <1>;
                };
        };

        psci {
                compatible      = "arm,psci";
                method          = "hvc";
                cpu_off         = <1>;
                cpu_on          = <2>;
        };

psci 노드의 compatible이 “arm,psci” 디바이스를 사용한다고 되어 있다.
- “arm,psci”인 경우 초기화 함수는 psci_0_1_init() 함수이다.

다음 그림은 xennvm-4.2 시스템에서 hvc 방식의 PSCI 기능을 사용할 때의 모습을 보여준다.

psci_0_1_init()

drivers/firmware/psci.c

/*
 * PSCI < v0.2 get PSCI Function IDs via DT.
 */

static int psci_0_1_init(struct device_node *np)
{
        u32 id; 
        int err;

        err = get_set_conduit_method(np);

        if (err)
                goto out_put_node;

        pr_info("Using PSCI v0.1 Function IDs from DT\n");

        if (!of_property_read_u32(np, "cpu_suspend", &id)) {
                psci_function_id[PSCI_FN_CPU_SUSPEND] = id; 
                psci_ops.cpu_suspend = psci_cpu_suspend;
        }

        if (!of_property_read_u32(np, "cpu_off", &id)) {
                psci_function_id[PSCI_FN_CPU_OFF] = id; 
                psci_ops.cpu_off = psci_cpu_off;
        }

        if (!of_property_read_u32(np, "cpu_on", &id)) {
                psci_function_id[PSCI_FN_CPU_ON] = id; 
                psci_ops.cpu_on = psci_cpu_on;
        }

        if (!of_property_read_u32(np, "migrate", &id)) {
                psci_function_id[PSCI_FN_MIGRATE] = id; 
                psci_ops.migrate = psci_migrate;
        }

out_put_node:
        of_node_put(np);
        return err;
}

psci v0.1용 호출 함수(svc 또는 hvc)와 기능들을 지정한다.

코드 라인 6~9에서 “method” 속성 값에 “hvc” 또는 “smc”가 지정되어야 한다.
코드 라인 13~31에서 아래 속성이 있는 경우 각각의 psci_function_id[]에 읽어들이 id 속성 값을 저장하고, psci_ops 구조체의 각각의 멤버변수에 해당 구동 함수를 연결한다.
- “cpu_suspend” 속성 -> psci_cpu_suspend()
- “cpu_off” 속성 -> psci_cpu_off()
- “cpu_on” 속성 -> psci_cpu_on()
- “cpu_migrate” 속성 -> psci_migrate()

다음 그림은 psci v0.1을 사용하는 경우의 호출 관계이다.

psci_function_id[] 배열

다음은 PSCI v0.1의 기능별로 고정되지 않은 펑션 id를 지정하는 psci_function_id[] 배열이다.

arch/arm/kernel/psci.c

enum psci_function {
        PSCI_FN_CPU_SUSPEND,
        PSCI_FN_CPU_ON,
        PSCI_FN_CPU_OFF,
        PSCI_FN_MIGRATE,
        PSCI_FN_MAX,
};

static u32 psci_function_id[PSCI_FN_MAX];

get_set_conduit_method()

drivers/firmware/psci.c

static int get_set_conduit_method(struct device_node *np)
{
        const char *method;

        pr_info("probing for conduit method from DT.\n");

        if (of_property_read_string(np, "method", &method)) {
                pr_warn("missing \"method\" property\n");
                return -ENXIO;
        }

        if (!strcmp("hvc", method)) {
                set_conduit(PSCI_CONDUIT_HVC);
        } else if (!strcmp("smc", method)) {
                set_conduit(PSCI_CONDUIT_SMC);
        } else {
                pr_warn("invalid \"method\" property: %s\n", method);
                return -EINVAL;
        }
        return 0;
}

psci 노드에 “method” 속성이 발견되면 “hvc” 및 “smc” 속성 값에 대해서는 전역 invoke_psci_fn에 각 함수를 연결시키고 그렇지 않은 경우 에러로 리턴한다.

코드 라인 7~10에서 노드에 “method” 속성이 없는 경우 경고 메시지를 출력하고 에러를 리턴한다.
코드 라인 12~13에서 속성 값이 “hvc”인 경우 hvc용 psci 호출 함수를 지정한다.
코드 라인 14~19에서 속성 값이 “smc”인 경우 smc용 psci 호출 함수를 지정한다. 그렇지 않은 경우 경고 메시지를 출력한다.

set_conduit()

drivers/firmware/psci.c

static void set_conduit(enum psci_conduit conduit)
{
        switch (conduit) {
        case PSCI_CONDUIT_HVC:
                invoke_psci_fn = __invoke_psci_fn_hvc;
                break;
        case PSCI_CONDUIT_SMC:
                invoke_psci_fn = __invoke_psci_fn_smc;
                break;
        default:
                WARN(1, "Unexpected PSCI conduit %d\n", conduit);
        }

        psci_ops.conduit = conduit;
}

요청 @conduit에 따라 psci 호출 함수를 지정한다.

PSCI v0.2 이상 (include v1.0)

아래 broadcom 사의 northstar2 시스템에서 PSCI 1.0을 사용하고 이에 대응하는 초기화 함수는 psci_0_2_init() 이다.

arch/arm64/boot/dts/broadcom/northstar2/ns2.dtsi

        psci {
                compatible = "arm,psci-1.0";
                method = "smc";
        };

psci_0_2_init()

drivers/firmware/psci.c

/*
 * PSCI init function for PSCI versions >=0.2
 *
 * Probe based on PSCI PSCI_VERSION function
 */

static int __init psci_0_2_init(struct device_node *np)
{
        int err;

        err = get_set_conduit_method(np);

        if (err)
                goto out_put_node;
        /*
         * Starting with v0.2, the PSCI specification introduced a call
         * (PSCI_VERSION) that allows probing the firmware version, so
         * that PSCI function IDs and version specific initialization
         * can be carried out according to the specific version reported
         * by firmware
         */
        err = psci_probe();

out_put_node:
        of_node_put(np);
        return err;
}

psci v0.2 및 psci v1.0용 호출 함수(svc 또는 hvc)와 기능들을 지정한다.

코드 라인 5~8에서 “method” 속성 값에 “hvc” 또는 “smc”가 지정되어야 한다.
코드 라인 16에서 psci 버전 및 기능별로 고정된 호출 함수를 지정한다.

다음은 psci v1.0이 디텍트되어 사용됨을 보여주는 출력 로그이다.

[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.

psci_probe()

drivers/firmware/psci.c

/*
 * Probe function for PSCI firmware versions >= 0.2
 */

static int __init psci_probe(void)
{
        u32 ver = psci_get_version();

        pr_info("PSCIv%d.%d detected in firmware.\n",
                        PSCI_VERSION_MAJOR(ver),
                        PSCI_VERSION_MINOR(ver));

        if (PSCI_VERSION_MAJOR(ver) == 0 && PSCI_VERSION_MINOR(ver) < 2) {
                pr_err("Conflicting PSCI version detected.\n");
                return -EINVAL;
        }

        psci_0_2_set_functions();

        psci_init_migrate();

        if (PSCI_VERSION_MAJOR(ver) >= 1) {
                psci_init_smccc();
                psci_init_cpu_suspend();
                psci_init_system_suspend();
        }

        return 0;
}

psci 버전 및 기능별로 고정된 호출 함수를 지정한다.

코드 라인 3~12에서 psci 버전을 알아와서 출력하고 psci v0.2 보다 낮은 경우 -EINVAL 에러로 함수를 빠져나간다.
코드 라인 14에서 psci v0.2를 위한 기능별로 호출 함수를 지정한다.
코드 라인 16에서 트러스트 펌웨어 OS가 cpu off 시 migration을 지원하는지 여부를 알아와 출력한다.
- 특정 cpu를 off 시 트러스트 펌웨어 OS가 동작하지 못하는 경우를 위해 동작 중인 cpu를 저장해두어야 한다.
코드 라인 18~22에서 psci v1.0 이상인 경우 smccc 버전을 알아오고, suspend 기능들을 준비한다.
- SMCCC 버전 기능이 지원되는 경우 SMCCC SMCC 버전을 알아와서 v1.1 여부를 기록한다.

psci_0_2_set_functions()

drivers/firmware/psci.c

static void __init psci_0_2_set_functions(void)
{
        pr_info("Using standard PSCI v0.2 function IDs\n");
        psci_ops.get_version = psci_get_version;

        psci_function_id[PSCI_FN_CPU_SUSPEND] =
                                        PSCI_FN_NATIVE(0_2, CPU_SUSPEND);
        psci_ops.cpu_suspend = psci_cpu_suspend;

        psci_function_id[PSCI_FN_CPU_OFF] = PSCI_0_2_FN_CPU_OFF;
        psci_ops.cpu_off = psci_cpu_off;

        psci_function_id[PSCI_FN_CPU_ON] = PSCI_FN_NATIVE(0_2, CPU_ON);
        psci_ops.cpu_on = psci_cpu_on;

        psci_function_id[PSCI_FN_MIGRATE] = PSCI_FN_NATIVE(0_2, MIGRATE);
        psci_ops.migrate = psci_migrate;

        psci_ops.affinity_info = psci_affinity_info;

        psci_ops.migrate_info_type = psci_migrate_info_type;

        arm_pm_restart = psci_sys_reset;

        pm_power_off = psci_sys_poweroff;
}

psci v0.2를 위한 기능별로 호출 함수를 지정한다.

psci v0.1 호환을 위해 cpu_suspend, cpu_off, cpu_on, migrate 등의 4 가지 기능의 경우 해당 기능 id 값을 지정한다.
각 기능에 대한 호출 함수를 psci_ops의 각 콜백 함수에 지정한다.

psci_init_migrate()

drivers/firmware/psci.c

/*
 * Detect the presence of a resident Trusted OS which may cause CPU_OFF to
 * return DENIED (which would be fatal).
 */

static void __init psci_init_migrate(void)
{
        unsigned long cpuid;
        int type, cpu = -1;

        type = psci_ops.migrate_info_type();

        if (type == PSCI_0_2_TOS_MP) {
                pr_info("Trusted OS migration not required\n");
                return;
        }

        if (type == PSCI_RET_NOT_SUPPORTED) {
                pr_info("MIGRATE_INFO_TYPE not supported.\n");
                return;
        }

        if (type != PSCI_0_2_TOS_UP_MIGRATE &&
            type != PSCI_0_2_TOS_UP_NO_MIGRATE) {
                pr_err("MIGRATE_INFO_TYPE returned unknown type (%d)\n", type);
                return;
        }

        cpuid = psci_migrate_info_up_cpu();
        if (cpuid & ~MPIDR_HWID_BITMASK) {
                pr_warn("MIGRATE_INFO_UP_CPU reported invalid physical ID (0x%lx)\n",
                        cpuid);
                return;
        }

        cpu = get_logical_index(cpuid);
        resident_cpu = cpu >= 0 ? cpu : -1;

        pr_info("Trusted OS resident on physical CPU 0x%lx\n", cpuid);
}

트러스트 펌웨어 OS가 cpu off 시 migration을 지원하는지 여부를 알아와 출력하고, 동작 중인 트러스트 펌웨어 OS의 cpu를 전역 변수 resident_cpu에 기억해둔다.

코드 라인 6에서 migrate info type 기능을 호출하여 type을 알아온다.
코드 라인 8~11에서 트러스트 펌웨어 OS가 mp를 지원하여 커널에서 migration 요청을 할 필요 없으므로 함수를 빠져나간다.
코드 라인 13~16에서 트러스트 펌웨어 OS가 migration을 지원하지 않으므로 에러 메시지를 출력하고 함수를 빠져나간다.
코드 라인 18~22에서 트러스트 펌웨어 OS가 알려지지 않은 타입을 반환한 경우 경고 메시지를 출력한다.
코드 라인 24~29에서트러스트 펌웨어 OS가 동작하는 물리 cpu 번호를 알아온다.
코드 라인 31~34에서 트러스트 펌웨어 OS가 동작하는 논리 cpu 번호를 알아내서 출력한다.

PSCI migrate 타입

커널에서 특정 cpu를 off하기 전에 트러스트 펌웨어 OS의 migration 지원 여부이다. 오렌지 색 항목은 cpu의 off가 가능하다.

PSCI_0_2_TOS_UP_MIGRATE(0)
- 트러스트 펌웨어 OS가 하나의 코어에서 동작하지만 migration 요청을 지원한다.
- 해당 cpu를 off할 때 트러스트 펌웨어 OS도 해당 cpu에서 동작 중인 경우 cpu를 off하기 전에 migration을 요청해야 한다.
PSCI_0_2_TOS_UP_NO_MIGRATE(1)
- 트러스트 펌웨어 OS가 하나의 특정 코어에 고정되어 동작하므로 migration 요청을 하면 안된다.
PSCI_0_2_TOS_MP(2)
- 트러스트 펌웨어 OS가 멀티 코어 시스템에서 동작 가능하므로 migration을 할 필요가 없다.
PSCI_RET_NOT_SUPPORTED(-1)
- 트러스트 펌웨어 OS가 migration 요청을 지원하지 않는다.

psci_init_smccc()

drivers/firmware/psci.c

static void __init psci_init_smccc(void)
{
        u32 ver = ARM_SMCCC_VERSION_1_0;
        int feature;

        feature = psci_features(ARM_SMCCC_VERSION_FUNC_ID);

        if (feature != PSCI_RET_NOT_SUPPORTED) {
                u32 ret;
                ret = invoke_psci_fn(ARM_SMCCC_VERSION_FUNC_ID, 0, 0, 0);
                if (ret == ARM_SMCCC_VERSION_1_1) {
                        psci_ops.smccc_version = SMCCC_VERSION_1_1;
                        ver = ret;
                }
        }

        /*
         * Conveniently, the SMCCC and PSCI versions are encoded the
         * same way. No, this isn't accidental.
         */
        pr_info("SMC Calling Convention v%d.%d\n",
                PSCI_VERSION_MAJOR(ver), PSCI_VERSION_MINOR(ver));

}

SMCCC 버전 기능이 지원되는 경우 SMCCC SMCC 버전을 알아와서 v1.1 여부를 기록한다.

Cortex A57/A72인 경우 SMCCC 버전이 1.1이면 BP hardening workaround 처리를 위해 psci 호출을 수행해야 한다.
참고
- arm64: Add skeleton to harden the branch predictor against aliasing attacks
- arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

psci_init_cpu_suspend()

drivers/firmware/psci.c

static void __init psci_init_cpu_suspend(void)
{
        int feature = psci_features(psci_function_id[PSCI_FN_CPU_SUSPEND]);

        if (feature != PSCI_RET_NOT_SUPPORTED)
                psci_cpu_suspend_feature = feature;
}

cpu suspend 기능이 지원되는지 여부를 알아온다.

psci_init_system_suspend()

drivers/firmware/psci.c

static void __init psci_init_system_suspend(void)
{
        int ret;

        if (!IS_ENABLED(CONFIG_SUSPEND))
                return;

        ret = psci_features(PSCI_FN_NATIVE(1_0, SYSTEM_SUSPEND));

        if (ret != PSCI_RET_NOT_SUPPORTED)
                suspend_set_ops(&psci_suspend_ops);
}

system suspend 기능이 지원되는지 여부를 알아와서 절전 기능에 대한 ops를 psci 방식으로 연결한다.

suspend_set_ops()

kernel/power/suspend.c

/**
 * suspend_set_ops - Set the global suspend method table.
 * @ops: Suspend operations to use.
 */
void suspend_set_ops(const struct platform_suspend_ops *ops)
{
        lock_system_sleep();

        suspend_ops = ops;

        if (valid_state(PM_SUSPEND_STANDBY)) {
                mem_sleep_states[PM_SUSPEND_STANDBY] = mem_sleep_labels[PM_SUSPEND_STANDBY];
                pm_states[PM_SUSPEND_STANDBY] = pm_labels[PM_SUSPEND_STANDBY];
                if (mem_sleep_default == PM_SUSPEND_STANDBY)
                        mem_sleep_current = PM_SUSPEND_STANDBY;
        }
        if (valid_state(PM_SUSPEND_MEM)) {
                mem_sleep_states[PM_SUSPEND_MEM] = mem_sleep_labels[PM_SUSPEND_MEM];
                if (mem_sleep_default >= PM_SUSPEND_MEM)
                        mem_sleep_current = PM_SUSPEND_MEM;
        }

        unlock_system_sleep();
}
EXPORT_SYMBOL_GPL(suspend_set_ops);

PSCI 기능 호출

psci_get_version()

drivers/firmware/psci.c

static u32 psci_get_version(void)
{
        return invoke_psci_fn(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
}

psci 버전을 알아온다. 결과 값의 하위 16비트에 minor 버전이 담기고, 나머지에 major 버전이 담겨있다.

psci_cpu_suspend()

drivers/firmware/psci.c

static int psci_cpu_suspend(u32 state, unsigned long entry_point)
{
        int err;
        u32 fn;

        fn = psci_function_id[PSCI_FN_CPU_SUSPEND];
        err = invoke_psci_fn(fn, state, entry_point, 0);
        return psci_to_linux_errno(err);
}

cpu를 절전 모드로 변경하기 위해 psci 호출을 수행한다.

psci_cpu_off()

drivers/firmware/psci.c

static int psci_cpu_off(u32 state)
{
        int err;
        u32 fn;

        fn = psci_function_id[PSCI_FN_CPU_OFF];
        err = invoke_psci_fn(fn, state, 0, 0);
        return psci_to_linux_errno(err);
}

cpu를 off하기 위해 psci 호출을 수행한다.

psci_cpu_on()

drivers/firmware/psci.c

static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
{
        int err;
        u32 fn;

        fn = psci_function_id[PSCI_FN_CPU_ON];
        err = invoke_psci_fn(fn, cpuid, entry_point, 0);
        return psci_to_linux_errno(err);
}

cpu를 on하기 위해 psci 호출을 수행한다.

psci_affinity_info()

drivers/firmware/psci.c

static int psci_affinity_info(unsigned long target_affinity,
                unsigned long lowest_affinity_level)
{
        return invoke_psci_fn(PSCI_FN_NATIVE(0_2, AFFINITY_INFO),
                              target_affinity, lowest_affinity_level, 0);
}

cpu의 affinity 정보를 알아오기 위해 psci 호출을 수행한다.

psci_migrate()

drivers/firmware/psci.c

static int psci_migrate(unsigned long cpuid)
{
        int err;
        u32 fn;

        fn = psci_function_id[PSCI_FN_MIGRATE];
        err = invoke_psci_fn(fn, cpuid, 0, 0);
        return psci_to_linux_errno(err);
}

시큐어 펌웨어를 다른 cpu로 migration하기 위해 psci 호출을 수행한다.

psci_migrate_info_type()

drivers/firmware/psci.c

static int psci_migrate_info_type(void)
{
        return invoke_psci_fn(PSCI_0_2_FN_MIGRATE_INFO_TYPE, 0, 0, 0);
}

시큐어 펌웨어의 cpu migration 타입 정보를 알아오기 위해 psci 호출을 수행한다.

psci_migrate_info_up_cpu()

drivers/firmware/psci.c

static unsigned long psci_migrate_info_up_cpu(void)
{
        return invoke_psci_fn(PSCI_FN_NATIVE(0_2, MIGRATE_INFO_UP_CPU),
                              0, 0, 0);
}

시큐어 펌웨어가 싱글 코어에서만 동작할 때 현재 동작 중인 물리 cpu 번호를 알아오기 위해 psci 호출을 수행한다.

psci_sys_poweroff()

drivers/firmware/psci.c

static void psci_sys_poweroff(void)
{
        invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
}

시스템을 off 하기 위해 psci 호출을 수행한다.

psci_sys_reset()

drivers/firmware/psci.c

static void psci_sys_reset(enum reboot_mode reboot_mode, const char *cmd)
{
        invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0);
}

시스템을 reset 하기 위해 psci 호출을 수행한다.

psci_features()

drivers/firmware/psci.c

static int __init psci_features(u32 psci_func_id)
{
        return invoke_psci_fn(PSCI_1_0_FN_PSCI_FEATURES,
                              psci_func_id, 0, 0);
}

요청한 psci 기능이 지원되는지 여부를 알아온다.

psci_system_suspend()

drivers/firmware/psci.c

static int psci_system_suspend(unsigned long unused)
{
        return invoke_psci_fn(PSCI_FN_NATIVE(1_0, SYSTEM_SUSPEND),
                              __pa_symbol(cpu_resume), 0, 0);
}

시스템을 suspend 하기 위해 psci 호출을 수행한다.

PSCI 호출 API

invoke_psci_fn()

typedef unsigned long (psci_fn)(unsigned long, unsigned long,
                                unsigned long, unsigned long);
static psci_fn *invoke_psci_fn;

위의 전역 펑션콜용 변수에는 hvc 및 smc 호출을 위한 함수가 지정된다.

첫 번째 인자에는 function id가 지정되고 나머지엔 3 개의 인자가 전달된다.

HVC 호출

__invoke_psci_fn_hvc()

drivers/firmware/psci.c

static unsigned long __invoke_psci_fn_hvc(unsigned long function_id,
                        unsigned long arg0, unsigned long arg1,
                        unsigned long arg2)
{
        struct arm_smccc_res res;

        arm_smccc_hvc(function_id, arg0, arg1, arg2, 0, 0, 0, 0, &res);
        return res.a0;
}

smccc를 통해 하이퍼바이저를 위해 psci 기능을 호출하는데 8개의 인자를 전달할 때 나머지 4개는 0으로 채워 전달한다.

arm_smccc_hvc()

include/linux/arm-smccc.h

#define arm_smccc_hvc(...) __arm_smccc_hvc(__VA_ARGS__, NULL)

__arm_smccc_hvc()

include/linux/arm-smccc.h

/**
 * __arm_smccc_hvc() - make HVC calls
 * @a0-a7: arguments passed in registers 0 to 7
 * @res: result values from registers 0 to 3
 * @quirk: points to an arm_smccc_quirk, or NULL when no quirks are required.
 *
 * This function is used to make HVC calls following SMC Calling
 * Convention.  The content of the supplied param are copied to registers 0
 * to 7 prior to the HVC instruction. The return values are updated with
 * the content from register 0 to 3 on return from the HVC instruction.  An
 * optional quirk structure provides vendor specific behavior.
 */

asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1,
                        unsigned long a2, unsigned long a3, unsigned long a4,
                        unsigned long a5, unsigned long a6, unsigned long a7,
                        struct arm_smccc_res *res, struct arm_smccc_quirk *quirk);

arch/arm64/kernel/smccc-call.S

/*
 * void arm_smccc_hvc(unsigned long a0, unsigned long a1, unsigned long a2,
 *                unsigned long a3, unsigned long a4, unsigned long a5,
 *                unsigned long a6, unsigned long a7, struct arm_smccc_res *res,
 *                struct arm_smccc_quirk *quirk)
 */

ENTRY(__arm_smccc_hvc)
        SMCCC   hvc
ENDPROC(__arm_smccc_hvc)
EXPORT_SYMBOL(__arm_smccc_hvc)

SMCCC 매크로

        .macro SMCCC instr
        .cfi_startproc
        \instr  #0
        ldr     x4, [sp]
        stp     x0, x1, [x4, #ARM_SMCCC_RES_X0_OFFS]
        stp     x2, x3, [x4, #ARM_SMCCC_RES_X2_OFFS]
        ldr     x4, [sp, #8]
        cbz     x4, 1f /* no quirk structure */
        ldr     x9, [x4, #ARM_SMCCC_QUIRK_ID_OFFS]
        cmp     x9, #ARM_SMCCC_QUIRK_QCOM_A6
        b.ne    1f
        str     x6, [x4, ARM_SMCCC_QUIRK_STATE_OFFS]
1:      ret
        .cfi_endproc
        .endm

SMC 호출

__invoke_psci_fn_smc()

drivers/firmware/psci.c

static unsigned long __invoke_psci_fn_smc(unsigned long function_id,
                        unsigned long arg0, unsigned long arg1,
                        unsigned long arg2)
{
        struct arm_smccc_res res;

        arm_smccc_smc(function_id, arg0, arg1, arg2, 0, 0, 0, 0, &res);
        return res.a0;
}

smccc를 통해 시큐어 펌웨어를 위해 psci 기능을 호출하는데 8개의 인자를 전달할 때 나머지 4개는 0으로 채워 전달한다.

arm_smccc_smc()

include/linux/arm-smccc.h

#define arm_smccc_smc(...) __arm_smccc_smc(__VA_ARGS__, NULL)

__arm_smccc_smc()

include/linux/arm-smccc.h

/**
 * __arm_smccc_smc() - make SMC calls
 * @a0-a7: arguments passed in registers 0 to 7
 * @res: result values from registers 0 to 3
 * @quirk: points to an arm_smccc_quirk, or NULL when no quirks are required.
 *
 * This function is used to make SMC calls following SMC Calling Convention.
 * The content of the supplied param are copied to registers 0 to 7 prior
 * to the SMC instruction. The return values are updated with the content
 * from register 0 to 3 on return from the SMC instruction.  An optional
 * quirk structure provides vendor specific behavior.
 */

asmlinkage void __arm_smccc_smc(unsigned long a0, unsigned long a1,
                        unsigned long a2, unsigned long a3, unsigned long a4,
                        unsigned long a5, unsigned long a6, unsigned long a7,
                        struct arm_smccc_res *res, struct arm_smccc_quirk *quirk);

arch/arm64/kernel/smccc-call.S

/*
 * void arm_smccc_smc(unsigned long a0, unsigned long a1, unsigned long a2,
 *                unsigned long a3, unsigned long a4, unsigned long a5,
 *                unsigned long a6, unsigned long a7, struct arm_smccc_res *res,
 *                struct arm_smccc_quirk *quirk)
 */

ENTRY(__arm_smccc_smc)
        SMCCC   smc
ENDPROC(__arm_smccc_smc)
EXPORT_SYMBOL(__arm_smccc_smc)

참고

DTB (of API) | 문c
POWER STATE COORDINATION INTERFACE (PSCI) | arm – 다운로드
Firmware interfaces for mitigating cache speculation vulnerabilities System Software on Arm Developer (2018) | arm – 다운로드 pdf
SMC CALLING CONVENTION System Software on ARM® Platforms (2016) | arm – 다운로드 pdf
Power State Coordination Interface (PSCI) | kernel.org

arm_dt_init_cpu_maps()

2016-04-082016-04-12 문영일 Leave a comment

smp_setup_processor_id()에서 로지컬 cpu id 배열을 구성하였었는데 이 함수에서 다시 DTB와 비교하여 재 구성하고 특정 SMP 아키텍처의 SMP 핸들러가 준비되어 있는 경우 전역 smp_ops를 설정 한다.

“/cpu” 노드에서 읽은 reg 속성 값 순서대로 로지컬 맵을 구성하되 예외로 현재 부팅된 물리 cpu id는 로지컬 cpu id 0으로 구성한다.

arm_dt_init_cpu_maps()

arch/arm/kernel/devtree.c

/*
 * arm_dt_init_cpu_maps - Function retrieves cpu nodes from the device tree
 * and builds the cpu logical map array containing MPIDR values related to
 * logical cpus
 *
 * Updates the cpu possible mask with the number of parsed cpu nodes
 */
void __init arm_dt_init_cpu_maps(void)
{
        /*
         * Temp logical map is initialized with UINT_MAX values that are
         * considered invalid logical map entries since the logical map must
         * contain a list of MPIDR[23:0] values where MPIDR[31:24] must
         * read as 0.
         */
        struct device_node *cpu, *cpus;
        int found_method = 0;
        u32 i, j, cpuidx = 1;
        u32 mpidr = is_smp() ? read_cpuid_mpidr() & MPIDR_HWID_BITMASK : 0;

        u32 tmp_map[NR_CPUS] = { [0 ... NR_CPUS-1] = MPIDR_INVALID };
        bool bootcpu_valid = false;
        cpus = of_find_node_by_path("/cpus");

        if (!cpus)
                return;

        for_each_child_of_node(cpus, cpu) {
                u32 hwid;

                if (of_node_cmp(cpu->type, "cpu"))
                        continue;

                pr_debug(" * %s...\n", cpu->full_name);
                /*
                 * A device tree containing CPU nodes with missing "reg"
                 * properties is considered invalid to build the
                 * cpu_logical_map.
                 */
                if (of_property_read_u32(cpu, "reg", &hwid)) {
                        pr_debug(" * %s missing reg property\n",
                                     cpu->full_name);
                        return;
                }

                /*
                 * 8 MSBs must be set to 0 in the DT since the reg property
                 * defines the MPIDR[23:0].
                 */
                if (hwid & ~MPIDR_HWID_BITMASK)
                        return;

                /*
                 * Duplicate MPIDRs are a recipe for disaster.
                 * Scan all initialized entries and check for
                 * duplicates. If any is found just bail out.
                 * temp values were initialized to UINT_MAX
                 * to avoid matching valid MPIDR[23:0] values.
                 */
                for (j = 0; j < cpuidx; j++)
                        if (WARN(tmp_map[j] == hwid, "Duplicate /cpu reg "
                                                     "properties in the DT\n"))
                                return;

                /*
                 * Build a stashed array of MPIDR values. Numbering scheme
                 * requires that if detected the boot CPU must be assigned
                 * logical id 0. Other CPUs get sequential indexes starting
                 * from 1. If a CPU node with a reg property matching the
                 * boot CPU MPIDR is detected, this is recorded so that the
                 * logical map built from DT is validated and can be used
                 * to override the map created in smp_setup_processor_id().
                 */
                if (hwid == mpidr) {
                        i = 0;
                        bootcpu_valid = true;
                } else {
                        i = cpuidx++;
                }

                if (WARN(cpuidx > nr_cpu_ids, "DT /cpu %u nodes greater than "
                                               "max cores %u, capping them\n",
                                               cpuidx, nr_cpu_ids)) {
                        cpuidx = nr_cpu_ids;
                        break;
                }

                tmp_map[i] = hwid;

                if (!found_method)
                        found_method = set_smp_ops_by_method(cpu);
        }

        /*
         * Fallback to an enable-method in the cpus node if nothing found in
         * a cpu node.
         */
        if (!found_method)
                set_smp_ops_by_method(cpus);

        if (!bootcpu_valid) {
                pr_warn("DT missing boot CPU MPIDR[23:0], fall back to default cpu_logical_map\n");
                return;
        }

        /*
         * Since the boot CPU node contains proper data, and all nodes have
         * a reg property, the DT CPU list can be considered valid and the
         * logical map created in smp_setup_processor_id() can be overridden
         */
        for (i = 0; i < cpuidx; i++) {
                set_cpu_possible(i, true);
                cpu_logical_map(i) = tmp_map[i];
                pr_debug("cpu logical map 0x%x\n", cpu_logical_map(i));
        }
}

u32 mpidr = is_smp() ? read_cpuid_mpidr() & MPIDR_HWID_BITMASK : 0;
- mpidr에 cpu 물리 id를 읽어온다.
  - SMP의 경우 MPIDR의 lsb 24bits를 읽어오고 UP의 경우 0으로 한다.
- MPIDR_HWID_BITMASK=0xFFFFFF
u32 tmp_map[NR_CPUS] = { [0 … NR_CPUS-1] = MPIDR_INVALID };
- 임시 tmp_map[]의 각 값을 MPIDR_INVALID(0xff00_0000)로 초기화한다.
cpus = of_find_node_by_path(“/cpus”);
- “/cpus” 노드를 찾아 발견되지 않으면 함수를 빠져나간다.
for_each_child_of_node(cpus, cpu) {
- “/cpus” 노드의 서브 노드들로 루프를 돈다.
if (of_node_cmp(cpu->type, “cpu”)) continue;
- 노드 타입이 “cpu”가 아닌 경우 skip
if (of_property_read_u32(cpu, “reg”, &hwid)) {
- “reg” 속성이 없는 경우 디버그 메시지를 출력하고 함수를 빠져나간다.
if (hwid & ~MPIDR_HWID_BITMASK) return;
- hwid의 msb 8bits가 값이 있는 경우 함수를 빠져나간다.
for (j = 0; j < cpuidx; j++) if (WARN(tmp_map[j] == hwid, “Duplicate /cpu reg properties in the DT\n”)) return;
- “/cpu” 노드의 reg 속성 값이 중복된 경우 에러 메시지를 출력하고 함수를 빠져나간다.
if (hwid == mpidr) { i = 0; bootcpu_valid = true;
- DTB에서 읽은 hwid와 MPIDR[bit23:0] 값을 비교하여 같은 경우 현재 부팅되어 진행중인 물리 cpu인 경우 i=0, bootcpu_valid=true로 대입한다.
} else { i = cpuidx++; }
- 매치되지 않으면 cpuidx를 1 증가시킨다.
if (WARN(cpuidx > nr_cpu_ids, “DT /cpu %u nodes greater than max cores %u, capping them\n”, cpuidx, nr_cpu_ids)) { cpuidx = nr_cpu_ids; break; }
- cpuidx가 nr_cpu_ids를 초과한 경우 에러 메시지를 출력하고 cpuidx = nr_cpu_ids로 변경하고 루프를 빠져나간다.
tmp_map[i] = hwid;
- tmp_map[i]에 물리 cpu id를 대입한다.
if (!found_method) found_method = set_smp_ops_by_method(cpu);
- 처음 이곳을 수행할 때만 현재 cpu 노드에 대해 set_smp_ops_by_method()를 수행한다.
- rpi2의 경우 smp_ops를 사용하지 않는다.
if (!found_method) set_smp_ops_by_method(cpus);
- found_method가 false인 경우 set_smp_ops_by_method() 함수를 “/cpus” 노드에 대해 수행한다.
if (!bootcpu_valid) {
- bootcpu를 DTB에서 발견하지 못한 경우 에러 메시지를 출력하고 함수를 빠져나간다.
for (i = 0; i < cpuidx; i++) {
- cpuidx 만큼 루프를 돈다.
set_cpu_possible(i, true);
- i 논리 cpu 번호에 대해 possible 비트를 true로 설정한다.
cpu_logical_map(i) = tmp_map[i];
- i 논리 cpu 번호에 물리 cpu id를 저장한다.
  - __cpu_logical_map[]

set_smp_ops_by_method()

arch/arm/kernel/devtree.c

static int __init set_smp_ops_by_method(struct device_node *node)
{
        const char *method;
        struct of_cpu_method *m = __cpu_method_of_table;

        if (of_property_read_string(node, "enable-method", &method))
                return 0;

        for (; m->method; m++)
                if (!strcmp(m->method, method)) {
                        smp_set_ops(m->ops);
                        return 1;
                }

        return 0;
}

특정 SMP 아키텍처가 ops 핸들러가 구성된 경우 전역 변수 smp_ops가 이를 가리키게 한다.

struct of_cpu_method *m = __cpu_method_of_table;
- 전역 __cpu_method_of_table은 특정 SMP 아키텍처에서 CPU_METHOD_DECLARE() 함수를 사용하여 추가될 수 있다.
if (of_property_read_string(node, “enable-method”, &method)) return 0;
- 현재 노드의 “enable-method” 속성이 발견되지 않으면 함수를 빠져나간다.
for (; m->method; m++)
- __cpu_method_of_table의 시작부터 끝까지
  - 테이블의 마지막은 m->method가 null이다.
if (!strcmp(m->method, method)) { smp_set_ops(m->ops); return 1; }
- 테이블에 있는 method 문자열과 “enable-method” 속성 값이 같은 경우 ops 설정을 위해 smp_set_ops() 함수를 호출하여 전역 변수 smp_ops를 설정하고 결과 값이 1인채로 리턴한다.

arch/arm/boot/dts/hip01-ca9x2.dts

/* First 8KB reserved for secondary core boot */
/memreserve/ 0x80000000 0x00002000;

#include "hip01.dtsi"

/ {
        model = "Hisilicon HIP01 Development Board";
        compatible = "hisilicon,hip01-ca9x2", "hisilicon,hip01";

        cpus {
                #address-cells = <1>;
                #size-cells = <0>;
                enable-method = "hisilicon,hip01-smp";

                cpu@0 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a9";
                        reg = <0>;
                };

                cpu@1 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a9";
                        reg = <1>;
                };
        };

“/cpus” 노드의 enable-method 속성 값에 대응하는 디바이스 소스는 다음을 참고한다.
- enable-arch/arm/mach-hisi/platsmp.c

smp_set_ops()

arch/arm/kernel/smp.c

static struct smp_operations smp_ops;

void __init smp_set_ops(struct smp_operations *ops)
{
        if (ops)
                smp_ops = *ops;
};

전역 smp_ops에 인수로 주어진 ops를 대입한다.

CPU_METHOD_OF_DECLARE()

arch/arm/include/asm/smp.h

#define CPU_METHOD_OF_DECLARE(name, _method, _ops)                      \
        static const struct of_cpu_method __cpu_method_of_table_##name  \
                __used __section(__cpu_method_of_table)                 \
                = { .method = _method, .ops = _ops }

위 매크로는 rpi에서 사용하지 않으므로 아래 화일에서 선언된 소스로 추적을 해본다.

arch/arm/mach-hisi/platsmp.c

CPU_METHOD_OF_DECLARE(hip01_smp, "hisilicon,hip01-smp", &hip01_smp_ops);

다음 문장과 같이 구조체 데이터가 만들어진다.
- static const struct of_cpu_method __cpu_method_of_table_hip01_smp __used __section(__cpu_method_of_table) = { .method = “hisilicon,hip01-smp”, .ops = &hip01_smp_ops )

arch/arm/mach-hisi/platsmp.c

struct smp_operations hip01_smp_ops __initdata = { 
        .smp_prepare_cpus       = hisi_common_smp_prepare_cpus,
        .smp_boot_secondary     = hip01_boot_secondary,
};

hip01_smp 시스템의 ops는 두 개의 함수가 연결되어 있음을 알 수 있다.

CPU_METHOD_OF_TABLES()

include/asm-generic/vmlinux.lds.h

#define CPU_METHOD_OF_TABLES()  OF_TABLE(CONFIG_SMP, cpu_method)

CONFIG_SMP=y로 설정된 경우 각각의 호출되는 매크로를 통해 따라가 본다.
- “#defind CONFIG_SMP 1″이 include/generated/autoconf.h 화일에 자동 생성되어 있다.

#define OF_TABLE(cfg, name)     __OF_TABLE(config_enabled(cfg), name)

__OF_TABLE(config_enabled(1), cpu_method)

#define __OF_TABLE(cfg, name)   ___OF_TABLE(cfg, name)

___OF_TABLE(1, cpu_method)

#define ___OF_TABLE(cfg, name)  _OF_TABLE_##cfg(name)

_OF_TABLE_1(cpu_method)

#define _OF_TABLE_0(name)
#define _OF_TABLE_1(name)                                               \
        . = ALIGN(8);                                                   \
        VMLINUX_SYMBOL(__##name##_of_table) = .;                        \
        *(__##name##_of_table)                                          \
        *(__##name##_of_table_end)

OF_TABLE_0(cpu_method)의 경우 아무런 동작을 하지 않는다.
OF_TABLE_1(cpu_method)의 경우 다음과 같은 문장을 만들어낸다.
- . = ALIGN(8) \
- __cpu_method_of_table = .; \
- *(__cpu_method_of_table) \
- *(__cpu_method_of_table_end)

/* init and exit section handling */
#define INIT_DATA                                                       \
        *(.init.data)                                                   \
        MEM_DISCARD(init.data)                                          \
        KERNEL_CTORS()                                                  \
        MCOUNT_REC()                                                    \
        *(.init.rodata)                                                 \
        FTRACE_EVENTS()                                                 \
        TRACE_SYSCALLS()                                                \
        KPROBE_BLACKLIST()                                              \
        MEM_DISCARD(init.rodata)                                        \
        CLK_OF_TABLES()                                                 \
        RESERVEDMEM_OF_TABLES()                                         \
        CLKSRC_OF_TABLES()                                              \
        IOMMU_OF_TABLES()                                               \
        CPU_METHOD_OF_TABLES()                                          \
        KERNEL_DTB()                                                    \
        IRQCHIP_OF_MATCH_TABLE()                                        \
        EARLYCON_OF_TABLES()

위 INIT_DATA의 16번째 줄에 CPU_METHOD_OF_TABLES()가 포함되어 있다.
- __cpu_method_of_table이 위치한다.

static const struct of_cpu_method __cpu_method_of_table_sentinel
        __used __section(__cpu_method_of_table_end);

위의 빈 구조체 하나가 __cpu_method_of_table_end 테이블의 마지막에 들어간다.

전체 검색 시 멤버 변수 method가 null인 경우 테이블의 마지막임을 알린다.

config_enabled()

include/linux/kconfig.h

/*
 * Getting something that works in C and CPP for an arg that may or may
 * not be defined is tricky.  Here, if we have "#define CONFIG_BOOGER 1"
 * we match on the placeholder define, insert the "0," for arg1 and generate
 * the triplet (0, 1, 0).  Then the last step cherry picks the 2nd arg (a one).
 * When CONFIG_BOOGER is not defined, we generate a (... 1, 0) pair, and when
 * the last step cherry picks the 2nd arg, we get a zero.
 */
#define __ARG_PLACEHOLDER_1 0,
#define config_enabled(cfg) _config_enabled(cfg)
#define _config_enabled(value) __config_enabled(__ARG_PLACEHOLDER_##value)
#define __config_enabled(arg1_or_junk) ___config_enabled(arg1_or_junk 1, 0)
#define ___config_enabled(__ignored, val, ...) val

이 매크로는 커널 옵션 설정에 따라 1과 0으로 만들어진다.
- CONFIG_SMP=y 또는 CONFIG_SMP=m -> 1
- CONFIG_SMP= -> 0
config_enabled(CONFIG_SMP)를 하는 경우 다음의 순서를 따른 후에 결과가 1이만들어진다.
- _config_enabled(1)
- __config_enabled(__ARG_PLACEHOLDER_1)
- ___config_enabled(0, 1, 0)
- 1

구조체

smp_operations 구조체

arch/arm/include/asm/smp.h

struct smp_operations {
#ifdef CONFIG_SMP
        /*
         * Setup the set of possible CPUs (via set_cpu_possible)
         */
        void (*smp_init_cpus)(void);
        /*
         * Initialize cpu_possible map, and enable coherency
         */
        void (*smp_prepare_cpus)(unsigned int max_cpus);

        /*
         * Perform platform specific initialisation of the specified CPU.
         */
        void (*smp_secondary_init)(unsigned int cpu);
        /*
         * Boot a secondary CPU, and assign it the specified idle task.
         * This also gives us the initial stack to use for this CPU.
         */
        int  (*smp_boot_secondary)(unsigned int cpu, struct task_struct *idle);
#ifdef CONFIG_HOTPLUG_CPU
        int  (*cpu_kill)(unsigned int cpu);
        void (*cpu_die)(unsigned int cpu);
        int  (*cpu_disable)(unsigned int cpu);
#endif
#endif
};

특정 SMP 아키텍처는 별도로 위의 구조체를 사용하여 핸들러 함수들을 지정할 수 있다.

of_cpu_method 구조체

arch/arm/include/asm/smp.h

struct of_cpu_method {
        const char *method;
        struct smp_operations *ops;
};

method
- “/cpus” 노드의 enable-method 속성 값
  - 예) “hisilicon,hip01-smp”;
ops
- 해당 SMP 머신의 각 핸들러가 속해있는 smp_operations 구조체를 가리킨다.

참고

smp_setup_processor_id() | 문c

unflatten_device_tree()

2016-04-052021-10-28 문영일 Leave a comment

디바이스 트리(FDT) -> Expanded 포맷으로 변환

device_node와 property 구조체를 사용하여 트리 구조로 각 노드와 속성을 연결한다.
기존에 사용하던 DTB 바이너리들도 문자열등을 그대로 사용하므로 삭제되지 않고 유지된다.

노드 명

다음 디바이스 트리를 보고 3 가지 노드 명 분류를 알아본다.

Full path 노드명
Compact 노드명
Alias 명

arch/arm64/boot/dts/broadcom/northstar2/ns2.dtsi

/ {
        compatible = "brcm,ns2";
        interrupt-parent = <&gic>;
        #address-cells = <2>;
        #size-cells = <2>;

        cpus {
                #address-cells = <2>;
                #size-cells = <0>;

                A57_0: cpu@0 {
                        device_type = "cpu";
                        compatible = "arm,cortex-a57", "arm,armv8";
                        reg = <0 0>;
                        enable-method = "psci";
                        next-level-cache = <&CLUSTER0_L2>;
                };

Full path 노드명

디렉토리 구조로 표현한다.
- /cpus/cpu@0

Compact 노드명

Device Tree 버전 0x10이되면서 full path 노드명 대신 compact 노드명을 널리 사용한다.
- cpu@0
참고: Device trees I: Are we having fun yet? | LWN.net

Alias 명

Compact 노드명 앞에 alias 명을 둘 수 있다.
- A57_0

FDT -> Expanded Format으로 변환 수행

워드(4바이트) 단위로 정렬된 DTB를 unflatten_device_tree( ) 함수를 통해 unflatten 과정으로 변환하면 각 노드는 device_node 구조체로 변환된다. 전역 of_root 노드가 루트 노드를 가리킨다. 각 노드에 있는 속성들도 property 구조체로 변환되어 해당 노드에 등록된다. 이렇게 바이너리 형태로 존재하다가 device_node 구조체와 property 구조체를 할당받아 트리 형태로 구성된 것을 확장 포맷(expanded format)이라고 부른다.
노드와 속성은 of_로 시작되는 API에 의해 관리되어 사용한다. unflatten된 구조체들은 슬랩 캐시 할당자에서 할당받아 만들어진다.

unflatten_device_tree()

drivers/of/fdt.c

/**
 * unflatten_device_tree - create tree of device_nodes from flat blob
 *
 * unflattens the device-tree passed by the firmware, creating the
 * tree of struct device_node. It also fills the "name" and "type"
 * pointers of the nodes so the normal device-tree walking functions
 * can be used.
 */

void __init unflatten_device_tree(void)
{
        __unflatten_device_tree(initial_boot_params, NULL, &of_root,
                                early_init_dt_alloc_memory_arch, false);

        /* Get pointer to "/chosen" and "/aliases" nodes for use everywhere */
        of_alias_scan(early_init_dt_alloc_memory_arch);

        unittest_unflatten_overlay_base();
}

DTB를 확장 포맷으로 변환하고 관련 전역 변수가 적절한 노드를 가리키도록 초기화한다.

코드 라인 3~4에서 4바이트 단위의 바이너리로 구성된 DTB를 파싱하여 확장 포맷으로 변환한 후 of_root 전역 변수가 가리키게 한다.
코드 라인 7에서 전역 aliases_lookup 리스트에 alias_prop들을 추가한다.
- 전역 변수 of_aliases가 “/aliases” 노드를 가리키도록 설정한다.
- 전역 변수 of_chosen이 “/chosen” 노드를 가리키도록 설정한다.
- 전역 변수 of_stdout을 “/chosen” 노드의 “stdout-path” 속성 값에 대응하는 노드로 설정한다.

다음 그림과 같이 주요 노드들은 of_로 시작되는 전역 변수가 가리키는 것을 보여준다.

of_stdout
- /soc/uart@7e201000 노드를 가리킨다.
- /chosen 노드의 stdout-path가 가리키는 노드가 출력 디바이스로 사용된다.
of_aliases
- /aliases 노드를 가리킨다.
aliases_lookup
- /aliases 노드에 담겨있는 모든 속성들을 alias_prop 구조체 형태로 변환한 후 그 들이 연결되어 있다.
of_root
- 루트 노드를 가리킨다.
of_chosen
- /chosen 노드를 가리킨다.
of_stdout_option
- 출력 노드명에 사용된 옵션(‘:’문자로 시작하는) 문자열이 저장된다.

__unflatten_device_tree()

drivers/of/fdt.c

/**
 * __unflatten_device_tree - create tree of device_nodes from flat blob
 *
 * unflattens a device-tree, creating the
 * tree of struct device_node. It also fills the "name" and "type"
 * pointers of the nodes so the normal device-tree walking functions
 * can be used.
 * @blob: The blob to expand
 * @dad: Parent device node
 * @mynodes: The device_node tree created by the call
 * @dt_alloc: An allocator that provides a virtual address to memory
 * for the resulting tree
 * @detached: if true set OF_DETACHED on @mynodes
 *
 * Returns NULL on failure or the memory chunk containing the unflattened
 * device tree on success.
 */

void *__unflatten_device_tree(const void *blob,
                              struct device_node *dad,
                              struct device_node **mynodes,
                              void *(*dt_alloc)(u64 size, u64 align),
                              bool detached)
{
        int size;
        void *mem;

        pr_debug(" -> unflatten_device_tree()\n");

        if (!blob) {
                pr_debug("No device tree pointer\n");
                return NULL;
        }

        pr_debug("Unflattening device tree:\n");
        pr_debug("magic: %08x\n", fdt_magic(blob));
        pr_debug("size: %08x\n", fdt_totalsize(blob));
        pr_debug("version: %08x\n", fdt_version(blob));

        if (fdt_check_header(blob)) {
                pr_err("Invalid device tree blob header\n");
                return NULL;
        }

        /* First pass, scan for size */
        size = unflatten_dt_nodes(blob, NULL, dad, NULL);
        if (size < 0)
                return NULL;

        size = ALIGN(size, 4);
        pr_debug("  size is %d, allocating...\n", size);

        /* Allocate memory for the expanded device tree */
        mem = dt_alloc(size + 4, __alignof__(struct device_node));
        if (!mem)
                return NULL;

        memset(mem, 0, size);

        *(__be32 *)(mem + size) = cpu_to_be32(0xdeadbeef);

        pr_debug("  unflattening %p...\n", mem);

        /* Second pass, do actual unflattening */
        unflatten_dt_nodes(blob, mem, dad, mynodes);
        if (be32_to_cpup(mem + size) != 0xdeadbeef)
                pr_warn("End of tree marker overwritten: %08x\n",
                        be32_to_cpup(mem + size));

        if (detached && mynodes) {
                of_node_set_flag(*mynodes, OF_DETACHED);
                pr_debug("unflattened tree is detached\n");
        }

        pr_debug(" <- unflatten_device_tree()\n");
        return mem;
}

DTB를 파싱하여 확장 포맷으로 변환한 후 of_root 전역 변수가 가리키게 한다.

코드 라인 22~25에서 DTB의 첫 부분에 위치한 헤더에서 첫 워드를 통해 DTB 데이터 여부를 체크한다. 추가로 지원 가능한 DTB 버전이 0x02 ~ 0x11인지 확인하여 체크하고, 다른 경우 에러를 출력하고 처리를 하지 않는다.
코드 라인 28~30에서 가장 마지막 인자 dryrun을 true로 전달하여 실제 컨버팅 동작을 하지 않고 DTB를 unflatten할 때 만들어질 device_node 구조체들과 properties 구조체들의 구성에 필요한 전체 크기의 크기만을 구한다. 그리고 최종 산출된 크기를 워드(4바이트) 단위로 정렬한다.
코드 라인 36~38에서 인자로 전달받은 (*dt_alloc) 함수를 통해 메모리를 할당받는다. 할당 시의 크기로 위에서 산출한 크기에 추가로 끝부분을 나타내기 위한 4바이트만큼을 추가한다. 또한 정렬 단위는 시스템의 최소 정렬 단위가 주어지는데 ARM, ARM64는 4바이트다.
코드 라인 42에서 할당된 메모리의 마지막 4바이트에 0xdeadbeef를 저장한다. 이 값은 경계 침범을 모니터링하기 위해 사용한다.
코드 라인 47에서 DTB를 파싱하여 device_node, property 구조체 배열로 변환한다.
코드 라인 48~50에서 할당된 메모리의 끝에 설치한 경계 침범 값이 오염되었는지 확인하여 경고 출력을 한다.

다음 그림은 바이너리 형태의 DTB를 unflatten하여 확장 포맷으로 변환하는 모습을 보여준다.

unflatten_dt_node()

drivers/of/fdt.c

/**
 * unflatten_dt_nodes - Alloc and populate a device_node from the flat tree
 * @blob: The parent device tree blob
 * @mem: Memory chunk to use for allocating device nodes and properties
 * @dad: Parent struct device_node
 * @nodepp: The device_node tree created by the call
 *
 * It returns the size of unflattened device tree or error code
 */

static int unflatten_dt_nodes(const void *blob,
                              void *mem,
                              struct device_node *dad,
                              struct device_node **nodepp)
{
        struct device_node *root;
        int offset = 0, depth = 0, initial_depth = 0;
#define FDT_MAX_DEPTH   64
        struct device_node *nps[FDT_MAX_DEPTH];
        void *base = mem;
        bool dryrun = !base;

        if (nodepp)
                *nodepp = NULL;

        /*
         * We're unflattening device sub-tree if @dad is valid. There are
         * possibly multiple nodes in the first level of depth. We need
         * set @depth to 1 to make fdt_next_node() happy as it bails
         * immediately when negative @depth is found. Otherwise, the device
         * nodes except the first one won't be unflattened successfully.
         */
        if (dad)
                depth = initial_depth = 1;

        root = dad;
        nps[depth] = dad;

        for (offset = 0;
             offset >= 0 && depth >= initial_depth;
             offset = fdt_next_node(blob, offset, &depth)) {
                if (WARN_ON_ONCE(depth >= FDT_MAX_DEPTH))
                        continue;

                if (!IS_ENABLED(CONFIG_OF_KOBJ) &&
                    !of_fdt_device_is_available(blob, offset))
                        continue;

                if (!populate_node(blob, offset, &mem, nps[depth],
                                   &nps[depth+1], dryrun))
                        return mem - base;

                if (!dryrun && nodepp && !*nodepp)
                        *nodepp = nps[depth+1];
                if (!dryrun && !root)
                        root = nps[depth+1];
        }

        if (offset < 0 && offset != -FDT_ERR_NOTFOUND) {
                pr_err("Error %d processing FDT\n", offset);
                return -EINVAL;
        }

        /*
         * Reverse the child list. Some drivers assumes node order matches .dts
         * node order
         */
        if (!dryrun)
                reverse_nodes(root);

        return mem - base;
}

FDT 형태의 디바이스 트리를 파싱하고 확장(expand)하여 디바이스 노드로 변환한다. @blob에 디바이스 트리(FDT)의 시작 주소를 지정하고, 확장된 디바이스 노드가 저장될 @mem을 지정한다. @dad에는 부모 디바이스 노드를 지정하고, 출력 인자 @mynode는 이 함수가 호출되어 생성될 디바이스 노드이다. 처음 호출될 때 @dad에는 null, @mynode에는 루트 디바이스 노드를 가리키는 &of_root가 주어진다.

코드 라인 13~14에서 먼저 출력 인자 @nodepp에 null을 대입한다.
코드 라인 23~24에서 @dad가 지정된 경우에 한해 depth와 초기 depth를 1부터 시작한다.
코드 라인 29~33에서 다음(next) 노드를 읽고 offset을 알아온다. 이 때 읽은 노드의 depth도 알아온다.
코드 라인 35~37에서 노드가 enable 또는 ok 상태가 아닌 경우는 skip 한다.
코드 라인 39~41에서 노드를 활성화한다. 지금까지 변환한 사이즈를 반환한다.
코드 라인 43~44에서 2nd pass에서 @nodepp에 현재 노드를 지정한다. 단 한 번만 지정한다.
코드 라인 45~46에서 2nd pass의 루트가 아니고 아직 루트가 지정되지 않은 경우 현재 노드를 루트로 지정한다.
코드 라인 49~52에서 노드 파싱에 문제가 있는 경우 에러를 반환한다.
코드 라인 58~59에서 2nd pass인 경우 노드를 reverse 한다.
코드 라인 61에서 지금까지 변환한 사이즈를 반환한다.

디바이스 노드와 속성 활성화

populate_node()

drivers/of/fdt.c

static bool populate_node(const void *blob,
                          int offset,
                          void **mem,
                          struct device_node *dad,
                          struct device_node **pnp,
                          bool dryrun)
{
        struct device_node *np;
        const char *pathp;
        unsigned int l, allocl;

        pathp = fdt_get_name(blob, offset, &l);
        if (!pathp) {
                *pnp = NULL;
                return false;
        }

        allocl = ++l;

        np = unflatten_dt_alloc(mem, sizeof(struct device_node) + allocl,
                                __alignof__(struct device_node));
        if (!dryrun) {
                char *fn;
                of_node_init(np);
                np->full_name = fn = ((char *)np) + sizeof(*np);

                memcpy(fn, pathp, l);

                if (dad != NULL) {
                        np->parent = dad;
                        np->sibling = dad->child;
                        dad->child = np;
                }
        }

        populate_properties(blob, offset, mem, np, pathp, dryrun);
        if (!dryrun) {
                np->name = of_get_property(np, "name", NULL);
                if (!np->name)
                        np->name = "<NULL>";
        }

        *pnp = np;
        return true;
}

노드를 파싱하여 디바이스 노드로 변환한다. 성공 시 true를 반환한다.

코드 라인 12~16에서 노드 명이 null인 경우 출력 인자 @pnp에 null을 대입한 후 더 이상 처리하지 않고 false를 반환한다.
코드 라인 20~21에서 @mem에서 노드가 저장될 영역을 확보한다.
코드 라인 22~34에서 2nd pass인 경우 노드를 초기화 하고, 노드 명을 지정한 후 노드 간의 관계를 연결한다.
코드 라인 36~41에서 속성을 파싱하여 속성 정보로 변환한다. 속성 이름이 없는 경우 “<NULL>” 문자열을 이름으로 지정한다.
코드 라인 43~44에서 출력 인자 @pnp에 디바이스 노드를 지정하고, true를 반환한다.

populate_properties()

drivers/of/fdt.c -1/2-

static void populate_properties(const void *blob,
                                int offset,
                                void **mem,
                                struct device_node *np,
                                const char *nodename,
                                bool dryrun)
{
        struct property *pp, **pprev = NULL;
        int cur;
        bool has_name = false;

        pprev = &np->properties;
        for (cur = fdt_first_property_offset(blob, offset);
             cur >= 0;
             cur = fdt_next_property_offset(blob, cur)) {
                const __be32 *val;
                const char *pname;
                u32 sz;

                val = fdt_getprop_by_offset(blob, cur, &pname, &sz);
                if (!val) {
                        pr_warn("Cannot locate property at 0x%x\n", cur);
                        continue;
                }

                if (!pname) {
                        pr_warn("Cannot find property name at 0x%x\n", cur);
                        continue;
                }

                if (!strcmp(pname, "name"))
                        has_name = true;

                pp = unflatten_dt_alloc(mem, sizeof(struct property),
                                        __alignof__(struct property));
                if (dryrun)
                        continue;

                /* We accept flattened tree phandles either in
                 * ePAPR-style "phandle" properties, or the
                 * legacy "linux,phandle" properties.  If both
                 * appear and have different values, things
                 * will get weird. Don't do that.
                 */
                if (!strcmp(pname, "phandle") ||
                    !strcmp(pname, "linux,phandle")) {
                        if (!np->phandle)
                                np->phandle = be32_to_cpup(val);
                }

                /* And we process the "ibm,phandle" property
                 * used in pSeries dynamic device tree
                 * stuff
                 */
                if (!strcmp(pname, "ibm,phandle"))
                        np->phandle = be32_to_cpup(val);

                pp->name   = (char *)pname;
                pp->length = sz;
                pp->value  = (__be32 *)val;
                *pprev     = pp;
                pprev      = &pp->next;
        }

drivers/of/fdt.c -2/2-

        /* With version 0x10 we may not have the name property,
         * recreate it here from the unit name if absent
         */
        if (!has_name) {
                const char *p = nodename, *ps = p, *pa = NULL;
                int len;

                while (*p) {
                        if ((*p) == '@')
                                pa = p;
                        else if ((*p) == '/')
                                ps = p + 1;
                        p++;
                }

                if (pa < ps)
                        pa = p;
                len = (pa - ps) + 1;
                pp = unflatten_dt_alloc(mem, sizeof(struct property) + len,
                                        __alignof__(struct property));
                if (!dryrun) {
                        pp->name   = "name";
                        pp->length = len;
                        pp->value  = pp + 1;
                        *pprev     = pp;
                        pprev      = &pp->next;
                        memcpy(pp->value, ps, len - 1);
                        ((char *)pp->value)[len - 1] = 0;
                        pr_debug("fixed up name for %s -> %s\n",
                                 nodename, (char *)pp->value);
                }
        }

        if (!dryrun)
                *pprev = NULL;
}

아래 그림은 노드명이 full path name으로 바뀌어 저장되는 과정을 설명하였다.

아래 그림은 a@1000 노드의 서브 노드로 a2 노드가 추가될 때의 상황이다.

아래 그림은 cpu@0 노드에 속한 속성들이 연결된 모습을 보여준다.

다음 그림은 속성명에 name이 없는 경우 마지막에 추가되는 모습을 보여준다.

child 노드가 있는 경우 DTB 순서대로 만들기 위해 각 child 노드를 reverse 한다.

unflatten_dt_alloc()

drivers/of/fdt.c

static void *unflatten_dt_alloc(void **mem, unsigned long size,
                                       unsigned long align)
{
        void *res;

        *mem = PTR_ALIGN(*mem, align);
        res = *mem;
        *mem += size;

        return res; 
}

mem 값을 align 단위로 round up 하고 리턴하며 입출력 인수 mem 값은 size만큼 증가시킨다.

early_init_dt_alloc_memory_arch()

drivers/of/fdt.c

static void * __init early_init_dt_alloc_memory_arch(u64 size, u64 align)
{
        void *ptr = memblock_alloc(size, align);

        if (!ptr)
                panic("%s: Failed to allocate %llu bytes align=0x%llx\n",
                      __func__, size, align);

        return ptr;
}

align 단위로 size 만큼의 공간을 memblock으로 부터 할당 받고 그 가상 주소를 리턴한다.

alias 노드 스캔

of_alias_scan()

drivers/of/base.c

/**
 * of_alias_scan - Scan all properties of the 'aliases' node
 *
 * The function scans all the properties of the 'aliases' node and populates
 * the global lookup table with the properties.  It returns the
 * number of alias properties found, or an error code in case of failure.
 *
 * @dt_alloc:   An allocator that provides a virtual address to memory
 *              for storing the resulting tree
 */

void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
{
        struct property *pp;

        of_aliases = of_find_node_by_path("/aliases");
        of_chosen = of_find_node_by_path("/chosen");
        if (of_chosen == NULL)
                of_chosen = of_find_node_by_path("/chosen@0");

        if (of_chosen) {
                /* linux,stdout-path and /aliases/stdout are for legacy compatibility */
                const char *name = NULL;

                if (of_property_read_string(of_chosen, "stdout-path", &name))
                        of_property_read_string(of_chosen, "linux,stdout-path",
                                                &name);
                if (IS_ENABLED(CONFIG_PPC) && !name)
                        of_property_read_string(of_aliases, "stdout", &name);
                if (name)
                        of_stdout = of_find_node_opts_by_path(name, &of_stdout_options);
        }

        if (!of_aliases)
                return;

        for_each_property_of_node(of_aliases, pp) {
                const char *start = pp->name;
                const char *end = start + strlen(start);
                struct device_node *np;
                struct alias_prop *ap;
                int id, len;

                /* Skip those we do not want to proceed */
                if (!strcmp(pp->name, "name") ||
                    !strcmp(pp->name, "phandle") ||
                    !strcmp(pp->name, "linux,phandle"))
                        continue;

                np = of_find_node_by_path(pp->value);
                if (!np)
                        continue;

                /* walk the alias backwards to extract the id and work out
                 * the 'stem' string */
                while (isdigit(*(end-1)) && end > start)
                        end--;
                len = end - start;

                if (kstrtoint(end, 10, &id) < 0)
                        continue;

                /* Allocate an alias_prop with enough space for the stem */
                ap = dt_alloc(sizeof(*ap) + len + 1, __alignof__(*ap));
                if (!ap)
                        continue;
                memset(ap, 0, sizeof(*ap) + len + 1);
                ap->alias = start;
                of_alias_add(ap, np, id, start, len);
        }
}

“/chosen” 노드를 검색하여 전역 변수 of_stdout을 “stdout-path”에 연결된 노드로 설정한다. 그리고 “/aliases” 노드의 속성 중 “name” 및 “phandle”을 찾아 aliases_prop 구조체로 구성하여 리스트 aliases_lookup에 추가한다.

코드 라인 5에서 “/aliases” 노드를 찾아 전역 of_aliases에 설정한다.
코드 라인 6~8에서 “/chosen” 노드를 찾아 전역 of_chosen에 설정한다. 만일 of_chosen 노드가 발견되지 않으면 “/chosen@0”으로 다시 한번 검색한다.
코드 라인 10~18에서 “/chosen” 노드가 발견된 경우 of_chosen 노드에 있는 속성들에서 “stdout-path” 속성명으로 검색하여 찾은 속성 value 값을 name에 저장한다. 검색 결과가 없으면 레거시 호환을 위해 “linux,stdout-path” 및 “stdout” 속성명으로도 검색한다.
코드 라인 19~20에서 name(of_chosen 노드에서 검색한 “stdout-path” 속성의 value 값)으로 노드를 검색하고 전역 of_stdout_options에는 name 문자열에 옵션(‘:’ 문자로 시작하는 문자열) 값이 있다면 저장한다.
코드 라인 23~24에서 등록된 aliases가 없다면 함수를 빠져나간다.
코드 라인 26~31에서 of_aliases에 속한 모든 속성에 대해 루프를 돈다.
코드 라인 34~37에서 속성명이 “name”, “phandle”, “linux,phandle”인 경우에는 스킵한다.
코드 라인 39~41에서 속성의 value 값으로 노드를 검색하고 노드가 발견되지 않으면 스킵한다.
코드 라인 45~46에서 속성 값에 있는 노드명이 숫자가 있다면 숫자가 시작되는 위치를 end에 설정한다.
- 예 “/abc/def@1000”
  - end = ‘1’ 문자를 가리킴
코드 라인 47에서 len에 노드명의 마지막에 있는 주소를 제외한 ‘@’ 문자까지의 노드명 길이가 담긴다.
- 예) “/abc/def@1000”
  - len = 9
- 예) “/abc”
  - len = 4
코드 라인 49~50에서 주소에 대한 문자열을 10진수로 변환하여 id 변수에 저장을 하는데, 에러인 경우에는 스킵한다.
코드 라인 53~55에서 alias_prop 구조체 크기 + len + 1 크기만큼 memblock을 할당한다. 만일 할당이 실패하면 스킵한다.
코드 라인 56에서 할당받은 메모리를 0으로 초기화한다.
코드 라인 57에서 alias가 속성명을 가리키게 한다. /alias 노드의 각 속성명은 ‘ / ’ 문자로 시작하지 않는다.
- 예 name = “uart0”
코드 라인 58에서 np, id, stem 값을 저장하고 전역 aliases_lookup 리스트에 추가한다. stem 문자열은 속성명에서 주소 부분을 제외했다.
- 예) 속성 데이터 = “/soc/uart@7e201000”
  - stem = “/soc/uart@”

구조체

device_node 구조체

include/linux/of.h

struct device_node {
        const char *name;
        phandle phandle;
        const char *full_name;
        struct fwnode_handle fwnode;

        struct  property *properties;
        struct  property *deadprops;    /* removed properties */
        struct  device_node *parent;
        struct  device_node *child;
        struct  device_node *sibling;
#if defined(CONFIG_OF_KOBJ)
        struct  kobject kobj;
#endif
        unsigned long _flags;
        void    *data;
#if defined(CONFIG_SPARC)
        unsigned int unique_id;
        struct of_irq_controller *irq_trans;
#endif
};

property 구조체

include/linux/of.h

struct property {
        char    *name;
        int     length;
        void    *value;
        struct property *next;
#if defined(CONFIG_OF_DYNAMIC) || defined(CONFIG_SPARC)
        unsigned long _flags;
#endif
#if defined(CONFIG_OF_PROMTREE)
        unsigned int unique_id;
#endif
#if defined(CONFIG_OF_KOBJ)
        struct bin_attribute attr;
#endif
};

alias_prop 구조체

drivers/of/of_private.h

/**
 * struct alias_prop - Alias property in 'aliases' node
 * @link:       List node to link the structure in aliases_lookup list
 * @alias:      Alias property name
 * @np:         Pointer to device_node that the alias stands for
 * @id:         Index value from end of alias name
 * @stem:       Alias string without the index
 *
 * The structure represents one alias property of 'aliases' node as
 * an entry in aliases_lookup list.
 */

struct alias_prop {
        struct list_head link;
        const char *alias;
        struct device_node *np;
        int id;
        char stem[];
};

link
- 링크드 리스트
alias
- alias 속성명
  - 예) “chosen”, “uart0”
np
- 노드(device_node)를 가리킨다.
id
- 노드의 메모리 또는 포트가 사용하는 주소
  - 예) 노드명이 serial@12000 인 경우 id=12000
stem
- index(id)를 제외한 full path 노드명
  - 예) “/soc/uart@”

각 구조체의 Memblock 할당 시 사이즈

device_node
- 생성할 때 마다 full path 노드명 공간이 추가 할당된다.
- name, type, data 등은 기존 DTB에 있는 문자열이나 값을 가리킨다.
property
- 생성할 때 마다 property 사이즈만큼 공간이 할당된다. 그러나 DTB에 없는 name 속성을 추가 생성해야 하는 경우에는 property 속성 이외에도 주소제외 노드명 공간을 추가 할당한다.
alias_prop
- 생성할 때 마다 주소제외 노드명 공간이 추가 할당되고 이 공간은 stem 문자열이 사용하는 공간이다.