문c 블로그

lockdep

2015-11-302015-12-31 문영일 Leave a comment

lockdep

커널이 사용하는 lock(spinlock, mutex, semaphore)을 디버깅하기 위해 사용
Lock Problem(Dead-Lock)발생 시 경고 출력을 한다. (dmesg로 볼 수 있다.)
ww-mutex를 사용하면 deadlock도 회피할 수 있다.
현재 커널은 userspace에서 사용하는 lock도 디버깅할 수 있다.
커널 빌드 시 락 디버깅 옵션을 enable하여 사용한다.

커널 메뉴 설정 방법

Kernel hacking  --->
	Lock Debugging (spinlocks, mutexes, etc...)  --->
		[*] RT Mutex debugging, deadlock detection
		-*- Spinlock and rw-lock debugging: basic checks
		-*- Mutex debugging: basic checks
		[*] Wait/wound mutex debugging: Slowpath testing
		-*- Lock debugging: detect incorrect freeing of live locks
		[*] Lock debugging: prove locking correctness
		[*] Lock usage statistics
		[*] Lock dependency engine debugging
		[*] Sleep inside atomic section checking
		[*] Locking API boot-time self-tests
		<M> torture tests for locking

/proc 디렉토리에 생성되는 항목들

/proc/lockdep
/proc/lockdep_chains
/proc/lockdep_stat
/proc/locks
/proc/lock_stats

/proc/lockdep

all lock classes:
80a1c55c OPS:      19 FD:   50 BD:    2 +.+...: cgroup_mutex
 -> [80a1c6b0] cgroup_idr_lock
 -> [80a1c5b0] css_set_rwsem
 -> [80a30a74] devcgroup_mutex
 -> [80a1cf3c] freezer_mutex
 -> [80a274ac] kernfs_mutex
(...)

/proc/lockdep_chains

all lock chains:
irq_context: 0
[80a1c55c] cgroup_mutex

irq_context: 0
[80a16a3c] resource_lock
(...)

/proc/lockdep_stat

lock-classes:                         1212 [max: 8191]
 direct dependencies:                  4230 [max: 32768]
 indirect dependencies:               13035
 all direct dependencies:             63105
 dependency chains:                    6166 [max: 65536]
 dependency chain hlocks:             18187 [max: 327680]
 in-hardirq chains:                      29
 in-softirq chains:                     261
 in-process chains:                    4580
 stack-trace entries:                 60133 [max: 524288]
 combined max dependencies:        36006660
 hardirq-safe locks:                     28
 hardirq-unsafe locks:                  471
 softirq-safe locks:                     91
 softirq-unsafe locks:                  415
 irq-safe locks:                         97
 irq-unsafe locks:                      471
 hardirq-read-safe locks:                 3
 hardirq-read-unsafe locks:              80
 softirq-read-safe locks:                 9
 softirq-read-unsafe locks:              77
 irq-read-safe locks:                    10
 irq-read-unsafe locks:                  80
 uncategorized locks:                   130
 unused locks:                            1
 max locking depth:                      15
 max bfs queue depth:                   159
 chain lookup misses:                  6175
 chain lookup hits:                10136672
 cyclic checks:                        4705
 find-mask forwards checks:            1593
 find-mask backwards checks:          31551
 hardirq on events:                 8784803
 hardirq off events:                8784808
 redundant hardirq ons:              356471
 redundant hardirq offs:            6574108
 softirq on events:                  116537
 softirq off events:                 116565
 redundant softirq ons:                   0
 redundant softirq offs:                  0
 debug_locks:                             1

/proc/locks

1: FLOCK  ADVISORY  WRITE 2057 00:0e:7200 0 EOF
2: FLOCK  ADVISORY  WRITE 2246 00:0e:7097 0 EOF

/proc/lock_stat

lock_stat version 0.4
-----------------------------------------------------------------------------------------------------
               class name    con-bounces    contentions   waittime-min   waittime-max waittime-total 
-----------------------------------------------------------------------------------------------------

 &mapping->i_mmap_rwsem-W:          1630           2186           0.99       64477.45      988876.13 
 &mapping->i_mmap_rwsem-R:             0              0           0.00           0.00           0.00 
 ------------------------
   &mapping->i_mmap_rwsem           1017         [<801343b0>] unlink_file_vma+0x34/0x50
   &mapping->i_mmap_rwsem            319         [<8013448c>] vma_link+0x44/0xbc
   &mapping->i_mmap_rwsem            327         [<80024244>] copy_process.part.44+0x1440/0x17e4
   &mapping->i_mmap_rwsem            523         [<801347cc>] vma_adjust+0x2c8/0x604
 ------------------------
   &mapping->i_mmap_rwsem            274         [<8013448c>] vma_link+0x44/0xbc
   &mapping->i_mmap_rwsem            320         [<80024244>] copy_process.part.44+0x1440/0x17e4
   &mapping->i_mmap_rwsem            603         [<801347cc>] vma_adjust+0x2c8/0x604
   &mapping->i_mmap_rwsem            989         [<801343b0>] unlink_file_vma+0x34/0x50

......................................................................................................

---------------------------------------------------------------------------------------------------------
 waittime-avg    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total   holdtime-avg
---------------------------------------------------------------------------------------------------------

       452.37          37732         174127           2.19       66922.50     2247589.89          12.91
         0.00             12            727          10.57        3269.64       36358.60          50.01










  
.........................................................................................................

참고자료:

Runtime locking correctness validator | kernel.org
How to use lockdep feature in linux kernel for deadlock detection | stackoverflow.com
Userspace Lockdep | LWN.net
[Linux] lockdep: 런타임 락 의존성 검사 | F/OSS study

set_task_stack_end_magic()

2015-11-302021-09-07 문영일 Leave a comment

set_task_stack_end_magic()

start_kernel()이 커널 초기화 과정을 수행하는 동안 사용할 최초 커널 스택의 마지막에 magic value를 기록한다. 이후에 만들어지는 커널 스택은 메모리를 할당 받아 생성되어 사용되며 태스크가 종료되는 경우 메모리를 회수한다.
기록된 magic value(STACK_END_MAGIC: 0x57AC6E9D)를 통해 kernel stack overflow를 감지하는데 사용된다.

void set_task_stack_end_magic(struct task_struct *tsk)
{
        unsigned long *stackend;

        stackend = end_of_stack(tsk);
        *stackend = STACK_END_MAGIC;    /* for overflow detection */
}

STACK_END_MAGIC=0x57AC_6E9D

end_of_stack()

CONFIG_THREAD_INFO_IN_TASK 사용 케이스 (ARM64 must)

include/linux/sched/task_stack.h

static inline unsigned long *end_of_stack(const struct task_struct *task)
{
        return task->stack;
}

태스크에 해당하는 stack의 끝 주소를 반환한다. 스택의 마지막 위치에는 스택 마지막을 표시하는 매직 값이 위치한다.

ARM32 에서 CONFIG_THREAD_INFO_IN_TASK를 사용하지 않는 케이스

include/linux/sched/task_stack.h

/*
 * Return the address of the last usable long on the stack.
 *
 * When the stack grows down, this is just above the thread
 * info struct. Going any lower will corrupt the threadinfo.
 *
 * When the stack grows up, this is the highest address.
 * Beyond that position, we corrupt data on the next page.
 */

static inline unsigned long *end_of_stack(struct task_struct *p)
{
#ifdef CONFIG_STACK_GROWSUP
        return (unsigned long *)((unsigned long)task_thread_info(p) + THREAD_SIZE) - 1; 
#else
        return (unsigned long *)(task_thread_info(p) + 1);
#endif
}

요청한 태스크에 해당하는 스택의 마지막 unsigned long 값을 반환한다. 스택의 마지막 위치에는 스택 마지막을 표시하는 매직 값이 위치한다.

CONFIG_STACK_GROWSUP
- 스택이 상향으로 push되는 경우에 사용
- arm, arm64 default: 하향으로 스택이 push된다.
task가 가지고 있는 kernel stack의 마지막 주소를 리턴한다.
- task는 kernel stack과 user stack를 각각 하나씩 가진다.
- kernel stack은 kernel이 자신의 코드를 수행할 때 사용하는 코드이다.
- 예를 들어, user application이 요청한 시스템 콜을 수행할 때 kernel stack이 사용될 수 있다.
- 참고: kernel stack

kernel stack 구조

아키텍처마다 커널 스택 크기가 다르다.
- arm32
  - 8K (2 페이지)
  - 1개 페이지로 돌아가려고 노력하고 있으므로, 나중에 변경될 수도 있다.
- arm64
  - 16K (default)
  - 64K (1페이지=64K 이면서 vmap stack 사용시)

아래 그림은 커널 스택에서의 스택 보호용 매직 번호가 있는 위치를 보여준다.

arm64의 경우 thread_info가 스택에서 제거되어 task_struct의 안으로 들어가므로 매직 번호는 가장 하단에 위치하게 된다.

thread_union

include/linux/sched.h

union thread_union {
#ifndef CONFIG_ARCH_TASK_STRUCT_ON_STACK
        struct task_struct task;
#endif
#ifndef CONFIG_THREAD_INFO_IN_TASK
        struct thread_info thread_info;
#endif
        unsigned long stack[THREAD_SIZE/sizeof(long)];
};

CONFIG_ARCH_TASK_STRUCT_ON_STACK
- ia64 아키텍처만 이 커널 옵션을 사용한다.
CONFIG_THREAD_INFO_IN_TASK
- 커널 v4.9-rc1에 추가된 기능으로 이 커널 옵션을 사용하는 경우 보안을 위해 스택의 아래에 존재하던 thread_info를 제거하여 task_struct의 첫 엔트리로 옮겼다. 이렇게 첫 부분에 옮겨 놓았으므로 task_struct 구조체도 thread_union의 thread_info를 통해 접근이 가능하다.
- arm은 사용하지 않는 옵션이지만 arm64의 경우는 적용되었다.
- 참고: sched/core: Allow putting thread_info into task_struct

task_struct 구조체 (처음 부분만)

include/linux/sched.h

struct task_struct {
#ifdef CONFIG_THREAD_INFO_IN_TASK
        /*
         * For reasons of header soup (see current_thread_info()), this
         * must be the first element of task_struct.
         */
        struct thread_info              thread_info;
#endif
        volatile long                   state;

        /*
         * This begins the randomizable portion of task_struct. Only
         * scheduling-critical items should be added above here.
         */
        randomized_struct_fields_start

        void                            *stack;
        ...
}

CONFIG_THREAD_INFO_IN_TASK 커널 옵션을 사용한 경우 가장 처음에 스레드 정보가 위치함을 알 수 있다.

thread_info – ARM

arch/arm/include/asm/thread_info.h

struct thread_info {
        unsigned long           flags;          /* low level flags */
        int                     preempt_count;  /* 0 => preemptable, <0 => bug */
        mm_segment_t            addr_limit;     /* address limit */
        struct task_struct      *task;          /* main task structure */
        __u32                   cpu;            /* cpu */
        __u32                   cpu_domain;     /* cpu domain */
#ifdef CONFIG_STACKPROTECTOR_PER_TASK
        unsigned long           stack_canary;
#endif
        struct cpu_context_save cpu_context;    /* cpu context */
        __u32                   syscall;        /* syscall number */
        __u8                    used_cp[16];    /* thread used copro */
        unsigned long           tp_value[2];    /* TLS registers */
#ifdef CONFIG_CRUNCH
        struct crunch_state     crunchstate;
#endif
        union fp_state          fpstate __attribute__((aligned(8)));
        union vfp_state         vfpstate;
#ifdef CONFIG_ARM_THUMBEE
        unsigned long           thumbee_state;  /* ThumbEE Handler Base register */
#endif
};

thread_info – ARM64

arch/arm64/include/asm/thread_info.h

struct thread_info {
        unsigned long           flags;          /* low level flags */
        mm_segment_t            addr_limit;     /* address limit */
#ifdef CONFIG_ARM64_SW_TTBR0_PAN
        u64                     ttbr0;          /* saved TTBR0_EL1 */
#endif
        union {
                u64             preempt_count;  /* 0 => preemptible, <0 => bug */
                struct {
#ifdef CONFIG_CPU_BIG_ENDIAN
                        u32     need_resched;
                        u32     count;
#else
                        u32     count;
                        u32     need_resched;
#endif
                } preempt;
        };
};

task_thread_info()

include/linux/sched.h

#ifdef CONFIG_THREAD_INFO_IN_TASK
static inline struct thread_info *task_thread_info(struct task_struct *task)
{
        return &task->thread_info;
}
#elif !defined(__HAVE_THREAD_FUNCTIONS)
# define task_thread_info(task) ((struct thread_info *)(task)->stack)
#endif

요청한 태스크에 해당하는 스레드 정보(thread_info 구조체)를 반환한다.

CONFIG_THREAD_INFO_IN_TASK 커널 옵션을 사용한 여부와 관련하여
- 사용한 경우 task 디스크립터에 위치한 스레드 정보를 가져온다.
- 사용하지 않은 경우 스택에서 스레드 정보를 가져온다.

thread_union

include/linux/sched.h

union thread_union {
#ifndef CONFIG_ARCH_TASK_STRUCT_ON_STACK
        struct task_struct task;
#endif
#ifndef CONFIG_THREAD_INFO_IN_TASK
        struct thread_info thread_info;
#endif
        unsigned long stack[THREAD_SIZE/sizeof(long)];
};

스택과 같은 주소를 공유하는 구조체들이다.

CONFIG_ARCH_TASK_STRUCT_ON_STACK (default=n) 커널 옵션을 사용하지 않는 경우에 task_struct 구조체를 사용한다.
- Construct init thread stack in the linker script rather than by union (2018, v4.16-rc1)
CONFIG_THREAD_INFO_IN_TASK (default=y) 커널 옵션을 사용하지 않는 경우에 thread_info 구조체를 사용한다. (ARM64는 사용하지 않는다)
- 참고: sched/core: Allow putting thread_info into task_struct (2016, v4.9-rc1)
스택은 THREAD_SIZE 크기 만큼 사용한다.

다음 그림은 CONFIG_THREAD_INFO_IN_TASK 커널 옵션 사용에 따른 변화를 보여준다.

ARM64의 경우 이 커널 옵션은 항상 선택되어 thread_info가 스택에서 분리하여 태스크 디스크립터에 포함된다.

다음 그림은 CONFIG_ARCH_TASK_STRUCT_ON_STACK 커널 옵션 사용에 따른 변화를 보여준다.

이 커널 옵션을 사용하는 경우 vmlinux.lds.h가 지정하는 섹션을 통해 들어가다보면 init_task가 init_stack 내에 포함되는 것을 알 수 있다.

INIT_TASK_DATA()

include/asm-generic/vmlinux.lds.h

#define INIT_TASK_DATA(align)                                           \
        . = ALIGN(align);                                               \
        __start_init_task = .;                                          \
        init_thread_union = .;                                          \
        init_stack = .;                                                 \
        KEEP(*(.data..init_task))                                       \
        KEEP(*(.data..init_thread_info))                                \
        . = __start_init_task + THREAD_SIZE;                            \
        __end_init_task = .;

init_stack을 만들때 __start_init_task, init_thread_union 심볼등도 같은 주소로 생성한다.

init_task의 경우 CONFIG_ARCH_TASK_STRUCT_ON_STACK 커널 옵션을 사용하는 경우 아래의 __init_task_data 섹션 지시 매크로를 통해 .data..init_task 섹션에 포함시킨다.
init_thread_info의 경우 CONFIG_THREAD_INFO_IN_TASK 커널 옵션을 사용지 않는 경우 .data..init_task 섹션에 포함시킨다. (ARM64의 경우 이 옵션을 사용하지 않으므로 해당 섹션 위치를 사용하지 않는다.)

__init_task_data

include/linux/init_task.h

/* Attach to the init_task data structure for proper alignment */

#ifdef CONFIG_ARCH_TASK_STRUCT_ON_STACK
#define __init_task_data __section(".data..init_task")
#else
#define __init_task_data /**/
#endif

init_task 생성 시 이 섹션 지시 매크로를 통해 .data..init_task 섹션에 포함시키도록 한다.

get_current()

arch/arm64/include/asm/current.h

/*
 * We don't use read_sysreg() as we want the compiler to cache the value where
 * possible.
 */

static __always_inline struct task_struct *get_current(void)
{
        unsigned long sp_el0;

        asm ("mrs %0, sp_el0" : "=r" (sp_el0));

        return (struct task_struct *)sp_el0;
}

현재 태스크를 반환한다.

커널(el1)에서는 사용하지 않는 sp_el0 레지스터를 사용하여 태스크를 가리키도록 하여 사용하고 있다.

참고

함수선언부 관련 매크로 (attribute)

2015-11-302016-03-21 문영일 Leave a comment

__init

__section(.init.text) __cold notrace
- __section(S) __attribute__ ((__section__(#S)))
init.text 섹션에 해당 코드를 배치한다.

__cold

__attribute__((__cold__))
호출될 가능성이 희박한 함수를 뜻함.
속도보다 사이즈에 더 최적화를 수행한다.
unlikely()의 사용을 줄일 수 있게 된다. unlikely() 함수들은 old compiler 호환성을 위해 그냥같이 사용한다.
text 섹션의 한 쪽에 __cold 펑션들을 모아두는 지역성(locality)도 있다. 당연히 이로 인한 cache 효율성도 좋아진다.

notrace

__attribute__((no_instrument_function))
컴파일러에서 -finstrument-functions 컴파일 옵션을 사용할 때에도 해당 함수에 대한 profiling을 비활성한다.
참고:
- Trace and profile function calls with GCC
- 함수 호출 시각화 하기 | JaPa2
- Profiling | 문c

__weak

__attribute__((weak))
해당 심볼을 weak symbol로 만든다.
링커가 링크를 수행 시 다른곳에 같은 이름으로 만든 strong symbol이 존재하면 weak symbol 대신 strong symbol을 사용한다.
참고: GCC Weak Function Attributes

__attribute_const__

__attribute__((__const__))
전달 받은 인수외에 global 변수에 접근할 수 없다.
side effect가 생기지 않는다.
참고: Implications of pure and constant functions | LWN.net

__pure

__attribute__((pure)
전달 받은 인수외에 global 변수로의 access는 읽기만 가능한다.
side effect가 생기지 않는다.

__read_mostly

__attribute__((__section__(“.data.read_mostly”)))
읽기 위주의 데이터들만을 위한 섹션으로 캐시 라인 바운싱을 회피하기 위한 솔루션
- SMP 머신에서 cache eviction이 최소화될 수 있는 데이터들끼리 모여있도록 함으로 성능향상을 목표로 하였다.
- 캐시 라인 바운싱 참고: Exclusive loads and store | 문c
참고: Short subjects: kerneloops, read-mostly, and port 80 | LWN.net
.data.read_mostly 섹션은 RO_DATA_SECTION 다음에 위치한 RW_DATA_SECTION 에 정의되어 있다.

include/asm-generic/vmlinux.lds.h

/*
 * Helper macros to support writing architecture specific
 * linker scripts.
 *
 * A minimal linker scripts has following content:
 * [This is a sample, architectures may have special requiriements]
 *
 * OUTPUT_FORMAT(...)
 * OUTPUT_ARCH(...)
 * ENTRY(...)
 * SECTIONS
 * {
 *      . = START;
 *      __init_begin = .;
 *      HEAD_TEXT_SECTION
 *      INIT_TEXT_SECTION(PAGE_SIZE)
 *      INIT_DATA_SECTION(...)
 *      PERCPU_SECTION(CACHELINE_SIZE)
 *      __init_end = .;
 *
 *      _stext = .;
 *      TEXT_SECTION = 0
 *      _etext = .;
 *
 *      _sdata = .;
 *      RO_DATA_SECTION(PAGE_SIZE)
 *      RW_DATA_SECTION(...)
 *      _edata = .;
 *
 *      EXCEPTION_TABLE(...)
 *      NOTES
 *
 *      BSS_SECTION(0, 0, 0)
 *      _end = .;
 *
 *      STABS_DEBUG
 *      DWARF_DEBUG
 *
 *      DISCARDS                // must be the last
 * }
 *
 * [__init_begin, __init_end] is the init section that may be freed after init
 *      // __init_begin and __init_end should be page aligned, so that we can
 *      // free the whole .init memory
 * [_stext, _etext] is the text section
 * [_sdata, _edata] is the data section
 *
 * Some of the included output section have their own set of constants.
 * Examples are: [__initramfs_start, __initramfs_end] for initramfs and
 *               [__nosave_begin, __nosave_end] for the nosave data
 */

/*
 * Writeable data.
 * All sections are combined in a single .data section.
 * The sections following CONSTRUCTORS are arranged so their
 * typical alignment matches.
 * A cacheline is typical/always less than a PAGE_SIZE so
 * the sections that has this restriction (or similar)
 * is located before the ones requiring PAGE_SIZE alignment.
 * NOSAVE_DATA starts and ends with a PAGE_SIZE alignment which
 * matches the requirement of PAGE_ALIGNED_DATA.
 *
 * use 0 as page_align if page_aligned data is not used */
#define RW_DATA_SECTION(cacheline, pagealigned, inittask)               \
        . = ALIGN(PAGE_SIZE);                                           \
        .data : AT(ADDR(.data) - LOAD_OFFSET) {                         \
                INIT_TASK_DATA(inittask)                                \
                NOSAVE_DATA                                             \
                PAGE_ALIGNED_DATA(pagealigned)                          \
                CACHELINE_ALIGNED_DATA(cacheline)                       \
                READ_MOSTLY_DATA(cacheline)                             \
                DATA_DATA                                               \
                CONSTRUCTORS                                            \
        }

#define READ_MOSTLY_DATA(align)                                         \
        . = ALIGN(align);                                               \
        *(.data..read_mostly)                                           \
        . = ALIGN(align);

__used

__attribute__((used))
해당 객체 또는 함수가 참조되지 않아도 사용하는 것처럼 컴파일러로 하여금 삭제되지 않도록 한다.

__visible

__attribute__((externally_visible))
LTO(Link Time Optimization) 기능을 사용하는 경우 caller(호출측)와 callee(피호출측)의 관계에서 링커가 callee가 한 번만 사용된다고 판단되는 경우 caller에 callee를 inline화 하여 집어 넣는다.
externally_visible 속성을 사용하는 경우 LTO 옵션을 사용하여 링크를 하는 경우에도 하나의 완전한 함수나 객체로 외부에 보여질 수 있도록 심볼화하여 해당 함수나 객체가 inline화 되지 않도록 막는다.
-flto 또는 -whole-program을 사용하여 LTO 기능을 동작시킨다.
참고: Enable link-time optimization (after switching to avr-gcc 4.5 or greater)

asmlinkage

어셈블리 코드에서 C 함수를 호출할 때 함수 인자의 전달을 레지스터가 아닌 스택을 이용하도록 해주는 속성지정 매크로이다.
extern “C”로 정의되어 있다.
참고: [Linux] asmlinkage – F/OSS

참고

Declaring Attributes of Functions | gcc.gnu.org
Options That Control Optimization | gcc.gnu.org

lockdep_init()

2015-11-302016-02-11 문영일 Leave a comment

lockdep_init()

lockdep
- lock dependency의 약자로 커널이 lock을 모니터링하고 디버깅하기 위한 것으로 dead-lock 검출도 한다.

void lockdep_init(void)
{
        int i;

        /*   
         * Some architectures have their own start_kernel()
         * code which calls lockdep_init(), while we also
         * call lockdep_init() from the start_kernel() itself,
         * and we want to initialize the hashes only once:
         */

        if (lockdep_initialized)
                return;

        for (i = 0; i < CLASSHASH_SIZE; i++) 
                INIT_LIST_HEAD(classhash_table + i);

        for (i = 0; i < CHAINHASH_SIZE; i++) 
                INIT_LIST_HEAD(chainhash_table + i);

        lockdep_initialized = 1;
}

if (lockdep_initialized)
- lockdep_initialized는 lockdep_init() 함수가 이미 초기화되었음을 의미한다. 따라서 lockdep_initialized가 true(1)일 경우에는 초기화 코드를 수행하지 않는다.
INIT_LIST_HEAD(classhash_table + i);
- lockdep에 사용될 class마다 hash table을 만든다.
- lockdep에 사용되는 class는 4096개(CLASSHASH_SIZE)이다.
- classhash_table을 CLASSHASH_SIZE(4096)개 만큼 초기화한다.
INIT_LIST_HEAD(chainhash_table + i);
- chainhash_table을 CHAINHASH_SIZE(32768)개 만큼 초기화한다.

classhash_table & chainhash_table

/*
 * The lockdep classes are in a hash-table as well, for fast lookup:
 */
static struct list_head classhash_table[CLASSHASH_SIZE];

/*
 * We put the lock dependency chains into a hash-table as well, to cache
 * their existence:
 */
static struct list_head chainhash_table[CHAINHASH_SIZE];

list_head의 구조

struct list_head {
    struct list_head *next, *prev;
};

INIT_LIST_HEAD()

static inline void INIT_LIST_HEAD(struct list_head *list)
{
        list->next = list;
        list->prev = list;
}

참고

Lockdep | 문c

kernel/head.S – v7_flush_dcache_louis:

2015-11-242016-02-11 문영일 Leave a comment

compressed/head.S 에서 사용하였던 __armv7_mmu_cache_flush: 루틴과 거의 동일하다.
데이터 캐시를 지우라고 요청하면 SoC 정보를 확인하여 지원하는 최종 캐시레벨을 확인한 후 L1 부터 해당 캐시 레벨까지 flush한다.
해당 캐시 레벨에서는 index와 way를 사용하여 하나씩 삭제한다.

hierarchical cache

ARMv6 아키텍처까지는 ARM 아키텍처에서 L1 캐시만 지원하였었다.
ARMv7 아키텍처부터 다단계의 캐시를 지원하게되었다.
다단계 캐시, ARM에서는 hierarchical cache 구조 라고 한다.

v7_flush_dcache_louis()

arch/arm/mm/cache-v7.S

 /*
 *     v7_flush_dcache_louis()
 *
 *     Flush the D-cache up to the Level of Unification Inner Shareable
 *
 *     Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
 */

ENTRY(v7_flush_dcache_louis)
        dmb                                     @ ensure ordering with previous memory accesses
        mrc     p15, 1, r0, c0, c0, 1           @ read clidr, r0 = clidr
        ALT_SMP(ands    r3, r0, #(7 << 21))     @ extract LoUIS from clidr
        ALT_UP(ands     r3, r0, #(7 << 27))     @ extract LoUU from clidr
#ifdef CONFIG_ARM_ERRATA_643719
        ALT_SMP(mrceq   p15, 0, r2, c0, c0, 0)  @ read main ID register
        ALT_UP(reteq    lr)                     @ LoUU is zero, so nothing to do
        ldreq   r1, =0x410fc090                 @ ID of ARM Cortex A9 r0p?
        biceq   r2, r2, #0x0000000f             @ clear minor revision number
        teqeq   r2, r1                          @ test for errata affected core and if so...
        orreqs  r3, #(1 << 21)                  @   fix LoUIS value (and set flags state to 'ne')
#endif
        ALT_SMP(mov     r3, r3, lsr #20)        @ r3 = LoUIS * 2
        ALT_UP(mov      r3, r3, lsr #26)        @ r3 = LoUU * 2
        reteq   lr                              @ return if level == 0
        mov     r10, #0                         @ r10 (starting level) = 0
        b       flush_levels                    @ start flushing cache levels
ENDPROC(v7_flush_dcache_louis)

mrc p15, 1, r0, c0, c0, 1
- LoUU/LoUIS를 추출하기 위해 CLIDR을 읽어온다.
ALT_SMP(ands r3, r0, #(7 << 21))
- SMP 시스템에서 CLIDR의 LoUIS 필드를 추출해온다.
ERRATA_643719
- 특정 프로세서의 CLIDR.LOUIS가 잘못 기록이 되어 있어서 이를 보정해주는 코드
- Cortex-A9 r1p0 이전 버전에서 LoUIS 값이 1이 아닌 0으로 기록된 것을 잡아준다.
ALT_SMP(mov r3, r3, lsr #20)
- r3: 읽어온 값을 우측으로 쉬프트하여 LoUIS x 2와 같은 값으로 만든다.
  - d-cache를 어느 캐시 레벨까지 flush할지 결정하기 위함.
reteq lr
- 읽어온 LoUIS가 0이면 d-cache의 flush를 포기하고 루틴을 빠져나간다.
mov r10, #0
- 시작 캐시 레벨을 0(L1)부터 준비한다.
b flush_levels
- v7_flush_dcache_all() 루틴 중간에 있는 flush_levels 레이블을 같이 사용한다.

v7_flush_dcache_all()

/*
 *      v7_flush_dcache_all()
 *
 *      Flush the whole D-cache.
 *
 *      Corrupted registers: r0-r7, r9-r11 (r6 only in Thumb mode)
 *
 *      - mm    - mm_struct describing address space
 */
ENTRY(v7_flush_dcache_all)
        dmb                                     @ ensure ordering with previous memory accesses
        mrc     p15, 1, r0, c0, c0, 1           @ read clidr
        ands    r3, r0, #0x7000000              @ extract loc from clidr
        mov     r3, r3, lsr #23                 @ left align loc bit field
        beq     finished                        @ if loc is 0, then no need to clean
        mov     r10, #0                         @ start clean at cache level 0

flush_levels:
        add     r2, r10, r10, lsr #1            @ work out 3x current cache level
        mov     r1, r0, lsr r2                  @ extract cache type bits from clidr
        and     r1, r1, #7                      @ mask of the bits for current cache only
        cmp     r1, #2                          @ see what cache we have at this level
        blt     skip                            @ skip if no cache, or just i-cache
#ifdef CONFIG_PREEMPT
        save_and_disable_irqs_notrace r9        @ make cssr&csidr read atomic
#endif
        mcr     p15, 2, r10, c0, c0, 0          @ select current cache level in cssr
        isb                                     @ isb to sych the new cssr&csidr
        mrc     p15, 1, r1, c0, c0, 0           @ read the new csidr
#ifdef CONFIG_PREEMPT
        restore_irqs_notrace r9
#endif
        and     r2, r1, #7                      @ extract the length of the cache lines
        add     r2, r2, #4                      @ add 4 (line length offset)
        ldr     r4, =0x3ff
        ands    r4, r4, r1, lsr #3              @ find maximum number on the way size
        clz     r5, r4                          @ find bit position of way size increment
        ldr     r7, =0x7fff
        ands    r7, r7, r1, lsr #13             @ extract max number of the index size
loop1:
        mov     r9, r7                          @ create working copy of max index
loop2:
 ARM(   orr     r11, r10, r4, lsl r5    )       @ factor way and cache number into r11
 THUMB( lsl     r6, r4, r5              )
 THUMB( orr     r11, r10, r6            )       @ factor way and cache number into r11
 ARM(   orr     r11, r11, r9, lsl r2    )       @ factor index number into r11
 THUMB( lsl     r6, r9, r2              )
 THUMB( orr     r11, r11, r6            )       @ factor index number into r11
        mcr     p15, 0, r11, c7, c14, 2         @ clean & invalidate by set/way
        subs    r9, r9, #1                      @ decrement the index
        bge     loop2
        subs    r4, r4, #1                      @ decrement the way
        bge     loop1
skip:
        add     r10, r10, #2                    @ increment cache number
        cmp     r3, r10
        bgt     flush_levels
finished:
        mov     r10, #0                         @ swith back to cache level 0
        mcr     p15, 2, r10, c0, c0, 0          @ select current cache level in cssr
        dsb     st
        isb
        ret     lr
ENDPROC(v7_flush_dcache_all)

decompressed/head.S에서 d-cache를 flush한 로직과 거의 흡사하다.
- way와 index 루프 순서만 기존과 바뀌었다.