在讲调度域的初始化之前我们不得不先说一下多CPU启动的一些基本知识点,不过我们的目的是聚焦于调度域,所以只会讲与调度相关的内容。
possible,present,online和active四大状态分别记录在__cpu_possible_mask,set_cpu_present,__cpu_online_mask和__cpu_active_mask这几个变量中。下面四个函数用于设置这四个状态。

static inline void
set_cpu_possible(unsigned int cpu, bool possible)
	if (possible)
		cpumask_set_cpu(cpu, &__cpu_possible_mask);
		cpumask_clear_cpu(cpu, &__cpu_possible_mask);
static inline void
set_cpu_present(unsigned int cpu, bool present)
	if (present)
		cpumask_set_cpu(cpu, &__cpu_present_mask);
		cpumask_clear_cpu(cpu, &__cpu_present_mask);
static inline void
set_cpu_online(unsigned int cpu, bool online)
	if (online)
		cpumask_set_cpu(cpu, &__cpu_online_mask);
		cpumask_clear_cpu(cpu, &__cpu_online_mask);
static inline void
set_cpu_active(unsigned int cpu, bool active)
	if (active)
		cpumask_set_cpu(cpu, &__cpu_active_mask);
		cpumask_clear_cpu(cpu, &__cpu_active_mask);

possible状态

possible标记一个可能存在的CPU。第一个启动的CPU称为boot CPU,该CPU肯定是存在的,否则系统无法启动,所以boot CPU初始化的时候会直接设置possible状态,当然对于boot CPU来说,它的present,online,active状态也是直接设置即可。
start_kernel->boot_cpu_init

void __init boot_cpu_init(void)
	int cpu = smp_processor_id();
	/* Mark the boot cpu "present", "online" etc for SMP and UP case */
	set_cpu_online(cpu, true);
	set_cpu_active(cpu, true);
	set_cpu_present(cpu, true);
	set_cpu_possible(cpu, true);

然而对于其他CPU来说,需要在DTS中描述,否则boot CPU不知道其他CPU的存在。下面是从内核代码arch/arm64/boot/dts/arm/juno.dts中摘录的关于CPU拓扑的描述:

	cpus {
		#address-cells = <2>;
		#size-cells = <0>;
		cpu-map {
			cluster0 {
				core0 {
					cpu = <&A57_0>;
				core1 {
					cpu = <&A57_1>;
			cluster1 {
				core0 {
					cpu = <&A53_0>;
				core1 {
					cpu = <&A53_1>;
				core2 {
					cpu = <&A53_2>;
				core3 {
					cpu = <&A53_3>;
		idle-states {
			entry-method = "arm,psci";
			CPU_SLEEP_0: cpu-sleep-0 {
				compatible = "arm,idle-state";
				arm,psci-suspend-param = <0x0010000>;
				local-timer-stop;
				entry-latency-us = <300>;
				exit-latency-us = <1200>;
				min-residency-us = <2000>;
			CLUSTER_SLEEP_0: cluster-sleep-0 {
				compatible = "arm,idle-state";
				arm,psci-suspend-param = <0x1010000>;
				local-timer-stop;
				entry-latency-us = <400>;
				exit-latency-us = <1200>;
				min-residency-us = <2500>;
		A57_0: cpu@0 {
			compatible = "arm,cortex-a57","arm,armv8";
			reg = <0x0 0x0>;
			device_type = "cpu";
			enable-method = "psci";
			next-level-cache = <&A57_L2>;
			clocks = <&scpi_dvfs 0>;
			cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		A57_1: cpu@1 {
			compatible = "arm,cortex-a57","arm,armv8";
			reg = <0x0 0x1>;
			device_type = "cpu";
			enable-method = "psci";
			next-level-cache = <&A57_L2>;
			clocks = <&scpi_dvfs 0>;
			cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		A53_0: cpu@100 {
			compatible = "arm,cortex-a53","arm,armv8";
			reg = <0x0 0x100>;
			device_type = "cpu";
			enable-method = "psci";
			next-level-cache = <&A53_L2>;
			clocks = <&scpi_dvfs 1>;
			cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		A53_1: cpu@101 {
			compatible = "arm,cortex-a53","arm,armv8";
			reg = <0x0 0x101>;
			device_type = "cpu";
			enable-method = "psci";
			next-level-cache = <&A53_L2>;
			clocks = <&scpi_dvfs 1>;
			cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		A53_2: cpu@102 {
			compatible = "arm,cortex-a53","arm,armv8";
			reg = <0x0 0x102>;
			device_type = "cpu";
			enable-method = "psci";
			next-level-cache = <&A53_L2>;
			clocks = <&scpi_dvfs 1>;
			cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		A53_3: cpu@103 {
			compatible = "arm,cortex-a53","arm,armv8";
			reg = <0x0 0x103>;
			device_type = "cpu";
			enable-method = "psci";
			next-level-cache = <&A53_L2>;
			clocks = <&scpi_dvfs 1>;
			cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
		A57_L2: l2-cache0 {
			compatible = "cache";
		A53_L2: l2-cache1 {
			compatible = "cache";

该SoC有两个Cluster,Cluster0集成了两个A57 Core,Cluster1集成了四个A53。这里没有NUMA Node和超线程的定义,所以我们不在讨论NUMA Node和超线程。
我们先把设置其他CPU possible的状态列出来:
setup_arch->smp_init_cpus

void __init smp_init_cpus(void)
	int i;
	if (acpi_disabled)
		of_parse_and_init_cpus();
		 * do a walk of MADT to determine how many CPUs
		 * we have including disabled CPUs, and get information
		 * we need for SMP init
		acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT,
				      acpi_parse_gic_cpu_interface, 0);
	if (cpu_count > nr_cpu_ids)
		pr_warn("Number of cores (%d) exceeds configured maximum of %d - clipping\n",
			cpu_count, nr_cpu_ids);
	if (!bootcpu_valid) {
		pr_err("missing boot CPU MPIDR, not enabling secondaries\n");
		return;
	 * We need to set the cpu_logical_map entries before enabling
	 * the cpus so that cpu processor description entries (DT cpu nodes
	 * and ACPI MADT entries) can be retrieved by matching the cpu hwid
	 * with entries in cpu_logical_map while initializing the cpus.
	 * If the cpu set-up fails, invalidate the cpu_logical_map entry.
	for (i = 1; i < nr_cpu_ids; i++) {
		if (cpu_logical_map(i) != INVALID_HWID) {
			if (smp_cpu_setup(i))
				cpu_logical_map(i) = INVALID_HWID;
  • 调用of_parse_and_init_cpus接口解析DTS获得CPU拓扑,该函数会将扫描到的CPU的硬件ID依次存放在__cpu_logical_map中,也就是说__cpu_logical_map是一个数组,以CPU number为索引,记录硬件ID。DTS中CPU节点的reg属性记录硬件ID。
  • 调用smp_cpu_setup接口设置possible状态。遍历所支持的所有CPU number,如果硬件ID不是invalid,即在上一步成功从DTS解析到,那么这里就会调用smp_cpu_setup设置possible属性。
    由此可见,DTS中定义的CPU节点对应的CPU会被设置为possible,也就说它们可能存在,但是是否真的存在,要它们自己为自己发生,boot CPU说了不算。

present状态

possible只是说CPU有可能存在,因为DTS定义了该CPU节点,但是此时CPU还不在位,之前在文章https://mp.csdn.net/mdeditor/84845448#中介绍过,Boot CPU先进入内核,其他CPU其实还在uboot下面做无聊的循环,Boot CPU会给其他CPU一个地址,让他们跳转到内核,此时其他CPU才算在位。
kernel_init->kernel_init_freeable->smp_prepare_cpus

void __init smp_prepare_cpus(unsigned int max_cpus)
	int err;
	unsigned int cpu;
	unsigned int this_cpu;
	init_cpu_topology();
	this_cpu = smp_processor_id();
	store_cpu_topology(this_cpu);
	numa_store_cpu_info(this_cpu);
	 * If UP is mandated by "nosmp" (which implies "maxcpus=0"), don't set
	 * secondary CPUs present.
	if (max_cpus == 0)
		return;
	 * Initialise the present map (which describes the set of CPUs
	 * actually populated at the present time) and release the
	 * secondaries from the bootloader.
	for_each_possible_cpu(cpu) {
		if (cpu == smp_processor_id())
			continue;
		if (!cpu_ops[cpu])
			continue;
		err = cpu_ops[cpu]->cpu_prepare(cpu);
		if (err)
			continue;
		set_cpu_present(cpu, true);
		numa_store_cpu_info(cpu);
  • cpu_ops[cpu]->cpu_prepare让其他CPU跳转到内核,但是此时这些跳转过来的CPU依然在做着无聊的循环,并没有做什么有意义的事情。
  • 调用set_cpu_present设置在位状态。

online状态

present表明CPU已经跳转到内核了,但是他们依然在做着无意义的循环。下一步Boot CPU会让其他CPU开始执行必要的初始化,此时其他CPU会进入online状态,也就是说这些CPU在线了,可以干活了。
下面这条线是Boot CPU启动其他CPU的线:

smp_init->cpu_up->do_cpu_up(cpu, CPUHP_ONLINE)->_cpu_up

最终调用bringup_cpu函数启动其他CPU。
下面这条线是其他CPU启动的线:

secondary_startup->__secondary_switched->secondary_start_kernel->set_cpu_online

各个CPU启动的时候,会设置自己的online状态。

active状态

当CPU处于active的时候,说明它已经准备好了一切,可以参与进程调度了。

下面这条线创建per CPU的内核线程,该线程的主处理函数是cpuhp_should_run:

smp_init->cpuhp_threads_init

其他CPU启动完成后,会唤醒该进程,处理逻辑如下:

secondary_startup->__secondary_switched->secondary_start_kernel->cpu_startup_entry->cpuhp_online_idle->__cpuhp_kick_ap_work

被唤醒的进程处理如下:

cpuhp_thread_fun->cpuhp_ap_online->cpuhp_up_callbacks

cpuhp_up_callbacks最终调用函数sched_cpu_activate设置CPU为active。

CPU启动流程以及状态机如下:

  • Boot CPU启动,并设置Boot CPU的possible,present,online和active状态。
  • Boot CPU解析DTS获取CPU拓扑,并将DTS定义的CPU节点对应的CPU设置possible状态。
  • Boot CPU将其他CPU引导到内核,此时其他CPU尚不做任何有意义的工作,此时Boot CPU设置其他CPU的present状态。
  • Boot CPU释放其他CPU,让其他CPU开始做初始化工作,其他CPU启动过程中会将自己的online状态设置。
  • 各个CPU初始化完成后,会唤醒内核线程cpuhp_threads,该进程会设置CPU的active状态。

处于active状态的CPU一切就绪,可以参与进程调度了,然而参与调度之前还有一件事情要做,就是参与调度域的初始化,我们再下一篇文章详细介绍。

查看top帮助信息 不管linux还是unix,大多数命令都是支持man命令来查看帮助信息的。 语法是下面这样,进入到交互界面后,用法类似vi,然后按「q」可以退出,输入「?」再输入关键字,可以查询相关关键字: man top 帮助信息回显: TOP(1) ...
浏览器打开 start_kernel()函数在init/main.c文件里。 内核的初始化程序在start_kernel这个函数中,可以在线查看这些代码: start_kernel。通过阅读start_kernel代码,可以大致了解到内核在初始化的时候,做了以下工作:1. lockdep_init():初始化内核依赖关系表,初始化hash表 boot_init_st... 浏览器打开 ARM上电。执行BOOTLOADER bootloader加载kernel 。传递参数给kernel 然后执行kernel 设置一些寄存器,初始化一些状态等等。然后跳到head.s执行 head.s已经属于kernel的部分了 head.s主要是硬件相关的部分,解压kernel等等。最终跳转到 start_kernel里面执行 start_kernel 浏览器打开 idle进程,也就是swapper进程,其pid是0,是所有进程的祖先。每个cpu 都有一个idle进程。 bootcpu 进入idle进程的flow如下: start_kernel –> rest_init –> cpu_startup_entry->do_idle nobootcpu(也就是剩下的cpu) secondary_startup –> __secondary_switched –> 浏览器打开