Aarch64 Kernel Memấdadsdasdory Management.pptx

NguynVnDng579982 0 views 18 slides Oct 08, 2025
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

ádddddddddddddddddddddđa


Slide Content

Memory mapping And management Chen xinyu AArch64 linux kernel

Kernel virtual memory layout MODULES_VADDR (VA_START) VMEMMAP_START __ phys_to_virt ( memblock_start_of_DRAM ()) KIMAGE_VADDR + TEXT_OFFSET

Kernel variable and configure VA_BITS = 48 VA_START = 0xffff000000000000 PAGE_OFFSET = 0xffff800000000000 PHYS_OFFSET = 0x80000000 // i.MX8QM/QXP memblock_start_of_DRAM () = 0x80200000 // passed in by u-boot (get from scfw ) KIMAGE_VADDR = MODULE_END = 0xffff000008000000 TEXT_OFFSET = 0x80000 #define __ phys_to_virt (x) ((unsigned long)((x) - PHYS_OFFSET) | PAGE_OFFSET) #define __ phys_to_kimg (x) ((unsigned long)((x) + kimage_voffset ))

DDR Memory layout Memblock on boot configurations is passed by u-boot, who get usable memory from SCFW API sc_rm_get_memreg_info () On i.MX8Q, there’s 5 memblocks , the hole is used by ATF/M4_0/1/TEE/SHM.. (BLUE) On Android Auto Car2, No M4 used, on boot memory DDR blocks get from uboot : memblock_add : [0x0000000080200000-0x00000000fdffffff] memblock_add : [0x0000000880000000-0x00000008bfffffff] Kernel after boot (below settings) CMA = 400MB (RED) SWIOTLB = 96MB (RED) reserved-memory {} in dts (BLACK) BL31 (ATF) 0x8020_0000 0x8000_0000 0xFE00_0000 BL32 (TEE) 0xFFFF_FFFF >4G DDR 0x88000_0000 0x8BFFF_FFFF SWIOTLB 0xBFFF_F000 0xC5FF_F000 CMA 0xC600_0000 0xDFFF_FFFF VPU 0x8400_0000 0x8640_0000 RPMSG 0x9000_0000 0x9200_0000 0x9240_0000 DSP 0x9440_0000 M4

Kernel memory allocation kmalloc – for small chunks allocation, go through SLxB . Directly call alloc_pages for large continue size vmalloc – for virtual continuous memory allocation, physical is not required to be continuous (place in virtual vmalloc range) CMA – described in next page Reserved memory (in dts , reserved by memblock_alloc on boot) managed by drivers for specific usage (like GPU) ION – Android specific allocator, used for video/camera buffers DMA API CMA Buddy Reserved Page allocators kmalloc vmalloc SLUB (SLAB/SLOB) DDR Size > PAGE_SIZEx2 Driver Specific ION

Continuous Memory Allocator (CMA) Target: Allocating big chunks of physically contiguous memory CMA is integrated with the DMA API How it works At the boot time, cma =<size> of the memory is reserved. When page allocator initializes, cma range is released with MIGRATE_CMA type pages can be used for movable pages. (normal usage, e.g. Examples anonymous process pages and disk cache) Unless the memory is allocated to a device driver. (CMA alloc ) Migration allocating a new page copying contents of the old page to the new page updating all places where old page was referred, and freeing the old page. memblock Page allocator DMA API CMA Give memory Take memory from Give back I ntegrated Use May use

DMA Buffer Management DMA buffer operations backend ( alloc /free/map/ mmap /sync) SWIOTLB (software solution w/o IOMMU hardware) Used by default for any devices E.g VPU, ISI, audio on i.MX8 IOMMU (hardware solution, w/ ARM smmu ) Used by devices which has iommus node in dts USDHC, FEC, USB on i.MX8 DMA buffer sharing – dma-buf A uniform mechanism to share DMA buffers across different devices Use cases (example): Decoding video stream into buffers suitable for graphics rendering and display Camera capture into buffers suitable for encoding and rendering

SWIOTLB (Bounce buffer for DMA) SoC CPU Core DMA IP (FIFO) 4G > 4G DDR Bounce Buffer Data Buffer SYNC SWIOTLB Memory map DDR DMA can only access 32bit addr IP FIFO DMA By CPU SWIOTLB slots are allocated on boot within DMA zone (< 4G) Data buffer allocated in IP drivers CPU write/read data from data buffer Driver use dma_map_page () to create bounce buffer in low mem for DMA transfer CPU use memcpy for synchronize between data and bounce buffer

IOMMU (SMMU in aarch64) An I/O Memory Management Unit (IOMMU) is a hardware component that provides two main functions: I/O Translation and Device Isolation. The IOMMU translates memory addresses presented by devices from “I/O space” (IOVA) to “machine space” (PA) to allow a particular device to access physical memory. (TBU/TCU in smmu-v2). The same translation function, when coupled with access permissions can limit the ability of devices to access specific regions of memory. With IOMMU, IP module can: Access physical memory which above 4GB memory space. Access IOVA continued buffer w/o PA continuous. reduce the memory management pressure for large continuous allocation.

DMA buffer operations alloc – allocate buffer from CMA (FORCE_CONTIGUOUS or allow blocking) or page allocator free – free dma buffer mmap – map DMA memory previously allocated by dma_alloc_attrs () into user space. map/ unmap_page swiotlb : allocate/release bounce buffer iommu : allocate/release iova map/ unmap_sg – scatterlist map/ unmap sync_single / sg_for_cpu /device swiotlb : copy buffer data between real data and bounce buffer iommu : invalid/clean cache d ma_alloc_coherent () d ma_alloc_attrs () drivers ops-> alloc () swiotlb iommu OR Basic call flow

DMA buffer sharing dma_buf_export () - announce the wish to export a buffer dma_buf_fd () - returns a FD associated with the dma_buf object dma_buf_get () - import device to get the dma_buf object associated with the FD dma_buf_attach () - import device can attach itself with the dma_buf object dma_buf_map_attachment () - import device to request access to the buffer so it can do DMA. dma_buf_unmap_attachment () - Once the DMA access is done, the device tells the exporter that the currently requested access is completed by calling this API dma_buf_detach () - At the end of need to access this dma_buf object, the importer device tells the exporter of its intent to 'detach' from the current sharing dma_buf_put () - After dma_buf_detach () is called, the reference count of this buffer is decremented by calling this

Kernel MM Zones ( kswapd ) Two zones in i.MX BSP DMA Size is identified by max_zone_dma_phys ():arch/arm64/mm/ init.c , which return top 4G On i.MX8, 0x80000000 ~ 0xFFFFFFFF is assigned to DMA zone NORMAL > 4G DDR is assigned to NORMAL zone On i.MX8, 0x880000000 ~ 0x8BFFFFFFF is assigned to NORMAL zone Zones watermark Free > High: zone balanced, kswapd sleeps Free < Low: kswapd wakeup, reclaim, swap Free < Min: only GFP_ATOMIC can still go Protection: protect DMA zone from allocating when NORMAL out of memory Adjusted by /proc/sys/ vm / lowmem_reserve_ratio Zones balance when alloc When balance free pages between two zones, CMA free size would be removed (calculated out) from total DMA zone free Zone protection will be counted into Kswapd & zone watermark

GPU Memory Management Kernel Allocator Virtual Pool (Continuous/Non-Continuous from buddy) Alloc flags: gcvALLOC_FLAG_CONTINGUOUS | gcvALLOC_FLAG_CMA_PREEMPT | gcvALLOC_FLAG_NON_CONTINGUOUS | gcvALLOC_FLAG_CACHEABLE | gcvALLOC_FLAG_DMABUF_EXPORTABLE | gcvALLOC_FLAG_ALLOC_ON_FAULT CMA Pool Alloc flags: gcvALLOC_FLAG_CONTINGUOUS | gcvALLOC_FLAG_CMA_PREEMPT | gcvALLOC_FLAG_DMABUF_EXPORTABLE Reserved Pool (reserved by cmdline or dts ) Alloc flags: gcvALLOC_FLAG_LINUX_RESERVED_MEM Allocated from CMA on boot

GPU Memory Management (Cont.) i.MX gralloc Graphic buffers -> GPU gralloc Video buffers -> ION GPU gralloc go by viv DRM driver Allocating pools Reserved -> Virtual -> CAM Surface layer buffers directly allocated from virtual pool compression TITLED buffer start from reserved pool ION Allocate from buddy or CMA depends on the contiguous or cache flags ion drv libdrm gralloc.imx8.so gralloc_viv.imx8.so viv drm drv libion Kernel User space RGBA(x)8888/888 BGRA8888 RGB565 YUV video buf Buddy CMA Reserve Surface layer

Customize according to DDR size Issue on small DDR size Memory is limited resource, every components requires. E.g APP/GPU/VPU/IP/DMA… DMA zone can only be in < 4G (on i.MX8, 0x80000000 ~ 0xFFFFFFFF is low 2G DDR), which means DMA zone can not be larger than 2G. To balance the zone free pages in buddy allocator may have additional cost, when one of them free < Low watermark. CMA free is not accounted to DMA zone free when balance, so CMA size need to be considered carefully Customize OOM is needed, well configured ( On android, lmkd must be well configured: https://source.android.com/devices/tech/perf/lmkd ) CMA size GPU reserved pool size DMA zone size (hack kernel may needed) RAM SWAP is optional on Android (ZRAM swap is faster than disk swap)

How to avoid using CMA CMA is not a perfect way for physical contiguous memory allocation. It has cons: Can be used by system through get free pages. When used out, no memory for contiguous allocation even with page migration Page migration is slow, also may be blocked by other staffs like cache page flush (write back), or filesystem page lock To make sure enough contiguous free memory is available when GPU/VPU/Camera and other dma transfer needed, there’s several ways for improvement: Reserve physical memory for each devices, refer to: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841683/Linux+Reserved+Memory Reserved memory Reserved memory through DMA API Using coherent (atomic) pool for non-block dma allocation Using coherent (atomic) pool for all dma allocation For ION, please use carveout heap instead of CMA heap (disable CMA heap in kernel config)

How to avoid using CMA (Using coherent pool) Enable coherent pool Add “ coherent_pool =<size>” in commandline Coherent pool is actually allocate from system default CMA, so CMA size > coherent_pool Using coherent pool for non-block dma allocation Clear the __GFP_DIRECT_RECLAIM bit in the gfp_t flags Using coherent pool for all dma allocation Hack to arch/arm64/mm/ dma-mapping.c Remove the gfpflags_allow_blocking check in the __ dma_alloc () function.
Tags