sbc-bench v0.8.2 Raspberry Pi Zero 2 Rev 1.0 (Thu, 04 Nov 2021 22:55:54 +0000) Distributor ID: Raspbian Description: Raspbian GNU/Linux 10 (buster) Release: 10 Codename: buster Architecture: armhf Raspberry Pi ThreadX version: Oct 29 2021 10:49:08 Copyright (c) 2012 Broadcom version b8a114e5a9877e91ca8f26d1a5ce904b2ad3cf13 (clean) (release) (start) ThreadX configuration (/boot/config.txt): dtparam=audio=on [pi4] dtoverlay=vc4-fkms-v3d max_framebuffers=2 [all] Actual ThreadX settings: aphy_params_current=819 arm_freq=1000 arm_freq_min=600 audio_pwm_mode=514 config_hdmi_boost=5 core_freq=400 desired_osc_freq=0x331df0 disable_commandline_tags=2 disable_l2cache=1 display_hdmi_rotate=-1 display_lcd_rotate=-1 dphy_params_current=547 dvfs=3 enable_tvout=1 force_eeprom_read=1 force_pwm_open=1 framebuffer_ignore_alpha=1 framebuffer_swap=1 gpu_freq=300 ignore_lcd=1 init_uart_clock=0x2dc6c00 max_framebuffers=-1 over_voltage_avs=25000 pause_burst_frames=1 program_serial_random=1 sdram_freq=450 total_mem=512 hdmi_force_cec_address:0=65535 hdmi_force_cec_address:1=65535 hdmi_pixel_freq_limit:0=0x9a7ec80 /usr/bin/gcc (Raspbian 8.3.0-6+rpi1) 8.3.0 Uptime: 22:55:54 up 7 min, 1 user, load average: 1.00, 0.78, 0.38 Linux 5.10.63-v7+ (tranzystorpl) 11/04/21 _armv7l_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 20.77 0.00 1.70 0.56 0.00 76.97 Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn mmcblk0 11.32 421.16 94.73 193296 43477 total used free shared buff/cache available Mem: 427Mi 37Mi 194Mi 3.0Mi 195Mi 336Mi Swap: 99Mi 0B 99Mi Filename Type Size Used Priority /var/swap file 102396 0 -2 ########################################################################## Checking cpufreq OPP: Cpufreq OPP: 1000 ThreadX: 1000 Measured: 1000 @ 1.2250V Cpufreq OPP: 900 ThreadX: 900 Measured: 900 @ 1.2250V Cpufreq OPP: 800 ThreadX: 800 Measured: 800 @ 1.2250V Cpufreq OPP: 700 ThreadX: 700 Measured: 700 @ 1.2250V Cpufreq OPP: 600 ThreadX: 600 Measured: 600 @ 1.2V ########################################################################## Hardware sensors: cpu_thermal-virtual-0 temp1: +41.9°C rpi_volt-isa-0000 in0: N/A ########################################################################## tinymembench v0.4.9 (simple benchmark for memory throughput and latency) ========================================================================== == Memory bandwidth tests == == == == Note 1: 1MB = 1000000 bytes == == Note 2: Results for 'copy' tests show how many bytes can be == == copied per second (adding together read and writen == == bytes would have provided twice higher numbers) == == Note 3: 2-pass copy means that we are using a small temporary buffer == == to first fetch data into it, and only then write it to the == == destination (source -> L1 cache, L1 cache -> destination) == == Note 4: If sample standard deviation exceeds 0.1%, it is shown in == == brackets == ========================================================================== C copy backwards : 1157.0 MB/s (0.7%) C copy backwards (32 byte blocks) : 1156.7 MB/s C copy backwards (64 byte blocks) : 1151.2 MB/s C copy : 1185.2 MB/s C copy prefetched (32 bytes step) : 1224.6 MB/s C copy prefetched (64 bytes step) : 1224.8 MB/s C 2-pass copy : 990.6 MB/s C 2-pass copy prefetched (32 bytes step) : 999.9 MB/s C 2-pass copy prefetched (64 bytes step) : 1000.9 MB/s (0.1%) C fill : 1638.2 MB/s (0.4%) C fill (shuffle within 16 byte blocks) : 1636.7 MB/s (0.3%) C fill (shuffle within 32 byte blocks) : 1634.1 MB/s C fill (shuffle within 64 byte blocks) : 1627.8 MB/s (0.3%) --- standard memcpy : 1210.9 MB/s standard memset : 1634.7 MB/s (0.5%) --- NEON read : 2173.6 MB/s (0.5%) NEON read prefetched (32 bytes step) : 2369.2 MB/s NEON read prefetched (64 bytes step) : 2369.3 MB/s NEON read 2 data streams : 1919.4 MB/s NEON read 2 data streams prefetched (32 bytes step) : 1871.0 MB/s (0.1%) NEON read 2 data streams prefetched (64 bytes step) : 1859.7 MB/s NEON copy : 1185.9 MB/s (0.5%) NEON copy prefetched (32 bytes step) : 1225.8 MB/s NEON copy prefetched (64 bytes step) : 1226.6 MB/s NEON unrolled copy : 1186.6 MB/s NEON unrolled copy prefetched (32 bytes step) : 1236.9 MB/s NEON unrolled copy prefetched (64 bytes step) : 1236.6 MB/s NEON copy backwards : 1148.9 MB/s (0.7%) NEON copy backwards prefetched (32 bytes step) : 1164.4 MB/s (1.3%) NEON copy backwards prefetched (64 bytes step) : 1163.4 MB/s NEON 2-pass copy : 996.5 MB/s NEON 2-pass copy prefetched (32 bytes step) : 1025.3 MB/s (0.2%) NEON 2-pass copy prefetched (64 bytes step) : 1024.1 MB/s (0.1%) NEON unrolled 2-pass copy : 980.6 MB/s (0.1%) NEON unrolled 2-pass copy prefetched (32 bytes step) : 1014.1 MB/s NEON unrolled 2-pass copy prefetched (64 bytes step) : 1007.0 MB/s NEON fill : 1639.0 MB/s (0.7%) NEON fill backwards : 1638.1 MB/s (0.6%) VFP copy : 1188.9 MB/s VFP 2-pass copy : 975.4 MB/s ARM fill (STRD) : 1627.5 MB/s (0.3%) ARM fill (STM with 8 registers) : 1637.8 MB/s (0.5%) ARM fill (STM with 4 registers) : 1629.0 MB/s (0.3%) ARM copy prefetched (incr pld) : 1223.8 MB/s ARM copy prefetched (wrap pld) : 1218.0 MB/s ARM 2-pass copy prefetched (incr pld) : 1009.1 MB/s ARM 2-pass copy prefetched (wrap pld) : 1001.8 MB/s (0.1%) ========================================================================== == Framebuffer read tests. == == == == Many ARM devices use a part of the system memory as the framebuffer, == == typically mapped as uncached but with write-combining enabled. == == Writes to such framebuffers are quite fast, but reads are much == == slower and very sensitive to the alignment and the selection of == == CPU instructions which are used for accessing memory. == == == == Many x86 systems allocate the framebuffer in the GPU memory, == == accessible for the CPU via a relatively slow PCI-E bus. Moreover, == == PCI-E is asymmetric and handles reads a lot worse than writes. == == == == If uncached framebuffer reads are reasonably fast (at least 100 MB/s == == or preferably >300 MB/s), then using the shadow framebuffer layer == == is not necessary in Xorg DDX drivers, resulting in a nice overall == == performance improvement. For example, the xf86-video-fbturbo DDX == == uses this trick. == ========================================================================== NEON read (from framebuffer) : 67.9 MB/s NEON copy (from framebuffer) : 64.3 MB/s NEON 2-pass copy (from framebuffer) : 66.3 MB/s NEON unrolled copy (from framebuffer) : 67.3 MB/s NEON 2-pass unrolled copy (from framebuffer) : 65.7 MB/s VFP copy (from framebuffer) : 436.7 MB/s VFP 2-pass copy (from framebuffer) : 383.8 MB/s (0.2%) ARM copy (from framebuffer) : 204.3 MB/s ARM 2-pass copy (from framebuffer) : 220.0 MB/s (0.8%) ========================================================================== == Memory latency test == == == == Average time is measured for random memory accesses in the buffers == == of different sizes. The larger is the buffer, the more significant == == are relative contributions of TLB, L1/L2 cache misses and SDRAM == == accesses. For extremely large buffer sizes we are expecting to see == == page table walk with several requests to SDRAM for almost every == == memory access (though 64MiB is not nearly large enough to experience == == this effect to its fullest). == == == == Note 1: All the numbers are representing extra time, which needs to == == be added to L1 cache latency. The cycle timings for L1 cache == == latency can be usually found in the processor documentation. == == Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. == ========================================================================== block size : single random read / dual random read 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 6.4 ns / 11.0 ns 131072 : 9.9 ns / 15.7 ns 262144 : 11.6 ns / 17.7 ns 524288 : 12.9 ns / 19.4 ns 1048576 : 81.2 ns / 125.6 ns 2097152 : 119.0 ns / 163.2 ns 4194304 : 143.3 ns / 184.1 ns 8388608 : 156.2 ns / 193.5 ns 16777216 : 164.7 ns / 199.8 ns 33554432 : 170.5 ns / 204.7 ns 67108864 : 174.2 ns / 207.6 ns ########################################################################## OpenSSL 1.1.1d, built on 10 Sep 2019 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 24224.63k 34271.98k 38485.25k 39497.39k 39966.04k 40047.96k aes-128-cbc 24246.15k 34310.40k 38418.43k 39640.41k 39889.58k 39971.50k aes-192-cbc 22041.26k 29768.92k 32985.26k 33855.83k 33974.95k 34138.79k aes-192-cbc 22022.92k 29903.77k 32836.95k 33745.92k 34122.41k 33980.42k aes-256-cbc 20369.97k 26867.37k 29219.75k 30035.97k 30244.86k 30261.25k aes-256-cbc 20278.27k 26880.62k 29343.91k 29902.51k 30239.40k 30255.79k ########################################################################## 7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE) LE CPU Freq: 927 920 987 998 999 999 999 998 RAM size: 427 MB, # CPU hardware threads: 4 RAM usage: 234 MB, # Benchmark threads: 4 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 600 100 584 584 | 13119 100 1120 1119 23: 585 100 597 597 | 12843 100 1112 1111 ---------------------------------- | ------------------------------ Avr: 100 590 590 | 100 1116 1115 Tot: 100 853 853 ########################################################################## 7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE) LE CPU Freq: 969 942 931 998 999 999 998 999 RAM size: 427 MB, # CPU hardware threads: 4 RAM usage: 234 MB, # Benchmark threads: 4 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 1777 316 548 1729 | 52191 399 1115 4453 23: 1759 323 555 1793 | 51076 399 1108 4419 ---------------------------------- | ------------------------------ Avr: 319 551 1761 | 399 1112 4436 Tot: 359 832 3098 7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE) LE CPU Freq: 996 998 999 999 998 998 998 998 RAM size: 427 MB, # CPU hardware threads: 4 RAM usage: 234 MB, # Benchmark threads: 4 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 1791 318 547 1742 | 52168 399 1115 4451 23: 1675 307 557 1707 | 51134 399 1109 4424 ---------------------------------- | ------------------------------ Avr: 313 552 1725 | 399 1112 4438 Tot: 356 832 3081 7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE) LE CPU Freq: 997 998 998 998 998 998 997 998 RAM size: 427 MB, # CPU hardware threads: 4 RAM usage: 234 MB, # Benchmark threads: 4 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 1771 316 545 1723 | 52198 399 1115 4453 23: 1708 312 557 1741 | 51118 399 1109 4423 ---------------------------------- | ------------------------------ Avr: 314 551 1732 | 399 1112 4438 Tot: 357 832 3085 Compression: 1761,1725,1732 Decompression: 4436,4438,4438 Total: 3098,3081,3085 ########################################################################## Testing clockspeeds again. System health now: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:06:42: 1000/1000MHz 2.86 77% 1% 75% 0% 0% 0% 56.9°C 1.2250V Checking cpufreq OPP: Cpufreq OPP: 1000 ThreadX: 1000 Measured: 1000 @ 1.2250V Cpufreq OPP: 900 ThreadX: 900 Measured: 900 @ 1.2250V Cpufreq OPP: 800 ThreadX: 800 Measured: 800 @ 1.2250V Cpufreq OPP: 700 ThreadX: 700 Measured: 700 @ 1.2250V Cpufreq OPP: 600 ThreadX: 600 Measured: 600 @ 1.2V ########################################################################## Hardware sensors: cpu_thermal-virtual-0 temp1: +54.8°C rpi_volt-isa-0000 in0: N/A ########################################################################## System health while running tinymembench: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 22:55:58: 1000/1000MHz 1.00 23% 1% 20% 0% 0% 0% 42.9°C 1.2250V 22:57:58: 1000/1000MHz 1.00 25% 0% 25% 0% 0% 0% 47.8°C 1.2250V 22:59:58: 1000/1000MHz 1.00 25% 0% 25% 0% 0% 0% 46.2°C 1.2250V 23:01:59: 1000/1000MHz 1.00 25% 0% 24% 0% 0% 0% 46.2°C 1.2250V System health while running OpenSSL benchmark: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:02:42: 1000/1000MHz 1.00 23% 0% 22% 0% 0% 0% 45.1°C 1.2250V 23:02:52: 1000/1000MHz 1.00 25% 0% 25% 0% 0% 0% 46.2°C 1.2250V 23:03:02: 1000/1000MHz 1.07 25% 0% 24% 0% 0% 0% 46.2°C 1.2250V 23:03:12: 1000/1000MHz 1.06 25% 0% 25% 0% 0% 0% 46.2°C 1.2250V 23:03:22: 1000/1000MHz 1.05 25% 0% 24% 0% 0% 0% 46.7°C 1.2250V 23:03:32: 1000/1000MHz 1.04 25% 0% 25% 0% 0% 0% 47.2°C 1.2250V 23:03:42: 1000/1000MHz 1.03 25% 0% 25% 0% 0% 0% 46.7°C 1.2250V 23:03:52: 1000/1000MHz 1.03 25% 0% 25% 0% 0% 0% 47.2°C 1.2250V 23:04:02: 1000/1000MHz 1.02 25% 0% 24% 0% 0% 0% 46.7°C 1.2250V 23:04:12: 1000/1000MHz 1.02 25% 0% 24% 0% 0% 0% 47.2°C 1.2250V 23:04:23: 1000/1000MHz 1.02 25% 0% 25% 0% 0% 0% 47.2°C 1.2250V System health while running 7-zip single core benchmark: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:04:30: 1000/1000MHz 1.01 24% 0% 22% 0% 0% 0% 47.2°C 1.2250V 23:05:30: 1000/1000MHz 1.88 25% 0% 24% 0% 0% 0% 47.2°C 1.2250V System health while running 7-zip multi core benchmark: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:05:42: 1000/1000MHz 2.20 24% 0% 23% 0% 0% 0% 47.2°C 1.2250V 23:06:02: 1000/1000MHz 2.17 76% 1% 75% 0% 0% 0% 53.2°C 1.2250V 23:06:22: 1000/1000MHz 2.41 78% 1% 76% 0% 0% 0% 55.3°C 1.2250V 23:06:42: 1000/1000MHz 2.86 77% 1% 75% 0% 0% 0% 56.9°C 1.2250V ########################################################################## dmesg output while running the benchmarks: [ 463.019573] process 'local/src/tinymembench/tinymembench' started with executable stack ########################################################################## Linux 5.10.63-v7+ (tranzystorpl) 11/04/21 _armv7l_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 26.68 0.02 0.91 0.24 0.00 72.16 Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn mmcblk0 4.96 175.52 48.14 197052 54041 total used free shared buff/cache available Mem: 427Mi 97Mi 193Mi 0.0Ki 137Mi 281Mi Swap: 99Mi 8.0Mi 91Mi Filename Type Size Used Priority /var/swap file 102396 8960 -2 Architecture: armv7l Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Vendor ID: ARM Model: 4 Model name: Cortex-A53 Stepping: r0p4 CPU max MHz: 1000.0000 CPU min MHz: 600.0000 BogoMIPS: 64.00 Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32