sbc-bench v0.8.2 Raspberry Pi Zero 2 Rev 1.0 (Thu, 04 Nov 2021 23:22:34 +0000) Distributor ID: Raspbian Description: Raspbian GNU/Linux 10 (buster) Release: 10 Codename: buster Architecture: armhf Raspberry Pi ThreadX version: Oct 29 2021 10:49:08 Copyright (c) 2012 Broadcom version b8a114e5a9877e91ca8f26d1a5ce904b2ad3cf13 (clean) (release) (start) ThreadX configuration (/boot/config.txt): arm_freq=1200 dtparam=audio=on [pi4] dtoverlay=vc4-fkms-v3d max_framebuffers=2 [all] Actual ThreadX settings: aphy_params_current=819 arm_freq=1200 arm_freq_min=600 audio_pwm_mode=514 config_hdmi_boost=5 core_freq=400 desired_osc_freq=0x331df0 disable_commandline_tags=2 disable_l2cache=1 display_hdmi_rotate=-1 display_lcd_rotate=-1 dphy_params_current=547 dvfs=3 enable_tvout=1 force_eeprom_read=1 force_pwm_open=1 framebuffer_ignore_alpha=1 framebuffer_swap=1 gpu_freq=300 ignore_lcd=1 init_uart_clock=0x2dc6c00 max_framebuffers=-1 over_voltage_avs=25000 pause_burst_frames=1 program_serial_random=1 sdram_freq=450 total_mem=512 hdmi_force_cec_address:0=65535 hdmi_force_cec_address:1=65535 hdmi_pixel_freq_limit:0=0x9a7ec80 /usr/bin/gcc (Raspbian 8.3.0-6+rpi1) 8.3.0 Uptime: 23:22:34 up 6 min, 1 user, load average: 0.79, 0.31, 0.13 Linux 5.10.63-v7+ (tranzystorpl) 11/04/21 _armv7l_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 4.28 0.00 2.03 1.13 0.00 92.56 Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn mmcblk0 22.68 697.12 232.70 253844 84733 total used free shared buff/cache available Mem: 427Mi 38Mi 173Mi 0.0Ki 215Mi 336Mi Swap: 99Mi 5.0Mi 94Mi Filename Type Size Used Priority /var/swap file 102396 5888 -2 ########################################################################## Checking cpufreq OPP: Cpufreq OPP: 1200 ThreadX: 1200 Measured: 1200 @ 1.2250V Cpufreq OPP: 1100 ThreadX: 1100 Measured: 1100 @ 1.2250V Cpufreq OPP: 1000 ThreadX: 1000 Measured: 1000 @ 1.2250V Cpufreq OPP: 900 ThreadX: 900 Measured: 900 @ 1.2250V Cpufreq OPP: 800 ThreadX: 800 Measured: 800 @ 1.2250V Cpufreq OPP: 700 ThreadX: 700 Measured: 700 @ 1.2250V Cpufreq OPP: 600 ThreadX: 600 Measured: 600 @ 1.2V ########################################################################## Hardware sensors: cpu_thermal-virtual-0 temp1: +43.5°C rpi_volt-isa-0000 in0: N/A ########################################################################## tinymembench v0.4.9 (simple benchmark for memory throughput and latency) ========================================================================== == Memory bandwidth tests == == == == Note 1: 1MB = 1000000 bytes == == Note 2: Results for 'copy' tests show how many bytes can be == == copied per second (adding together read and writen == == bytes would have provided twice higher numbers) == == Note 3: 2-pass copy means that we are using a small temporary buffer == == to first fetch data into it, and only then write it to the == == destination (source -> L1 cache, L1 cache -> destination) == == Note 4: If sample standard deviation exceeds 0.1%, it is shown in == == brackets == ========================================================================== C copy backwards : 1264.8 MB/s (0.9%) C copy backwards (32 byte blocks) : 1260.8 MB/s C copy backwards (64 byte blocks) : 1260.2 MB/s (0.2%) C copy : 1255.0 MB/s (0.4%) C copy prefetched (32 bytes step) : 1289.8 MB/s C copy prefetched (64 bytes step) : 1291.0 MB/s C 2-pass copy : 1051.3 MB/s C 2-pass copy prefetched (32 bytes step) : 1069.8 MB/s C 2-pass copy prefetched (64 bytes step) : 1071.6 MB/s C fill : 1760.3 MB/s (0.2%) C fill (shuffle within 16 byte blocks) : 1760.1 MB/s C fill (shuffle within 32 byte blocks) : 1760.1 MB/s C fill (shuffle within 64 byte blocks) : 1760.4 MB/s --- standard memcpy : 1282.6 MB/s standard memset : 1760.3 MB/s (0.2%) --- NEON read : 2206.6 MB/s (0.4%) NEON read prefetched (32 bytes step) : 2390.6 MB/s NEON read prefetched (64 bytes step) : 2389.9 MB/s NEON read 2 data streams : 2022.0 MB/s NEON read 2 data streams prefetched (32 bytes step) : 2037.5 MB/s NEON read 2 data streams prefetched (64 bytes step) : 2036.1 MB/s NEON copy : 1248.3 MB/s (0.6%) NEON copy prefetched (32 bytes step) : 1288.7 MB/s NEON copy prefetched (64 bytes step) : 1286.7 MB/s NEON unrolled copy : 1258.6 MB/s NEON unrolled copy prefetched (32 bytes step) : 1282.4 MB/s NEON unrolled copy prefetched (64 bytes step) : 1283.5 MB/s NEON copy backwards : 1252.8 MB/s (0.3%) NEON copy backwards prefetched (32 bytes step) : 1286.6 MB/s NEON copy backwards prefetched (64 bytes step) : 1286.5 MB/s NEON 2-pass copy : 1065.1 MB/s (0.8%) NEON 2-pass copy prefetched (32 bytes step) : 1093.5 MB/s NEON 2-pass copy prefetched (64 bytes step) : 1098.7 MB/s (0.1%) NEON unrolled 2-pass copy : 1042.3 MB/s (0.3%) NEON unrolled 2-pass copy prefetched (32 bytes step) : 1058.0 MB/s NEON unrolled 2-pass copy prefetched (64 bytes step) : 1058.7 MB/s NEON fill : 1759.9 MB/s (0.2%) NEON fill backwards : 1759.6 MB/s VFP copy : 1258.9 MB/s VFP 2-pass copy : 1036.4 MB/s ARM fill (STRD) : 1757.7 MB/s (0.2%) ARM fill (STM with 8 registers) : 1759.7 MB/s ARM fill (STM with 4 registers) : 1759.3 MB/s ARM copy prefetched (incr pld) : 1287.3 MB/s ARM copy prefetched (wrap pld) : 1282.4 MB/s ARM 2-pass copy prefetched (incr pld) : 1074.1 MB/s ARM 2-pass copy prefetched (wrap pld) : 1066.0 MB/s ========================================================================== == Framebuffer read tests. == == == == Many ARM devices use a part of the system memory as the framebuffer, == == typically mapped as uncached but with write-combining enabled. == == Writes to such framebuffers are quite fast, but reads are much == == slower and very sensitive to the alignment and the selection of == == CPU instructions which are used for accessing memory. == == == == Many x86 systems allocate the framebuffer in the GPU memory, == == accessible for the CPU via a relatively slow PCI-E bus. Moreover, == == PCI-E is asymmetric and handles reads a lot worse than writes. == == == == If uncached framebuffer reads are reasonably fast (at least 100 MB/s == == or preferably >300 MB/s), then using the shadow framebuffer layer == == is not necessary in Xorg DDX drivers, resulting in a nice overall == == performance improvement. For example, the xf86-video-fbturbo DDX == == uses this trick. == ========================================================================== NEON read (from framebuffer) : 70.4 MB/s NEON copy (from framebuffer) : 68.0 MB/s NEON 2-pass copy (from framebuffer) : 68.9 MB/s NEON unrolled copy (from framebuffer) : 70.1 MB/s NEON 2-pass unrolled copy (from framebuffer) : 68.6 MB/s VFP copy (from framebuffer) : 454.7 MB/s VFP 2-pass copy (from framebuffer) : 407.1 MB/s (0.3%) ARM copy (from framebuffer) : 221.7 MB/s ARM 2-pass copy (from framebuffer) : 233.5 MB/s (0.2%) ========================================================================== == Memory latency test == == == == Average time is measured for random memory accesses in the buffers == == of different sizes. The larger is the buffer, the more significant == == are relative contributions of TLB, L1/L2 cache misses and SDRAM == == accesses. For extremely large buffer sizes we are expecting to see == == page table walk with several requests to SDRAM for almost every == == memory access (though 64MiB is not nearly large enough to experience == == this effect to its fullest). == == == == Note 1: All the numbers are representing extra time, which needs to == == be added to L1 cache latency. The cycle timings for L1 cache == == latency can be usually found in the processor documentation. == == Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. == ========================================================================== block size : single random read / dual random read 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 5.4 ns / 9.2 ns 131072 : 8.2 ns / 13.1 ns 262144 : 9.6 ns / 14.8 ns 524288 : 10.8 ns / 16.1 ns 1048576 : 77.1 ns / 121.5 ns 2097152 : 113.5 ns / 159.1 ns 4194304 : 138.0 ns / 178.7 ns 8388608 : 150.7 ns / 186.9 ns 16777216 : 158.7 ns / 192.5 ns 33554432 : 163.9 ns / 197.0 ns 67108864 : 166.5 ns / 199.2 ns ########################################################################## OpenSSL 1.1.1d, built on 10 Sep 2019 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 29107.62k 41248.21k 46209.62k 47466.15k 48048.81k 47928.66k aes-128-cbc 29143.23k 41237.25k 46194.09k 47495.85k 47852.20k 48097.96k aes-192-cbc 26449.30k 35751.34k 39591.77k 40637.10k 40812.54k 40965.46k aes-192-cbc 26446.50k 35887.51k 39437.40k 40622.42k 40943.62k 40823.47k aes-256-cbc 24447.40k 32261.12k 35083.52k 36045.14k 36293.29k 36312.41k aes-256-cbc 24352.45k 32265.77k 35223.38k 35912.70k 36301.48k 36317.87k ########################################################################## 7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE) LE CPU Freq: 1124 1105 1196 1198 1199 1198 1198 1199 1198 RAM size: 427 MB, # CPU hardware threads: 4 RAM usage: 234 MB, # Benchmark threads: 4 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 686 100 669 668 | 15651 100 1335 1335 23: 669 100 682 682 | 15303 100 1324 1324 ---------------------------------- | ------------------------------ Avr: 100 675 675 | 100 1330 1330 Tot: 100 1003 1002 ########################################################################## 7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE) LE CPU Freq: 1159 1120 1122 1198 1198 1199 1198 1199 1199 RAM size: 427 MB, # CPU hardware threads: 4 RAM usage: 234 MB, # Benchmark threads: 4 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 1798 281 623 1749 7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE) LE CPU Freq: 1197 1198 1199 1199 1198 1198 1198 1198 1198 RAM size: 427 MB, # CPU hardware threads: 4 RAM usage: 234 MB, # Benchmark threads: 4 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 2056 325 615 2001 7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE) LE CPU Freq: 1186 1196 1199 1199 1199 1199 1198 1199 1198 RAM size: 427 MB, # CPU hardware threads: 4 RAM usage: 234 MB, # Benchmark threads: 4 Compressing | Decompressing Dict Speed Usage R/U Rating | Speed Usage R/U Rating KiB/s % MIPS MIPS | KiB/s % MIPS MIPS 22: 2049 324 616 1994 Compression: Decompression: Total: ########################################################################## Testing clockspeeds again. System health now: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:32:35: 1200/1200MHz 2.53 57% 1% 56% 0% 0% 0% 56.4°C 1.2250V Checking cpufreq OPP: Cpufreq OPP: 1200 ThreadX: 1200 Measured: 1200 @ 1.2250V Cpufreq OPP: 1100 ThreadX: 1100 Measured: 1100 @ 1.2250V Cpufreq OPP: 1000 ThreadX: 1000 Measured: 1000 @ 1.2250V Cpufreq OPP: 900 ThreadX: 900 Measured: 900 @ 1.2250V Cpufreq OPP: 800 ThreadX: 800 Measured: 800 @ 1.2250V Cpufreq OPP: 700 ThreadX: 700 Measured: 700 @ 1.2250V Cpufreq OPP: 600 ThreadX: 600 Measured: 600 @ 1.2V ########################################################################## Hardware sensors: cpu_thermal-virtual-0 temp1: +54.8°C rpi_volt-isa-0000 in0: N/A ########################################################################## System health while running tinymembench: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:22:40: 1200/1200MHz 0.81 7% 2% 4% 0% 1% 0% 45.1°C 1.2250V 23:24:40: 1200/1200MHz 0.98 25% 0% 25% 0% 0% 0% 51.5°C 1.2250V 23:26:40: 1200/1200MHz 1.00 25% 0% 24% 0% 0% 0% 47.2°C 1.2250V 23:28:40: 1200/1200MHz 1.00 25% 0% 25% 0% 0% 0% 48.3°C 1.2250V System health while running OpenSSL benchmark: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:29:19: 1200/1200MHz 1.00 16% 0% 15% 0% 0% 0% 48.3°C 1.2250V 23:29:29: 1200/1200MHz 1.00 25% 0% 25% 0% 0% 0% 49.4°C 1.2250V 23:29:40: 1200/1200MHz 1.00 25% 0% 25% 0% 0% 0% 49.9°C 1.2250V 23:29:50: 1200/1200MHz 1.00 25% 0% 25% 0% 0% 0% 50.5°C 1.2250V 23:30:00: 1200/1200MHz 1.00 25% 0% 24% 0% 0% 0% 50.5°C 1.2250V 23:30:10: 1200/1200MHz 1.00 25% 0% 25% 0% 0% 0% 50.5°C 1.2250V 23:30:20: 1200/1200MHz 1.00 25% 0% 24% 0% 0% 0% 50.5°C 1.2250V 23:30:30: 1200/1200MHz 1.00 25% 0% 24% 0% 0% 0% 51.0°C 1.2250V 23:30:40: 1200/1200MHz 1.00 25% 0% 24% 0% 0% 0% 50.5°C 1.2250V 23:30:50: 1200/1200MHz 1.00 25% 0% 24% 0% 0% 0% 51.5°C 1.2250V 23:31:00: 1200/1200MHz 1.00 25% 0% 25% 0% 0% 0% 52.1°C 1.2250V System health while running 7-zip single core benchmark: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:31:08: 1200/1200MHz 1.00 17% 0% 16% 0% 0% 0% 51.5°C 1.2250V 23:32:08: 1200/1200MHz 2.38 25% 0% 24% 0% 0% 0% 51.5°C 1.2250V System health while running 7-zip multi core benchmark: Time fake/real load %cpu %sys %usr %nice %io %irq Temp VCore 23:32:15: 1200/1200MHz 2.51 18% 0% 16% 0% 0% 0% 52.1°C 1.2250V 23:32:35: 1200/1200MHz 2.53 57% 1% 56% 0% 0% 0% 56.4°C 1.2250V ########################################################################## dmesg output while running the benchmarks: [ 369.746053] process 'local/src/tinymembench/tinymembench' started with executable stack ########################################################################## Linux 5.10.63-v7+ (tranzystorpl) 11/04/21 _armv7l_ (4 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 18.57 0.01 0.93 0.43 0.00 80.06 Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn mmcblk0 8.78 264.39 94.44 260608 93093 total used free shared buff/cache available Mem: 427Mi 90Mi 177Mi 0.0Ki 160Mi 286Mi Swap: 99Mi 11Mi 88Mi Filename Type Size Used Priority /var/swap file 102396 12032 -2 Architecture: armv7l Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Vendor ID: ARM Model: 4 Model name: Cortex-A53 Stepping: r0p4 CPU max MHz: 1200.0000 CPU min MHz: 600.0000 BogoMIPS: 76.80 Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32