========== onnx ========== ========== onnxsim ========== ========== model-convert ========== rm -rf /data/output ax650n npu mode:3 pulsar2 build --input /tmp/datadir/ax650n_profiler/resnet18-e657bb.onnx --output_dir /data/output --config /data/config/bgr_npu3_config.json Building onnx ---------------------------------------- 100% 0:00:00 Quant Config Table ------- ------------------ ------------------- ------------- --------------- -------------------------------------------------------------- -------------------- | Input | Shape | Dataset Directory | Data Format | Tensor Format | Mean | Std | ------- ------------------ ------------------- ------------- --------------- -------------------------------------------------------------- -------------------- | input | [1, 3, 182, 278] | input | Image | BGR | [103.93900299072266, 116.77899932861328, 123.68000030517578] | [58.0, 58.0, 58.0] | ------- ------------------ ------------------- ------------- --------------- -------------------------------------------------------------- -------------------- Transformer optimize level: 0 32 File(s) Loaded. [20:25:45] AX LSTM Operation Format Pass Running ... Finished. [20:25:45] AX Set MixPrecision Pass Running ... Finished. [20:25:45] AX Refine Operation Config Pass Running ... Finished. [20:25:45] AX Reset Mul Config Pass Running ... Finished. [20:25:45] AX Tanh Operation Format Pass Running ... Finished. [20:25:45] AX Confused Op Refine Pass Running ... Finished. [20:25:46] AX Quantization Fusion Pass Running ... Finished. [20:25:46] AX Quantization Simplify Pass Running ... Finished. [20:25:46] AX Parameter Quantization Pass Running ... Finished. [20:25:46] AX Runtime Calibration Pass Running ... Finished. [20:26:04] AX Passive Parameter Quantization Running ... Finished. [20:26:04] AX Parameter Baking Pass Running ... Finished. [20:26:04] AX Refine Int Parameter Pass Running ... Finished. [20:26:04] AX Refine Weight Parameter Pass Running ... Finished. --------- Network Snapshot --------- Num of Op: [49] Num of Quantized Op: [49] Num of Variable: [93] Num of Quantized Var: [93] ------- Quantization Snapshot ------ Num of Quant Config: [149] BAKED: [21] OVERLAPPED: [74] ACTIVATED: [32] SOI: [1] PASSIVE_BAKED: [21] Network Quantization Finished. quant.axmodel export success: /data/output/quant/quant_axmodel.onnx ===>export input/output data to folder: /data/output/quant/debug/test_data_set_0 Building native ---------------------------------------- 100% 0:00:00 tiling op... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 34/34 0:00:00 new_ddr_tensor = [] build op... ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 200/200 0:00:02 add ddr swap... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 944/944 0:00:00 calc input dependencies... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 1270/1270 0:00:00 calc output dependencies... ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1270/1270 0:00:00 assign eu heuristic ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1270/1270 0:00:00 assign eu onepass --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1270/1270 0:00:00 assign eu greedy ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1270/1270 0:00:00 2023-09-19 20:25:43.825 | WARNING | yamain.command.build:fill_default:320 - ignore input csc config because of src_format is AutoColorSpace or src_format and tensor_format are the same 2023-09-19 20:25:44.288 | INFO | yamain.command.build:build:444 - save optimized onnx to [/data/output/frontend/optimized.onnx] 2023-09-19 20:25:44.289 | INFO | yamain.common.util:extract_archive:21 - extract [/data/dataset/imagenet-32-images.tar] to [/data/output/quant/dataset/input]... Calibration Progress(Phase 1): 0%| | 0/32 [00:00, 'quant_method': 0} 2023-09-19 20:26:05.913 | INFO | yamain.command.load_model:pre_process:456 - tensor: tensor:pre_norm_1, (1, 182, 278, 3), FP32 2023-09-19 20:26:05.913 | INFO | yamain.command.load_model:pre_process:456 - op: op:pre_norm_1, AxNormalize, {'dim': 3, 'mean': [103.93900299072266, 116.77899932861328, 123.68000030517578], 'std': [58.0, 58.0, 58.0]} 2023-09-19 20:26:05.913 | INFO | yamain.command.load_model:pre_process:456 - tensor: tensor:pre_transpose_1, (1, 182, 278, 3), FP32 2023-09-19 20:26:05.913 | INFO | yamain.command.load_model:pre_process:456 - op: op:pre_transpose_1, AxTranspose, {'perm': [0, 3, 1, 2]} :186: RuntimeWarning: divide by zero encountered in divide :187: RuntimeWarning: invalid value encountered in divide 2023-09-19 20:26:09.260 | INFO | yasched.test_onepass:results2model:2004 - max_cycle = 4,233,946 2023-09-19 20:26:09.756 | INFO | yamain.command.build:compile_npu_subgraph:1076 - QuantAxModel macs: 27,685,496,576 2023-09-19 20:26:09.756 | INFO | yamain.command.build:compile_npu_subgraph:1084 - use random data as gt input: input, uint8, (1, 182, 278, 3) 2023-09-19 20:26:17.779 | INFO | yamain.command.build:compile_ptq_model:1003 - fuse 1 subgraph(s) ========== copy-to-device ========== scp /data/output/compiled.axmodel khj@10.1.52.47:~/ ========== profiling ========== ssh khj@10.1.52.47 "sudo /opt/bin/ax_run_model -m compiled.axmodel -w 10 -r 100" Run AxModel: model: compiled.axmodel type: NPU3 vnpu: Disable affinity: 0b001 repeat: 100 warmup: 10 batch: 1 pulsar2 ver: 1.9 c62d0b64 engine ver: [Axera version]: libax_engine.so V1.27.0_P3_20230627143603 Jun 27 2023 14:58:22 JK 1.1.0 tool ver: 1.0.0 cmm size: 11390772 Bytes ------------------------------------------------------ min = 5.422 ms max = 5.470 ms avg = 5.426 ms ------------------------------------------------------ sudo: unable to resolve host maixbox: Temporary failure in name resolution rm -rf compiled.axmodel ========== megpeak CPU perf ========== there are 16 cores, currently use core id :0 Vendor is: Intel, uArch: unknown, frequency: 0Hz bandwidth: 29.318362 Gbps vfmadd132ps_avx throughput: 0.105750 ns 151.300568 GFlops latency: 0.841503 ns : vfmadd132pd_avx throughput: 0.106073 ns 75.419495 GFlops latency: 0.848878 ns : vpmaddwd_avx2 throughput: 0.116324 ns 206.319443 GFlops latency: 1.060577 ns : vpaddd_avx2 throughput: 0.070282 ns 113.827446 GFlops latency: 0.211323 ns : vpand_avx2 throughput: 0.070107 ns 114.111679 GFlops latency: 0.211675 ns : vpmaddwd_vpaddd_avx2 throughput: 0.170288 ns 187.917419 GFlops latency: 1.356159 ns : vpackssdw_avx2 throughput: 0.211504 ns 75.648819 GFlops latency: 0.630753 ns : vpacksswb_avx2 throughput: 0.210157 ns 152.267380 GFlops latency: 0.632621 ns : vpmaddwd_512 throughput: 0.211212 ns 227.259674 GFlops latency: 1.058113 ns : vpaddd_512 throughput: 0.105371 ns 151.844269 GFlops latency: 0.211014 ns : vfmadd132ps_512 throughput: 0.211028 ns 151.638626 GFlops latency: 0.848180 ns : vpdpbusd_vnni throughput: 0.213610 ns 524.321045 GFlops latency: 1.069391 ns : mulps_sse throughput: 0.106581 ns 75.060257 GFlops latency: 0.853457 ns : mulpd_sse throughput: 0.106173 ns 37.674252 GFlops latency: 0.848591 ns : vfmadd132ps_sse throughput: 0.106421 ns 75.173355 GFlops latency: 0.844646 ns : vpmaddwd_vpaddd_sse throughput: 0.169533 ns 94.377151 GFlops latency: 1.351309 ns : ========== task ========== taskid: 1013receiver: ax650n profiler