You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 28, 2026. It is now read-only.
Modify regkey:
[HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers\MemoryManager] "SystemPartitionCommitLimitPercentage"
set it to 0x4b ,which means use 75% memory as shared gpu memory
llama-cli.exe -m ..\models\Qwen3\Qwen3-32B-Q8_0.gguf -p "how to become an expert on GPU driver" -n 2048 -e -ngl 999 --color -c 2500 --temp 0 -no-cnv
Then will meet issue:
llama_context: SYCL_Host output buffer size = 0.58 MiB
llama_kv_cache_unified: SYCL0 KV buffer size = 632.00 MiB
llama_kv_cache_unified: size = 632.00 MiB ( 2528 cells, 64 layers, 1 seqs), K (f16): 316.00 MiB, V (f16): 316.00 MiB
llama_context: SYCL0 compute buffer size = 1497.80 MiB
llama_context: SYCL_Host compute buffer size = 73.54 MiB
llama_context: graph nodes = 2054
llama_context: graph splits = 2
common_init_from_params: setting dry_penalty_last_n to ctx_size = 2528
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable) could not create a primitive descriptor for a matmul primitive Exception caught at file:D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\llama-cpp-bigdl\ggml\src\ggml-sycl\ggml-sycl.cpp, line:3552, func:operator()
SYCL error: CHECK_TRY_ERROR(op(ctx, src0, src1, dst, src0_dd_i, src1_ddf_i, src1_ddq_i, dst_dd_i, dev[i].split_dim_low, dev[i].split_dim_high, src1_ncols, src1_padded_col_size, stream)): Exception caught in this line of code.
in function ggml_sycl_op_mul_mat at D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\llama-cpp-bigdl\ggml\src\ggml-sycl\ggml-sycl.cpp:3552
D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\llama-cpp-bigdl\ggml\src\ggml-sycl..\ggml-sycl\common.hpp:127: SYCL error
Describe the bug
[HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers\MemoryManager] "SystemPartitionCommitLimitPercentage"
set it to 0x4b ,which means use 75% memory as shared gpu memory
Then will meet issue:
llama_context: SYCL_Host output buffer size = 0.58 MiB
llama_kv_cache_unified: SYCL0 KV buffer size = 632.00 MiB
llama_kv_cache_unified: size = 632.00 MiB ( 2528 cells, 64 layers, 1 seqs), K (f16): 316.00 MiB, V (f16): 316.00 MiB
llama_context: SYCL0 compute buffer size = 1497.80 MiB
llama_context: SYCL_Host compute buffer size = 73.54 MiB
llama_context: graph nodes = 2054
llama_context: graph splits = 2
common_init_from_params: setting dry_penalty_last_n to ctx_size = 2528
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
could not create a primitive descriptor for a matmul primitive
Exception caught at file:D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\llama-cpp-bigdl\ggml\src\ggml-sycl\ggml-sycl.cpp, line:3552, func:operator()
SYCL error: CHECK_TRY_ERROR(op(ctx, src0, src1, dst, src0_dd_i, src1_ddf_i, src1_ddq_i, dst_dd_i, dev[i].split_dim_low, dev[i].split_dim_high, src1_ncols, src1_padded_col_size, stream)): Exception caught in this line of code.
in function ggml_sycl_op_mul_mat at D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\llama-cpp-bigdl\ggml\src\ggml-sycl\ggml-sycl.cpp:3552
D:\actions-runner\release-cpp-oneapi_2024_2_work\llm.cpp\llm.cpp\llama-cpp-bigdl\ggml\src\ggml-sycl..\ggml-sycl\common.hpp:127: SYCL error