

4·
2 years agoJust beware that like AMD, Intel GPUs suffer a performance hit when using LLMs because of the CUDA specific optimizations in frameworks like llama.cpp


Just beware that like AMD, Intel GPUs suffer a performance hit when using LLMs because of the CUDA specific optimizations in frameworks like llama.cpp
I’ve heard good things about Instander, but it may not satisfy all the requirements of an alternative frontend.
Do you have any tips (or examples) using quadlets? I tried using them but I couldn’t wrap my head around them.