Are there any uncensored AI assistants?

monis@ttrpg.network · 2 days ago

Are there any uncensored AI assistants?

De Lancre@lemmy.world · 21 hours ago

is there a general term for the setting that offloads the model into RAM? I’d love to be able to load larger models.

Ollama does that by default, but prioritizes gpu above regular ram and cpu. In fact, it’s other feature that often doesn’t work, cause they can’t fix the damn bug that we reported a year ago - mmap. That feature allows you to load and use model directly from disk (alto, incredibly slow, but allows to run something like deepseek that weight ~700gb with at least 1-3 token\s).

num_gpu allows you to specify how much to load into GPU vram, the rest will be swapped to regular RAM.