De Lancre

De Lancre@lemmy.world · 1 day ago

is there a general term for the setting that offloads the model into RAM? I’d love to be able to load larger models.

Ollama does that by default, but prioritizes gpu above regular ram and cpu. In fact, it’s other feature that often doesn’t work, cause they can’t fix the damn bug that we reported a year ago - mmap. That feature allows you to load and use model directly from disk (alto, incredibly slow, but allows to run something like deepseek that weight ~700gb with at least 1-3 token\s).

num_gpu allows you to specify how much to load into GPU vram, the rest will be swapped to regular RAM.

De Lancre@lemmy.world · 1 day ago

You’d need ollama (local) and custom models from huggingface.

Half of the charm in using ollama - ability to install models in one command, instead of searching for correct file format and settings on huggingface.

De Lancre@lemmy.world · 1 day ago

for example:

Isn’t that one also pretty censored? Really uncensored one usually either builded from scratch (behemoth or midnight-miqu as example) or named accordingly: mixtral-uncensored or llama3-ablitered.

De Lancre@lemmy.world · 1 day ago

I spotted 2 more. TBF, it’s kinda catchy. I can almost swear I saw someone on e621\youtube\tweeter\etc. with that nickname.

De Lancre@lemmy.world · 1 day ago

Lily the Fox

Ella the Wolf

Nellie the Pig

Grizzly Bear

Furry. Furry never changes.

De Lancre@lemmy.world · 2 days ago

Would be cool if damn thing stopped crashing each time I copy-paste folders. Like, optimizations are cool, but isn’t it kinda less important?

De Lancre@lemmy.world · 2 days ago

Disgusting

Do you have more stories like that per chance?