You’d need ollama (local) and custom models from huggingface.
Half of the charm in using ollama - ability to install models in one command, instead of searching for correct file format and settings on huggingface.
You’d need ollama (local) and custom models from huggingface.
Half of the charm in using ollama - ability to install models in one command, instead of searching for correct file format and settings on huggingface.
for example:
Isn’t that one also pretty censored? Really uncensored one usually either builded from scratch (behemoth or midnight-miqu as example) or named accordingly: mixtral-uncensored or llama3-ablitered.
I spotted 2 more. TBF, it’s kinda catchy. I can almost swear I saw someone on e621\youtube\tweeter\etc. with that nickname.
Lily the Fox
Ella the Wolf
Nellie the Pig
Grizzly Bear
Furry. Furry never changes.


Would be cool if damn thing stopped crashing each time I copy-paste folders. Like, optimizations are cool, but isn’t it kinda less important?
Disgusting
Do you have more stories like that per chance?
Ollama does that by default, but prioritizes gpu above regular ram and cpu. In fact, it’s other feature that often doesn’t work, cause they can’t fix the damn bug that we reported a year ago -
mmap. That feature allows you to load and use model directly from disk (alto, incredibly slow, but allows to run something like deepseek that weight ~700gb with at least 1-3 token\s).num_gpu allows you to specify how much to load into GPU vram, the rest will be swapped to regular RAM.