The problem is simple: consumer motherboards don’t have that many PCIe slots, and consumer CPUs don’t have enough lanes to run 3+ GPUs at full PCIe gen 3 or gen 4 speeds.

My idea was to buy 3-4 computers for cheap, slot a GPU into each of them and use 4 of them in tandem. I imagine this will require some sort of agent running on each node which will be connected through a 10Gbe network. I can get a 10Gbe network running for this project.

Does Ollama or any other local AI project support this? Getting a server motherboard with CPU is going to get expensive very quickly, but this would be a great alternative.

Thanks

  • marauding_gibberish142@lemmy.dbzer0.comOPEnglish
    42·
    3 months ago

    I’m not going to do anything enterprise. I’m not sure how people seem to think of it this way when I didn’t even mention it.

    I plan to use 4 GPUs with 16-24GB VRAM each to run smaller 24B models.

    • False@lemmy.worldEnglish
      71·
      3 months ago

      I didn’t say you were, I said you were asking about a topic that enters that area.

    • Xanza@lemm.eeEnglish
      22·
      3 months ago

      I’m not going to do anything enterprise.

      You are, though. You’re creating a GPU cluster for generative AI which is an enterprise endeavor…

      • marauding_gibberish142@lemmy.dbzer0.comOPEnglish
        2·
        3 months ago

        Specifically because PCIe slots go for a premium on motherboards and CPU architectures. If I didn’t have to worry about PCIe I wouldn’t care about a networked AI cluster. But yes, I accept what you say