Is anyone actually surprised by this?

  • ayaya@lemdro.idEnglish
    16·
    6 months ago

    This is mildly pedantic but you’re not actually running Deepseek R1, you’re running a 7B version of Qwen that’s been fine-tuned on Deepseek R1 outputs. All of the “distilled” models are existing models trained on R1.

      • stink@lemmygrad.mlEnglish
        134·
        6 months ago

        If you don’t know what you are doing please stop trying to act like an expert in the subject.

        • mac@lemm.ee
          54·
          6 months ago

          When did they claim to be an expert??