• MrSulu@lemmy.ml
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    2 days ago

    It really is the equivalent of having unsupervised Cookie Monster rebuild your car engine (yes, two very old and outdated references).

  • TankieTanuki [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    25
    arrow-down
    2
    ·
    2 days ago

    Open-source developer Scott Shambaugh deleted an AI-generated code submission to Matplotlib, which has 130 million users. In response, the AI agent (of unknown ownership) created a blog post publicly lashing out at him.

    wut

  • mindwanderer@feddit.org
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    10
    ·
    2 days ago

    could you use AI to evaluate whether the code submissions are actually valuable. Would not work 100% of the time but it might help. I mean we are fucked at this point so you might as well try.

    Fighting fire with fire i guess.

    Just run it locally and not on a data center. Dont support big tech and help them destroy our planet.

    It is truly interesting how AI is mostly a solution to problems it caused in the first place.

    • brucethemoose@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      8 hours ago

      I suspect it would work. But the false positive rate would be really high.

      In other words, they could probably detect sloppy junk reasonably well, but I suspect it would flag too many human PRs to make the automation particularly useful.


      That, and the good seeming vibe coded PRs are the ones the worry about. Those are the ones that seem to slot in, but might have an error or general misunderstanding somewhere in them that’s just really hard to detect, as it would be common sense to a human working on the project, but not to an LLM agent.

      As a random specific example, I had a local LLM + Gemini 3.1 fix this issue with a Rimworld mod for me. It was really simple; just changing one line in an XML file.

      But neither of them realized the change was, ultimately, bad practice. They re-defined something inherited from a parent class, which would prevent other mods’ changes in that parent class chain from percolating down to this. Any basic Rimworld modder would know this is a recipe for trouble, but an LLM isn’t cognizant like that and has no clue.

      Now: imagine that, but in a huge PR for a complex codebase.

      It’s just too much to look for. The LLM could make a non-obvious, “inhuman” mistake at any point.

    • IsaamoonKHGDT_6143@lemmy.zip
      link
      fedilink
      English
      arrow-up
      3
      ·
      21 hours ago

      It depends on which AI model you’re using. If it’s an older one, it’s not recommended.

      If it’s a recently released model, then it’s probably acceptable. However, don’t use it for highly critical tasks unless you have a backup or two. Of course, you could also verify the code’s integrity as an extra safety measure.

    • luciferofastora@feddit.org
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      13 hours ago

      Ah yes, let’s let the AI qualify AI code submissions.

      At that point, why not automate the whole process? Have an AI guess what kind of software you might be interested in, slop it together, evaluate and criticise it, suggest amendments, evaluate the amendment, include it, build the product, ship it, install it directly to your machine for your convenience, then proceed to operate it for you so you can automate sloppy execution of a sloppy task you never wanted to do, in a sloppy tool you never asked you for a purpose a random generator slopped together without your input.

      • mindwanderer@feddit.org
        link
        fedilink
        English
        arrow-up
        2
        ·
        8 hours ago

        The problem is, that FOSS devs get spammed with a 1000 times the same bugs found by the same tools put into a bug report with no human oversight by people who have no clue what they are doing and often dont give an answer on how to solve the problem.

        My idea was to use AI to identify the 1000 bug reports about the same issue and make sure that you don’t have to read every single one of them. That way you could sort out the spam and reduce the amount of slop the devs have to deal with.

    • idriss@lemmy.ml
      link
      fedilink
      arrow-up
      16
      arrow-down
      1
      ·
      2 days ago

      have a really bad dev next to you? just hire the same dev a second time and tell them they are now full time reviewers.

    • ToastedRavioli@midwest.social
      link
      fedilink
      arrow-up
      9
      arrow-down
      1
      ·
      2 days ago

      Good luck getting an AI thats trained to blow smoke up everyones ass to actually critically assess anything, instead of just saying everything is amazing