onsdag 3 juni 2020

Is GPT-3 one more step towards artificial general intelligence?

How does the pandemic affect the long-distance race between (and now I borrow Nick Bostrom's metaphor) the galloping stallion which is humanity's technological capability, and the foal on unsteady legs which is humanity's wisdom? I don't know, but it does seem that AI development is the kind of activity that can move on pretty much unhampered by lockdowns etc. Last week, OpenAI announced their stunning natural language processor GPT-3, which is an enormously scaled-up sequel to their amazing and much-discussed GPT-2 last year. Yannic Kilcher at ETH has gone over their paper carefully and offers his insights in a very instructive video lecture:

The attentive viewer will notice that Kilcher, while accepting that GPT-3 does a bunch of impressive-looking things, what it does is not so much reasoning as pattern matching and copy-paste. This can be interpreted as Kilcher leaning in the direction of a negative answer to the question in the title of the present blog post. I am personally somewhat more inclined towards a positive answer, as is prominent tech blogger gwern:
    GPT-3 is scary because it’s a tiny model compared to what’s possible, with a simple uniform architecture trained in the dumbest way possible (prediction of next text token) on a single impoverished modality (random Internet text dumps) on tiny data (fits on a laptop), and yet, the first version already manifests crazy runtime meta-learning - and the scaling curves still are not bending! [...] In 2010, who would have predicted these enormous models would just develop all these capabilities spontaneously, aside from a few diehard connectionists written off as willfully-deluded old-school fanatics by the rest of the AI community? [...] GPT-3 is hamstrung by its training & data, but just simply training a big model on a lot of data induces meta-learning without even the slightest bit of meta-learning architecture being built in; and in general, training on more and harder tasks creates ever more human-like performance, generalization, and robustness.
The semi-fictitional dialogue between an anonymous machine learning researcher NN, and Scott Alexander, in connection with last year's release of GPT-2, is worth recalling again:
    NN: I still think GPT-2 is a brute-force statistical pattern matcher which blends up the internet and gives you back a slightly unappetizing slurry of it when asked.

    SA: Yeah, well, your mom is a brute-force statistical pattern matcher which blends up the internet and gives you back a slightly unappetizing slurry of it when asked.

1 kommentar:

  1. If the thorough dissection in the video wasn't enough, you could get informed opinions on this from NLP and cognitive science researchers at Chalmers and GU.

    SvaraRadera