The Five Best Things: Dec 31, 2020

Deep dive on GPT-3

Hi all, Happy New Year! I hope your 2021 is better than 2020, and that you were able to catch a breath the last two weeks. I got thrown off the writing schedule a bit; my kids decided they would not let me sleep for a stretch of a few days.

An all GPT-3 edition today. If you recall, we’ve talked about the attention mechanism underpinning most large natural language models, which tend to be based on the transformer architecture. An example of such a large model is OpenAI’s GPT-2. Today, let’s dig in to its successor GPT-3, which was released this year to a limited set of folks, and caused a huge splash when the demos based on its technologies were rolled out in June and July.

The Five Best Things

  1. GPT-3: What's Hype, What's Real

    • This podcast with Sonal Chokshi and Frank Chen is the best layperson overview of GPT-3 and its implications that I’ve come across. It’s only ~30 minutes and covers the technology, the demos released so far, and discusses if it passes the Turing test and if it’s going to be a job killer.

    • Some terminology: GPT-3 (and other transformer models) are called few-shot or zero-shot learners, in that they get a few or no “shots” to learn before they can be applied to a problem.

  2. Demos of GPT-3

  3. Ben Dickson: The GPT-3 Economy

    • After rolling out the API in June, OpenAI turned off the spigot in October, and Microsoft declared it would be exclusively licensing GPT-3 as part of its ongoing $1B partnership with OpenAI. The pricing model is quite interesting and Ben Dickson discusses the implications of it here. Instead of open sourcing the model, OpenAI will charge for it on a subscription basis, to companies they deem to be ethical and for non-harmful purposes. It’s unclear how much of this curation control rests with Microsoft. A few demos folded because they are unable to afford the costs.

    • Lambda labs estimated that a GPT-3 like model will required $4.6M to train once; it likely went through several rounds of hyperparameter tuning, putting the costs of training at at least 5x that amount. Post training, it also has to be served, costs of which are estimated at $100,000 - $150,000/year. In addition, there’s the research lab’s staff salaries. You’re easily looking at a $30M-$100M annual burn rate. Another estimate pegs OpenAI’s margins at 60x cloud operating costs.

    • Andrew Mayne’s blog covers ways in which companies can reduce costs on GPT-3. Ben’s follow up post presents some interesting potential outcomes from OpenAI’s Microsoft partnership.

  4. Renee DiResta: The Supply of Disinformation Will Soon Be Infinite

    • Renee DiResta, research manager at the Stanford Internet Observatory wrote a compelling piece on how GPT-3 or copycat technologies will enable content farms, dis- and mis-information spreaders to go even further into overdrive in the near future. When the marginal cost of generating malicious content trends to zero, it’s going to remove any and all friction for bad actors. This position is corroborated by the Middlebury Institute of International Studies.

    • Renee is a complete bad ass and I highly recommend you follow her work. Her warnings were prescient - a GPT-3 based bot was found to be posting content and engaging with comments on Reddit, as if it were a real person.

    • Could identity verification on the internet finally be THE killer use case for Blockchain?

  5. Page Street Labs: GPT-3 and A Typology of Hype

    • An excellent framework from Delip Rao at Page Street Labs on how to assess emergent technologies with lots of hype surrounding them. I especially encourage you to read the summary!

    • If you’re in the mood for an even longer discussion about GPT-3, I suggest Gwern’s May 2020 newsletter. Yann LeCun also gives a measured opinion here.

Honorable Mentions

Disclaimer: The views and opinions expressed in this post are my own and do not represent my employer.