What OpenAI’s Codex Actually Does For Programmers

Posted on July 29, 2021

10 min read

2,303

In July 2020, OpenAI made waves with a private-beta version of its text generating language model titled Generative Pre-trained Transformer 3, or GPT-3. GPT-3 is a deep learning application that falls under a domain of Artificial Intelligence (AI) termed ‘Natural Language Processing’ (NLP), which refers to the problem of computers dealing with human text. GPT-3 can do amazing things:

Respond to questions.
Write document summaries.
Create Shakespeare-like poems.
Write articles and customized resumes.
Translate text in real-time to another language
Anything else that can be imagined with text.

Access to GPT-3 is available via an API. There are other options for those needing more direct control over the development of their deep learning models. One way is to learn to scale deep learning infrastructure with Spell.ml’s technology.

There is one task GPT-3 was not tuned for — writing source code for programs. OpenAI had plans for that: a descendant of GPT-3 named Codex. Codex is a specialized GPT model trained with publicly available code from GitHub that can produce Python code from natural language (function signature and description). GPT-3 was never trained with coding samples, while the training dataset of Codex consists of 12 billion parameters honed with 159 gigabytes of GitHub code samples.

Expected to be released later this summer (of 2021), Codex should be able to solve 28.8 percent of coding problems during its first attempt. For comparison, GPT-3 can solve no coding problems whatever.

Codex can be made to increase its coding success rate with supervision and with extra attempts. Codex-S, a version of Codex refined via supervised learning, reports an increased performance level of 37.7 percent. Given 100 attempts, Codex can provide the right solution for about 72 percent of coding problems.

OpenAI and GitHub have teamed up to deliver a new tool, GitHub Copilot, that produces source code for programmers. The system — powered by Codex — can furnish code in almost any programming language, although it works best with scripting languages like Python and JavaScript. OpenAI is expected to release the underlying online service for use this summer.

Even though Codex cannot reliably solve all coding problems, Codex could increase productivity among AI developers who might delegate relatively trivial coding tasks to Codex. Codex will never replace programmers or AI developers, though — developers will always need to review and revise the code generated by Codex.

Codex’s Training Model

In its paper on developing the Codex training model, OpenAI states, “Our training dataset was collected in May 2020 from 54 million public software repositories hosted on GitHub, containing 179 GB of unique Python files under 1 MB. We filtered out files that were likely auto-generated, had an average line length greater than 100, had a maximum line length greater than 1000, or contained a small percentage of alphanumeric characters. After filtering, our final dataset totaled 159 GB”.

Codex works by generating standalone Python functions from function signatures and descriptions (‘docstrings’). These functions are then evaluated for correctness by passing them through unit tests. This contrasts with natural language generation, wherein samples are usually evaluated by heuristics or by human evaluators.

It’s important to note that Codex does not understand programming, no matter how good its output code is. Like all deep learning-based language models, Codex sweeps through statistical correlations between code fragments it has previously encountered to generate code samples. If a sample passes all unit tests, it is said to have achieved ‘functional correctness,’ the metric OpenAI recommends for assessing code-generating language models like Codex.

Codex uses the contents of the file you are working on as context when generating its output. If your code contains subtle bugs, Codex may suggest code that appears good but is incorrect, OpenAI researchers warn. Codex may further produce deprecated and vulnerable code with security flaws. In some cases, the model may knit together fragments of code it has seen earlier, even if they don’t mesh correctly.

A Large Language Model

OpenAI’s findings suggest that deep learning is still ruled by the “no free lunch” theorem (NFL), which means that generalization comes at the cost of performance. Language models are more accurate when they are designed to solve one specific problem. When the problem domain is expanded, performance decreases. The broader language model GPT-3 was trained with 175 billion parameters (compared to just 12 billion for Codex), but GPT-3 cannot generate any correct source code.

Training language models with large datasets can lead to over-assembly, where the model is good at remembering and practicing its training examples but poor at handling new situations. Another problem in deep learning language models is memory span. The AI gets incoherent as the text it generates becomes longer. Experiments show that an extensive neural network (the core of a deep learning system that simulates the functioning of the human brain) invariably has a long memory span, increasing the potential of its misuse.

A New AI Programming Partner

While GPT-3 cannot furnish source code, Codex holds the promise of ‘AI pair programmer’ — a tool that writes source code on behalf of the developer, who has to review and perhaps revise the Codex sample.

The emergence of Codex signifies an increase in learning capacity for language models. Since Codex is a deep learning model, it can perform tasks it wasn’t necessarily trained for and could be further tuned to deliver more complex AI solutions with additional custom training.

OpenAI admits Codex still has issues related to generalization with its machine learning (ML) processes. Codex is better at performing a specific task than a range of tasks, like GPT-3, a general language model that can produce high-quality text about various topics (though it can’t write code as Codex does).

Codex’s Training Model

A Large Language Model

A New AI Programming Partner

Leave a Reply Cancel reply

Check Also

About us

Follow us