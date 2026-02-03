OpenAI has released a new macOS app for Codex, aiming to close the gap with rival agentic coding tools by integrating multi-agent workflows, background automation, and a more flexible interface for complex software development tasks.

Codex Expands Beyond The Command Line

The new Codex app represents OpenAI’s most significant update to its coding tools since Codex first launched as a command-line product last April and later gained a web interface. The macOS app is built to support parallel work by multiple AI agents, reflecting the rapid shift toward agentic software development, where AI systems independently handle substantial portions of coding work.

The launch comes less than two months after OpenAI introduced GPT-5.2-Codex, its most advanced coding model to date, as the company seeks to attract developers currently using competing tools such as Claude Code and Cowork.

Focus On Agentic Workflows

According to OpenAI, the Codex app incorporates practices that have become common among developers experimenting with human-AI collaboration. The app allows multiple agents to operate simultaneously, share state, and apply specialized skills across tasks, enabling workflows that go beyond single-prompt interactions.

The tool also supports background automations that can run on a schedule, placing completed results into a queue for later review. OpenAI said this is intended to let developers offload longer-running tasks while focusing on higher-level design decisions.

Executive Perspective On Model Capability

Speaking during a press call, Sam Altman said the company believes the combination of GPT-5.2-Codex and the new interface significantly improves usability for complex projects.

Altman said GPT-5.2 offers the strongest performance for sophisticated coding tasks but acknowledged that earlier versions were harder to use effectively. He said the macOS app is designed to make that model capability more accessible through a flexible, agent-oriented interface.

Benchmark Results Remain Close

Despite OpenAI’s confidence, available benchmarks suggest a more competitive landscape. GPT-5.2 currently leads TerminalBench, which measures command-line programming performance, but agents from Google’s Gemini 3 and Anthropic’s Claude Opus have posted comparable scores within the benchmark’s margin of error.

Results from SWE-bench, which evaluates how models fix real-world software bugs, also show no clear performance separation among leading models. Researchers have noted that agentic coding workflows are difficult to benchmark consistently, and user experience can vary widely depending on implementation.

Customization And User Experience

In addition to automation, the Codex app allows users to select different agent personalities, such as pragmatic or empathetic, to better match individual working styles. OpenAI said these options are designed to improve collaboration between developers and AI agents rather than alter the underlying technical output.

The company positions these features as steps toward parity with, or in some cases advantages over, competing Claude-based applications.

Speed As The Core Value Proposition

OpenAI’s central argument for Codex remains speed. Altman said the tool allows developers to move from a blank slate to a sophisticated software project in a matter of hours, with progress limited mainly by how quickly ideas can be articulated.

The macOS app marks OpenAI’s latest attempt to keep pace in a rapidly evolving market where AI-driven software development tools are advancing faster than traditional labs can easily track.

Featured image credits: Wikimedia Commons

