Devin’s personality with a lack of decisiveness examined deeply

In March, an artificial intelligence system called Devin, touted as “the world’s first AI programmer,” made a stunning entrance into the industry. Its impressive abilities quickly drew widespread attention from within and outside the field. Devin was reported to be capable of planning and executing complex engineering tasks involving thousands of decision points and could remember the relevant background knowledge associated with each step. Over time, it is also able to learn from and correct its own mistakes.

Recently, former LangChain employee Andrew Gao revealed new features of the upcoming Devin 2.0 version. A notable new feature is the addition of interactive mode, which allows Devin to navigate the web more effectively. This will be particularly useful when encountering potential barriers like image captchas. Although they admit the speed is somewhat slow, this feature is mature enough for basic clicking operations.

Furthermore, an improvement in the new version of Devin that has gained attention is that developers can now edit code directly in Web VSCode, resolving the previous issue of being unable to intervene and edit the code. It is also worth mentioning that the new version added a “Cookie” feature, allowing users to log into websites with their own accounts without disclosing passwords to Devin, similar to the mechanism used by PhantomBuster.

Andrew explained the improved features and capabilities of Devin through a practical example. He demonstrated how Devin successfully found the Wingstop restaurant on DoorDash, selected chicken wings, and properly handled various checkboxes.

In addition to being more adept at writing websites, Devin has now incorporated a “machine snapshot” feature that allows users to save Devin’s state, so that they can restart and return to the previous state even if the server is shut down. Devin has also integrated GitHub functionality to perform code commits.

However, it should be noted that Cognition, the company behind Devin, has not officially released these new features. In a recent founders’ interview, there was silence regarding the controversy over allegations of Devin’s deception.

The two most notable moments for Devin were its debut on March 13 and the “fraud” allegations it faced more than two weeks later. At the beginning of last month, a blogger named Carl, who claimed to have 35 years of software engineering experience, conducted a frame-by-frame analysis of Devin’s demo video and raised questions. Carl’s concerns focused on several aspects:

Devin is believed to solve any Upwork task in the video; however, the problem that had been asked to solve did not match the client’s explicit requirements (the client asked for a documentation, not code);

When fixing errors in the source code of a GitHub repository, Devin edited a file that did not exist in the actual repository, and the error it fixed would not occur in reality — akin to a mistake that humans could not make;

In the EC2 section, no coding was actually necessary because the repository’s README file already contained all instructions needed to complete the task; it could run with only a single adjustment, even on an older version of the repository.

Clients often clearly express their usage requirements and expectations for products; for instance, some clients specifically request guidance on how to run programs on EC2 servers, rather than just making some programming demands. A person named Devin, without fully understanding the README document, failed to realize that all that was needed was to run some pre-made Python scripts. The output results shown in the video seem to depict a complex task process, including detailed planning and numerous completed check boxes, but in reality, these tasks are meaningless and superfluous.

Devin’s code changes are not of good quality; for example, Devin wrote an inefficient file reading loop, without properly utilizing the standard library. While the video shows Devin completing tasks quickly, and the question poster himself completed the requested work in about 30 minutes, the timestamps in the chat logs indicate that the task actually took several hours and even extended into the next day. In addition, Devin performed some meaningless shell commands, such as “head -n 5 foo | tail -n 5“.

Regarding this situation, a commenter named Karl believes that Cognition Labs exaggerated Devin’s abilities and lied in video descriptions and tweets, causing confusion and misunderstanding among the audience. Karl suggests that people should not rashly repeat and spread internet rhetoric without proper investigation. He states, “Almost no artificial intelligence product can continue to meet the expectations of its hype a few weeks after launch.”

Despite the public’s strong desire for Cognition to respond to these doubts, so far, the team has not provided any explanation. In mid-April, we could only glean from Scott’s comments on Twitter: “Today’s Devin is still far from perfect. Devin indeed works hard, but errors, omissions, or difficulties may occur.”

On May 2, Scott Wu attended an interview that lasted less than 30 minutes. In the interview, Scott mentioned that the future will not see a reduction in engineering positions due to AI; on the contrary, demand for this role will increase. He explained that, first, with technological progress, engineering demands will similarly grow; secondly, Devin is not meant to decide what to do, and those who use Devin need to understand what to build, what problems to solve, etc., making the engineer’s work more focused and pure. Scott also emphasized Devin’s potential in Devops and developing environment setups, particularly when mentioning operations on databases and the initiation of Kubernetes—this was the first time Devin genuinely piqued their interest. Moreover, data analysis is also an excellent use case for Devin.

However, netizens were not impressed with Scott’s performance in the interview, feeling that he did not address the criticisms from the previous video substantively and instead kept evading. The interview did not increase any confidence in his company among the public. Some viewers even sarcastically commented: “An interviewee duped by cryptocurrency scammers was interviewed by a cryptocurrency scammer.”

Despite facing some criticism, there are netizens who strongly support the enterprise, “Seeing so many haters here is just insane. Scott has built a very excellent team and is developing a revolutionary product.” It is reported that the company currently has a professional team of over 35 people, although public information remains stagnant.

Cognition is founded by three visionary founders: CEO Scott Wu, CTO Steven Hao, and CPO Walden Yan. Scott Wu has been passionate about programming since childhood and is passionate about turning ideas into reality. His talent has been exhibited in various fields, such as his impressive quick and accurate response to Math Olympiad problems at the age of 14.

Steven Hao has rich experience in the AI field and worked as a top engineer at Scale AI, a company specializing in training artificial intelligence systems. Walden Yan dropped out of Harvard University and chose to remain tight-lipped to avoid affecting communication with his parents. The founders proudly claim that the team includes 10 winners of the International Olympiad in Informatics (IOI) gold medals.

This technically solid team has received recognition from the investment community, with Peter Thiel’s Founders Fund leading a $21 million Series A funding round for the company. According to Bloomberg, former Twitter executive Elad Gil also participated in the investment in Cognition AI.

Although the specific details of how Cognition achieves its great successes remain a mystery, Scott Wu says they have found a unique way to combine large-scale language models (LLMs) like OpenAI’s GPT-4 with reinforcement learning technology. However, the extent to which Cognition’s product Devin relies on existing large-language models remains unexplained by any official detailed public information.

Throughout the entire interview process, Scott Wu consistently refused to disclose too many details about Devin’s operation. The entire R&D team of Cognition is tight-lipped about the discussion of technical implementation, adding to the project’s mystique but also causing the outside world to retain skepticism, because “Talk is cheap, Show me your code” is a widely accepted view in the IT industry.

DevinDecidesWhat to DoPerson