Cognition has revealed its first fully autonomous software engineer powered by AI.
The company calls itself an “applied AI lab,” with a primary focus on reasoning. Cognition aims to build “AI teammates with capabilities far beyond today’s existing AI tools,” the company said in a recent blog post.
The AI start-up has created Devin to work alongside existing software engineers as a teammate, “ready to build alongside your or independently complete tasks.”
The company claims that Devin will help engineers focus on more creative tasks while aiding engineering teams to reach their full potential.
Cognition claims that with its “advances in long-term reasoning and planning,” the software is capable of organizing and accomplishing tasks that require much decision-making.
The software can supposedly recognize relevant context while carrying out tasks, learn new things, and revise mistakes.
Cognition has equipped Devin with a range of general developer tools such as the shell, code editors, and browser within a sandboxed computer environment.
According to Cognition, Devin is a tool for users to use in collaboration with their own work efforts.
Devin provides updates and reports on its progress, is open to feedback, and allows users to collaborate with the software on design choices where necessary.
In a video, Scott Wu, Cognition’s CEO, demonstrates Devin in action by inputting a natural language style prompt into the interface – similar to that of many large language models.
The demonstration shows Devin’s ability to generate a step by step plan outlining the methods for tackling the problem.
From there, Devin will craft the project and begin attacking each task using the general artillery that a human software engineer is equipped with.
Some features within the demo include Devin’s personal command line, its own code editor, and its own browser.
Devin encountered a problem during the demonstration, but used various methods to learn from the mistake and fix the problem.
There are various examples of Devin’s capabilities provided by Congition. A few examples includes its ability to create and deploy apps end to end, find and fix bugs in codebases, and its ability to train and refine its own AI model.
The software was tested against SWE-bench – a dataset that assesses a systems ability to solve real GitHub issues – and found that Devin could correctly solve 13.86% of issues end to end, which exceeds some of the popular large language models like Claude 2.
Your email address will not be published. Required fields are markedmarked