Scientists build crop-picking robot based on designs dreamt up by ChatGPT

DELFT, Netherlands — What does the perfect farmer look like to artificial intelligence? Researchers have created a practical, working robot based on a design developed by (AI). Specifically, a team in the Netherlands employed ChatGPT to assist with improving food production.

The research team from TU Delft and the Swiss technical university EPFL closely followed ChatGPT’s design suggestions, including choosing which crop to harvest and determining the most effective harvesting technique. The result of this collaboration is a tomato-harvesting robot that has the ability to gently pluck the fruit from the vine.

The researchers sought to investigate the potential levels of collaboration between humans and Large Language Models (LLMs) like ChatGPT. In their research, team members raised the question, “What are the greatest future challenges for humanity?”

“We wanted ChatGPT to design not just a robot, but one that is actually useful,” explains Cosimo Della Santina, an assistant professor from TU Delft, in a media release.

In their discussions with ChatGPT, they decided to tackle food supply as their challenge, and the idea of creating a tomato-harvesting robot was conceived. The researchers found AI’s design suggestions particularly valuable during the conceptual phase.

“ChatGPT extends the designer’s knowledge to other areas of expertise,” says Francesco Stella, a Ph.D. student from TU Delft, “For example, the chat robot taught us which crop would be most economically valuable to automate.”

large robot with long arm, on wheels, roams test enviornment filled with greenery and vegetables
The tomato picker robot designed with ChatGPT by researchers from TU Delft and EPFL moves through a testing environment. (credit: © Adrien Buttier / EPFL)

ChatGPT also offered useful suggestions during the implementation phase, advising on specific materials and mechanisms to be used, such as recommending silicone or rubber for the gripper to prevent crushing tomatoes and suggesting a Dynamixel motor as the best choice to operate the robot. The resulting creation is a robotic arm capable of harvesting tomatoes. However, Stella acknowledges a shift in their roles as engineers, with a greater focus on more technical tasks.

The research team plans to continue utilizing the tomato-harvesting robot in their robotics research. They also plan to further study LLMs to design new robots, focusing particularly on the autonomy of AIs in designing their own physical structures.

Crop-picking robot created by AI
A tomato picker robot designed by ChatGPT and researchers from TU Delft and EPFL in a field test together with a researcher. (CREDIT: © Adrien Buttier / EPFL)

“Ultimately an open question for the future of our field is how LLMs can be used to assist robot developers without limiting the creativity and innovation needed for robotics to rise to the challenges of the 21st century,” Stella concludes.

This study published in the journal Nature Machine Intelligence.

How does ChatGPT work?

According to ChatGPT itself, the program is a language model based on the GPT-4 architecture developed by OpenAI. It is designed to understand and generate human-like responses in a conversational context. The underlying technology, GPT-4, is an advanced iteration of the GPT series and improves upon its predecessors in terms of scale and performance. Here’s an overview of how ChatGPT works:

  1. Pre-training: ChatGPT is pre-trained on a large body of text data from diverse sources like books, articles, and websites. During this phase, the model learns the structure and patterns in human language, such as grammar, syntax, semantics, and even some factual information. However, it is essential to note that the knowledge acquired during pre-training is limited to the information available in the training data, which has a cutoff date.
  2. Fine-tuning: After the pre-training phase, ChatGPT is fine-tuned using a narrower dataset, typically containing conversations or dialogue samples. This dataset may be generated with the help of human reviewers following specific guidelines. The fine-tuning process helps the model learn to generate more contextually relevant and coherent responses in a conversational setting.
  3. Transformer architecture: ChatGPT is based on the transformer architecture, which allows it to efficiently process and generate text. It uses self-attention mechanisms to weigh the importance of words in a given context and to capture long-range dependencies in language. This architecture enables the model to understand and generate complex and contextually appropriate responses.
  4. Tokenization: When a user inputs text, ChatGPT first tokenizes the text into smaller units called tokens. These tokens can represent characters, words, or subwords, depending on the language and tokenization strategy used. The model processes these tokens in parallel, allowing it to generate context-aware responses quickly.
  5. Decoding: After processing the input tokens and generating a context vector, ChatGPT decodes the output by generating a sequence of tokens that form the response. This is typically done using a greedy search, beam search, or other decoding strategies to select the most likely next token based on the model’s predictions.
  6. Interactive conversation: ChatGPT maintains a conversation history to keep track of the context during a dialogue. This history is fed back into the model during each interaction, enabling it to generate contextually coherent responses.

It’s important to note that the AI program actually admits that it has limitations, such as generating incorrect or nonsensical answers, being sensitive to input phrasing, being excessively verbose, or not asking clarifying questions for ambiguous queries. OpenAI adds that it continually works on improving these aspects and refining the model to make it more effective and safer for the public to use.

South West News Service writer Dean Murray contributed to this report.

YouTube video