Weekly reading: AI Agents

Luiz Felipe Mendes
3 min readJul 8, 2024

--

The image depicts a dark, futuristic setting with three men seated at computer stations, illuminated by the soft blue light of their screens. Each man is dressed in a dark suit and sunglasses, giving them a uniform, mysterious appearance. They are focused intently on their tasks, typing on keyboards in a room that suggests a high-tech, surveillance or command center environment. The overall atmosphere is tense and focused, evoking a sense of urgency and secrecy.
Image generated with the following prompt: “Create an image of many Matrix agents in front of computers programming”

The portuguese version of this content can be found here.

I decided to write a weekly short article about my readings.

This week I focused on a topic: AI Agents.

I still think the definition of an AI Agent is forming, but generally, I would say it is a computer program (often with an LLM in the flow) that can autonomously perform tasks in an environment to achieve certain goals. Anyway, it’s easier to understand this concept through an example.

Example

Today we can approach a chatGPT or Gemini and request a travel itinerary with a simple prompt like this:

Plan a 2-day itinerary. I’m going to Salt Spring Island in BC, Canada. I’ll arrive on Friday and leave on Sunday.

The LLMs will generate useful responses and will certainly help put together a real itinerary. However, with an AI agent, we could do something even more complete.

In this case, the agent could, for example, access a weather forecast API to check if it’s going to rain and suggest visiting a museum instead of a beach.

Additionally, it could access services like Booking.com to suggest the best hotel prices, taking into account the location of the attractions it recommended. In other words, an agent uses LLMs and also accesses other APIs and functions iteratively to generate something more complete and of better quality.

The image displays a presentation slide and a speaker (Andrew Ng). The slide, divided into two sections, contrasts workflows for LLM-based agents: a simple, linear “Non-agentic workflow” and a more complex “Agentic workflow” involving multiple steps like outlining, researching, and revising. On the right, a man dressed in a light blue shirt, is presenting these concepts, gesturing as he speaks into a headset microphone.

In this lecture, Andrew details what an agent workflow is and how it differs from a simple LLM prompt.

He also presents several interesting examples, such as computational vision agents that analyze a video, identify a shark, measure the distance to people, and generate a new video marking this information.

Another interesting and practical example is dividing the task of writing code between an agent that writes and another that generates tests. The agent that generates the tests runs the code and returns feedback so that the first agent can improve the code.

Andrew Ng comments at the end about the current limitations of agents and how to circumvent them. I highly recommend watching the full lecture.

If you prefer reading instead of watching a video, I recommend this article from MIT Technology Review: What are AI agents?

The image features a creative and surreal depiction of three identical figures against a solid blue background. Each figure is a woman dressed in a vintage-style gray dress, holding a notepad and a pencil. Their faces are obscured by a digital, circuit-like pattern in green on a blue background, symbolizing a blend of human and technological elements. The figures are positioned side by side, creating a repetitive, mirrored effect that emphasizes the theme of uniformity and technology.

The article begins by trying to define what an AI agent is and mentions that many of them are multimodal, capable of processing text, audio, and video. I found it interesting that the article addresses the issue that the concept of agents is not new; it has gained popularity due to LLMs that facilitate their creation, generating various possibilities that were previously very difficult to achieve.

The most interesting part of the article discusses the limitations of agents, especially those powered by LLMs:

- Inability for complete reasoning and need for human supervision
- Contextualization problems: AI agents have difficulties in fully understanding the context, which can lead to inappropriate responses or actions.
- Task tracking loss: After some time, AI agents may lose track of what they are working on, which can lead to failures in executing ongoing tasks.
- Context window limitations: AI systems have limited context windows, meaning they can only consider a limited amount of data at a time, affecting their ability to remember past interactions or handle longer interactions.

Conclusions

The goal of these posts is to leverage my studies and readings to share knowledge with others and also receive feedback to start the discussion on these topics. If you liked this format, please let me know!

#AI #LLM #aiAgent #MachineLearning #ArtificialIntelligence #innovation

--

--

Luiz Felipe Mendes

Co-founded Hekima, an AI company acqui-hired in 2020 by iFood. Head of Data Science at iFood — Recommendation and Search