Genere código Python “verificado” utilizando agentes conversables de AutoGen | de Shahzeb Naveed

IEstamos en abril de 2024 y han pasado aproximadamente 17 meses desde que utilizamos LLM como ChatGPT para ayudarnos en la generación de código y las tareas de depuración. Si bien ha agregado un gran nivel de productividad, de hecho hay ocasiones en las que el código generado está lleno de errores y nos hace tomar la buena ruta de StackOverflow.

En este artículo, daré una demostración rápida de cómo podemos abordar esta falta de “verificación” utilizando agentes conversables que ofrece AutoGen.

¿Qué es AutoGen?

“AutoGen es un marco que permite el desarrollo de aplicaciones LLM utilizando múltiples agentes que pueden conversar entre sí para resolver tareas”.

Presentación del solucionador de problemas LeetCode:

Comience instalando silenciosamente autogen:

!pip install pyautogen -q --progress-bar off

Estoy usando Google Colab, así que ingresé por OPENAI_API_KEY en la pestaña Secretos y lo cargué de forma segura junto con otros módulos:

import os
import csv
import autogen
from autogen import Cache
from google.colab import userdata
userdata.get('OPENAI_API_KEY')

Estoy usando gpt-3.5-turbo sólo porque es más barato que gpt4. Si puede permitirse una experimentación más costosa y/o está haciendo las cosas más “en serio”, obviamente debería utilizar un modelo más potente.

llm_config = {
"config_list": [{"model": "gpt-3.5-turbo", "api_key": userdata.get('OPENAI_API_KEY')}],
"cache_seed": 0,  # seed for reproducibility
"temperature": 0,  # temperature to control randomness
}

Ahora, copiaré el enunciado del problema de mi problema LeetCode favorito. dos suma. Es una de las preguntas más frecuentes en entrevistas estilo leetcode y cubre conceptos básicos como el almacenamiento en caché usando hashmaps y la manipulación básica de ecuaciones.

LEETCODE_QUESTION = """
Title: Two SumGiven an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. You may assume that each input would have exactly one solution, and you may not use the same element twice. You can return the answer in any order.
Example 1:
Input: nums = [2,7,11,15], target = 9
Output: [0,1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].
Example 2:
Input: nums = [3,2,4], target = 6
Output: [1,2]
Example 3:
Input: nums = [3,3], target = 6
Output: [0,1]
Constraints:
2 <= nums.length <= 104
-109 <= nums[i] <= 109
-109 <= target <= 109
Only one valid answer exists.
Follow-up: Can you come up with an algorithm that is less than O(n2) time complexity?
"""

Ahora podemos definir a nuestros dos agentes. Un agente actúa como agente “asistente” que sugiere la solución y el otro sirve como proxy para nosotros, el usuario, y también es responsable de ejecutar el código Python sugerido.

# create an AssistantAgent named "assistant"SYSTEM_MESSAGE = """You are a helpful AI assistant.
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
Additional requirements:
1. Within the code, add functionality to measure the total run-time of the algorithm in python function using "time" library.
2. Only when the user proxy agent confirms that the Python script ran successfully and the total run-time (printed on stdout console) is less than 50 ms, only then return a concluding message with the word "TERMINATE". Otherwise, repeat the above process with a more optimal solution if it exists.
"""
assistant = autogen.AssistantAgent(
name="assistant",
llm_config=llm_config,
system_message=SYSTEM_MESSAGE
)
# create a UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=4,
is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
code_execution_config={
"work_dir": "coding",
"use_docker": False,
},
)

puse el human_input_mode a “NUNCA” porque no planeo dar ninguna información por mí mismo y max_consecutive_auto_reply a 4 para limitar los cambios de ida y vuelta en la conversación. Se le ha indicado al agente asistente que responda con la palabra “TERMINAR” que le indica al UserProxyAgent cuándo concluir la conversación.

¡Ahora es el momento de la diversión! Iniciaremos la conversación enviando un mensaje desde nuestro UserProxyAgent a nuestro Asistente.

Un beneficio adicional de usar AutoGen (incluso para flujos de trabajo no agentes) es que proporciona capacidad de almacenamiento en caché explícita para ayudarlo a ahorrar costos de API durante el desarrollo. Aquí, estoy almacenando en caché las respuestas en el disco, pero también puedes integrar Redis para este propósito.

# Use DiskCache as cache
with Cache.disk(cache_seed=7) as cache:
# the assistant receives a message from the user_proxy, which contains the task description
chat_res = user_proxy.initiate_chat(
assistant,
message="""Solve the following leetcode problem and also comment on it's time and space complexity:\n\n""" + LEETCODE_QUESTION
)

Producción:

user_proxy (to assistant):Solve the following leetcode problem and also comment on it's time and space complexity:
Title: Two Sum
...
Follow-up: Can you come up with an algorithm that is less than O(n2) time complexity?
--------------------------------------------------------------------------------
assistant (to user_proxy):
To solve this problem efficiently, we can use a dictionary to store the elements we have seen so far along with their indices. While iterating through the array, we can check if the complement of the current element (target - current element) is already in the dictionary. If it is, we have found the pair that sums up to the target.
Here's the Python code to solve the Two Sum problem with a time complexity less than O(n^2):
```python
# filename: two_sum.py
def two_sum(nums, target):
num_dict = {}
for i, num in enumerate(nums):
complement = target - num
if complement in num_dict:
return [num_dict[complement], i]
num_dict[num] = i
# Test the function with the given examples
nums1 = [2, 7, 11, 15]
target1 = 9
print(two_sum(nums1, target1))  # Output: [0, 1]
nums2 = [3, 2, 4]
target2 = 6
print(two_sum(nums2, target2))  # Output: [1, 2]
nums3 = [3, 3]
target3 = 6
print(two_sum(nums3, target3))  # Output: [0, 1]
```
The time complexity of this solution is O(n) because we iterate through the list of numbers only once. The space complexity is also O(n) because we store elements in a dictionary.
You can run this code and check if it provides the correct output. Let me know the total run-time, and if it's less than 50 ms, you can reply with "TERMINATE". If it's not, I can suggest a more optimal solution.
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output: 
[0, 1]
[1, 2]
[0, 1]
--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran successfully and provided the correct output for the given examples. Since the total run-time is not provided, I will suggest measuring the total run-time of the algorithm using the "time" library to determine if it's less than 50 ms. 
You can add the functionality to measure the total run-time by importing the "time" library and using it to calculate the elapsed time before and after calling the `two_sum` function. Here's how you can modify the code:
```python
# filename: two_sum.py
import time
def two_sum(nums, target):
num_dict = {}
for i, num in enumerate(nums):
complement = target - num
if complement in num_dict:
return [num_dict[complement], i]
num_dict[num] = i
# Test the function with the given examples and measure the total run-time
nums1 = [2, 7, 11, 15]
target1 = 9
start_time = time.time()
print(two_sum(nums1, target1))  # Output: [0, 1]
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
nums2 = [3, 2, 4]
target2 = 6
start_time = time.time()
print(two_sum(nums2, target2))  # Output: [1, 2]
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
nums3 = [3, 3]
target3 = 6
start_time = time.time()
print(two_sum(nums3, target3))  # Output: [0, 1]
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
```
Please run this modified code and provide the total run-time for each example. If the total run-time is less than 50 ms for all examples, you can reply with "TERMINATE".
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output: 
[0, 1]
Total run-time: 0.01239776611328125 ms
[1, 2]
Total run-time: 0.00286102294921875 ms
[0, 1]
Total run-time: 0.0016689300537109375 ms
--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran successfully, and the total run-time for each example is significantly less than 50 ms. Therefore, you can reply with "TERMINATE". The algorithm provided an efficient solution with a time complexity of O(n) and a space complexity of O(n).
--------------------------------------------------------------------------------

Esto es lo que está pasando aquí:

UserProxyAgent le pide al Asistente que resuelva el problema según la descripción de la tarea.
El Asistente sugiere una solución con un bloque de Python
UserProxyAgent ejecuta el código Python.
El Asistente lee la salida de la consola y responde con una solución modificada (con funcionalidad de medición de tiempo. Honestamente, hubiera esperado esta solución modificada de inmediato, pero este comportamiento se puede ajustar mediante ingeniería rápida o empleando un LLM más potente).

Con AutoGen, también puede mostrar el costo del flujo de trabajo agente.

chat_res.cost


({'total_cost': 0,
'gpt-3.5-turbo-0125': {'cost': 0,
'prompt_tokens': 14578,
'completion_tokens': 3460,
'total_tokens': 18038}}

Observaciones finales:

Por lo tanto, al utilizar los agentes conversables de AutoGen:

Verificamos automáticamente que el código Python sugerido por el LLM realmente funciona.
Y creó un marco mediante el cual el LLM puede responder aún más a errores de sintaxis o lógicos leyendo el resultado en la consola.

¡Gracias por leer! ¡Sígueme y suscríbete para ser el primero cuando publique un nuevo artículo! 🙂

Mira mis otros artículos:

Genere código Python “verificado” utilizando agentes conversables de AutoGen | de Shahzeb Naveed | abril de 2024

ByEquipo de 7 minutos

By Equipo de 7 minutos

Related Post

Un tutorial de codificación para ejecutar PrismML Bonsai 1-Bit LLM en CUDA con GGUF, evaluación comparativa, Chat, JSON y RAG

NVIDIA lanza Ising: la primera familia de modelos abiertos de IA cuántica para sistemas híbridos cuánticos-clásicos

xAI lanza las API independientes de voz a texto y de texto a voz de Grok, dirigidas a desarrolladores de voz empresarial

You missed

¿Cerrar los ojos te ayuda a oír? Un estudio sorprendente tiene la respuesta: ScienceAlert

Completamente fuera de sí, Trump ni siquiera sabe dónde está JD Vance

Taquilla del fin de semana de apertura mundial de Bhooth Bangla: la comedia de terror de Akshay Kumar recauda 95 millones de rupias en 3 días en todo el mundo

Un tutorial de codificación para ejecutar PrismML Bonsai 1-Bit LLM en CUDA con GGUF, evaluación comparativa, Chat, JSON y RAG