Mejore el rendimiento de sus aplicaciones de IA generativa con Prompt Optimization en Amazon Bedrock

La ingeniería rápida se refiere a la práctica de escribir instrucciones para obtener las respuestas deseadas de los modelos básicos (FM). Es posible que tenga que pasar meses experimentando e iterando sus indicaciones, siguiendo las mejores prácticas para cada modelo, para lograr el resultado deseado. Además, estas indicaciones son específicas de un modelo y tarea, y no se garantiza el rendimiento cuando se utilizan con un FM diferente. Este esfuerzo manual requerido para una ingeniería rápida puede ralentizar su capacidad para probar diferentes modelos.

Hoy nos complace anunciar la disponibilidad de Optimización rápida en Roca Amazónica. Con esta capacidad, ahora puede optimizar sus indicaciones para varios casos de uso con una sola llamada API o haciendo clic en un botón en la consola de Amazon Bedrock.

En esta publicación, analizamos cómo puede comenzar con esta nueva función utilizando un caso de uso de ejemplo, además de analizar algunos puntos de referencia de rendimiento.

Descripción general de la solución

Al momento de escribir este artículo, Prompt Optimization para Amazon Bedrock admite Prompt Optimization para los modelos Claude 3 Haiku, Claude 3 Sonnet, Claude 3 Opus y Claude-3.5-Sonnet de Anthropic, los modelos Llama 3 70B y Llama 3.1 70B de Meta, el modelo grande de Mistral y Modelo Titan Text Premier de Amazon. Las optimizaciones rápidas pueden generar mejoras significativas para las tareas de IA generativa. Se realizaron y analizan algunos ejemplos de puntos de referencia de desempeño para varias tareas.

En las siguientes secciones, demostramos cómo utilizar la función de optimización rápida. Para nuestro caso de uso, queremos optimizar un mensaje que analiza la transcripción de una llamada o chat y clasifica la siguiente mejor acción.

Utilice la optimización automática de mensajes

Para comenzar con esta función, complete los siguientes pasos:

En la consola de Amazon Bedrock, elija Gestión rápida en el panel de navegación.
Elegir Crear mensaje.
Ingrese un nombre y una descripción opcional para su mensaje, luego elija Crear.

Para Mensaje de usuarioingrese la plantilla de solicitud que desea optimizar.

Por ejemplo, queremos optimizar un mensaje que analiza la transcripción de una llamada o chat y clasifica la siguiente mejor acción como una de las siguientes:

Espere la opinión del cliente
Asignar agente
Escalar

La siguiente captura de pantalla muestra cómo se ve nuestro mensaje en el generador de mensajes.

En el Configuraciones panel, para Recurso de IA generativaelegir Modelos y elige tu modelo preferido. Para este ejemplo, utilizamos el Soneto Claude 3.5 de Anthropic.
Elegir Optimizar.

Aparece una ventana emergente que indica que su mensaje se está optimizando.

Cuando se complete la optimización, debería ver una vista lado a lado del mensaje original y optimizado para su caso de uso.

Agregue valores a sus variables de prueba (en este caso, transcript) y elige Correr.

Luego podrá ver el resultado del modelo en el formato deseado.

Como podemos ver en este ejemplo, el mensaje es más explícito, con instrucciones claras sobre cómo procesar la transcripción original proporcionada como una variable. Esto da como resultado la clasificación correcta, en el formato de salida requerido. Una vez que se ha optimizado un mensaje, se puede implementar en una aplicación creando una versión que genere una instantánea de su configuración. Se pueden almacenar varias versiones para permitir el cambio entre diferentes configuraciones de indicaciones de casos de uso. Ver pronta gestión para obtener más detalles sobre el control y la implementación de versiones rápidas.

Puntos de referencia de rendimiento

Ejecutamos la función de optimización rápida en varios conjuntos de datos de código abierto. Nos complace compartir las mejoras observadas en algunos casos de uso importantes y comunes con los que vemos que trabajan nuestros clientes:

Resumen (XSUM)
Continuación del diálogo basado en RAG (DSTC)
Llamada a función (GLAVE)

Para medir la mejora del rendimiento con respecto a las indicaciones de referencia, utilizamos ROUGE-2 F1 para el caso de uso de resumen, HELM-F1 para el caso de uso de continuación del diálogo y HELM-F1 y la coincidencia JSON para la llamada de funciones. Vimos una mejora de rendimiento del 18 % en el caso de uso de resumen, del 8 % en la finalización del diálogo y del 22 % en los puntos de referencia de llamadas a funciones. La siguiente tabla contiene los resultados detallados.

Caso de uso	Aviso original	Mensaje optimizado	Mejora del rendimiento
Resumen	`First, please read the article below.` `{context}` `Now, can you write me an extremely short abstract for it?`	`<task>` `Your task is to provide a concise 1-2 sentence summary of the given text that captures the main points or key information.` `</task><context>` `{context}` `</context><instructions>` `Please read the provided text carefully and thoroughly to understand its content. Then, generate a brief summary in your own words that is much shorter than the original text while still preserving the core ideas and essential details. The summary should be concise yet informative, capturing the essence of the text in just 1-2 sentences.` `</instructions><result_format>` `Summary: [WRITE YOUR 1-2 SENTENCE SUMMARY HERE]` `</result_format>`	18,04%
Continuación del diálogo	`Functions available:` `{available_functions}` `Examples of calling functions:` `Input:` Functions: [{"name": "calculate_area", "description": "Calculate the area of a shape", "parameters": {"type": "object", "properties": {"shape": {"type": "string", "description": "The type of shape (e.g. rectangle, triangle, circle)"}, "dimensions": {"type": "object", "properties": {"length": {"type": "number", "description": "The length of the shape"}, "width": {"type": "number", "description": "The width of the shape"}, "base": {"type": "number", "description": "The base of the shape"}, "height": {"type": "number", "description": "The height of the shape"}, "radius": {"type": "number", "description": "The radius of the shape"}}}}, "required": ["shape", "dimensions"]}}] `Conversation history: USER: Can you calculate the area of a rectangle with a length of 5 and width of 3?` `Output:` `{"name": "calculate_area", "arguments": {"shape": "rectangle", "dimensions": {"length": 5, "width": 3}}}Input:` `Functions: [{"name": "search_books", "description": "Search for books based on title or author", "parameters": {"type": "object", "properties": {"search_query": {"type": "string", "description": "The title or author to search for"}}, "required": ["search_query"]}}]` `Conversation history: USER: I am looking for books by J.K. Rowling. Can you help me find them?` `Output:` `{"name": "search_books", "arguments": {"search_query": "J.K. Rowling"}}Input:` `Functions: [{"name": "calculate_age", "description": "Calculate the age based on the birthdate", "parameters": {"type": "object", "properties": {"birthdate": {"type": "string", "format": "date", "description": "The birthdate"}}, "required": ["birthdate"]}}]` `Conversation history: USER: Hi, I was born on 1990-05-15. Can you tell me how old I am today?` `Output:` `{"name": "calculate_age", "arguments": {"birthdate": "1990-05-15"}}` `Current chat history:` `{conversation_history}` `Respond to the last message. Call a function if necessary.`	`Task: Respond to the user's message in the given conversation by calling appropriate functions if necessary.` `Instructions:` `1. Review the list of available functions:` `<available_functions>` `{available_functions}` `</available_functions>` `2. Study the examples of how to call these functions:` `<fewshot_examples>` `<example>` `H:` <context>Functions: [{"name": "calculate_area", "description": "Calculate the area of a shape", "parameters": {"type": "object", "properties": {"shape": {"type": "string", "description": "The type of shape (e.g. rectangle, triangle, circle)"}, "dimensions": {"type": "object", "properties": {"length": {"type": "number", "description": "The length of the shape"}, "width": {"type": "number", "description": "The width of the shape"}, "base": {"type": "number", "description": "The base of the shape"}, "height": {"type": "number", "description": "The height of the shape"}, "radius": {"type": "number", "description": "The radius of the shape"}}}}, "required": ["shape", "dimensions"]}}]</context> `<question>USER: Can you calculate the area of a rectangle with a length of 5 and width of 3?</question>` `A:` `<output>{"name": "calculate_area", "arguments": {"shape": "rectangle", "dimensions": {"length": 5, "width": 3}}}</output>` `</example>` `<example>` `H:` `<context>Functions: [{"name": "search_books", "description": "Search for books based on title or author", "parameters": {"type": "object", "properties": {"search_query": {"type": "string", "description": "The title or author to search for"}}, "required": ["search_query"]}}]</context>` `<question>USER: I am looking for books by J.K. Rowling. Can you help me find them?</question>` `A:` `<output>{"name": "search_books", "arguments": {"search_query": "J.K. Rowling"}}</output>` `</example>` `<example>` `H:` `<context>Functions: [{"name": "calculate_age", "description": "Calculate the age based on the birthdate", "parameters": {"type": "object", "properties": {"birthdate": {"type": "string", "format": "date", "description": "The birthdate"}}, "required": ["birthdate"]}}]</context>` `<question>USER: Hi, I was born on 1990-05-15. Can you tell me how old I am today?</question>` `A:` `<output>{"name": "calculate_age", "arguments": {"birthdate": "1990-05-15"}}</output>` `</example>` `</fewshot_examples>` `3. Carefully read the current conversation history:` `<conversation_history>` `{conversation_history}` `</conversation_history>` `4. Analyze the last message from the user and determine if any of the available functions need to be called to provide an appropriate response.` `5. If a function call is necessary, follow the format demonstrated in the examples to invoke the relevant function with the required arguments.` `6. If no function call is needed, provide a direct response to the user's message.` `7. Your response should be concise, relevant, and tailored to the specific context of the conversation.` `8. Enclose your final response in <response></response> tags, without any additional preamble or explanation.` `Provide your response immediately after these instructions, following the specified format.`	8,23%
Llamada de función	`## Retrieved knowledge snippets` `Domain: Restaurant` `Entity: THE COPPER KETTLE` `Review: My friend Mark took me to the copper kettle to celebrate my promotion. I decided to treat myself to Shepherds Pie. It was not as flavorful as I'd have liked and the consistency was just runny, but the servers were awesome and I enjoyed the view from the patio. I may come back to try the strawberries and cream come time for Wimbledon..` `Highlight: It was not as flavorful as I'd have liked and the consistency was just runny, but the servers were awesome and I enjoyed the view from the patio.` `Domain: Restaurant` `Entity: THE COPPER KETTLE` `Review: Last week, my colleagues and I visited THE COPPER KETTLE that serves British cuisine. We enjoyed a nice view from inside of the restaurant. The atmosphere was enjoyable and the restaurant was located in a nice area. However, the food was mediocre and was served in small portions.` `Highlight: We enjoyed a nice view from inside of the restaurant.` `## Dialog` `user: Howdy, I'm looking for a British restaurant for breakfast.` `agent: There are several British restaurants available. Would you prefer a moderate or expensive price range?` `user: Moderate price range please.` `agent: Five restaurants match your criteria. Four are in Centre area and one is in the West. Which area would you prefer?` `user: I would like the Center of town please.` `agent: How about The Copper Kettle?` `user: Do they offer a good view?` `agent:` `Given the above information from a retrieval augmented generation (RAG) system, respond with the appropriate answer to the question from the user.`	`<task_description>` `You are an advanced question-answering system that utilizes information from a retrieval augmented generation (RAG) system to provide accurate and relevant responses to user queries.` `</task_description><instructions>` `1. Carefully review the provided context information:` `<context>` `Domain: Restaurant` `Entity: THE COPPER KETTLE` `Review: My friend Mark took me to the copper kettle to celebrate my promotion. I decided to treat myself to Shepherds Pie. It was not as flavorful as I'd have liked and the consistency was just runny, but the servers were awesome and I enjoyed the view from the patio. I may come back to try the strawberries and cream come time for Wimbledon..` `Highlight: It was not as flavorful as I'd have liked and the consistency was just runny, but the servers were awesome and I enjoyed the view from the patio.Domain: Restaurant` `Entity: THE COPPER KETTLE` `Review: Last week, my colleagues and I visited THE COPPER KETTLE that serves British cuisine. We enjoyed a nice view from inside of the restaurant. The atmosphere was enjoyable and the restaurant was located in a nice area. However, the food was mediocre and was served in small portions.` `Highlight: We enjoyed a nice view from inside of the restaurant.` `</context>2. Analyze the user's question:` `<question>` `user: Howdy, I'm looking for a British restaurant for breakfast.agent: There are several British restaurants available. Would you prefer a moderate or expensive price range?user: Moderate price range please.agent: Five restaurants match your criteria. Four are in Centre area and one is in the West. Which area would you prefer?user: I would like the Center of town please.agent: How about The Copper Kettle?user: Do they offer a good view?` `agent:` `</question>` `3. Leverage the context information and your knowledge to generate a concise and accurate answer to the user's question.` `4. Ensure your response directly addresses the specific query while incorporating relevant details from the context.` `5. Provide your answer in a clear and easy-to-understand manner, without any unnecessary preamble or explanation.` `</instructions>` `<output_format>` `Answer: [Insert your concise answer here]` `</output_format>` `<example>` `Context:` `The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower. Constructed from 1887 to 1889 as the centerpiece of the 1889 World's Fair, it was initially criticized by some of France's leading artists and intellectuals for its design, but it has become a global cultural icon of France and one of the most recognizable structures in the world.` `Question: What is the Eiffel Tower?` `Answer: The Eiffel Tower is a wrought-iron lattice tower in Paris, France, named after its designer Gustave Eiffel, and constructed as the centerpiece of the 1889 World's Fair.` `</example>`	22,03%

Las mejoras consistentes en diferentes tareas resaltan la solidez y eficacia de Prompt Optimization para mejorar el rendimiento de las indicaciones para diversas tareas de procesamiento del lenguaje natural (NLP). Esto muestra que la optimización de avisos puede ahorrarle una cantidad considerable de tiempo y esfuerzo y, al mismo tiempo, lograr mejores resultados al probar modelos con avisos optimizados que implementan las mejores prácticas para cada modelo.

Conclusión

Optimización rápida en Amazon Bedrock le permite mejorar sin esfuerzo el rendimiento de su mensaje en una amplia gama de casos de uso con solo una llamada API o unos pocos clics en la consola de Amazon Bedrock. Las mejoras sustanciales demostradas en los puntos de referencia de código abierto para tareas como resumen, continuación de diálogos y llamadas de funciones subrayan la capacidad de esta nueva característica para agilizar significativamente el proceso de ingeniería rápida. La optimización rápida en Amazon Bedrock le permite probar fácilmente muchos modelos diferentes para su aplicación de IA generativa, siguiendo las mejores prácticas de ingeniería rápida para cada modelo. La reducción del esfuerzo manual acelerará enormemente el desarrollo de aplicaciones de IA generativa en su organización.

Le recomendamos que pruebe la optimización rápida con sus propios casos de uso y que se comunique con nosotros para recibir comentarios y colaboración.

Acerca de los autores

Shreyas Subramanian es un científico de datos principal y ayuda a los clientes mediante el uso de IA generativa y aprendizaje profundo para resolver sus desafíos comerciales mediante los servicios de AWS. Shreyas tiene experiencia en optimización a gran escala y ML y en el uso de ML y aprendizaje por refuerzo para acelerar las tareas de optimización.

Chris Pecora es científico de datos de IA generativa en Amazon Web Services. Le apasiona crear productos y soluciones innovadores y, al mismo tiempo, centrarse en la ciencia obsesionada con el cliente. Cuando no realiza experimentos y se mantiene al día con los últimos avances en IA generativa, le encanta pasar tiempo con sus hijos.

Zheng Yuan Shen es científico aplicado en Amazon Bedrock y se especializa en modelos fundamentales y modelado de aprendizaje automático para tareas complejas que incluyen el lenguaje natural y la comprensión de datos estructurados. Le apasiona aprovechar soluciones innovadoras de aprendizaje automático para mejorar productos o servicios, simplificando así la vida de los clientes a través de una combinación perfecta de ciencia e ingeniería. Fuera del trabajo, le gustan los deportes y la cocina.

Shipra Kanoria es gerente principal de productos en AWS. Le apasiona ayudar a los clientes a resolver sus problemas más complejos con el poder del aprendizaje automático y la inteligencia artificial. Antes de unirse a AWS, Shipra pasó más de 4 años en Amazon Alexa, donde lanzó muchas funciones relacionadas con la productividad en el asistente de voz de Alexa.

Mejore el rendimiento de sus aplicaciones de IA generativa con Prompt Optimization en Amazon Bedrock

ByEquipo de 7 minutos

Descripción general de la solución

Utilice la optimización automática de mensajes

Puntos de referencia de rendimiento

Conclusión

Acerca de los autores

By Equipo de 7 minutos

Related Post

Cómo crear flujos de trabajo de LLM rastreables y evaluados utilizando Promptflow, Prompty y OpenAI

El modelo NVIDIA Nemotron 3 Nano Omni ya está disponible en Amazon SageMaker JumpStart

OpenAI lanza filtro de privacidad: un modelo de redacción de PII de código abierto de 1,500 millones de parámetros con 50 millones de parámetros activos

You missed

El tráfico en la costa de Benalmádena cambia « Euro Weekly News

¿Cuándo hizo Jimmy Kimmel el comentario de ‘viuda’ sobre Melania Trump? – Vida en Hollywood

Los pájaros urbanos parecen tener más miedo de las mujeres que de los hombres, y los científicos no tienen idea de por qué

Un caso SCOTUS expone los peligros de dos doctrinas equivocadas de la Cuarta Enmienda