OpenAI의 GPT 모범 사례: 개요. 더 나은 답변을 얻기 위한 6가지 프롬프트 작성 전략 (Six strategies for getting better results)

9bow · 8월 27, 2023, 7:56오전

OpenAI에는 GPT 모범 사례(GPT Best Practices)라는 문서를 통해 더 나은 답변을 얻기 위한 프롬프트 작성 방법을 설명하고 있습니다.

https://platform.openai.com/docs/guides/gpt-best-practices

이전에 TLDR 뉴스레터에서 ~~(콘텐츠가 부족했는지)~~ GPT 모범 사례를 소개한 것을 보고 번역을 시작했었는데,
문서가 길어서 하다말다 하다보니 벌써 2달이 훌쩍 넘어가 3달이 되어가고 있었네요

주말을 맞이하여 ~~(먼지를 좀 털고)~~ 정리해서 파트별로 하나씩 올려보겠습니다!

시작하기 전에 - GPT 모범 사례 문서 개요

GPT 모범 사례 문서는 6가지 전략(starategy)을 간략히 소개하고, 각 전략별 세부 전술들을 소개하고 있습니다.

이 게시글에서는 6가지 전략들을 간략히 언급하고, 이후 새로운 게시물들로 각 전략별 전술들을 소개하겠습니다.

GPT 모범 사례 / GPT best practices

이 문서에서는 GPT 모델로부터 더 나은 결과를 얻기 위한 전략과 전술을 공유합니다. 여기에 설명된 방법은 때때로 더 큰 효과를 위해 조합하여 사용할 수도 있습니다. 자신에게 가장 적합한 방법을 찾기 위해 다양한 실험을 해보시기 바랍니다.

This guide shares strategies and tactics for getting better results from GPTs. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.

여기에 설명된 예시 중 일부는 현재 가장 성능이 뛰어난 모델인 gpt-4에서만 작동합니다. 아직 gpt-4에 액세스할 수 없는 경우 대기 명단에 등록해보세요. 일반적으로 어떤 작업에서 GPT 모델이 실패하고 더 성능이 좋은 모델을 사용할 수 있는 경우, 더 성능이 좋은 모델로 다시 시도해 보는 것이 좋습니다.

Some of the examples demonstrated here currently work only with our most capable model, gpt-4. If you don't yet have access to gpt-4 consider joining the waitlist. In general, if you find that a GPT model fails at a task and a more capable model is available, it's often worth trying again with the more capable model.

더 나은 결과를 얻기 위한 6가지 전략 / Six strategies for getting better results

전략 1. 지침(Instruction)을 명확히 작성하세요 / Write clear instructions

GPT 모델은 여러분의 마음을 읽을 수 없습니다. 답변이 너무 길면 간단한 답변을 요청하세요. 답변이 너무 간단하다면 전문가 수준의 문장을 요청하세요. 형식이 마음에 들지 않으면 원하는 형식을 직접 보여주세요. GPT가 원하는 것을 추측하는 일이 적을수록 원하는 것을 얻을 가능성이 높아집니다.

GPTs can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less GPTs have to guess at what you want, the more likely you’ll get it.

세부 전술 / Tactics:

관련성 높은 답변을 얻기 위해 질문에 세부 정보 포함하기 / Include details in your query to get more relevant answers
모델에게 페르소나를 받아들이도록 요청하기 / Ask the model to adopt a persona
구분 기호를 사용하여 입력이 구분되도록 명확히 표시하기 / Use delimiters to clearly indicate distinct parts of the input
작업을 완료하는 데 필요한 단계들을 명시하기 / Specify the steps required to complete a task
예시를 함께 제공하기 / Provide examples
원하는 결과의 길이를 명시하기 / Specify the desired length of the output

전략 2. 참고 문헌 제공 / Provide reference text

GPT 모델은 종종 자신있게 가짜 답변을 만들어내기도 합니다. 이는 특히 어려운 주제나 인용, URL에 대한 질문을 할 때 그렇습니다. 시험 성적을 높이는데 노트 한 장이 도움이 되는 것처럼, 참고 문헌을 제공하면 GPT 모델이 잘못된 내용을 적게 말하게 하는데 도움이 됩니다.

GPTs can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to GPTs can help in answering with fewer fabrications.

세부 전술 / Tactics:

모델이 (제공한) 문헌을 참고하여 답하도록 지시하기 / Instruct the model to answer using a reference text
모델이 (제공한) 문헌를 인용하여 답하도록 지시하기 / Instruct the model to answer with citations from a reference text

전략 3. 복잡한 작업을 더 간단한 하위 작업들로 나누기 / Split complex tasks into simpler subtasks

소프트웨어 공학에서 복잡한 시스템을 일련의 모듈식 구성 요소로 분해하는 것이 좋은 관행인 것처럼, GPT 모델에게 시키는 작업도 마찬가지입니다. 복잡한 작업은 단순한 작업보다 오류율이 높은 경향이 있습니다. 또한 복잡한 작업은 종종 이전 작업의 출력을 사용하여 이후 작업의 입력을 구성하는 더 간단한 작업의 워크플로로 재정의할 수 있습니다.

Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to GPTs. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.

세부 전술 / Tactics:

인텐트(사용자의 의도) 분류를 사용하여 사용자 질의에 가장 관련성이 높은 지시문 식별하기 / Use intent classification to identify the most relevant instructions for a user query
매우 긴 대화가 필요한 대화 애플리케이션의 경우 이전 대화를 요약하거나 필터링하기 / For dialogue applications that require very long conversations, summarize or filter previous dialogue
긴 문서를 조각별로 요약하고, 다시 조각들을 모아서 전체 내용 요약하기 / Summarize long documents piecewise and construct a full summary recursively

전략 4. GPT 모델에게 "생각할" 시간 주기 / Give GPTs time to "think"

17에 28을 곱하라는 요청을 받으면 즉시 알 수는 없지만 시간이 지나면 해결할 수 있습니다. 마찬가지로, GPT 모델은 시간을 들여 답을 찾는 것보다 바로 답하려고 할 때 더 많은 추론 오류를 범합니다. 답하기 전에 일련의 추론 과정을 거치면 GPT 모델이 보다 안정적으로 정답을 추론하는 데 도움이 될 수 있습니다.

If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, GPTs make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a chain of reasoning before an answer can help GPTs reason their way toward correct answers more reliably.

세부 전술 / Tactics:

결론을 재촉하기 전에 모델 스스로 해결책을 찾도록 하기 / Instruct the model to work out its own solution before rushing to a conclusion
모델의 추론 과정을 숨기기 위해 내면의 독백 또는 일련의 질의문을 사용하기 / Use inner monologue or a sequence of queries to hide the model's reasoning process
모델에게 이전 단계에서 놓친 것이 있는지 물어보기 / Ask the model if it missed anything on previous passes

전략 5. 외부 도구 사용하기 / Use external tools

GPT 모델의 약점을 보완하기 위해 다른 도구의 결과를 사용할 수 있습니다. 예를 들어, 텍스트 검색 시스템을 사용하여 관련있는 문서에 대해서 GPT 모델에 알려줄 수 있습니다. 코드 실행 엔진은 GPT 모델이 계산을 하거나 코드를 실행하는데 도움을 줄 수 있습니다. GPT 모델보다 더 안정적이고 효율적으로 작업을 수행할 수 있는 다른 도구가 있다면, 그 작업을 대신 시키세요.

Compensate for the weaknesses of GPTs by feeding them the outputs of other tools. For example, a text retrieval system can tell GPTs about relevant documents. A code execution engine can help GPTs do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a GPT, offload it to get the best of both.

세부 전술 / Tactics:

임베딩 기반 검색을 사용하여 효율적인 지식 검색 구현하기 / Use embeddings-based search to implement efficient knowledge retrieval
코드를 실행하여 보다 정확한 계산을 수행하거나 외부 API 호출하기 / Use code execution to perform more accurate calculations or call external APIs

전략 6. 체계적으로 변경 사항 테스트하기 / Test changes systematically

성능을 측정할 수 있다면 성능 개선이 더 쉬워집니다. 프롬프트를 수정함으로써 몇몇 개별적인 예제에서는 성능이 향상되지만, 보다 대표성을 띄는 예제들에서는 전반적인 성능이 저하되는 경우가 있습니다. 따라서 변경사항이 전체적인 성능에 긍정적인지를 확인하려면 포괄적인 테스트셋('평가'라고도 함)을 정의해야 할 수 있습니다.

Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an "eval").

세부 전술 / Tactic:

골드-표준 답변을 참조하여 모델 결과 평가하기 / Evaluate model outputs with reference to gold-standard answers

세부 전술들 / Tactics

앞에서 언급한 각 전략들은 특정 전술로 구체화할 수 있습니다. 각각의 전술은 시도해 볼 수 있는 아이디어를 제공하기 위한 것입니다. 여기에 나온 전략들이 모든 상황들을 포괄하는 것이 아니니, 여기에 제시되지 않은 창의적인 아이디어도 자유롭게 시도해 보세요.

Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.

아래는 6개 전략 뒤에 있는 더 읽어볼만한 자료들입니다. 6가지 전략을 읽어보신 뒤 보시기를 권해드립니다.

기타 리소스 (Other resources)

더 많은 영감을 얻으려면 예제 코드와 다음과 같은 타사 리소스에 대한 링크가 포함된 OpenAI Cookbook을 참조하세요:

For more inspiration, visit the OpenAI Cookbook, which contains example code and also links to third-party resources such as:

Prompting libraries & tools

Prompting guides

Video courses

Papers on advanced prompting to improve reasoning