Follow these best practices when writing, managing, and releasing prompts. Keep in mind the importance of experimentation and iterative development against your objective.

Also note that our best practices have been developed and tailored for the GPT3.5 and GPT4 series of large language models provided by OpenAI and Microsoft Azure. Prompt Engineering is an emerging field, with active research into the behavior of LLMs. These best practices are informed by the latest research and our own internal testing; however, our understanding of large language models is likely to evolve over time.

Key components

A well-written prompt includes these components:

  • [Role/identity description] - Describe the persona, voice, or perspective that you want the Generative AI model to adopt. Make sure to include any useful contextual information.

  • [Objective] Succinctly and precisely describe the objective that you want accomplished.

  • [Instructions] - List the instructions that the Generative AI model should follow in clear and exact language, following the best practices listed below. Make sure to include any desired output formatting as well as any desired output constraints.

  • [Example outputs] - Provide full examples to illustrate the output you desire.

Experiment with the order in which you position these components. Some research suggests that instructions at the beginning and end of your prompt are most salient. Keep in mind that—in KnowledgeAI use cases—the matched articles are appended to the end of the prompt. If the instructions aren't being followed, try placing them at the beginning of the prompt, and summarize them briefly at the end too.

Key concepts

The following key concepts inform our best practices.

Emotional prompting can improve performance

Some research has shown that using emotional language, like stressing the importance of a task, can improve performance. For this, try phrases like:

  • It's very important to me
  • You will be penalized
  • Users will be annoyed if you
  • I believe in your abilities!
  • Stay determined and keep moving forward!
  • Your hard work will be rewarded

Try both positive and negative sentiment as you iterate.

Few-shot prompting can improve performance

Incorporating examples is what's called few-shot prompting. And it can be more performant than natural language rules and descriptions alone. Even just one example of a user query and the desired response can have an impact. For example:

Briefly summarize the article according to the provided context. Follow your instructions.

--- Instructions ---
1. Tell a joke about the article
2. Omit the term article
3. Summarize the main points of the article.

Here is an example:
Article: An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica). Apple trees are cultivated worldwide and are the most widely grown species in the genus Malus. The tree originated in Central Asia, where its wild ancestor, Malus sieversii, is still found. Apples have been grown for thousands of years in Asia and Europe and were introduced to North America by European colonists. Apples have religious and mythological significance in many cultures, including Norse, Greek, and European Christian tradition.
Response: An apple is an edible fruit produced by an apple tree. They're cultivated globally, originating in Central Asia. They've been grown for thousands of years, and carry great cultural significance. And now for a joke: why don't apples ever get lonely? Because they hang out in bunches!

Chain of Thought (CoT) prompting can improve performance

As discussed in a paper by Google Research (Jason Wei et al., 2023), Chain of Thought (CoT) prompting can significantly improve output by priming the large language model with intermediate reasoning steps.

This can even be achieved in a zero-shot (i.e., no example) manner by including the phrase (or a phrase similar to) “Let’s think step by step.”

Here's an example with no chain of thought:

Question: John is carrying a ball in a cup. John is in the living room. John moves to the bathroom. John sets the cup upside down. Mary picks up the cup. Mary goes to the kitchen. Where is the ball?

In our research at LivePerson, we've found that this can yield the output: The ball is in the kitchen with Mary.

But here's a zero-shot example with chain of thought:

Question: John is carrying a ball in a cup. John is in the living room. John moves to the bathroom. John sets the cup upside down. Mary picks up the cup. Mary goes to the kitchen. Where is the ball?
Let's think step by step.

This can yield the output: The ball is in the bathroom. When John set the cup upside down in the bathroom, the ball would have fallen out of the cup and remained in the bathroom. Therefore, the ball is in the bathroom.

As demonstrated above, even without an example of intermediate reasoning, simply priming the LLM to think step by step can improve its performance on reasoning tasks.

If your use case is concrete enough, you can provide labeled examples yourself to build a multi-shot prompt. The aforementioned paper by Google Research (Jason Wei et al., 2023) offers examples. Note that this approach requires crafting labeled examples that are relevant to the reasoning of your specific task.

If you find that CoT prompting isn’t working for you, try experimenting with how you instruct the model to reason. For example, Hao Sun et al. were able to see improvements over the “think step by step” phrase by framing the problem as a panel discussion between experts. (Paper) (Github)

Negations can backfire

Instructing the LLM to not do something can have the effect of distracting the LLM, bringing attention to the behavior that you want to suppress, and actually achieving the opposite effect.

Try using “instead” statements that provide an alternative for the LLM to pay attention to. Or, try action or command statements that can replace the negative statement. Keep in mind both of these strategies when iterating, as different problems with the output may require different strategies:

Technique Example
Negation statement Do not refer to yourself as ChatGPT!
"Instead" statement Instead of referring to yourself as ChatGPT, refer to yourself LivePersonAI.
Positive action or command statement Always refer to yourself as LivePersonAI.

We should also say that it's highly unlikely that the instructions will always be followed. When attempting to suppress a behavior, it's an iterative trial-and-error game to reduce the undesired behavior.

Substance and language

  • Ground the prompt: In the context of writing a prompt for an LLM, "grounding" means providing clear and specific details and/or context in the request. It's like giving the model a firm foundation or reference point to understand what you're asking for. Instead, of Tell me about phones, which lacks grounding, say, Please write a short paragraph of 50 to 100 words about the iPhone 15, including its product specs, strengths, and weaknesses. This info grounds the request and helps the LLM generate a more relevant and accurate response.

  • Set exact expectations: Our Generative AI models are highly literal, so it's important to state exactly what you want. If you're looking for a certain format (bullet points, numbered lists, etc.), length (token length, word length, etc.), or type of response, specify that in your prompt. For instance, instead of writing Tell me about the conversation below, write Provide a concise summary between 20 and 100 words about the conversation below.

  • Include context up front: If your request depends on specific context like previous information or a specific scenario, place that at the beginning of your prompt. This can be done by preceding your main request with a brief introduction. For example:

    You are a customer service agent that works for XYZ business, an internet service provider. The user is experiencing an Internet outage. Your instructions are below:

  • Stay consistent: Avoid inconsistencies in your instructions. As said above, our Generative AI models interpret your words very literally. Even small inconsistencies can add confusion to the model. So, make sure all of your own words are precisely aligned to the goal you want to achieve. For instance, an instruction like Always respond with less than 100 words, followed later in the prompt with Always provide as much detail as possible, might be interpreted as contradictory requests.

    Also, avoid interchangeable terms. If there's a section in the prompt called Instructions, then earlier in the prompt use, Always follow the Instructions below, not Always follow the rules below.

    When using command terms, test different ones to see which ones perform best, for example, summarize versus shorten, classify versus label, and so on.

  • Break down complexity: If you have a multi-faceted set of instructions in your prompt, consider breaking them down into a numbered list or bullet points. This can help guide our Generative AI through the different facets you want addressed. For example, instead of Tell me about the benefits and risks of x, write:

    List three primary benefits of x.
    Or
    List three potential risks or challenges associated with x.

  • Guide the tone and style: If you have a preference for the tone or style of the answer, make it clear. Asking for a brief summary of the product specs will output a different response than asking for a professional and engaging summary of the product specs for young adults.

  • Include examples for clarification: This can improve the performance. See the discussion about the concept of few-shot prompting (including examples) farther above.

  • Avoid negations: This can backfire. See the discussion about this concept farther above.

  • Avoid ambiguity: Use simple and unambiguous language. Be straightforward with your instructions rather than overly polite. For instance, avoid polite phrasing such as:

    You should always follow each response with "Is there anything else I can help you with?"

    Instead, shorten and simplify to:

    Always follow each response with "Is there anything else I can help you with?"

    Moreover, try to anticipate interpretability. If a term can have multiple meanings, avoid it in favor of a more exact term, and/or try to contextualize it to avoid confusion. For example, Refer to the information below could refer to anything written below. Specify what you mean: Refer to the Knowledge Articles section below.

  • Avoid superfluous and redundant instructions: Keep in mind the task that you’re trying to accomplish with the large language model. If the use case doesn’t require too many tasks (i.e., all you need to do is summarize the knowledge articles/context), then adding too many instructions or rules, or redundant ones, can introduce noise to the prompt. Noise can increase the frequency of hallucinations and generally degrade the performance.

    Additionally, while our Generative AI models can handle lengthy prompts, brevity often leads to clarity. It's better to provide a succinct prompt than an overly verbose one. Lengthy prompts run the risk of added ambiguity or contradictory instructions, both of which will make output responses less reliable.

Punctuation

Proper punctuation helps structure your prompt and communicates your expectations to the model more effectively. It can reduce ambiguity and ensure that the model generates responses that align with your intent. However, keep in mind that excessive or incorrect punctuation might confuse the model, so strike a balance between clarity and simplicity.

  • Delimiters: As recommended in OpenAI's guide, use delimiters to demarcate sections of text to be treated differently.

  • End punctuation: Use appropriate end punctuation, such as periods and question marks, to signal the type of response you expect. For example:

    • Tell me about your baggage policies.
    • What are the main causes of flight cancellations?
  • Commas: Use commas to separate different parts of a prompt or to make the prompt easier to understand. For example:

    • In your response, discuss the benefits, drawbacks, and ethical considerations.
    • Compare and contrast the economic, social, and environmental impacts.
  • Quotation marks: Use quotation marks to indicate that you want the model to treat a specific portion of text as a quoted or verbatim statement. Quotation marks can also be used to refer to something in the prompt. Also, one common convention is to enclose longer strings in triple quotes. For example:

    Succinctly and accurately summarize the conversation in the "text" below:

    text:
    """
    {text input here}
    """

  • Colon: Use a colon to introduce a list, explanation, or elaboration. For example:

    • List three reasons to upgrade a modem:
  • Hyphens and dashes: Use hyphens and dashes appropriately for clarity. For example:

    • Discuss the pre- and post-upgrade differences in system performance.
    • Explain the differences between iPhones - iPhone 14 and iPhone 15 - and Android phones.

Capitalization

We're still working to understand the impact of capitalization. We know that capitalized text is tokenized differently by LLMs, so Word, word, and WORD are all tokenized differently. This means that they each have a different meaning, although we're still researching what those differences are.

Since LLMs are trained on natural language, and since use of all capital letters in natural language is often used to convey emphasis, our current hypothesis is that a word in all capital letters might have the same effect in a prompt: extra emphasis on that word.

References to matched articles

In prompts, when referring to the articles in the KnowledgeAI knowledge base that matched the consumer's query, refer to them as Knowledge Articles. This is what they're called behind the scenes in the system, so keep this terminology consistent.

See our tip for coaching the LLM to not use this term in its response.

Use of variables

Be judicious when using variables in prompts. Using lots of variables can cause some increased latency.

Size limit

The Prompt text field has a maximum limit of 5,000 characters. This is to comply with the token limits from our LLM providers. The character limit is enforced by the system.

The Prompt Text field within a prompt, with a callout noting that there's a 5,000 character limit

Prompt management

When creating and managing prompts, follow these best practices. They’ll help you to avoid impacting your Generative AI solution in ways that you don’t intend.

  • Name field: Establish, document, and socialize a naming convention to be used by all prompt creators. Consider referencing the environment (Dev, Prod, etc.). Consider including a version number (v1.0, v2.0, etc.).
  • Description field: Enter a meaningful description of the prompt’s purpose. If the prompt has been updated, describe the new changes. Also, list where the prompt is used in your solution, so you can readily identify the impact of making changes to the prompt. Consider identifying the prompt owner here. If the prompt is only used for testing, you might want to mention that. Generally speaking, include any info that you find useful.
  • Optional fields: At the moment, these contain metadata that doesn’t enhance the usability of the system. But we recommend specifying values, so they’re there for the future.
  • Minor changes: These include changes like typo fixes. It’s okay to edit the prompt directly.
  • Major changes: To make these, we recommend duplicating the prompt and testing the changes in the copy first. This avoids impacting your Production solution while you’re testing and verifying. Always test before changing your Production solution.
  • Duplicate feature: Take advantage of this feature. It lets you implement self-managed versioning: Duplicate Prompt A v1.0 to create Prompt A v2.0. Duplicate Prompt A 2.0 to create Prompt A v3.0. And so on. This kind of strategy has two very important benefits: 1) Your Production solution isn’t impacted as you work on new, independent copies of prompts. 2) By keeping versions distinct, it enables you to revert your solution to an earlier version of a prompt if needed.
  • Edit feature: This feature lets you make changes to a prompt. But for safety, we recommend using the duplicate feature for major changes. Always fully test any substantive changes, especially major ones.

Prompt testing

Even modest changes to prompts can produce very different results, so always test a prompt fully before using it in Production. Take advantage of the following tools:

  • KnowledgeAI’s testing tools: Use these to test the article matching, and to see enriched answers that are generated without any conversation context as input to the LLM. In the results, you can see the articles that were matched, the prompt sent to the LLM service, and the enriched answer that was returned. This tool can help you to tune the performance of the knowledge base. It also gives you some insight into how well the Generative AI piece of your solution is performing.
  • Conversation Builder’s Preview and Conversation Tester: Use either of these tools to fully test your Generative AI solution. Both tools give you a better view into performance because both pass previous turns from the current conversation to the LLM service, not just the matched articles and the prompt. This added context enhances the quality of the enriched answers. So these tools give you the most complete picture.

    Use the Conversation Tester to test the end-to-end flow. With Preview, the conversation only flows between the tool and the underlying bot server. With Conversation Tester, it goes through Conversational Cloud.

Using Generative AI in Conversation Assist? To fully test prompt changes and include the conversation context as input to the LLM, you’ll need to create a messaging bot in Conversation Builder. Configure it to match your Conversation Assist configuration, for example, use the same answer threshold. You can quickly create a messaging bot via a bot template. Use the Generative AI - Messaging bot template in specific.

Releasing prompt changes

  • Generative AI in Conversation Assist: First test via a Conversation Builder test bot. Then update the prompt configuration in Conversation Assist.
  • Generative AI in Conversation Builder bots: We recommend you take advantage of Conversation Builder’s Release feature to control and manage how prompt changes are made live in your Production bots. First make the updates in a Development or Sandbox bot and test. When you’re ready, push those changes to the Production bot.

External references and guides