# Technical Practice Cases

# 1. Prompt Best Practices

# 1.1 Opening Formula (Define Role + State Problem + Set Goal + Supplement Requirements)

You are a time management trainer [Define Role]. I want to learn about time management [State Problem]. I want you to output a PPT outline about time management [Set Goal]. The format is to output code marked in Markdown language [Supplement Requirements].

# 1.2 Few-Shot Prompting

Sometimes it may not be easy to describe a Prompt clearly, and providing examples will make it easier.

For example: We ask the model to act as a text classifier to perform binary classification on user reviews, with classification results being: Positive Review or Negative Review.

Please help me distinguish whether the category of the user input text is a positive review or a negative review according to the following classification method, and please output directly: Positive Review / Negative Review.

Please refer to the following samples:
Sample 1:
User Input: I went to this restaurant last night, and their food and service were amazing. I will definitely come again.
Output: Positive Review
Sample 2:
User Input: This mobile phone has an ultra-long battery life and great camera effects, very satisfied!
Output: Positive Review
Sample 3:
User Input: The courier was delayed for a week, and the packaging was damaged, the experience was extremely poor.
Output: Negative Review
Sample 4:
User Input: The movie has a wonderful plot and the actors' acting is on point, highly recommended!
Output: Positive Review
Sample 5:
User Input: The product quality is terrible, it broke after one use, and the customer service ignored me.
Output: Negative Review

Please answer the following question:
User Input: I dined at this western restaurant last month, it was okay, but not particularly amazing.
Output:

# 1.3 CoT (Chain of Thought)

CoT (Chain of Thought) is a thinking tool that helps people conduct deeper thinking and draw more complex and comprehensive conclusions by gradually extending and expanding an idea.

Core Concepts:

  • The core idea of CoT is to guide the model to show its step-by-step reasoning process before generating the final answer.
  • It imitates the way humans solve complex problems: instead of jumping directly to conclusions, it derives answers through a series of intermediate, interpretable logical steps ("chains").
  • These steps usually include: understanding the problem, decomposing the problem, invoking relevant knowledge, performing logical operations or reasoning, integrating information, and drawing conclusions.

# 1.3.1 Zero-Shot

Add Let's think step by step or Let's analyze and think step by step to the prompt.

# 1.3.2 Few-Shots

Provide several examples with detailed reasoning steps in the Prompt. These examples show the complete process of "Question -> Step-by-Step Reasoning -> Final Answer". By observing these examples, the model learns to generate a similar reasoning chain when answering new questions. This is the most commonly used method and usually has better effects.

Please solve the following math problems. I will first give several examples and their detailed problem-solving ideas, then ask you to solve the final problem.

Example 1:
Question: Xiao Ming has 15 apples. He ate 3 of them, then bought 2 bags of apples, with 4 apples in each bag. How many apples does he have now in total?
Reasoning Steps:
1. Initial number of apples: 15.
2. Remaining after eating 3: 15 - 3 = 12.
3. Bought 2 bags, 4 apples each: 2 * 4 = 8.
4. Total number of apples now: remaining apples + newly bought apples = 12 + 8 = 20.
Answer: 20

Example 2:
Question: A swimming pool is 25 meters long, 10 meters wide, and 2 meters deep. If 5 cubic meters of water can be injected per minute, how many minutes does it take to fill the swimming pool?
Reasoning Steps:
1. Calculate the volume of the swimming pool: length * width * depth = 25 meters * 10 meters * 2 meters = 500 cubic meters.
2. Known water injection speed per minute: 5 cubic meters/minute.
3. Required time = total volume / water injection speed = 500 cubic meters / 5 cubic meters/minute = 100 minutes.
Answer: 100

Now, please solve this problem:
Question: A bookstore holds a promotion, all books are sold at a 20% discount. Xiao Li bought a book with an original price of 50 yuan and another book with an original price of 30 yuan. How much did she actually pay?
Reasoning Steps:

If you are interested in Prompt and want to learn more about Prompt knowledge, you can also learn from the following learning websites by yourself.

Name Website
Selected Chinese Prompts https://github.com/langgptai/wonderful-prompts (opens new window)
Popular Website System Prompts https://github.com/jujumilk3/leaked-system-prompts (opens new window)
Theoretical Learning - Prompt Engineering Guide https://www.promptingguide.ai/zh (opens new window)
Prompt Engineering Guide https://learnprompting.org/zh-Hans/docs/introduction (opens new window)

# 2. Development Practice of Photo-Based Q&A Agent

# 2.1 Case Introduction

With the continuous breakthroughs in large models and multimodal fusion technologies, the unified understanding and content generation of heterogeneous data such as text and images have been widely applied. AI applications such as photo-based Q&A and photo-based product searching have brought convenience to people.

This case mainly implements the photo-based Q&A function. When users encounter various unsolved problems in mathematics, English, etc., they only need to take a photo and upload it to the agent to get an accurate answer. This article will detail how to implement a multimodal agent application with photo-based Q&A function step by step.

# 2.2 Implementation Process

A complete agent product generally goes through four steps from development to application: creation, orchestration, debugging, and release, among which agent orchestration is the core of agent design and implementation.

The key points to realize the photo-based Q&A function lie in the recognition of image content and the generation of reasoning results by the large model. The overall design process of the workflow is as follows:

The specific implementation process will be explained step by step below.

# Step 1: Create a Workflow Agent

Click to enter the homepage of Xingchen Agent Development Platform (opens new window), click [My Agents] in the left navigation bar to enter the My Agents list page, click [New Agent] - [Create by Workflow] - [Custom Creation] to enter the agent canvas page. The agent canvas page has two mandatory nodes: Start and End by default.

# Step 2: Configure Support for Image Input

Click [+ Add] at the bottom left of the Start node to add an input variable, set the variable name to "image", and subsequent nodes will reference the network address of the image uploaded by the user through this variable name. Select "Image" as the variable type.

# Step 3: Implementation of Image-Text Recognition

The official provides a general OCR large model tool to support image and PDF recognition, which can be referenced through tool nodes. The operation process is as follows:

  1. Click the [+] sign in the upper right corner of the [Tool] node in the node list on the left side of the canvas to enter the tool selection page, search and find the [General OCR Large Model] tool from the official tools, click the [Add] button on the right side of the tool to add the tool to the canvas, and click to return to the canvas page.

  2. Connect the Start node and the General OCR Large Model node, and configure the input parameters of the General OCR Large Model node. The General OCR Large Model has three parameters:

Parameter Name Type Parameter Description Required
file_url string The address of the ocr file to be recognized. Currently supports images and PDFs Yes
ocr_document_page_start integer For document data, specify the starting page range for recognition, starting from 0, -1 means no limit No
ocr_document_page_end integer For document data, specify the ending page range for recognition, starting from 0, -1 means no limit No

For the input parameter "file_url", select to reference Start/image. For image data, ocr_document_page_start and ocr_document_page_end can be left blank by default.

# Step 4: Large Model Reasoning to Generate Answers

Connect a large model node after the General OCR Large Model node to reason and generate answers.

Configure two variables for the large model input parameters: input and image_content:

  • input: Reference the text information input by the user (Start/AGENT_USER_INPUT)
  • image_content: Reference the image OCR result (General OCR Large Model_1/data.content)

Select the model: Spark X1 for the answer mode;

The large model prompt configuration is as follows:

//System Prompt:
You are a teacher of all disciplines, please answer students' questions as required.

//User Prompt:
The picture of the question input by the user, after OCR, the result is: {{image_content}}. Please combine the user's requirements: {{input}}, and answer the question in a standard way that students can understand, and answer it standardizedly.

The large model node has 2 output variables:

REASONING_CONTENT: The thinking process of the large model;

output: The answer generated by the large model.

# Step 5: Output Configuration

Connect the large model node to the output node, and configure the End node as follows:

  • Answer Mode: By default, it is sufficient to return the answer configured in the set format.
  • Output definition references 2 variables:
    • output: Reference the large model output result (Large Model_1/output)
    • reasoning: Reference the thinking process of the large model (Large Model_1/REASONING_CONTENT)
  • Reference variable in thinking content: {{reasoning}}
  • Turn on streaming output for answer content, and reference variable for result: {{output}}

So far, the orchestration of the workflow is completed.

# Step 6: Debug the Workflow

Click the [Debug] button in the upper right corner to enter the debugging page, enter the description and the picture of the question, and click Send to test the result.

# Step 7: Release the Agent

After debugging the workflow effect and configuring the workflow name, description and other information. After the information is completed, click the release button in the upper right corner to select release.