DMflow Document Q&A System User Guide
This document aims to provide a complete user guide for the document Q&A system, including file upload, syntax settings, history inquiry, and detailed instructions on each operation. It helps users quickly get started and effectively use the system.
Document Q&A Page
This page provides upload and retrieval functions for files, audio, and dialogues:
- Data Types:
- Files: Supports csv, json, html, docx, pdf formats, with a single file size limit of 5MB.
- Audio: This feature is not yet available.
- Dialogues: Uses csv format, with fields including type (Q or A), sentence (question or answer), and session_id (dialogue group identifier). Click “Dialogue” to download a template. Dialogues with the same session_id will be grouped together as much as possible.
-
Document Filtering: Provides a document filtering function to help users quickly find the needed documents.
- Rebuild Status (for sitemap/rss/atom): For documents sourced from sitemap, rss, or atom, the “rebuild status” function is available. Clicking it will make the system fetch updated content based on changed dates.
Document Q&A Upload
The system supports uploading various document formats for Q&A analysis. Here are the details for each format’s upload method and considerations:
-
PDF and DOCX: Directly upload the files. Image formats are not supported.
- CSV:
- Concept of “Group”: Here, “group” refers to how many rows are placed in the same data block (chunks). For example, setting it to 10 means every 10 rows will be in the same chunk for subsequent Q&A analysis.
- JSON:
- **Concept of “Group” is the same as CSV, used to control the size of data blocks.
- JSON Path Specification: You can use JSON Path to specify the JSON content to extract. This allows for more precise data selection for analysis.
- HTML and URLs:
- HTML: Directly upload HTML files.
- URLs: You can enter a URL for crawling.
- XML (sitemap/rss/atom): If the URL points to an XML format sitemap, rss, or atom file, check the corresponding option, and the system will automatically fetch the links within.
- Support for Dynamic URLs:
- Automatic Crawling (sitemap/rss/atom): Currently does not support automatic crawling of dynamic URLs.
- Single URL Upload: Supports uploading and crawling a single dynamic URL. This is because automatic crawling of dynamic URLs may take too long, exceeding the system’s runtime limit.
- API Upload: You can use the
docqa/upload
API for uploads, which allows setting a limit on the number of URLs to crawl per file, up to 50 URLs.
Important Notes for URL Uploads:
- Cloudflare WAF: If the target website has Cloudflare’s WAF (Web Application Firewall) enabled, you may need to temporarily disable it, as the system currently does not set the necessary header values to handle WAF verification.
- robots.txt: Ensure the target website’s
robots.txt
allows crawling; otherwise, the system will refuse to crawl the page.
Considerations
Before using the document Q&A function, please note the following:
Basic Operation Flow:
-
Link to Production Version: If you are creating a document Q&A for the first time, you must link it to the production version. This step is crucial; otherwise, subsequent operations will not function properly.
-
Select a Bot: After linking to the production version, you need to select the bot to use. Only the selected bot can access the document Q&A function for that domain.
History Records
This page provides your document Q&A usage records for the past 30 days, helping you track and analyze Q&A performance.
- Session ID List (1):
- This section shows all active Session IDs from the past 30 days.
- Important: Only Session IDs are shown here, not user identities. This is to protect user privacy.
- You can quickly browse records for different Sessions through this list.
- Tag Filtering (2):
- You can use tags to filter and quickly find Q&A records on specific topics or types.
- Charts: A chart (expected to be a pie chart) next to each tag shows the fallback rate for that tag. Fallback refers to situations where the system cannot find an answer in the provided documents and needs to respond in other ways. This chart helps evaluate document completeness and Q&A system performance.
- Select All: If you check “Select All,” the system will ignore Session IDs and display records based only on tags. This feature helps you analyze the performance of each tag from an overall perspective.
- Detailed Information Display (3):
- This section shows detailed information for each Session, including:
- Received Message: The question posed by the user.
- Response Message: The answer given by the system.
- Text Score: The matching score of the relevant text found in the documents. This score can be used as a reference for evaluating answer quality.
- This section shows detailed information for each Session, including:
Syntax Completion (Custom Intent Representation)
The “Syntax Completion” feature allows you to customize intent representation by setting questions (Question) and responses (Response), and adding additional information, to more precisely control the system’s behavior. This feature is mainly used to define response methods for specific scenarios.
Usage:
-
Question: Enter the text you want the system to match in this field. When the user’s input matches the text set here, the system will trigger the corresponding response.
- Response: Enter the content the system should reply with in this field.
- Important: If this field is empty, the system will not reply with any message but will still process “Additional Information.” This means you can trigger certain actions using only “Additional Information” without sending any text reply.
- Additional Information (Optional): Here, you can add extra information or instructions to expand the functionality of this intent. For example, you can trigger specific actions, display card-style messages, or send other types of data.
- Supported Types: The dropdown menu lists currently supported types of additional information, such as “Card.”
- Add Card: Click this button to add a card-style message, providing a richer way to present information.
Operational Logic:
When the user’s input matches the content in the “Question” field, the system will perform the following operations:
- If the “Response” field has content, the system will reply with that content and process “Additional Information.”
- If the “Response” field is empty, the system will not reply with any text but will only process “Additional Information.”
Session Input Matching (Within the Same Session):
Within the same Session, the system will automatically start session input matching. If the user’s subsequent input has a high similarity to previous input, the system will use the previous answer as the main response, and no AI points will be deducted.
Considerations:
- Impact on Context: Session input matching may affect the coherence of the context. Since the system prioritizes using previous answers, it may lead to slight deviations from the current context in some cases.
- Similarity Conditions for Different Languages: Different languages have different standards for judging similarity. This means the same similarity setting may have different effects across languages.
- Multiple Syntax Completions: If you need to set multiple syntax completions for different languages, please create multiple language settings. This is because different languages have different syntax and expressions, requiring separate settings for optimal results.
Examples:
- Scenario: When a user inputs “order pizza,” display a pizza ordering card.
- Settings:
- Question: Order pizza
- Response: (blank)
- Additional Information: Card (including pizza flavors, sizes, etc.)
- Result: When a user inputs “order pizza,” the system will not reply with any text but will display a pizza ordering card for the user to choose flavors and sizes.
(This is just an example, do not use the document Q&A for ordering pizza)
- Scenario: When a user inputs “what is dmflow.chat.”
- Settings:
- Question: What is dmflow.chat
- Response: dmflow.chat is me
- Result: When a user inputs “what is dmflow.chat,” the system will reply with “dmflow.chat is me.” If the user inputs “what is dmflow.chat” or a very similar sentence again within the same Session, the system will directly show “dmflow.chat is me” again. This feature applies to every call to the knowledge node’s general function, except for syntax completion, and no additional AI points are deducted if not passing through the LLM node.
Conclusion:
Through this user guide, users should fully understand the various functions and operation methods of the document Q&A system. The system provides diverse file upload methods, flexible syntax settings, and convenient history inquiry functions, helping users more effectively use document data for Q&A interactions. Please read the relevant considerations carefully before use to ensure the system operates normally.