Best Practices for Creating AI Extraction Guidance

Best Practices for Creating AI Extraction Guidance

Notes
Product: Paige
Best for: Scan Managers, Administrators, Solutions Engineers
Industry: All
Summary: The article outlines best practices for crafting AI extraction guidance that guide Paige in accurately extracting and interpreting data from documents. By following these guidelines, users can enhance Paige’s flexibility, improve accuracy, and streamline document processing.
AI Extraction Guidance in Paige is used to interpret data and extract fields accurately. To make this process as effective as possible, it’s important to provide clear but flexible guidance—helping Paige understand the data without enforcing rigid rules.

Paige works best when given broad, natural-language instructions that allow for context and variation. Think of writing AI guidance the same way you’d explain something to a curious learner—clear, straightforward, and adaptable to different situations.


The goal isn’t to be overly strict or technical, but to guide Paige in making the best decisions based on the information it encounters. 

Below are some key best practices to help you create effective AI Extraction Guidance.

1. Provide Context – Explain the Real-World Use of the Data

Giving context helps Paige make more informed decisions by understanding why the data matters. When explaining, consider the following:

  • Purpose: What is the data used for? Is it essential for completing a task, ensuring accuracy, or verifying information?
  • Impact: How does using (or misusing) the data affect the process or outcome? For example, incorrect address data could delay shipments, while missing identity verification could lead to security issues.
  • Examples: Make it concrete—mention scenarios like mailing a letter, processing an order, verifying an account, or calculating a customer’s eligibility for a service.

Providing this context ensures Paige understands not just the data itself but its role in real-world applications.

Lack Context:

""Extract the address.""

Improved Example - Avoiding Delays

"Extract the shipping address accurately to ensure timely delivery. Missing or incorrect details, such as postal codes or apartment numbers, could cause shipping delays, increased costs, or package loss."

2. Use Plain, Natural Language

Write as if you were explaining the concept to someone unfamiliar with it. Avoid jargon or overly precise definitions.

Too Precise:

"This field must contain only a given name, no spaces, and must not exceed 20 characters."

Improved Example:

"This field is usually a person's first name. It might sometimes include a middle initial, but it shouldn't have their full name or last name."

3. Use Keywords and Contextual Cues Effectively

Paige relies on textual clues, including page numbers, to extract data accurately. Incorporating relevant keywords enhances precision by helping Paige recognize patterns, distinguish different data types, and locate information efficiently. Specifying clear keywords along with the page number increases confidence in extraction, reduces ambiguity, and improves overall results.

Too Vague:

"Extract the amount from the document."

Improved Example:

"Extract the total invoice amount from the document. Look for keywords such as 'Total Amount,' 'Amount Due,' or 'Grand Total' to identify the correct value."

4. Provide Examples—Even If They Aren’t Always True

Examples help Paige understand what the data typically looks like, even if there are exceptions.

No Guidance:

"Enter a valid phone number."

Improved Example:

"A phone number typically follows formats like ‘555-123-4567’ or ‘(555) 123-4567.’ It may also include a country code, such as ‘+1 555-123-4567’ or ‘+44 20 7946 0958.’ While formatting may vary, phone numbers generally consist of numeric digits and may include spaces, dashes, or parentheses. Avoid extracting non-numeric text or unrelated numbers."

5. Be Expansive Rather Than Restrictive

Being too strict can limit Paige’s ability to make smart judgment calls based on context. Instead of rigid rules, describe what the data usually looks like and how it is commonly used.

Overly Specific Address Format

"An address must always include a street number, street name, city, state, and zip code in that exact order."

Improved Example - Generalized Address Extraction

"Addresses usually include elements like a street number, street name, city, state, and postal code. However, some addresses might use different formats, such as PO Boxes or international styles."

6. Clarify What the Field is Not

It helps to describe what similar but incorrect data might look like, so Paige knows what to avoid.

No Clarification for Common Mistakes:

"Enter a valid invoice number. It must be exactly 10 characters long and contain only numbers."

Good Example:

"This field is the invoice number, typically following a format like ‘INV-2024-123’ or ‘1005678.’ Invoice numbers often contain numbers and sometimes letters, but they should not be confused with purchase order (PO) numbers or customer IDs."

7. Be Specific, But Leave Room for Flexibility

Paige thrives on flexibility. Unlike traditional AI models that require strict definitions, Paige works best when given guidance that allows for variation and context. Using words like “usually,” “typically,” and “often” helps Paige recognize patterns without being restricted by rigid rules. This allows it to make intelligent decisions even when data doesn’t fit a perfect mold.
This approach ensures that Paige understands the intent behind the data while still allowing for the natural inconsistencies that come with real-world documents.

Too Vague:

"Enter a valid business name."

Improved Example:

"This field is usually the name of a business, like ‘Joe’s Coffee Shop’ or ‘Acme Corp.’ Sometimes people enter their own names here by mistake, but that’s not what this field is for."

By following these best practices, you’ll create guidance that helps Paige make smart, flexible decisions—just like a human would.
    • Related Articles

    • Updating AI Extraction Guidance for a Field

      Product: Paige Industry: All Best for: Scan Manager and Administrators Summary: AI Guidance for Fields This article offers a step-by-step guide to enter or update AI Extraction Guidance for a field. AI Extraction Guidance provides instructions for ...
    • Creating a Report in Content Central

      In order to generate a Report in Content Central you must first log into Content Central with an Admin account to access the Administration section. 1. In the Administration section, go to the "Administration>Report Templates>" section, here you will ...
    • Creating Field Rules set to Last Page or Specific Page

      This is for Field Rules to populate Fields with index values extracted from the Last Page or Specific Page other than the 1st Page.  A Document Layout must be created prior. The page with the index value must be captured prior.  A. Create a new Field ...
    • Creating New Catalogs via the Catalog Manager: Utilizing the Catalog Wizard function

      In order to create new Catalogs using the Catalog Manager, you will need access to the Content Central server. Catalog Wizard will add the selected Folder and all its files and sub-folders to a ‘Default’ document type belonging to the New Catalog ...
    • Capture Data from different document pages in Capture Point

      In capture point there may come a time where you need to capture information off of the second, third, fourth or even last page of a document. In order to capture information from these pages, and not just page one follow these instructions:  This ...