Knowledge Sources

Control ingestion quality before answers reach customers

Manage URL and document sources with crawl mode control, source health visibility, and retrieval diagnostics.

Website crawl modes

Choose between indexing a single page or crawling a full documentation section depending on your rollout risk and breadth requirements. Single-page mode gives you precise control over what enters the knowledge base, while deep crawl mode ingests entire doc trees in one operation. Both modes provide real-time status visibility so teams always know where indexing stands.

Single-page mode for controlled ingest

Add one URL at a time to carefully validate content quality before expanding coverage to broader sections.

Deep crawl mode for broad coverage

Crawl entire documentation sections automatically, capturing all linked pages in a single operation for fast initial setup.

Source-level status visibility while indexing

Monitor each source as it moves through pending, fetching, and completed states so you can catch issues immediately.

Document ingestion

Upload documents for parsing, chunking, and embedding so the AI assistant can retrieve knowledge from non-web assets like PDFs and internal guides. The ingestion pipeline automatically splits documents into optimally sized chunks and generates embeddings for semantic search. Each uploaded document is tracked as a document-type source with full lifecycle management.

Document upload endpoint and parser

Upload files through a dedicated interface that extracts text content and prepares it for knowledge base indexing.

Automatic chunking and embedding

Documents are split into retrieval-optimized chunks and embedded automatically, requiring no manual configuration.

Indexed as document-type sources

Uploaded documents are tracked with the same status, metrics, and controls as crawled web sources for unified management.

Operational diagnostics

Inspect source health and retrieval performance with detailed ranking diagnostics to improve answer grounding quality over time. Each source displays its current status, chunk counts, and retrieval metrics so teams can identify underperforming content. Recrawl controls with cooldown protection let you refresh stale sources without accidentally overwhelming the ingestion pipeline.

Status states: pending, fetched, failed

Every source shows its current lifecycle state so teams can quickly identify and resolve ingestion failures.

Recrawl with cooldown protection

Refresh outdated sources on demand while built-in cooldown timers prevent accidental rapid repeated crawls.

Retrieval diagnostics: total matches, best rank, average rank

See how often each source is retrieved and how well it ranks, so you can improve or replace low-performing content.

2. What it is

Knowledge Sources is the ingestion control layer for website URLs and documents. Every source is tracked with type, status, chunk count, retrieval count, and update timestamps.

3. When to use it

  • Launching initial documentation coverage.
  • Adding non-web docs that support teams rely on.
  • Investigating weak retrieval performance by source.

4. When not to use it

  • For exact policy wording where deterministic output is required.
  • For tiny single facts that are easier to maintain as snippets.
  • For assistant tone/personality controls (use Assistant settings instead).

5. Setup steps in app

  1. Open Knowledge Sources.
  2. Choose crawl mode: single for one URL or deep for full sections.
  3. Add URL or upload document.
  4. Monitor status transitions: pending, fetched, failed.
  5. Use recrawl for changed sources; respect cooldown messages when rate-limited.

6. Best use cases

  • Deep crawling product docs after each release.
  • Uploading process documents used by support teams.
  • Using diagnostics to identify low-performing sources for edits.

7. Common pitfalls and how to avoid them

  • Pitfall: crawling too broad too early. Avoid by piloting with single-page mode first.
  • Pitfall: stale content after doc changes. Avoid with recrawl routines tied to release cadence.
  • Pitfall: ignoring failed statuses. Avoid by reviewing error messages and source health weekly.

8. Success metrics to track

  • Share of sources in fetched state.
  • Retrieval match volume by source.
  • Average and best match rank trends in diagnostics.

Which option should you use?

Use this matrix to choose the right knowledge option for each support intent.

Intent Best option Why Link
Exact policy wording Custom Answers Deterministic response for high-risk phrasing. /custom-answers
Short operational facts Snippets Fast to publish and update. /snippets
Multi-step guides Knowledge Pages Better for long-form structure and context. /knowledge-pages
Synced external docs Knowledge Sources Best for URL crawl and document ingestion workflows. /knowledge-sources
Baseline product framing Product Profile Keeps answers aligned with core positioning. /product-profile
Operational constraints and fallback rules Assistant Contexts Adds scoped behavioral context without rewriting docs. /assistant-contexts

FAQ

10. Knowledge source FAQs

When should I use single-page versus deep crawl?

Use single-page for controlled experiments or sensitive pages. Use deep crawl for broad docs ingestion when structure is stable. Learn more

Can I refresh a source after docs change?

Yes. Recrawl is available per source, with cooldown controls to prevent rapid repeated runs. Learn more

How can I verify whether a source is actually being retrieved?

Source detail views include retrieval diagnostics such as match counts, rank quality, and recent retrieval events. Learn more

Launch a conversation layer your whole GTM team can use.

Unify grounded AI, web and WhatsApp coverage, human handoff, and conversation analytics without stitching together separate tools.