251020_Thien_Labseminar_Think-on-graph 2.0: Deep and faithful large language model reasoning with knowledge-guided retrieval augmented generation.pptx
thanhdowork
2 views
12 slides
Oct 20, 2025
Slide 1 of 12
1
2
3
4
5
6
7
8
9
10
11
12
About This Presentation
Think-on-graph 2.0: Deep and faithful large language model reasoning with knowledge-guided retrieval augmented generation
Size: 1.06 MB
Language: en
Added: Oct 20, 2025
Slides: 12 pages
Slide Content
Ma, et al. " Think-on-graph 2.0: Deep and faithful large language model reasoning with knowledge-guided retrieval augmented generation. " arXiv preprint arXiv:2407.10805 (2024). Thien Nguyen Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected]
1. Introduction RAG enhances LLMs by retrieving external knowledge to reduce hallucinations. ToG-2: A tight-coupling hybrid RAG framework that iteratively retrieves from unstructured texts and structured KGs. Core Idea : Alternates between KG-guided context retrieval and context-enhanced graph retrieval for deeper reasoning. Advantages : Deepens context retrieval via KG guidance. Enables precise graph retrieval using document contexts. Achieves faithful LLM reasoning through iterative collaboration. Training-free and plug-and-play for various LLMs.
2 . Problem Statement Challenges in Current RAG: Text-based RAG : Relies on semantic similarity; misses structured relationships (e.g., entities like "Global Financial Crisis" and "The 2008 Recession") . KG-based RAG : Structured but incomplete; lacks detailed contexts beyond ontology. Loose-coupling Hybrid RAG : Aggregates info but doesn't improve retrieval (e.g., fails on multi-step queries needing in-depth details). Motivation: Need for tight-coupling to enable human-like reasoning: Integrate fragmented info and structural links for complex tasks.
2 . Proposed Solution - ToG-2 Framework Core Paradigm: KG × Text (Tight-coupling Hybrid RAG) Starts with topic entities from question. Iterative Process: Graph Retrieval : Explore relations on KG to find candidate entities. Context Retrieval : Retrieve and rank texts from documents linked to entities. Pruning : Refine entities based on context relevance. Reasoning : Prompt LLM with triples and contexts; continue if insufficient. Benefits: In-depth Retrieval : KG guides deep text search; texts enable precise KG pruning. Faithful Reasoning : Heterogeneous knowledge reduces hallucinations. Efficiency : No training; works with any KG/documents; build KG from docs if needed.
2. Methodology Initialization: Extract/link entities from question using Entity Linking (e.g., Azure AI). Topic Prune: LLM selects starting entities (E0_topic). Initial Context: Retrieve top-K chunks via Dense Retrieval Models (DRM). Hybrid Knowledge Exploration (Iterative): Knowledge-Guided Graph Search: Relation Discovery & Prune: LLM scores relations (0-10); select top. Entity Discovery: Find connected entities. Knowledge-Guided Context Retrieval: Collect contexts from candidates; score with DRM (using triple sentences). Context-Based Entity Prune: Exponential decay scoring; select top-W for next iteration. Reasoning with Hybrid Knowledge: Prompt LLM with paths, contexts, clues. If sufficient: Generate answer; else: Summarize clues and continue (up to depth D).
2. Methodology Initialization Phase : Process: Entity Linking: Extract entities from question (q) using tools like Azure AI Entity Linking API. Example: Question "Who was the only other American high schooler besides Lukas Verzbicas to break four minutes in the mile?" → Entities: "Lukas Verzbicas", "mile". Topic Prune (TP): Prompt LLM to select relevant topic entities (E0_topic = {e1, e2, ..., eN}). LLM evaluates entities based on relevance to question. Initial Context Retrieval: Use Dense Retrieval Models (DRM, e.g., BGE(Bidirectional Gated Encoder)) to fetch top-K document chunks linked to E0_topic. Check if sufficient to answer; if not, proceed to exploration. Key Prompt Example: "Evaluate entities in question and select topic entities, scoring 0-10 with explanations."
2. Methodology Hybrid Knowledge Exploration - Graph Search Process (per iteration i) : 1. Relation Discovery For each topic entity (eᵢⱼ ∈ Eᵢ_topic) , find all relations in the KG. Example formula: Edge(eᵢⱼ) = { (relation, head/tail) } Example: “Craig Virgin” → Relations: “alma mater”, “place of birth”. 2. Relation Prune (RP) Prompt LLM to score relations (0–10); select top relations based on relevance to the question. Modes: Individual: per entity Combined: across all entities (more efficient) 3. Entity Discovery Identify new entities via selected relations. Example: “Craig Virgin” → “alma mater” → “Carl Sandburg High School”. Prompt Example: “Retrieve %s relations contributing to the question; rate 0–10 with explanations.”
2. Methodology Hybrid Knowledge Exploration - Context Retrieval Process (per iteration i): Entity-Guided Context Retrieval Collect document chunks for candidate entities (cᵢⱼ,ₘ) from graph search. Compute relevance score: sᵢⱼ,ₘ,𝓏 = DRM(q, [triple sentence : chunk]) Triple sentence: Convert KG triples to text (e.g., “Craig Virgin attended Carl Sandburg High School”). Select top- K chunks ( Ctxᵢ ) for reasoning. Context-Based Entity Prune Score entities based on context relevance: Score(cᵢⱼ,ₘ) = Σ (sₖ × wₖ × I(chunkₖ ∈ cᵢⱼ,ₘ)) , where wₖ = e^(–α·k) Select top- W entities for next iteration ( Eⁱ⁺¹_topic ). Example: Prune irrelevant entities (e.g., “Evan Jager”) based on low-scoring contexts; Keep high-relevance ones (e.g., “Carl Sandburg High School”).
2. Methodology Reasoning with Hybrid Knowledge Process: Entity-Guided Context Retrieval Collect document chunks for candidate entities (cᵢⱼ,ₘ) from graph search. Compute relevance score: sᵢⱼ,ₘ,𝓏 = DRM(q, [triple sentence : chunk]) Triple sentence: Convert KG triples to text (e.g., “Craig Virgin attended Carl Sandburg High School”). Select top- K chunks ( Ctxᵢ ) for reasoning. Context-Based Entity Prune Score entities based on context relevance: Score(cᵢⱼ,ₘ) = Σ (sₖ × wₖ × I(chunkₖ ∈ cᵢⱼ,ₘ)) , where wₖ = e^(–α·k) Select top- W entities for next iteration ( Eⁱ⁺¹_topic ). Example: Prune irrelevant entities (e.g., “Evan Jager”) based on low-scoring contexts; Keep high-relevance ones (e.g., “Carl Sandburg High School”).
8. Experiments and Results
8. Experiments and Results
9. Conclusions Summary: ToG-2 advances RAG by tightly integrating KG and texts for deeper, more faithful LLM reasoning on complex tasks. Impact: Elevates smaller LLMs; achieves SOTA without training; traceable and editable knowledge. Limitations: Knowledge Source Issues: Dependent on incomplete or inaccurate KGs and documents; performance degrades in sparse graphs. Retrieval Limitations: Generic dense retrievers (e.g., BGE) struggle with diverse query types, often retrieving irrelevant information without task-specific fine-tuning. LLM Behaviors: Misinterprets ambiguous questions, exhibits overcautious reasoning.