LLM powered contract compliance application which uses Advanced RAG method Self-RAG and Knowledge Graph together for the first time.
It provides highest accuracy for contract compliance recorded so far for Oil and Gas Industry.
Size: 15.12 MB
Language: en
Added: Jul 02, 2024
Slides: 19 pages
Slide Content
SELF-RAG powered Contra Compliance application 30 th June 2024
Table of Contents 2 Fundamentals of LLM and RAG 03-06 Project Objective, Scope & Benefits 07 Data Understanding & Contract Rules 08-09 Methodology 10-11 License Area Findings 12-13 Data Confidentiality Findings 14-15 Accuracy and Findings 16 Next Steps and Q&A 17-18
LLMs predict next word of given sentence What are Large Language Models (LLM)? LLMs are Neural Networks Despite multiple complains the issue remains unresolved Predicted by LLM Given to LLMs LLMs learn from trillions of words LLMs store information within themselves
LLMs are Large Neural Networks
What is Retrieval Augmented Generation (RAG)? LLM RAG Adds Additional information over and above what is stored inside LLMs RAG Information Set 1 Information Set 2 LLM + RAG Information Set 1 + 2 LLM powered Output RAG = LLM + RAG powered Output User Prompt User Response Information Set 1 Prompt Retrieved Data Prompt + Retrieved Data Response * Information Set 2 Information Set 1
What is Self - RAG? Self-RAG Introduces Checks and Validations in RAG method RAG = LLM + RAG powered Output User Prompt Retrieved Data 1 Prompt + Retrieved Data 1 Response * Check if retrieval is needed Suppose Response * is selected as it’s most relevant Check if Response * useful Retrieved Data 2 Prompt + Retrieved Data 2 Response Response * Check which response is most relevant
Business Problem – Non-Complaint Contracts Fortune 500 firms doing B2B business has several thousand customer contracts; manually evaluating if they are complaint is humanly impossible Business Problem Self-RAG and LLM driven application to automatically identify of non-complaint supplier and vendor contracts. Proposed Solution Reviewing application built for Oil and Gas in this talk; the solution can be expanded to multiple industries *
Data Understanding ~2000 Oil and Gas scanned contracts in PDF format ~Oil and Gas Scanned Contracts in PDF format
Session’s Objective Rules and Regulations of Oil and Gas Contracts for Exploration and Extraction
Solution Design Contracts in Storage Contract in Text format Contracts Read by Python Converted Image to Text Prompt Submitted Contract Split in Parts Contract Split in parts Stored in Vector Database Vector Database Converted to embeddings A B User Response sent to User Converted to embeddings Contract in Image format A B Vector Embeddings Pipeline 1 LLAMA 2 One clause & Contract in Prompt Vector Embeddings Fetch information from database Fetch information from LLM Pipeline 2 Vector Database
Self-RAG Design Step 1 Step 2 Identify most relevant segment Segment 1 Segment 2 Segment 3 Is selected segment useful? Data B Retrieve again No User Yes Critic LLM Generator LLM User Prompt Yes Prompt + Data B* Data B Is retrieval required? Retrieve Data B** Retrieve Data B* Retrieve Data B*** Prompt + Data B* Prompt + Data B* Segment 1 Segment 2 Segment 3 No LLM response sent to user Send to user
Licensed Area in Contract Licensed Area comprises of one or more Blocks and its co-ordinates The geographical area for which the firm, Tangram Energy Ltd and Summit Exploration and Production Limited, has been given a license for oil and gas extraction and production is referred to as " Block 14/26b .“ The coordinates of this licensed area are defined by a series of latitude and longitude points, forming a polygon. The coordinates for Block 14/26b are as follows: 58°10'00.000"N 1°00'00.000"W 58°10'00.000"N 0°48'00.000"W 58°00'00.000"N 0°48'00.000"W 58°00'00.000"N 0°50'00.000"W 58°05'00.000"N 0°50'00.000"W 58°05'00.000"N 0°56'00.000"W 58°06'00.000"N 0°56'00.000"W 58°06'00.000"N 1°00'00.000"W 58°10'00.000"N 1°00'00.000"W These coordinates outline the boundaries of Block 14/26b, which is the licensed area for oil and gas activities as specified in the provided text. Co-ordinates - Gives precise geographical location Block – A portion within large area
Identify Licensed Area in Contract RAG more accurately captures License Area in Contracts as compared to prompts only Contract RAG method used? ChatGPT Output chrysaor-production-uk-limited-neo-energy-zex-limited-p2521-exploitation No License Area not captured in Output chrysaor-production-uk-limited-neo-energy-zex-limited-p2521-exploitation Yes All 4 License Areas captured with coordinates 5258-apache-north-sea-limited-p2529-exploitation-license-exploration-pdf Yes 2 out of 3 License Areas captured correctly Step 1 : Identify License Areas & their coordinates in Contract Step 2 : Identify if License Area fully present, partially present or not Present Contract RAG method used? ChatGPT Output chrysaor-production-uk-limited-neo-energy-zex-limited-p2521-exploitation Yes Gives appropriate answer along with additional information
Data Confidentiality Clause in Contract Data confidentiality is critical; data if leaked can result in economic and political consequences All records, returns, plans, maps, samples, accounts and information (in this clause referred to as "the specified data") which the Licensee is or may from time to time be required to furnish under the provisions of this license shall be supplied at the expense of the Licensee and shall not (except with the consent in writing of the Licensee which shall not be unreasonably withheld) be disclosed to any person not in the service or employment of the OGA or the Crown
Identify Data Confidentiality Clause in Contract RAG more accurately captures Data Confidentiality details in Contracts as compared to prompts only Contract RAG method used? ChatGPT Output 5263-ithaca-oil-and-gas-limited-p2534-exploitation-license-explorationpdf No Only captures that data confidentiality clause is present without details 5263-ithaca-oil-and-gas-limited-p2534-exploitation-license-explorationpdf Yes Captured details of data confidentiality correctly Step 1 : Identify Data Confidentiality Clause in Contract Step 2 : Identify if Data Confidentiality Clause is fully present, partially present or not Present Contract RAG method used? ChatGPT Output 5263-ithaca-oil-and-gas-limited-p2534-exploitation-license-explorationpdf Yes Gives correct answer
Accuracy and Findings Results 0.5% contracts with unclear License area co-ordinates Data confidentiality incomplete in 2% contracts 1.5% contracts without damage clause No commitment to local employment in 3% contracts No clause to protect natural resources in 4% contracts 89% Accuracy Accuracy
Next Steps Fine tune LLM with 500 contracts Evaluate performance improvement Secure the application