Session 8 Specialized AI Associate Series: Fundamentals of Model Training

DianaGray10 11 views 31 slides Oct 29, 2025
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

🚀 Welcome to Session 8/ AI Associate Developer Series 2025!

In this session, we will discover how to train a data model in the UiPath Communications Mining platform. We will go through key concepts, including labels and entities and how they work. We will also provide an overview of the steps in...


Slide Content

Communications Mining
Fundamentals of Model Training

Centric Consulting
6x UiPath MVP
Proservartner
4x UiPath MVP
Tracy Dixon Srinivas Kadamati
Your Speakers

1. How it works: Deployment Overview
2. Labels, fields, metadata.
3. Define what model training is.
4. Generative Extraction
5. Overview of the model training process.
a) Taxonomy
b) Setup
c) Discover
d) Explore
e) Refine & Maintenance
6. Best Practices for labels and fields.
Agenda

Email Server
Chat Server
CRM
CMS
RPA
BPM
API
MIS
Pre-built connectors for
ingestion into historic
comms store. Proprietary
ML segmentation and
cleaning engine to clean
data.
Proprietary Deep
Learning Sentence
models extract semantics
for data efficient learning.
Proprietary
Unsupervised Learning
models identify common
intents and constantly
search for new ones.
Train bespoke
supervised models
efficiently in our
Proprietary Active
Learning engine and
interface.
Real time aggregate
statistics for meaning-
based Management
Information and Analytics.
Real-time model
validation and model
lifecycle
management.
INGEST &
STORE
PARSE &
COMPREHEND
UNSUPERVISED
LEARNING
DISCOVER &
TRAIN
REPORT &
ANALYZE
VALIDATE &
DEPLOY
RE-TRAIN & RE-PREDICT
Here’s an overview of the typical journey that your data goes on within the platform:
| How It Works: Deployment Overview
1 2 3 4 5 6
1 2 3
4 5 6
30

AFTER
Communications
auto triage for
insurance underwriters
•Underwriter receives
email request
•Underwriter opens
email, reviews
•Underwriter
determines message
•Underwriter forwards
to appropriateteam
•SME receives email
request
•SME opens and
reviews content
•SME responds to
customer
•SME updates
customer file
3 days to process
100%
human effort
3 hours to process
5%
human effort
95%
robot
Monitors shared inbox to understand
context, intent, and sentiment of
emails in real time
Prioritizes and routes request to
the relevant SME / triggers automation
Responds to customer and
updates customer file
Location: Europe
Customer type: Insurance
Solution: Communications Mining and UiPath Robots
Time taken for work to enter workflow
down from 2-3 days to 2 hours
£370k back to the business in year one
91,000 total hours saved in year
one
Before

Labels-concepts, themes, and intents. Ex -Change of address request
Fields-structured data points extracted from the text. Ex -Policy Numbers, dates, trade IDs
Metadata-additional structured information associated with each message.
Properties-User properties, Email Properties, Thread properties,Attachment properties
Message (formerly verbatim) -single unit of freeform text communication, such as an individual
email or customer feedback survey
Labels, Fields, and Metadata

Labels, Fields, and Metadata: General vs Extraction Fields

Labels, Fields, and Metadata Cont…

Comms Mining interprets the message...
Subject: Address change policy No. 1863325
Created: June 29, 8:33pm
From: Usama Ahmed
To: Dylan McDougall, Robert Smith
Hi Dylan
Further to our emailson 17th May, could you please
update the addresson the above policy to 20 W 34th St.,
New York, NY 10001, USA.
My policy is due to renew next week, and I need the
address updated before signing the PO.
Feel this is unnecessarily hard.
Thanks,
Usama
… and extracts relevant structured data
Policy No: 1863325
New Address: 20 W 34
th
St. New York, NY 10001, USA
Labels describe the concept or intent
Chaser
Admin Change > Change of Address
Tone > Frustrated
Feedback > Unresponsive
Entities capture data points
Date of Previous Contact: 17th May
| Example
Confidence
Pin this label
100%

What?
ModelTraininginvolvescreatingandtrainingasetoflabels(intents/concepts)andfields
(structureddatapoints)thatareappliedtoindividualmessageswithinthedataset.
How?
Thisneedsamodeltrainerwhoisveryfamiliarwiththedata.
ModelTrainers
Playsakeyroleinteachingcommunicationminingtounderstandyourdata,enabling
accurateinsights.
Model Training

Model Training

Theresultofmodeltrainingisamodelthatworksasexpectedandmeetsyour
businessgoals.Itssuccessdependsonhowaccuratethemodel'spredictionsare.
What Makes a ‘good’ Model ?

GenerativeExtraction(GenEx)isacutting-edgefeatureinUiPathCommunications
MiningthatusesGenerativeAItounderstandcomplexconnectionsbetweenmultiple
requestsandthedatapointsneededtoprocessthem.
HowGenExhelpingCommunicationsMining?
ByRecognizingrelationships
Byusinggenerativemodels
ByAutomatingmultirequests
Generative Extraction

Overview of the Model Training Process
•Therearesixstagesinthemodeltrainingprocessthat’llhelpyoutobuildyouownmodel.

Overview of the Model Training Process –Taxonomy
•CommunicationsMiningtaxonomyisacollectionofallthelabelsandfieldsappliedtothemessagesinadataset
•LabelTaxonomy:Alabelisasummaryofanintentorconceptexpressedwithinafield

Overview of the Model Training Process –What Kind of Labels
Make up a Taxonomy
•Yourlabeltaxonomyshouldcontainalltheconceptsandintentsyouwanttocaptureinthedatasettomeetyourspecificobjectives.
Typicalgroupsoflabelsthatyoumayincludeare:
•Structureofalabeltaxonomy:

Setup –Data Structure
•Dataisstructuredandstoredinahirerarchicalmanner.Itconsistsofthreemaincomponents:
Data sources: These are made up of:
∙These arecollections of raw unlabelled communications data of a similar type, e.g. all emails from a shared mailbox, or a
collection of NPS survey responses.
∙Individual data sources can beassociated with a maximum of 10 datasets.
Datasets: These are made up of:
∙1–20data sources(of similar type with similar intended purposes) and
∙The 'model' that you create whentraining the platformto understand the data in those sources.
Projects:
∙Apermissioned storage areawithin the platform
∙Each dataset and data source belongs to a specific project, which is designated when they are created.

Creating a New Source
1.Note:Requires the ‘Sources Admin’ permission in the
relevant project.
2.Navigate to the ‘Sources’ page via the Admin console
and click on 'New source’.
3.Select the relevantprojectand give your source anAPI
name, using hyphens instead of spaces (e.g. zendesk-
cs-chats) –theAPI name is unchangeableonce
created.
4.Use thetitleanddescriptionboxes to provide more
information about the source (not mandatory but
recommended, they are editable)
5.Define thesensitive properties, if any. Sensitive
properties will only be visible to users with the ‘View
sensitive data’ permission’.
6.Set the sourcelanguageand enabletranslation, if
required. Enabling translation requires the 'Create
translated sources' permission.
7.Click on ‘Create source’.

Preparing Your Dataset & Impacts of Data Quality
•TheDatasetshouldbeina.csvformat,belowarefewwherewecangetdataqualityissuesthatwillimpactthequalityofmodel
performance.

Discover Phase
•Discoverisafeatureintheplatformthatusesunsupervisedlearningtointerpretallofyourdataandgrouptogetherclusters
ofsimilarmessagesthatitbelievessharesimilarthemes,conceptsorintents.
∙It’s the veryfirst stepin the model training process.
∙Automatic unsupervised learning: It reads and interprets the data without any human training to automatically discover
clusters of similar messages and presents them to you in the platform.
∙Thebulk labelfunctionality is a helpful tool to quickly train the model.
∙Generative annotationcan be used to predict cluster suggestions with no training data
∙After a significant amount of training has been completed or an influx of new data, Discover will find new clusters so we
cancontinue finding interesting thingswithin your data.

How to Navigate through the Clusters

Guided vs Un-guided Training

Explore Phase
•ExploreisthecorephaseofthemodeltrainingandbuildsonthetrainingcompletedinDiscover.
•Supervisedlearning:ThestepsinExplorehelptosignificantlyimprovethemodel'soverallunderstandingofyourdata,by
buildingoutthetrainingdatathatitlearnsfrom.
•Keyobjective:TheExplorephaseprovideseachlabelandfieldinyourtaxonomywithenoughvariedandconsistenttraining
examplesforthemodeltoaccuratelyassesstheirperformance,andmakeaccuratepredictionsatscale.
•Generativeannotationcanbeusedtogeneratelabelpredictionswithnotrainingdata.
•Exploreisthecorephaseofmodeltraining.Itiswhereyouprovidethebulkofthetrainingexamplesthatyourmodelneedsandis
thereforewhereyouwillspendthemajorityoftheoveralltrainingtime.

Generative Annotation in Explore

Refine Phase
In this phase of model training, you understandhow your model is performingandrefine ituntil it performs as required to
meet your objectives.
Reviewing your model's performance is the first stage of the Refine phase of model training. This includesreviewing the
model rating,and each of theperformance factors.

Refine and Maintenance Phase
TheRefinephaseofthemodeltrainingprocessinvolvesgoingthroughthefollowingsteps
tohelpyoutrainahigh-performingmodel.
Review Model Rating-this step is about checking your Model Rating on the Validation page and seeing
where the platform thinks there may be performance issues with your model, as well as guidance on
how to address them.
Refine label performance-this step is about taking actions, recommended by the platform, to improve the
performance of your labels. These include using theCheck labelandMissed labeltraining modes,
which help you address potential inconsistencies in your labeling, as well as the Teach label mode.
Increase coverage-this step helps ensure that as much of your dataset as possible is covered by
meaningful label predictions.
Improve balance-this step helps ensure that your training data is a balanced representation of the dataset
as a whole. Improving the balance in the dataset helps to reduce labelling bias and increase the
reliability of predictions made.
Improve field performance-this step helps you improve the performance of your entities. This includes
using theCheck fieldandMissed fieldtraining modes, which help you address potential inconsistencies
in your labeling, as well as the Teach General Field mode.

Dashboard for Model Performance

Best Practices for Label and Fields
•Don’tsplitwords.
•Thehighlightedfieldshouldcovertheentirewordinquestion,notjustpartofit.
•Also,makesurenottoincludeadditionalspacesattheendofthefield.

Best Practices for Label and Fields
•Don’t partially review fields
Like labels it’s important not to partially reviewyour general fields and
extraction fields.
•General fieldsare reviewed at the paragraph level, not the entire
message level. This means when reviewing a paragraph for fields, we
must review all the fields in the paragraph.
•Extraction fieldsare reviewed at the message level, not just at the
paragraph level. This means when reviewing an entire message for
fields, we must review all the fields in the message.