Data Integration Basics: Merging & Joining Data

SafeSoftware 656 views 55 slides Jul 10, 2024
Slide 1
Slide 1 of 55
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55

About This Presentation

Are you tired of dealing with data trapped in silos? Join our upcoming webinar to learn how to efficiently merge and join disparate datasets, transforming your data integration capabilities. This webinar is designed to empower you with the knowledge and skills needed to efficiently integrate data fr...


Slide Content

Data Integration Basics:
Merging & Joining

Jovita Chan

Technical Support Specialist
Safe Software Inc.
Sara Mak

Technical Support Specialist
Safe Software Inc.

Welcome to Livestorm.
A few ways to engage with us during the webinar:


Audio issues? Click this for 4 simple
troubleshooting steps.

Data silos are often a result of
multiple data formats.
Just because you have different kinds of data,
doesn’t mean they can’t be brought together.

Poll:
What are your biggest
challenge when it comes to
merging and joining data?

Merging and joining disparate
datasets can be streamlined!
FME transformers can do the work for you.

But there are
many paths
you can take
in FME.

Agenda
1Introduction to Merging & Joining
FeatureJoiner & FeatureMerger
DatabaseJoiner
2Merging Attributes of Multiple Datasets
InlineQuerier
SQLExecutor & SQLCreator
3Resources & Next Steps
4Q&A


Agenda

Flash Demo

One platform, two technologies
FME Form FME Flow
Data Movement and transformations
(“ETL”) workflows are built here.
Brings life to FME Form workflows
FME Flow Hosted
Safe Software managed FME Flow
fme.safe.com/platform
FME Enterprise Integration Platform
Safe & FME

Number of supported data types in FME
1995 2000 2005 2010 2015 2020 2023…
10 100 300 500
GIS
CAD
Database
XML
Raster
3D
BIM
Web
Point
Cloud
Cloud
Big
Data
IOT
Gaming
BI
Indoor
Mapping
AR/VR
Generative
AI
Cloud
Native
Tabular
Merge and Join ANY
Type of Data

BIM and Facilities Management Integration
IBM Shared Parameters:
●IBM.Name
●IBM.Description
●...

Tririga
BIM
Integrator
ArcGIS
Indoors

Introduction
to Merging &
Joining

Merging or Joining Data
in FME
When data is merged, the attributes from
each dataset are combined together
based on a common identifier and then
output as a single fused dataset.

This is different from appending data.

Terminology
Append/Union




Merge/Join




Easy Peasy in FME! Todayʼs Focus
Combine Vertically
Combine Horizontally
Append
Merge

Terminology
Append/Union




Merge/Join




Easy Peasy in FME! Todayʼs Focus
Combine Vertically
Combine Horizontally
Append
Merge

Selecting a
Transformer
Here are some of our
favourites joining data…

The Two Categories: SQL and SQL-Free

SQL Transformers for
Joining & Merging
Require knowledge of SQL.
Includes:
●InlineQuerier
●SQLCreator
●SQLExecutor
SQL-Free Transformers for
Joining & Merging
Same functionality of SQL
transformers. Can be set up easily
without any database knowledge.
Includes:
●FeatureJoiner
●FeatureMerger
●FeatureReader
●DatabaseJoiner

Follow the Flow Chart
1) Is all of my data already inside the workspace?
●If yes, follow the INTERNAL path
●If no, follow the EXTERNAL path

Inspect your data and ask yourself:
FME Community: Merging or Joining Spreadsheet or Database Data

Follow the Flow Chart
1) Is all of my data already inside the workspace?
●If yes, follow the INTERNAL path
●If no, follow the EXTERNAL path

Inspect your data and ask yourself:
2) Does the FeatureJoiner for internal or DatabaseJoiner
for external work for my data?
●If yes, Great! End here.
●If no, continue to question 3


FME Community: Merging or Joining Spreadsheet or Database Data

Follow the Flow Chart
1) Is all of my data already inside the workspace?
●If yes, follow the INTERNAL path
●If no, follow the EXTERNAL path

Inspect your data and ask yourself:
2) Does the FeatureJoiner for internal or DatabaseJoiner
for external work for my data?
●If yes, Great! End here.
●If no, continue to question 3


3) Do I know or want to use SQL?
●If yes, see the transformers in the green SQL box
●If no, see the transformers in the blue box
FME Community: Merging or Joining Spreadsheet or Database Data

Demo
FeatureJoiner &
FeatureMerger

FeatureJoiner
Tutorial Documentation

FeatureMerger
Results in a single match by default

For 1:M, generate a list of matches for
each Requestor by enabling:

●Process Duplicate Suppliers
●Generate List
Tutorial: Working with List Attributes

Tutorial Documentation

FeatureJoiner
●Simpler, and performs
faster than FeatureMerger
●SQL-like terminology
●Handles cardinalities easier than the
FeatureMerger (ie. 1:M, M:N and
M:1)
●Supports multiple matches per
source (by default, 1:M results in a
record for each match)
FeatureMerger
●Handling 1:M joins is a bit more work
●Default: Single match is output for
1:M relationship
●Many Suppliers can be merged on to
a single Requestor.
●Generate lists of matches
●Can create geometries!

Demo
DatabaseJoiner

Midstream Join with DatabaseJoiner


Provides the ability to form a join against a database (incl. CSV & Excel) with an existing dataset in the
workspace.
Read in database formats, Excel or CSV
Matching and accumulation options
Let the database do the work by pre-filtering a
subset of the table you’re reading into the
workspace
Tutorial Documentation

DatabaseJoiner
●Non-Blocking - Joins data
as each row flows by
●Connect to and join to database tables
midstream with one easy transformer!
●Can also connect to Excel and CSV
●Select which attributes you want to join
●Explicit cardinality parameters
●Good for quick lookups - caches first
5000 records

FeatureMerger/Joiner
●Blocking - Wait until all data has been
read before doing processing
●All FME readers can be used
○No need for anything to be in a
database
●Cardinality options vary

DatabaseJoiner
●Non-Blocking - Joins data
as each row flows by
●Data is external to the workspace and can
be changed without editing the
workspace
●Can merge in multiple attributes
●Explicit cardinality parameters
●Best if the join data is larger or changing
AttributeValueMapper
●Non-Blocking - Joins data
as each row flows by
●No external dependencies
○Data to lookup is stored in the
workspace
○Can be imported at design time
from an external source
●Only a single attribute can be added
●1:1 lookups only
●Very fast
●Best for small, simple, stable mappings

Midstream Join with FeatureReader


●Simpler midstream option
●Join against any database or
spatial format
●Typically used for spatial joins
(more about spatial joins in
Part II)
●Also used to perform tabular
joins with database tables

Midstream Join with FeatureReader


1.Set the WHERE Clause to:
“<Incoming Attribute>” = ʻ@Value(<Initiator Attribute>)ʼ

2.Set Accumulation Mode to Merge Initiator and Result
Documentation

DatabaseJoiner
●Connect to and join to database tables
midstream with one easy transformer!
●Can also connect to Excel and CSV
●Select which attributes you want to join
●Explicit cardinality parameters
●Very efficient, especially when using a
primed cache
○Automatically done for File-based
joins
FeatureReader
●Can be done midstream which aids in
adding and joining data quickly
●Limited options: Inner Joins only using
WHERE clause
●Does lots of work per input feature
○Only efficient when 1 input results in
large numbers of output

Merging
Attributes of
Multiple
Datasets

InlineQuerier
●Data already loaded, any data source can
be used
●Uses SQL
●Join data in a temporary SQLite database
●Multiple queries in a single transformer
●Great if you ?????? SQL
FeatureJoiner
●Data already loaded, any data source can
be used
●Uses SQL-Free joins
●Easy, approachable
●Great performance

InlineQuerier

Move over FeatureJoiner:
●Use any data source
●Utilizes the power of SQL
●Multiple queries
●Join data in a temporary
SQLite database
●Includes spatial

SQLite Tutorial - An Easy Way to Master SQLite Fast

Demo
InlineQuerier

Tips for InlineQuerier


●Loads a temporary SQLite database - so consolidate all your queries if you can
○Takes some time to load
○Fast once it’s loaded
●Don’t use it for simple tasks, i.e. Filtering

●Use smart configuration to improve
performance
○Only define input columns needed
in the queries

Jovita

InlineQuerier
●Any data source can be
used
●Utilizes the power of
SQL
●Join data in a temporary
SQLite database
●Multiple queries

SQLExecutor
●Used to execute SQL
against a database
●Requires an incoming
feature to trigger the SQL
statement

SQLCreator
●Used to execute SQL
against a database
●No incoming feature
required

SQLExecutor & SQLCreator

Move over InlineQuerier
●Needs a database
●Utilizes the power of SQL
●Includes spatial
●You can use any SQL -
○Select data
○Create indices
○Drop tables

You don’t have so just read data


Let Your Database Do The Work

Demo
SQLExecutor

Tips for SQLExecutor or SQLCreator


●Let your database do the work
○You only load the data you need
●Needs all of your data is in a single database
●SQL - any level of SQL complexity

●Expose attributes if you need them in the workbench

Automating
with FME Flow

Automating
Automate any of your workflows
with apps on FME Flow.



Resources:

Getting Started with FME Flow

Getting Started with FME Flow
Workspace Apps

Getting Started with Gallery Apps

Webinar - Creating No Code Web Apps
with FME Flow

Conclusion

By choosing the right
Transformer, you can merge
and join data more
efficiently.

Summary
Data used in the demos are from the City of Surrey Open Data Site

●FeatureJoiner or InlineQuerier for disparate datasets
●SQLExecutor or SQLCreator if working with databases
●SQL options for SQL experts
●DatabaseJoiner for lookup tables
●Use what you’re comfortable with!

Resources

Resources
●[Knowledge Base] Merging or Joining Spreadsheet or Database Data
●FeatureJoiner Documentation: Choosing a Feature Joining Method section
●[Article] Using the InlineQuerier as a replacement for multiple FeatureMergers
●[Webinar] Strategies for Detecting and Handling Schema Drift
●More at support.safe.com!

Get our Ebook
Spatial Data for the
Enterprise

fme.ly/gzc


Guided learning
experiences at your
fingertips
academy.safe.com


FME Academy
Resources
Check out how-to’s &
demos in the knowledge
base
support.safe.com
Knowledge Base Webinars
Upcoming &
on-demand webinars

safe.com/webinars

Check out
our podcasts
on-demand.
featuring special guest
speakers over at EM360

Resources

Next Steps

We’d love to help you get
started.
Get in touch with us at
[email protected]
Experience the
FME Accelerator
Contact Us
A world where data is not just a
commodity but a catalyst for
real change.
fme.safe.com/accelerator



Next Steps

Claim Your Community Badge &
Dive into the new Community!
●Get community badges for watching
webinars
●community.safe.com
●Today’s code: 73PGGG


Join the Community today!
Next Steps

Q&A

Thank You
Recap of Next Steps

1Join the FME Community
2Contact us
3Experience the FME Accelerator

Please fill out our
webinar survey
Tags