AP-Summary-Aug-09-2022_capabilities .pdf

kcdelllaptop 29 views 26 slides Aug 22, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Capabilities of a group of Data Scientist


Slide Content

x
Trusted Partner in Advanced Analytics
Services & Solutions
1

About Us
2
Headquarter:
Chicago, Illinois
Center Of Excellences:
Kolkata, India
A comprehensive and deep experience in Data science Services and Analytics
Solutions. We ourselves are an amalgam of Master and Ph.Dlevel Data Scientists,
Data Architects and Software engineers under guidance of senior leadership and
serial entrepreneur.
Analytics
Services
&
Solution
Components
Numerical
Data
Big Data
Edge
Analytics
Custom
Analytics
Unstructured
Data
NLP & NLG
Video
Analytics
oHealth Care
oManufacturing
oMarketing
oInformation Mining
oAiOps
oEdgePlus

Extensive Experience In Data Science Platforms
3
Analytics Platforms
Database
Visual
Environment
Kapacitor

Big Data Analytics on Claim Data, EMR, EHR & Consumer Data
Preliminary Data Source
EMR, Claims, EHR
Data validation and data quality check, 25+ matrices, Dynamic Tableau Dashboard Data validation and data quality check, 25+ matrices, Dynamic Tableau Dashboard
Missing data imputation, updating and assigning Taxonomy, identifying swapped taxonomy, etc. using
Machine learning algorithm
Missing data imputation, updating and assigning Taxonomy, identifying swapped taxonomy, etc. using
Machine learning algorithm
……
Preparing base tables for multiple products combining multiple data sources.
It requires extensive data understanding, complex HQL query and vision of end result table and its uses.
Preparing base tables for multiple products combining multiple data sources.
It requires extensive data understanding, complex HQL query and vision of end result table and its uses.
Tools : HQL, PySpark,
Python, SQL, Tableau
4
Physician Network & Physician
performance score
Referral Graph using MCMC
simulation
Patient path analysis
Targeted Marketing Campaign
for 65 diseases
Hospital direct billing
identification & gap correction
Entity resolution
Importance of mid level
physicians in the health system
Identifying Sole Practitioner Orthopedic Surgery and
Physiotherapyestimation
Solutions

Edge Analytics for Manufacturing
Automotive Engine Head Quality Monitoring
Objective :
•Eliminate human inspection
•Automated inspection of ALL parts
•Predicting future failures.
Monitoring Movement of Robot Arms
Objective :
•Monitor all arms in real-time
•Detect Anomalies in six armed KUKA
Robots in an automotive warehouse
Truck Engine Failure Prediction & Cargo Compartment monitoring in Real time
Objective :
•Monitor various engine parameters during route
•Assess risk of engine failure
•Alert driver in real time
Vibration Analytics -Monitoring Asset Health
Objective :
•Monitor vibration of assets during production
•Predict Machine Health and time for possible failure
5
Created a library of 100+ User Defined
Functions, for an Edge Analytics platform
including
•Data preparation
•Data transformation
•Data exploration
•Data compression
•Time Series Models
•Distance measures
•Linear & Non linear Clustering
•Machine Learning
•Deep Learning
•Ready to use applications

Custom Analytics
6
Targeted Campaign Optimal Marketing
Communication
Insurance Cost
prediction
65 disease profiles -
target lists ranked in
decilesby likelihood
of response –Cross
Sell, Up Sell, Patient
retention and
acquisition.
Claim data, Billing Data
and Experian Data
Ensemble Technique,
Rare event prediction,
Health Care
professional profiling
and finding best
channels, best day of
the week, content and
time of the day for
optimum response
and resource
allocation.
Multi-Arm Bandit & Multi
Touch Attribution Models
Linear & Logistic
Regression for optimal
weights
Identifying Expensive
Workers
Compensation claims
early in their Life Cycle.
Zero Day model using
limited information,
Thirty and Ninety day
models using updated
data.
Proprietary variable
derivation,
Logistic Regression, Rare
event prediction
Data science is a horizontal
discipline. We are flexible
enough to solve your specific
business problem

Health Bot
•Parse and Triage Patient messages
•CHAT BOT receives (Text/Audio/Video)
•Extracts actionable Insights
•Sentiment Analysis, Questions and Concerns
identification model (ML model) and patient EMR
•Automated conversation to gain added
information
•Urgency Coefficient Derived
•Dashboard for easy understanding
Web NLP Engine
•A scalable & Unified NLP engine
•Data Prep, Standard Analytics & Content matching
•Preprocesses Unstructured Data
•Sentiment analytics, POS tagging, Key word
extraction
•Extracts relevant content from web using BERT
model.
EMR NLP Engine
•Extracts relevant patient data from different EMR
pdf formats
•Common Structured Display Format.
•Storage in Patient Database
•Multiprocessing
•Output in Case Manager Dashboard
7
Natural Language Processing

8

Objective:

Creating an interface for user to search in Google

Extracting contact details from a collection of URLs by crawling the web pages

Saving the contact details in a file for later inspection

Input

An user given phrase to search in Google Example: “Oncologists in NewYork”

User can enter the numbers of urlsto be generated in each interval time

User can enter the interval time with interval unit

User can enter the iteration time (how long the total urlssearching process will run) with unit

Output

Contact details of physicians from every search result webpages

A csv file or a database table which contains this collection of contact details

A log file which holds the information for unsuccessful pages
Web crawling

Video / Image Analytics
Video Analytics
Mine the complete video data feed including OCR
(texts) –object detection, Face detection, Emotion
detection, Action detection ,Gaze Detection etc.
Structured output for further analysis
Archived videos summarized for user
consumption
Unnecessary frames and static removed for quick
user review
Video File Frames
Text Object Face
Emotions Actions
9

Analytics Solutions Components
Edge Analytics Platform -IoT
EdgePlus: A streaming Edge Analytics Platform which
can bring cloud-like functionality closer to data-
producing sources and merges real-time capabilities
to integrate IT and OT in a single platform.
AIOPs –Intelligent DevOps
A streaming and batch analytics Platform providing
intelligent insights in DevOps utilizing advanced
AI/ML techniques.
Intelligent BOT for Patients
Intelligent Patient Classification System & Patient 360
Data Manager To Co-ordinate Efficient Patient Care
10

11
Recent Use Cases
Custom Analytics
Edge Analytics
* Manufacturing
* Network Health Monitoring
Big Data
* Physician Referral
* Targeted Marketing
Custom Analytics
* Insurance Cost Prediction
* Marketing channel Optimization
NLP
* Information Mining
* Triaging Patient Communication

12
Engine Head Quality Control for a large Automobile House
Production Line –5 Cells 150 +
processes
Cell –30+ Processes
Engine head
25015 (Nu Theta)
25017 (Atkinson)
Only 4% product underwent inspection,
1% had MIP (Microscopic inspection)
Requirement was to implement
100% automated inline testing
•5 different Lines with 5 machines /Cells per line
•Engine head 25015 passed through 153 processes
•Head 25017 went through 155 processes
•Each Cell contained 30+ processes
•100 data Points / Sec
Data Complexity
Edge analytics at the cell level
Automated inspection of ALL
parts and identifying defects at
the earliest possible stage

13
Engine Head Quality Control for a large Automobile House
System Manager ( Plant Data Center)
EdgeSoftware
Standard Edge Hardware
Plant Network
Cell
AI/ML Model
Identification of processes and Variables
with greater variability
Differences from control pattern indicated
likely process failure leading to poor engine
head quality
Automated Quality Inspection with
Dashboards providing actionable insights
Automated process failure detection
Accounts for systemic differences due to:
A) Transition across jobs being not clearly
marked
B) Slight time delay across different engine
heads
Model used: HOT SAX Model for Anomaly
Detection

14
Monitoring Network Performance in Real Time
Main aim was to identify network performance of different stores compared to
1) That store itself (Local Anomalies) and
2) All other connected stores (Global Anomalies)
Considered variables
1)Good-Put Rate
2)Latency &
3)Packet-loss
Data from Logs, Metrics and other Sources, Ingested, cleaned , and prepared for
analysis
Algorithm Used
The Modified K-meansalgorithm identifies that cluster of values that contain rare
patterns(<10%) which are anomalous. The thresholds so identified are then
applied on incoming data to find similar patterns. The method learns from previous
batchesto check whether the pattern is still rare enough to be so classified
Store Level Dashboard & Notifications in Real time | Across stores Dashboard & Notifications in Real time
Local &
Global
Global

15
Physician Referral Discovery
•Huge data set with billions of rows
•Same claim reported by different Vendors with different ids
•Missing claims meant wrong referrer would be assigned to claim
•Referral data on claims potentially unreliable
•Aprioriclinical domain knowledge used
•Taxonomy of Physicians were cleaned and used to determine referral priority
•Proprietary algorithms developed for de-duplication and referral assignment
MarketShareIncreaseddueto
•CorrectIdentificationofPhysicianPracticestoacquire
•Removal/ReductionofReferralLeakage
•ReferralIdentificationAlgorithmwaspatentedbytheclient

16
Targeted Marketing for 65 Diseases
•Random advertising led to low ROI , low patient acquisition
•Not understanding patient needs led to patient churn
•65 disease profiles -target lists ranked in decilesby likelihood of response
•Customer profile-Experian Database, patient profile and billing data used
•Ensemble models , rare event prediction techniques
•Auto updates of models with changes in patient/consumer profile
•Patient Models for cross-sell, upsell, and retention: Response Rate>75%.
•Consumer Models for acquisition: Response Rate>65%
Patient Model Performance for Afib:

Workers Compensation Claims Triaging
17
Identifying Expensive Claim Early in their Life Cycle
Zero Day Model using limited information with 75% overall accuracy
and 80% recall for the rarest and most expensive claims. Thirty Day &
Ninety Day Model using updated data with more than 85% accuracy
22% savings in total cost, avoid litigation, decrease General
Liability claims
Performance Table for Zero Day Model
Actual
Predicted <=$1,500$1,500-$50,000>$50,000Total Injured
<=$500 72 17 9 98
$1,500-$50,000 3 12 1 16
>$50,000 2 5 29 36
Total Injured 77 34 39 150

Targeted Marketing to Healthcare Professionals (HCPs)
18
•HCP Profiling and finding best channels, best day of the week, content to be
sent and time of the day to approach them
•Allocate Resources across different channels customized for each HCP for
optimum response
•NBA using Reinforcement Learning with Multi-Arm Bandit Models with
multiple algorithms for best channel predictions .
•Multi Touch Attribution Model using Shapley Value approach to assign
channel contributions
•Optimum resource allocation using linear and Logistic regression
approaches to determine optimal weights to maximize profits
•Best channels identified for HCPs clustered according to the Propensity
of Response
•Optimal resource allocation to ensure maximum new
prescriptions/probability of prescriptions

Information Mining & Analytics using NLP
19
Objective :
•Mining patient data from pdf documents (different EMRs)
•Storing them into database in a common structured format
•Analytics and Case Manager dashboard
•Batch processing enabled
Methodology :
•PDF files converted to text using API
•Pattern Matching and Rule Based Information Retrieval
Algorithm used to convert extracted information to structured
format.
•Technologies used are: Python , NLTK, Flask, fuzzywuzzy,
Semantic Analysis, SNOMED CT Database
Output:
Structured information presented in the Care
Manager Dashboard in an actionable format

Triaging Patient Communication
20
•Care coordinator with 200+ patients –1000s of messages and tweets
•Urgent vs. Non-urgent
•System scaling requires automation
•Built an NLP system -parse patient messages and triage them –identify
question/content/urgency
•Bot gathers additional information from patients
•Patients receive timely support
•Care coordinator is able to focus on urgent cases

Triaging Patient Communication
21

22
Analytics Solutions Components
•Real time point data
and video analytics.
•Distributed,
standalone Analytics
platform can be
deployed in cloud or
on Edge.
•Centralized
management and
orchestration of the
analytics pipelines.
•Hardware agnostic.
Centrally managed streaming analytics platform for Video and data

Partnering in finding Solutions
Solve tough problems
together
Build Analytics Pipeline from data
ingestion to visualization
Build deployable
products together
Extend your
team’s capabilities
Bring solutions to
market faster
Flexible, Economical,
and Dependable
Specialty:
“Develop IP focused
Analytics Products”
23

Selected Team Members
24
We ourselves are an amalgam of Master and Ph.Dlevel Data Scientists, Data Architectures and Software engineersunder guidance of senior
leadership and serial entrepreneur. We have wide verities of resources starting from data curator to deployment engineer and our business
model is to provide customers project based solutions and data science services. Here are selected samples resources
Solution Architect, Develops algorithm, Team lead,
M.Sc. M.Phillin Statistiic
He architects analytical solutions, directs team on implementation,
and is hands on developer using various analytics platforms. He
leads a team of data scientists and software engineers in our India
office from last several years and he has total 14 years of experience
in Analytics. He has in-depth hands-on experience in US Health Care
domain, Insurance domain and IoTdomain. He has a Masters in
Statistics and Master of Philosophy in Computer Application of
Statistics from Calcutta University, ISI & IIM Calcutta India.
Director of Research, Develops algorithm, Team lead
M.Sc. M.Phill. Ph.Din Statistics
He leads Quantitative Research and Statistical Modelling. He has 18
years of experience in advanced statistical techniques as well as
computational methods for obtaining optimal solutions where
analytical solutions are difficult. He is currently working on developing
an AIOPS product and an IOT based Predictive and Prescriptive
Management solution for a client with the AnalyticsPlus team. He has
M.Phil. (2000) and PhD (2007) degrees from Calcutta University and
was part of the AnalyticsPlus team from its inception.
Data Scientist,
M.Sc. in Statistics
She has 5+ years of experience on applying R, SAS, Python,
pySpark, SQL, Tableau, PowerBI, Grafana, and Kapacitor for
algorithm development, data modelling, statistical learning and
data visualization. She also has hands on experience applying
several ML/Statistical Algorithms to real world problems. She has
worked on A.I. (Artificial Intelligence) and IOT (Internet of Things)
projects. She has experience in handling EMR, claim data and
NPPES data. She is an expert in NLP and NLG.
Data Scientist,
M.Sc. in Statistics
His expertise lies in data science and technology. He has
experience in handling EMR data, claim data, NPPES data and
unstructured data. He is proficient in Python, PySpark, SQL, NLP,
and Tableau. He has good exposure in Web scraping, Audio and
Image analytics, SNOMED CT and RXNORM database. He is good
at Rest Flask API, hosting analytics components on AWS or Azure
environment, implement Sagemakeretc. He holds a Master
degree in Statistics from BurdwanUniversity, India.

Selected Team Members
25
Data Scientist, M.Tech. in Computer Science, Pursuing PhD in Data Science
She has 3+ years of hands on experience in solving real world problems with AI, ML and creating insights. She has hands on
experience in streaming data and image analytics, static Image analytics and NLP. Recently she has delivered a project where her
responsibility was to automate an information retrieval system which can extract the contact details of doctors from URLs by
scrapping the webpages using python. Currently she is working on streaming image analytics.
She is also pursuing PhD on Data Science from Centre for Computational and Data Sciences, IIT Kharagpur. Her area of expertise
includes Machine Learning, Python, Natural language processing, IoT, Web scrapping and Data Visualization.
Data Scientist, M.Sc. in Statistics
He has completed post-graduation degree at IIT Kanpur in Statistics. He has worked for client on “Survey analytics” and “Video
Analytics” projects. He did KPMG Virtual Internship and Data @ANZ Virtual Internship program under InsideSherpa. He has the
experience on understanding and implementing different statistical models, Machine Learning and Deep Learning Models on different
use cases. He got knowledge from various projects like “Time Series and Forecasting”, “Projects on Business Analytics”, “Projects on
Hand Writing Digit Recognition” under different IITK professors. Currently he is working on Digital Advertising Content generation
using ML.
Deployment Engineer, M.Techin Computer Science
He is a DevOps Engineer with 3 + years of experience responsible for deploying, managing and monitoring the resources using
Azure and Azure DevOps. Having experience in containerization platform Docker, container orchestration platform like Kubernetes,
DevOps Tool like CI/CD in Azure Devops, GIT (distributed version control tool), Ansible(Configuration Management Tools).
He has completed his M.Tech. in Computer Science.

Contact Us
26
www.analyticsplus.com
[email protected]
Tags