“Optimized Vision Language Models for Intelligent Transportation System Applications,” a Presentation from Nota AI

embeddedvision 186 views 25 slides Jun 24, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/optimized-vision-language-models-for-intelligent-transportation-system-applications-a-presentation-from-nota-ai/

Tae-Ho Kim, Co-founder and CTO of Nota AI, presents the “Optimized Vision Language Models ...


Slide Content

Optimized Vision Language
Models for Intelligent
Transportation System
Applications
Tae-Ho Kim
CTO & Co-Founder
Nota Inc.

•In this talk, we explorethe challenges in ITS.
•How vision language model (VLM)can solve these challenges.
•Future work will also be addressed.
Introduction
2© 2024 Nota Inc.

Identity of NotaAI
3© 2024 Nota Inc.Artificial Intelligence Semiconductors (System)
Computer
Vision
Natural
Language
Processing
LLMs
GenAI
CPU
NPU
TPU
GPU
Embedded
BoardSTT/TTS
Edge
Hardware-aware AI Model
Optimization
Nota AI bridges the gap between AI & semiconductors.

Nota AI’s Main Product: NetsPresso
4© 2024 Nota Inc.
NetsPresso
®
simplifies AI model optimization for target devices with automated processes.

Platform& EdgeSolutions: ElevatingExcellence
5© 2024 Nota Inc.Solution SolutionPlatform
Using NetsPresso, Nota AI also has created a solution business in ITS

NotaAI's Expertise
6© 2024 Nota Inc.
Nota AI also specializes in GenAI compression

7© 2024 Nota Inc.
Intelligent Transportation System (ITS)

EnhancingTrafficFlow: Real-Time LightweightITS Solutions
8© 2024 Nota Inc.VRU SafetySolutions
AI-based real-time hazardous situation
screening and control system
AutomaticIncident
Management System
Real-timeincidentdetectionandanalysis
forsafe road condition
AI SmartParking
Real-timeparkingoccupancyand
parkingfacilityutilizationanalysis
SmartIntersectionSystem
Real-time incident detection and
analysis for safe road condition

Use Case 1:SmartIntersectionSystem
9© 2024 Nota Inc.
○Daejeon metropolitan city ITS
construction project
○Intersection CCTV AI video analysis
(600 ch)
○98% accuracy in traffic volume
counting in night and rainy conditions

Use Case 2: AI Safe Crossing
10© 2024 Nota Inc.

Use Case 3: VRU SafetySolutions
11© 2024 Nota Inc.

Use Case 4: AI Smart Parking
12© 2024 Nota Inc.
○UK Milton Keynes stadium
outdoor parking management
○USA San Diego outdoor parking
management (Caltrans)

•Ill-posed problem: How can we define “road debris”?
•Contextual problem: How can we define “accidents” with object detection?
•Rare dataset: How can we obtain dataset?
Challenges in ITS
13© 2024 Nota Inc.

Ill-posed Problem: Road Debris
14© 2024 Nota Inc.
Can it be detected by legacy AI models?

Contextual Problem: Accident Detection
15© 2024 Nota Inc.
Can it be detected by legacy AI models?

•Ill-posed problem: How can we define “Road debris”?
•Contextual problem: How can we define “Accidents” with Object detection?
•Rare dataset: How can we obtain dataset?
Challenges in ITS
16© 2024 Nota Inc.
Requires super-generalized model: Foundation Model

Foundation Model on the Edge: Challenges in Industrial AI
17© 2024 Nota Inc.
Deep Learning Model
Logic Layer
(Kalman filtering…)
As-is Problem
•Rule-based algorithm is fragile.
•Logic added for new requirements.
•Errors on the object detection/tracking propagate to logic layers.
•On-site calibration.
•Data drift.
•Data acquisition on rare events is hard.
•Sophisticated model composition.
•On-site calibration.
Input Image
Inference Result

•VLM is capable of comprehending complex scene.
•VLM already contains various logic.
•VLM is robust on data drift.
•VLM is aware of rare events.
•VLM needs less or no calibration.
•Still, VLM is not understanding video.
Foundation Model
(VLM)
Input Image
Inference Result
Foundation Model on the Edge: Challenges in Industrial AI
18© 2024 Nota Inc.
To-be Features

Vision Language Model: LLaVA
19© 2024 Nota Inc.
LLaVA Live LLaVA
Source: jetson-ai-lab.com

Working Prototype of LLaVA on Accidents Detection
20© 2024 Nota Inc.

Benchmark on Models
21© 2024 Nota Inc.
Source: jetson-ai-lab.com

•More advanced optimization required
•VLM needs to comprehend temporal consistency
•Domain adaptation might be required for user specific scenario
•Interface for product required
•Prompt engineering is required for higher performance
Future Work
22© 2024 Nota Inc.

•Industrial AI is already a widely used technology, but technology is limited when
the problem is complex and underdetermined.
•Among GenAI models, a Vision Language Model (VLM) can understand complex
scenes, so it could analyze complex queries and events.
•For example, in ITS, road debris and car accidents are severe problems, but this
couldn’t be solved by legacy AI models.
•Using VLMs, these problems can be solved as well.
•However, VLMs are still computationally expensive, so lightweight VLMs are
required for the next step.
Conclusion
23© 2024 Nota Inc.

Visit Nota AI @318 !
24© 2024 Nota Inc.

25© 2024 Nota Inc.
Thank you for your attention!