History and Application of LLM Leveraging Big Data

dalgual 109 views 35 slides Jul 16, 2024
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

Myself
Introduction To Big Data
Scalable Data Intensive Computing
Applications with ML & DL
Summary


Slide Content

July 16 2024 YISS Yonsei University, Korea Jongwook Woo, PhD, [email protected] Big Data AI Center (BigDAI) California State University Los Angeles Application of LLM Leveraging Big Data

Contents Myself Introduction To Big Data Scalable Data Intensive Computing Applications with ML & DL Summary

Myself Experience: Since 2002, Professor at California State University Los Angeles Director at BigDAI (Big Data AI Center) PhD/MS in 2001/1998: Computer Science and Engineering at USC BS/MS in 1989/1991: Electronic Engineering, Yonsei University

Myself: S/W Development Lead http://www.mobygames.com/game/windows/matrix-online/credits

Myself: Collaborations SOFT ZEN

Collaboration with NVidia, Databricks, Oracle, Amazon, CDH, Yonsei using Big Data AI https://www.cloudera.com/more/customers/csula.html

Contents Myself Introduction To Big Data Scalable Data Intensive Computing Applications with ML & DL Summary

Data Issues Large-Scale data Hourly, Daily, …: Tera -Byte (10 12 ), Peta-byte (10 15 ) Because of … IoT (Streaming data, Sensor Data) in SmartX Social Computing, smart phone, online game, web, Bioinformatics, … Legacy approach Too expensive to store and process large scale data

Data Handling: One Approach Make a new powerful systems with the bigger and expensive

Data Handling: One Approach IBM Mainframe Z15: T02 starts with $160,000 https://techcrunch.com/2019/09/12/the-mainframe-business-is-alive-and-well-as-ibm-announces-new-z15/

Data Handling: Another Way Less Expensive and More Scalable From 2017 Korean Blockbuster Movie, “The Fortress” ( 남한산성 ) AD 1409 (Year 9 of King Tae-Jong, Chosun Dynasty, Korea) By Choi family: 최해산 ( 崔海山 ), 아버지 최무선 ( 崔茂宣 ) [Ref] 조선의 비밀 병기 : 총통기 화차 ( 銃筒機火車 )| 작성자 도심 Use existing without spending more expenses

H/W: Leveraging Big Data Cluster with GPU

Data Handling: Another Way Less Expensive and More Scalable https://www.nextplatform.com/2021/09/15/the-endless-pursuit-of-scale-at-linkedin/ https://engineering.linkedin.com/blog/2021/scaling-linkedin-s-hadoop-yarn-cluster-beyond-10-000-nodes Linkedin Hadoop Spark Cluster: 10,000 nodes with 500PB of capacity through 2020

Big Data in Systems Big Data: Definition Again Non-expensive platform, which is distributed parallel computing systems and that can store a large-scale data and process it in parallel

Interviewed with Cloudera, Nov 12 2014 https://www.youtube.com/watch?v=ZvrHxsypeUE

Contents Myself Introduction To Big Data Scalable Data Intensive Computing Applications with ML & DL Summary

Jams and other traffic incidents reported by users in Dec 2017 – Jan 2018: (Dalyapraz Dauletbak)

Dashboard : COVID 19 & Vaccination https://www.calstatela.edu/centers/hipic/covid-19-us-ca-confirmed-prediction

Talked about COVID 19 at Arirang TV, 08/27/2020 Invited Talk about COVID 19 and Post-Pandemic era using Big Data AI at Arirang TV in Korea ,

Contents Myself Introduction To Big Data Scalable Data Intensive Computing Applications with ML & DL Summary

Big Data Prediction Big Data Science How to predict the future trend and pattern with the massive dataset ? Deep Learning Machine Learning AI

Deep Learning Example with Images GAN Neural Style Transfer with A Neural Algorithm of Artistic Style ( Gatys et al.). The Bathers, Korea, Yoon-Bok Shin 1858 - ?) Two Young Girls at the Piano, Auguste Renoir, French, 1892

Deep Learning Example (Cont’d) GAN

Contents Myself Introduction To Big Data Scalable Data Intensive Computing Applications with ML & DL Summary

Study ML/DL in Big Data AI Text Processing and Classification of Biz Reviews Data Amazon Products Ratings/Review, OpenTable Helpfulness, Transaction Fraud Detection, Traffics Product Recommendation Airbnb Listings, Amazon Product Price prediction Used Cars, Flight, Liquor Sales Community Service COVID 19: Confirmed Cases, Vaccine effects History books/data

Example: Text Data of History Books 『 明史 』 卷 41 志 17 地理 2 山東 遼東都指揮使司 鐵嶺衞 : 고려와의 경계인 철령성에 철령위를 설치하다 : https://bit.ly/3VefgLl Copy and paste the contents to Chat GPT or Copilot/ChatGPT For Example: 다음의 한자를 한국말로 번역해줘 : " 朝鮮 , 箕子所封國也 . 漢以前曰朝鮮 . 始爲燕人衞滿所據 , 漢武帝平之 , 置眞番 · 臨屯 · 樂浪 · 玄菟四郡 . 漢末 , 有扶餘人高氏據其地 , 改國號曰高麗 , 又曰高句麗 , 居平壤 , 即樂浪也 . 已 , 爲唐所破 , 東徙 . 後唐時 , 王建代高氏 , 兼併新羅 · 百濟地 , 徙居松岳 , 曰東京 , 而以平壤爲西京 . 其國北鄰契丹 , 西則女直 , 南曰日本 . 元至元中 , 西京內屬 , 置 東寧路 總管府 , 盡慈嶺爲界 .“

Example: Text Data of History Books (Cont’d) 『 明史 』 卷 41 志 17 地理 2 山東 遼東都指揮使司 鐵嶺 s ays “…. 西京內屬 , 置 東寧路 總管府 , 盡慈嶺爲界 . ” ChatGPT translates it “… 왕건이 고씨를 대신하여 신라와 백제의 땅을 흡수하고 성곡에 이주하여 동경 ( 東京 ) 이라 하였고 , 평양을 서경으로 하였습니다 . 그 나라는 북쪽으로는 거란 , 서쪽으로는 녀직 , 남쪽으로는 일본과 접합니다 . 원나라 시대 중에는 서경이 내려앉아 동녕로 총관부를 설치하여 자치선으로 하였고 , 전체적으로는 자평령을 경계로 하였습니다 . ”

Example (Cont’d) Open Baidu and search for “ 東寧路 ”

Example (Cont’d) 2. Copy the address of “ 東寧路 ” shown in Baidu: 东宁路

Example (Cont’d) 3. Paste the address 东宁路 吉林省辽源市东丰县 To Google Map

Example (Cont’d) 4. Compare what we have now and what we found at Google Map

NLP Rating Classifying Models in Open Table Performance Comparison of the Models LR has the shortest computation time precision and AUC of DistlBERT are 71.2 % and 70.5 % 4 - 14 % better Precision and AUC than traditional models Slightly better Precision and Accuracy than BERT 0.7 and 1%, respectively Algorithm Precision AUC Computing Time log (sec) LR 0.682 0.644 201 RF 0.667 0.524 634 GBT 0.637 0.616 3,141 BERT 0.707 0.694 11,936 DistilBERT 0.712 0.705 7,048 “Comparing NLP Models with LLM Classifying OpenTable Dataset”, H. Lin, S. Lee, J. Park, E. Lim, J. Woo, KrAIS 2024

Contents Myself Introduction To Big Data Scalable Data Intensive Computing Applications with ML & DL Summary

Summary Big Data platform for Large Scale Data LLM ChatGPT , Gemini, Bing: Your secretary, Translator, Advisor

Questions?