Semantic-Aware Code Model: Elevating the Future of Software Development

Semantic-Aware Code Model: Elevating the Future of Software Development Baishakhi Ray * Columbia University * AWS AI Lab

2

3 Coding Code Review/ Testing Debugging Repairing Documentation

4 Coding Code Review/ Testing Debugging Repairing Documentation AI-Powered Software Engineering!!

Future of Software Development: Coding for Everyone! 5

Background | Problem | Challenge | Solution | Future How to Improve Software Development? 6 Improve developers’ productivity Improve well being Improve overall software development experience Reliability, Usability Security, Privacy Performance, Efficiency, etc. etc.

AIWare Code: Brief Timeline 7 2012-2014 Observed the repetitiveness and statistical predictability of the software. [Hindle et al., 2012; Allamanis et al., 2013; Barr et al., 2014] 2014-2016 Applied basic statistical models (e.g., N-gram) to predict code properties. [Allamanis and Sutton, 2014; Ray et al., 2016] 2017-2020 Advanced statistical models (e.g., RNN) are adapted to code modeling. [Yin and Neubig, 2017; Zhou et al., 2019; Hellendoorn et al., 2020] 2020-2022 Transformer-based language models are introduced to learn general code representations via large-scale pre-training [Feng et al., 2020, Guo et al., 2021, Wang et al., 2021] 2022-Now Large Language Models (LLM) w/ xB params are pre-trained w/ nT tokens. Products Deployed. 2024 AI Engineers: Many AI Agents for solving a complex task SWEAgent/Davin/AutoDev

8 2022-2024 Large Language Models (LLM) w/ xB params are pre-trained w/ nT tokens. Products Deployed [ CodeLLama, GPTs, Starcoder ]. 2024 AI Engineers: Many AI Agents for solving a complex task SWEAgent/Davin/AutoDev 1. Smarter models 2. Prompting with LLM 3. AI agents AIWare Code: Brief Timeline —> Current Trend

Benchmarking & Evaluation 9 Mostly Benchmark Driven Development Efficiency How well a model/agent is performing in a particular task Benchmark saturates soon Dire need to create new challenging benchmarks across different SE Tasks

Evaluation Strategies 10 Efficiency How well a model/agent is performing in a particular tasks Cost Costs of calling LLM APIs Latency How fast a job needs to be resolved Inline editing vs test-driven bug fixing Chat Setting Human in the loop

AI-powered Software Development 11 Amazon Q Tabnine 3% Code Written by ML ML-Powered Program Fixing, Repair, Refactoring, etc. Huge Academic Contributions: 500+ Papers https://ml4code.github.io/

Today’s Talk: Code Generation for Inline Setting 12 2012-2014 Observed the repetitiveness and statistical predictability of the software. [Hindle et al., 2012; Allamanis et al., 2013; Barr et al., 2014] 2014-2016 Applied basic statistical models (e.g., N-gram) to predict code properties. [Allamanis and Sutton, 2014; Ray et al., 2016] 2017-2020 Advanced statistical models (e.g., RNN) are adapted to code modeling. [Yin and Neubig, 2017; Zhou et al., 2019; Hellendoorn et al., 2020] 2020-2022 Transformer-based language models are introduced to learn general code representations via large-scale pre-training [Feng et al., 2020, Guo et al., 2021, Wang et al., 2021] 2022-Now Large Language Models (LLM) w/ xB params are pre-trained w/ nT tokens. Products Deployed. 2024 AI Engineers: Many AI Agents for solving a complex task SWEAgent/Davin/AutoDev

How community evaluate code generation tasks? 13 It is becoming saturated & probably suffers from data leakage issues

14 Do they really understand code?

Duality of Consciousness 15

Duality of Consciousness 16 Natural Language: A sorting program Programming Language: def bubble_sort() { ….. }

Lacks Understanding of Program Semantics (GPT 3.5) 17

Lack of Self Consistency Min et al. “Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain”. ICLR’24 LLMs severely lack consistency----they do not even understand the code and summary written by themselves properly.

static void filter16_roberts( uint8_t *dstp, int width, float scale, float delta, int peak, ...){ uint16_t *dst = (uint16_t *)dstp; int x; for (x = 0; x < width; x++) { int suma = AV_RN16A ( ... ); int sumb = AV_RN16A ( ... ); dst[x] = av_clip(sqrtf(suma*suma + sumb*sumb) * scale + delta, 0,peak); } } static void filter16_roberts( uint8_t *dstp, int width, float scale, float delta, int peak, ...){ uint16_t *dst = (uint16_t *)dstp; int x; for (x = 0; x < width; x++) { float suma = AV_RN16A ( ... ); float sumb = AV_RN16A ( ... ); dst[x] = av_clip(sqrtf(suma*suma + sumb*sumb) * scale + delta, 0,peak); } } Code Text Cannot Distinguish the Inherent Malicious Functionalities (Behaviors) Textually Similar ? Functionally Identical ? Integer Overflow CVE-2021-38094 FFmpeg : libavfilter/vf_convolution.c; Commit: 3650835 FFmpeg : libavfilter/vf_convolution.c; Commit: 99f8d32 19 Identifying (Dis)Similar Program Behaviors is Challenging Ding et al., ”Disco." ACL 2022.

CRUXEval [Gu et al.] CRUX: Code Reasoning, Understanding, and eXecution Evaluation 20 Generate simple short python programs that use some Python Libraries Given Input, predicts output (forward reasoning) Given output, predicts input (backward reasoning)

CRUXEval [Gu et al.] 21 The models though report decent performances on code generation (in known benchmark), they are suffering to show code understanding.

How Neuro-Symbolic Reasoning can help to build “smarter” models?

“smarter” models? That are better in CRUX : Code reasoning, understand & execution

SemCoder A better CRUX aware model for Code reasoning, understand & execution

LLM training steps 25

LLM training steps 26

LLM training steps 27

SemCoder : A Code Execution-Aware Model 28 Filter-out non-executable code Syntax-aware data augmentation Align code execution with source code

SemCoder Data Augmentation & Pre-Processing 29

Data Augmentation by OSS Instruct 30

Data Augmentation by OSS Instruct 31

Data Augmentation by OSS Instruct 32 Out of 43.1k Python solutions from, about 11.6k (26.9%) are inexecutable despite instructions to produce "correct" and "self-contained" code

Data Augmentation by OSS Instruct 33 Random Seeds Syntactically/Semantically incorrect problem

SemCoder Data Augmentation & Preprocessing 34 Augment with good quality code

Quality Data Augmentation Impact 35 Better performance can be reached with good quality but much smaller training dataset

SemCoder : A Code Execution-Aware Model 36 Filter-out non-executable code Syntax-aware data augmentation Align code execution with source code

Data Collection Framework for Collecting Execution Data 37

Data Collection Framework for Collecting Execution Data 38

Data Collection Framework for Collecting Execution Data 39

SemCoder : Execution Learning 40

SemCoder : Execution Learning 41

Forward Monologue 42 Execution Coverage Natural Execution Orders Program State Transition Final Output Given the execution, we ask GPT to annotate with the natural language text explaining the execution

SemCoder : Execution Learning 43

Backward Monologue 44 Abstract Intermediate Constraints ([10.5, 8.2, 10.5, 7.1, 8.2]) previous state as "a disordered list with two 10.5s, two 8.2s, and one 7.1. Concrete Input Given the execution, we ask GPT to annotate with the natural language text explaining the generating monologue.

Monologue Annotation with LLMs Generated by LLM (GPT3.5-turbo) with rejection sampling. Use in-context learning to show the kind of reasoning we expect To verify the monologue: Ask GPT to follow the step-by-step reasoning to generate input (backward reasoning) & output forward reasoning 45

SemCoder : A Code Execution-Aware Model 46 Filter-out non-executable code Syntax-aware data augmentation Align code execution with source code Feedback helps to debug

Train Code LM to Self-Refine 47 Step-1: Collect Model’s Faults Ding et al ., 2024. CYCLE: Learning to Self-Refine Code Generation. OOPSLA’24.

Train Code LM to Self-Refine 48 Step-1: Collect Model’s Faults Ding et al ., 2024. CYCLE: Learning to Self-Refine Code Generation. OOPSLA’24.

Train Code LM to Self-Refine 49 Step-1: Collect Model’s Faults Step-2: Learning to Refine Ding et al ., 2024. CYCLE: Learning to Self-Refine Code Generation. OOPSLA’24.

CYCLE: Train Code LM to Self-Refine 50 Step-1: Collect Model’s Faults Step-2: Learning to Refine Step-3: Iterative Self-Refinement Ding et al ., 2024. CYCLE: Learning to Self-Refine Code Generation. OOPSLA’24.

CYCLE: Still Generates & Better Refine 51 Ding et al ., 2024. CYCLE: Learning to Self-Refine Code Generation. OOPSLA’24. ??? ??? ??? ??? Past Generation Mask: Mitigate Exact Copy Self-Refine Sample + NL-to-Code Sample Mixture of Data: Learning to Generate and Refine

Overall Results of SemCoder 52 Overall Results of SemCoder

Overall Results of SemCoder 53 Overall Results of SemCoder

54 SemCoder 6.7B Model is beating some other larger open source models by margin

With scaling improvement continues!! 55 Model HumanEval (+) MBPP (+) CRUXEval -I CRUXEval -O SemCoder (based on deepseek-coder-6.7-base) 68.3 (62.2) 79.9 (65.9) 51.2 / 52.6 48.1 / 56.6 DeepSeek-Coder-16B 49.4 (42.7) 78.3 (65.1) 39.4 / 51.4 44.1 / 46.4 SemCoder (based on DeepSeek-Coder-16B) 74.4 (67.1) 81.0 (68.3) 49.4 / 55.2 49.1 / 58.5

Impact of Monologue Design Choice 56

Increasing Accuracy with Refine Steps

Feedback & Refinement shows great potential (especially for agent centric system) 58 Ding et al ., 2024. CYCLE: Learning to Self-Refine Code Generation. OOPSLA’24.

Neuro-Symbolic reasoning ( better data + execution & abstract reasoning ) shows great potential to build smarter yet smaller models

Neuro-Symbolic reasoning ( better data + execution & abstract reasoning ) shows great potential to build smarter yet smaller models Can be used in more involved SE tasks: Input/Output prediction Program Repair (ii) Vulnerability prediction (iii) Branch Prediction (iv) Semantic Clone Detection

Formal Analysis Probabilistic models Guarantee for the Analysis Noise Intolerant Scalable and Transferrable No theoretical Guarantee Code Generation : Improve Trust Developer Feedback Oriented Automation Explainability Background | Problem | Challenge | Solution | Future 61 Semantic-Aware Code-Model: Future!!

62 Others: Ira Ceka Robin Ding Marcus Min Alex Mathai Vikram Nitin Jinjun Peng Acknowledgements

Semantic-Aware Code Model: Elevating the Future of Software Development

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Semantic-Aware Code Model: Elevating the Future of Software Development

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx