Future Dreaming 2024 | Artificial intelligence in career guidance "How is AI being used in career guidance?, What should we be careful of?"

OECDEDU 74 views 29 slides Jun 27, 2024
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

This presentation from the OECD Future Dreaming 2024: Career Guidance in the Age of Digital Technologies looks at Artificial intelligence in career guidance "How is AI being used in career guidance?, What should we be careful of?". Presented by Chris Percy.

Discover the videos and other s...


Slide Content

Chris Percy at the OECD Future Dreaming Conference How is AI being used in career guidance? What should we be careful of? 22 May 2024 [email protected] @chris_percy https://www.linkedin.com/in/chris-percy-strategy-advisor/

BACKGROUND Civil servant - secondary education reform Operational experience on secondment to a charity Business experience via strategy consulting Data science in diverse settings Speaker Intro: Chris Percy PhD POLICY UK policy & expert witness (e.g. Industrial Strategy; Career Strategy; Soc. Mobility Comm.) Government training (incl. DfE seminars, Australia, British Council) International consultancy (e.g. World Bank, OECD, ILO) PRACTICE Volunteer talks / career mgmt. skills Executive coaching Machine learning in public health Embedding wellbeing in guidance Careers chatbot co-founder RESEARCH Career surveys and big data: what improves labour market outcomes Machine learning models & explainable AI AI accountability ecosystem [email protected] @chris_percy

What is at stake (or Why trust matters)

The floodgates of poor practice are already opening

Practitioner survey ( Webb , 2023) 98 % want to learn more about / use AI tools more desire to hear from other practitioners how actually using it

2024 survey (UK HEI) Answer options Advisers Students I have not heard of them 3% 17% I have heard of them, but not used them 12% 25% I have used them a little bit, but don’t really understand them 5% 13% I  understand how to use them, but don’t use them regularly 43% 28% I regularly use them, but not on careers related topics 10% 5% I regularly use them on careers related topics, but do not rely on them 25% 9% I rely on them for careers related topics 0% 1% Other (please specify) 2% 3% Sample size (small UK HEI sample; indicative only) 40 101 With support and funding from Jisc and Arden University

First of all: What is this tech?

Advice? “I’ve decided I want to change careers and become an accountant. How should I go about this?” Guidance? “I’m feeling under pressure to make a decision about what jobs to apply for, as I come up to finishing my degree in French and German from Leeds University.”

LLMs: A potted recent history Source: LinkedIn Source: OpenAI White Paper 2023

How could generative AI support careers/employability? (the promise…) CMS Sandpit (creativity/brainstorming) Provide it a CV / parts of a CV / letter and ask for ideas how to improve the language generally Provide a CV & job advert and ask for *first* draft of a cover letter (“blank page problem”) Ask for keywords to use on a CV for a particular career / job advert Tell it your activities and ask what transferrable skills are related to it Help building a personal website or managing a LinkedIn profile; simplify CV to 100 word bio Interview Prep Ask it to provide example questions/answers to a standard interview for a given job advert Ask it to score and suggest improvements for your answers to the standard questions Help researching a company/sector/key trends Career Exploration Provide high level information and generic advice on what different careers are like to help someone think about options (initial stage of career decision making) Find adjacent roles/sectors or alternative job titles for something you’re interest in Effectively a navigation tool over large corpus of internet text The Full Monty Simply talk with it as you would a person Perhaps with a few sentences worth of preamble/caveats to help users understand the tool

A few other AI-applications in employability …Not just generative AI… AI-enriched tools for applications Automated CV feedback as part of tools specialised for CV support Send or record yourself doing interviews and get feedback Gamification across the pipeline Language translation and cross-cultural communication Personalised tutors / personalised & adaptive learning platforms ML predictive models to find your course/career interests + application success rates Recruitment support E.g. AI to screen or rank CVs or AI tests as part of round 1 candidate screening Supporting candidates to thrive in such settings Supercharged LMI UK ONS project to code SICs/SOCs from free-text descriptions in surveys AI to analyse job adverts and company websites to better understand trends (currently relies on NLP and coded logic, misses much unstructured data)

How to deploy an existing LLM for a particular use case, e.g. IAG Direct usage (e.g. zero-shot) Just open up a public chatbot and directly ask questions as a possible client Increasing complexity + resource / expertise to deploy Prompt engineering Enter some preamble text into the public chatbot to explain what sort of interaction we want Then directly ask questions Few-shot Learning Provide the chatbot with high quality examples of a “good IAG interaction” as part of prompt Fine-tuning Develop large- ish , clean dataset of good materials (e.g. QA’ed IAG sites) Set-up training config (e.g. low learn rate) Fine-tune model & make it accessible RLHF Reinforcement learning human feedback Collect corpus of scored IAG chats to train a new reward model Update the raw LLM using the reward model (e.g. PPO) Connect to plug-ins LLM uses an outside tool for flagged topics (e.g. web query, database scan) Except for, e.g. price look-up, plug-in quality currently low Hard to build smooth flow between base bot and plug-in

2024 survey view: Are you using / are you aware of any students using LLM technologies for any of the following careers related topics? [tick all that apply] Answer options Advisers Students General advice on how to approach careers topics 26% 30% Personalised advice about your/their specific circumstances 15% 37% Identifying what careers might suit you/them 26% 47% Details about job skills requirements and how to access them 26% 46% Data about typical job salaries, progression rates, forecast demand,… 21% 44% Finding specific intern/job vacancies 18% 41% Finding specific education/training opportunities 10% 39% Help drafting CV, cover letters, or employer introduction emails 95% 47% Help preparing for interview/application processes 67% 46% Providing mock interviews 26% 32% Long-term personalised guidance/mentoring as they progress career planning 5% 33% Other (please specify) 15% 4%   Sample size (small UK HEI sample; indicative only) 39 99

Different NLP IAG chatbot models: Empower with Generative AI? Public-facing (France) Chatbot-style interface for searching publicly-available data Convenient integration of multiple databases in one place Nudge tactics to promote users to engage and be proactive in their job search / course search e.g. bob-emploi.fr Practitioner-facing (UK) Level 1: Repository of curated, QA-ed info for guidance practitioners e.g. LMI, trends, courses, vacancies, skills etc. Level 2: Professional supports access for client/class e.g. introduces tool, empowers for independent use – continues to help with reflection/action Level 3: Integrated public/professional usage, e.g. public front-end for simple queries + localised referrals e.g. cicichat.co.uk (our one)

Generative AI for guidance: Concerns

Collective concerns… And even in sector-led development: Need for transparency that an AI is talking Risks of “hallucinations” and default not to name sources – over-confidence; false facts; omitted context Concerns over equity, accessibility, and lack of control over stereotyping in chatbot responses How can the bot tell when it should encourage engagement with a professional Concerns over responsible corporate use, e.g. to drive up standards not drive them down Off-the-shelf bots are fluent but have major limits: - Long, repetitive answer format; mostly remains vague - Does not ask for context / background - Makes assumptions about the user - Does not (gently) challenge user (“what have you done so far”) - Just does Q&A, does not drive to an action plan - No up-to-date knowledge of policies/LMI Direct to client? Will users know when to trust vs check it? Good questions to ask it? Less emotional support / lightbulb moments? Less contact with advisers? Will it lead to lower quality advice or false LMI? … Misaligned actors? What to do about orgs selling a different product or pushing SEO with careers as a side-hustle & little care for IAG? Will orgs build a cheap, low quality careers bot without sector knowledge? … Need for digital savvy & critical thinkin g users?

How we’re approaching it with a UK-tailored chatbot

CiCi: Move carefully with the sector… 2020/21 - 60 careers professionals in SuperUser groups to shape initial design (Derby, Bristol, Newcastle) 94% of practitioners said a chatbot would be a helpful complement to existing careers provision 2022/23 - Cont’d fi eld testing and R&D with careers practitioners and 5,000+ users Shortlisted by Career Development Institute’s National Award for Best Practice Research and Innovation in the Use of Technology 2021 International recognition in 2023 International Labour Organisation (ILO) Digital Inventory of Career Guidance Tools, the OECD international case study collection ( ODiCY ), and Europe’s Cedefop publications Working with Partners, Practitioners and Volunteers to get the idea just right

26 000+ 40 000+ 25 000+ Jobs & skills information profiles 1 500+ C ourse information Full-time & Part-time jobs in England, Scotland and Wales Short inspirational career journey videos CiCi the chatbot gives you access to:

CV help and drip feeding hints on different topics Referrals to advisers and local support – can be tailored to each organisation/ service/region CiCi is able to provide a record of the user journey, giving advisers a head start with interviews

Ideas for Testing & Evaluation Building trust (if time allows, or in Q&A)

How to test & build trust for an AI bot? (standalone or integrated with human support) Internal staff – informal testing Internal staff – formal testing Field trial with users/volunteers If formal benchmark exceeded Sector kite-mark Need for a blended approach: Well-funded, large-scale sector level research + individual orgs/CAs checking it themselves Test solo-bots + bots embedded with CAs

Using internal staff to evaluate a bot for a given use case Most informal approach e.g. Have a few staff opt-in to play around with the tool and feed back on whether to recommend it for particular uses Adding more formal structure e.g. mgmt. / staff panel decide various assessment protocol features in advance Which staff? e.g. specify a set of staff with diverse backgrounds and representative knowledge of the org/clients How thoroughly to evaluate it? E.g. require min amount of time / number of specific scenarios to try with the bot How to evaluate? E.g. score bot answers against predefined criteria e.g. accuracy, empathy, completeness, supportiveness Who else to engage? E.g. for individual bot testing or subsequent group assessment discussion; e.g. sector experts/researchers, independent careers advisers (i.e. no financial ties to the organisation), useability or UI experts, current/prospective/former users (without trying to replicate the rigour of a field trial - next slide)

A field trial with users / clients Different factors to decide on What services are being compared e.g. human adviser webchat vs AI bot (could be double-blind); adviser vs adviser + bot (probably requires multi-session clients) Outcome e.g. a standardised career decision making survey instrument; satisfaction measures; EET or career progress Scope of users – e.g. age range, circumstances in/out of scope, how recruited (most likely opt-in) Number of users in trial - initial pilot to get data to drive a power calculation for sample size for a full trial? Other data to collect e.g. if working with volunteers, decide how possible selection bias might be addressed, e.g. what data exists or could be collected to assess population validity / analyse results Blinded design , e.g. volunteer users agree to join trial but do not know if they will be assigned to an AI or human bot (need to agree to engage seriously rather than treating as a Turing test / game – some user data likely to be excluded on this account); send blinded data to statistician to analyse Transparency/quality of research methods , e.g. pre-registration, peer-review of results, academic publication In practice, budget, operational constraints, and management concerns also shape factors

Giving the bot an exam? Testing performance vs threshold… What is in the exam? One shot question to a client bg + question Q&A or essays modelled on current adviser qualification exams Real-life scenarios (e.g. iterative discussion over a 15-60 minutes session) Sector kite-mark = some combination of the other methods, conducted at scale / transparently How to mark? Questions with clear right/wrong answers Marking rubric similar to essay questions – assessed by human examiner judgement Panel of careers advisers who review the scripts (could be a double-blind test of advisers vs bots, so that the bot is held to a “current practice” standard rather than an idealised standard) Potential issues? Questions/variants need to be “unseen” to the bot, but answers might leak into the training sets for future LLMs or future upgrades to the base model Hard to design questions that capture the range of circumstances because LLMs do not generalise learning or have baseline social knowledge like advisers In practice, advisers hone skills iteratively and intuitively learning from more experienced advisers – not fully codified process so hard to assess in an exam Real-life scenarios hard to script in full, since the conversation tree could become large and designs to “reroute” divergent answers to a common core script may be artificial and unrealistic in nature. It may be possible to train a bot to act like a user to solve this problem (at the risk of introducing other issues). While issues might prevent perfect exam design, a “good enough” design may still be possible

Views Most informal approach e.g. Have a few staff opt-in to play around with the tool and feed back on whether to recommend it for particular uses Adding more formal structure e.g. mgmt. / staff panel decide various assessment protocol features in advance Which staff? e.g. specify a set of staff with diverse backgrounds and representative knowledge of the org/clients How thoroughly to evaluate it? E.g. require min amount of time / number of specific scenarios to try with the bot How to evaluate? E.g. score bot answers against predefined criteria e.g. accuracy, empathy, completeness, supportiveness Who else to engage? E.g. for individual bot testing or subsequent group assessment discussion; e.g. sector experts/researchers, independent careers advisers (i.e. no financial ties to the organisation), useability or UI experts, current/prospective/former users (without trying to replicate the rigour of a field trial - next slide)

2024 survey view: What sort of evaluation would you want to see to build confidence in LLM technology for careers advice? [tick all that apply] Answer options Advisers Students Early version user feedback that has been acted on 76% 45% Endorsement by a panel of professional careers advisers trialling the chatbot 92% 48% Comparison trial in which users rate the bot similar to webchat careers advice from a professional 81% 43% Comparison trial of user career outcomes, e.g. career confidence, job application success, career satisfaction 70% 45%   Sample size (small UK HEI sample; indicative only) 37 82