Β
Β
The Real Python Podcast
Measuring Bias, Toxicity, and Truthfulness in LLMs With Python
How can you measure the quality of a large language ...
more
Jan 19 2024 1h 15m
Chapter 1 2 mins
ndash; IntroductionChapter 2 1 min
ndash; Testing characteristics of LLMs with PythonChapter 3 4 mins
ndash; Background on LLMsChapter 4 5 mins
ndash; Training of modelsChapter 5 1 min
ndash; Uncurated sources of trainingChapter 6 5 mins
ndash; Safeguards and prompt engineeringChapter 7 2 mins
ndash; TruthfulQA and creating a more strict promptChapter 8 2 mins
ndash; Information that is out of dateChapter 9 2 mins
ndash; WinoBias for evaluating gender stereotypesChapter 10 1 min
ndash; BOLD dataset for evaluating biasChapter 11 49 sec
ndash; Sponsor: IntelChapter 12 4 mins
ndash; Using Hugging Face to start testing with PythonChapter 13 2 mins
ndash; Using the transformers packageChapter 14 5 mins
ndash; Using langchain for proprietary modelsChapter 15 4 mins
ndash; Putting the tools together and evaluatingChapter 16 1 min
ndash; Video Course SpotlightChapter 17 1 min
ndash; Assessing toxicityChapter 18 4 mins
ndash; Measuring biasChapter 19 1 min
ndash; Checking the hallucination rateChapter 20 1 min
ndash; LLM leaderboardsChapter 21 7 mins
ndash; What helped ChatGPT leap forward?Chapter 22 1 min
ndash; Improvements of what is being crawledChapter 23 3 mins
ndash; Revisiting agents and RAGChapter 24 2 mins
ndash; ChatGPT plugins and Wolfram-AlphaChapter 25 1 min
ndash; How can people follow your work online?Chapter 26 1 min
ndash; Thanks and goodbye