1st MBZUAI Machine Learning for Large Models Workshop 2024

Day 2 Program (June 4, Tue)

9:00am	Safer, more relible, more diverse LLMs
	Timothy Baldwin (MBZUAI)
	The recent surge in generative large language models (LLMs) has created even greater challenges for NLP evaluation. In this talk, I will cover a range of LLM evaluation initiatives covering issues including: multilingual and multicultural capabilities of LLMs; the ability of models to capture different aspects of negation; uncertainty quantification; and model safety.

9:30am	A Musical View on Multimodal Large Language Models: Hierarchical Modeling, Control, and AI Partnerships
	Gus Xia (MBZUAI)
	In this presentation, Gus will delve into three pioneering studies focused on developing multimodal language models for music AI. Firstly, he will discuss whole-song generation through cascaded diffusion modeling, which integrates hierarchical music language structures into diffusion models for enhanced sample-efficient learning. Secondly, he will introduce "coco-mulla," the inaugural "controlnet" fashion model that adeptly applies content-based controls to large-scale audio generation tasks. Lastly, Gus will present Flute X GPT, which introduces LAUI (LLM-Agent User Interface) as a potential next-generation Human-Computer Interaction (HCI) paradigm after Graphical User Interface (GUI). This system capitalizes on a nuanced understanding of both users and tutorial software to innovate customized music-learning experiences.

10:00am	Multimodal Generative AI and applications to biomedical domain
	Michalis Vazirgiannis (Ecole Polytechnique/MBZUAI)
	Graph generative models are recently gaining significant interest in current application domains. They are commonly used to model social networks, knowledge graphs, molecules and proteins. In this talk we will present the potential of graph generative models and recent relevant efforts in the biomedical domain. More specifically we present a novel architecture that generates medical records as graphs with privacy guarantees. We capitalize and modify the graph Variational autoencoders (VAEs) architecture. Finally we present ongoing work and research directions for multi modal generative models involving graphs and applications to molecule generation with LLMs and graphs.

10:30am	Coffee Break

11:00am	The Crescendo Effect: Understanding Multi-Turn Jailbreaks in Large Language Models
	Ahmed Salem (Microsoft)
	Large Language Models (LLMs) are powerful tools capable of generating content that might violate the principles of Responsible AI (RAI). These models are engineered to steer clear of engaging in such illegal or unethical topics through a process known as alignment. This process typically focuses on preventing single-turn jailbreaks, which are deliberate attacks attempting to circumvent the safety alignment. In this talk, I will introduce a new category of jailbreaks that are executed over multiple turns, and I will showcase a specific multi-turn jailbreak called Crescendo. Crescendo interacts with the model in a seemingly benign manner and gradually leads to a successful jailbreak. The talk will also cover Crescendomation, an automated tool designed to execute the Crescendo attack. Finally, I will discuss the complexities involved in evaluating jailbreaks and share the results of Crescendomation's assessment and the lessons learned.

11:30am	Evaluating Linguistic Diversity of Large Language Models
	Guokan Shang (MBZUAI France Lab)
	Recently, Large Language Models (LLMs) have gained widespread recognition and usage. However, their evaluation predominantly concentrates on task-solving performance. Diverging from this usual emphasis, we focus on linguistic perspectives, specifically linguistic diversity, a fundamentally important but significantly overlooked aspect of language generation. The talk begins with a taxonomy of linguistic diversity evaluation metrics, followed by a presentation of our recent study: "The Curious Decline of Linguistic Diversity: Training Language Models on Synthetic Text". Our research found that training LLMs on predecessor-generated text—where language models are trained on synthetic data produced by previous models—causes a consistent decrease in the lexical, syntactic, and semantic diversity of the model outputs through successive iterations. This decline is particularly notable for tasks demanding high levels of creativity. Our study highlights the need for careful consideration of the long-term effects of such training approaches, particularly concerning the preservation by LLMs of human linguistic richness.

12:00pm	Lunch

2:00pm	Jais and Jais-chat: Building the World's Best Open Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
	Preslav Nakov (MBZUAI)
	I will discuss Jais and Jais-chat, two state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. The models demonstrate better knowledge and reasoning capabilities in Arabic than previous open Arabic and multilingual models by a sizable margin, based on extensive evaluation. Moreover, they are competitive in English compared to English-centric open models of similar size, despite being trained on much less English data. I will discuss the training, the tuning, the safety alignment, and the evaluation, as well as the lessons we learned.

2:30pm	Safety and Robustness of Large Models
	Karthik Nandakumar (MBZUAI)
	While large machine learning models are a valuable tool for solving tough problems in many domains including text, speech, and vision, state-of-the-art large models are vulnerable to numerous security and privacy threats. In the first part of the talk, we briefly review these threats and identify key unsolved challenges. In particular, we will focus on adversarial attacks and defense mechanisms. For large generative models, alignment with human values is even more critical to mitigate the risk of unintended consequences. At the same time, care must be taken to ensure that prevalent human biases do not creep into large models. These challenges can be addressed by following well-known cybersecurity practices such as red and blue teaming, with the red team attempting to expose the vulnerabilities of the large models and the blue team devoted to plugging these loopholes. Finally, we focus on the challenge of preserving the privacy of data used in machine learning. We consider the scenario where data generated by multiple organizations or individuals needs to be leveraged to develop large models. While federated learning is typically used in such scenarios, this framework is highly inefficient for large models, necessitating more efficient learning algorithms and collaboration protocols.

3:00pm	Understanding Language Models Through the Lens of their Training Data
	Nikhil Kandpal (University of Toronto)
	The behaviors of language models are non-trivially influenced by their training data, presenting a significant challenge in understanding how properties of training datasets drive model responses. This talk explores methodologies to quantify these influences despite complexities introduced by the black-box nature of language models and large-scale training datasets. I will discuss key findings from recent studies that link high-level model behaviors, such as memorization and fact acquisition, to global characteristics of their training data. Additionally, I will discuss recent advances in techniques like semi-parametric language modeling and retrieval-augmented generation, which offer new ways to trace model outputs back to individual training instances.

3:30pm	Coffee Break

4:00pm-5:00pm	Panel Discussion