Advancing Medium-/Low-resource Language Generation with LLMs: Benchmarking, Personalization, and Model Optimization
By Lemei Zhang

Title: Advancing Medium-/Low-resource Language Generation with LLMs: Benchmarking, Personalization, and Model Optimization

Abstract

Recent advancements in Generative Language Models (GLMs) have significantly improved NLP tasks. Currently, most SATA models remain partially closed-source due to business competition and data privacy concerns, and existing benchmarks focus mainly on languages like English and Chinese, limiting the development of language models for medium- to low-resource languages. In this presentation, we explored the development of LLMs for low-resource languages, using Norwegian as a case study.

First, we introduce a benchmark dataset for Norwegian generative language models and present insights from a comprehensive empirical analysis based on it. Then, we examine the impact of incorporating copyrighted materials on model performance across various language tasks, emphasizing the roles of diverse data sources. Further, we investigate the capacity of LLMs in generating personalized content and explore factors affecting personalization from a user-behavior perspective. Preliminary results show that LLMs demonstrate promising abilities in personalized summarization, which can be further enhanced by explicitly incorporating personalization factors within in-context learning. Our findings aim to lay a foundation for more adaptive, user-specific language models, driving advancements in medium-/low-resource languages and expanding applications in personalized content delivery.

Biography

Lemei Zhang is a postdoctoral researcher at the Norwegian Research Center for AI Innovation (NorwAI) at NTNU, Norway. Her research topics include natural language processing, recommender systems, and user modeling. Currently, she focuses on the development of foundational language models and their application for low-resource languages. Her research has appeared in NeurIPS, EMNLP, TACL, TOIS, UMUAI, ECML-PKDD, etc., and she actively serves as a PC member or reviewer for conferences and journals such as AAAI, ACM Computing Surveys, UMUAI, and TWEB.