Pessimistic Off-Policy Optimization for Learning to Rank

Čief, Matej; Kompan, Michal

Pessimistic Off-Policy Optimization for Learning to Rank

dc.contributor.author	Čief, Matej	cs
dc.contributor.author	Kompan, Michal	cs
dc.date.accessioned	2025-04-04T11:56:32Z
dc.date.available	2025-04-04T11:56:32Z
dc.date.issued	2024-10-21	cs
dc.description.abstract	Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient, and we analyze it. We study its Bayesian and frequentist variants and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines and is both robust and general.	en
dc.format	text	cs
dc.format.extent	1896-1903	cs
dc.format.mimetype	application/pdf	cs
dc.identifier.citation	27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE. 2024, p. 1896-1903.	en
dc.identifier.doi	10.3233/FAIA240703	cs
dc.identifier.isbn	978-1-64368-548-9	cs
dc.identifier.orcid	0000-0002-4649-5120	cs
dc.identifier.other	189891	cs
dc.identifier.researcherid	E-8197-2012	cs
dc.identifier.uri	https://hdl.handle.net/11012/250750
dc.language.iso	en	cs
dc.publisher	IOS Press	cs
dc.relation.ispartof	27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE	cs
dc.relation.uri	https://ebooks.iospress.nl/volumearticle/69798	cs
dc.rights	Creative Commons Attribution-NonCommercial 4.0 International	cs
dc.rights.access	openAccess	cs
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	cs
dc.subject	Action spaces	en
dc.subject	Bayesian	en
dc.subject	Computationally efficient	en
dc.subject	Empirical Bayes	en
dc.subject	Frequentist	en
dc.subject	Lower confidence bound	en
dc.subject	Optimizers	en
dc.subject	Optimizing policies	en
dc.subject	Policy learning	en
dc.subject	Policy optimization	en
dc.title	Pessimistic Off-Policy Optimization for Learning to Rank	en
dc.type.driver	conferenceObject	en
dc.type.status	Peer-reviewed	en
dc.type.version	publishedVersion	en
sync.item.dbid	VAV-189891	en
sync.item.dbtype	VAV	en
sync.item.insts	2025.04.04 13:56:32	en
sync.item.modts	2025.04.04 08:32:02	en
thesis.grantor	Vysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačové grafiky a multimédií	cs

Files

Original bundle

Now showing 1 - 1 of 1

Name:: FAIA392FAIA240703.pdf
Size:: 591.37 KB
Format:: Adobe Portable Document Format
Description:: file FAIA392FAIA240703.pdf

Download

Collections

Ústav počítačové grafiky a multimédií