Pessimistic Off-Policy Optimization for Learning to Rank

dc.contributor.authorČief, Matejcs
dc.contributor.authorKompan, Michalcs
dc.date.issued2024-10-21cs
dc.description.abstractOff-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient, and we analyze it. We study its Bayesian and frequentist variants and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines and is both robust and general.en
dc.description.abstractOff-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient, and we analyze it. We study its Bayesian and frequentist variants and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines and is both robust and general.en
dc.formattextcs
dc.format.extent1896-1903cs
dc.format.mimetypeapplication/pdfcs
dc.identifier.citation27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE. 2024, p. 1896-1903.en
dc.identifier.doi10.3233/FAIA240703cs
dc.identifier.isbn978-1-64368-548-9cs
dc.identifier.orcid0000-0002-4649-5120cs
dc.identifier.other189891cs
dc.identifier.researcheridE-8197-2012cs
dc.identifier.urihttp://hdl.handle.net/11012/250750
dc.language.isoencs
dc.publisherIOS Presscs
dc.relation.ispartof27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCEcs
dc.relation.urihttps://ebooks.iospress.nl/volumearticle/69798cs
dc.rightsCreative Commons Attribution-NonCommercial 4.0 Internationalcs
dc.rights.accessopenAccesscs
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/cs
dc.subjectAction spacesen
dc.subjectBayesianen
dc.subjectComputationally efficienten
dc.subjectEmpirical Bayesen
dc.subjectFrequentisten
dc.subjectLower confidence bounden
dc.subjectOptimizersen
dc.subjectOptimizing policiesen
dc.subjectPolicy learningen
dc.subjectPolicy optimizationen
dc.subjectAction spaces
dc.subjectBayesian
dc.subjectComputationally efficient
dc.subjectEmpirical Bayes
dc.subjectFrequentist
dc.subjectLower confidence bound
dc.subjectOptimizers
dc.subjectOptimizing policies
dc.subjectPolicy learning
dc.subjectPolicy optimization
dc.titlePessimistic Off-Policy Optimization for Learning to Ranken
dc.title.alternativePessimistic Off-Policy Optimization for Learning to Ranken
dc.type.driverconferenceObjecten
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen
sync.item.dbidVAV-189891en
sync.item.dbtypeVAVen
sync.item.insts2025.10.14 14:13:21en
sync.item.modts2025.10.14 09:42:53en
thesis.grantorVysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačové grafiky a multimédiícs

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FAIA392FAIA240703.pdf
Size:
591.37 KB
Format:
Adobe Portable Document Format
Description:
file FAIA392FAIA240703.pdf