Pessimistic Off-Policy Optimization for Learning to Rank

dc.contributor.authorČief, Matejcs
dc.contributor.authorKompan, Michalcs
dc.date.accessioned2025-04-04T11:56:32Z
dc.date.available2025-04-04T11:56:32Z
dc.date.issued2024-10-21cs
dc.description.abstractOff-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient, and we analyze it. We study its Bayesian and frequentist variants and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines and is both robust and general.en
dc.formattextcs
dc.format.extent1896-1903cs
dc.format.mimetypeapplication/pdfcs
dc.identifier.citation27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE. 2024, p. 1896-1903.en
dc.identifier.doi10.3233/FAIA240703cs
dc.identifier.isbn978-1-64368-548-9cs
dc.identifier.orcid0000-0002-4649-5120cs
dc.identifier.other189891cs
dc.identifier.researcheridE-8197-2012cs
dc.identifier.urihttps://hdl.handle.net/11012/250750
dc.language.isoencs
dc.publisherIOS Presscs
dc.relation.ispartof27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCEcs
dc.relation.urihttps://ebooks.iospress.nl/volumearticle/69798cs
dc.rightsCreative Commons Attribution-NonCommercial 4.0 Internationalcs
dc.rights.accessopenAccesscs
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/cs
dc.subjectAction spacesen
dc.subjectBayesianen
dc.subjectComputationally efficienten
dc.subjectEmpirical Bayesen
dc.subjectFrequentisten
dc.subjectLower confidence bounden
dc.subjectOptimizersen
dc.subjectOptimizing policiesen
dc.subjectPolicy learningen
dc.subjectPolicy optimizationen
dc.titlePessimistic Off-Policy Optimization for Learning to Ranken
dc.type.driverconferenceObjecten
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen
sync.item.dbidVAV-189891en
sync.item.dbtypeVAVen
sync.item.insts2025.04.04 13:56:32en
sync.item.modts2025.04.04 08:32:02en
thesis.grantorVysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačové grafiky a multimédiícs
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
FAIA392FAIA240703.pdf
Size:
591.37 KB
Format:
Adobe Portable Document Format
Description:
file FAIA392FAIA240703.pdf