Pessimistic Off-Policy Optimization for Learning to Rank
dc.contributor.author | Čief, Matej | cs |
dc.contributor.author | Kompan, Michal | cs |
dc.date.accessioned | 2025-04-04T11:56:32Z | |
dc.date.available | 2025-04-04T11:56:32Z | |
dc.date.issued | 2024-10-21 | cs |
dc.description.abstract | Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient, and we analyze it. We study its Bayesian and frequentist variants and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines and is both robust and general. | en |
dc.format | text | cs |
dc.format.extent | 1896-1903 | cs |
dc.format.mimetype | application/pdf | cs |
dc.identifier.citation | 27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE. 2024, p. 1896-1903. | en |
dc.identifier.doi | 10.3233/FAIA240703 | cs |
dc.identifier.isbn | 978-1-64368-548-9 | cs |
dc.identifier.orcid | 0000-0002-4649-5120 | cs |
dc.identifier.other | 189891 | cs |
dc.identifier.researcherid | E-8197-2012 | cs |
dc.identifier.uri | https://hdl.handle.net/11012/250750 | |
dc.language.iso | en | cs |
dc.publisher | IOS Press | cs |
dc.relation.ispartof | 27TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | cs |
dc.relation.uri | https://ebooks.iospress.nl/volumearticle/69798 | cs |
dc.rights | Creative Commons Attribution-NonCommercial 4.0 International | cs |
dc.rights.access | openAccess | cs |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | cs |
dc.subject | Action spaces | en |
dc.subject | Bayesian | en |
dc.subject | Computationally efficient | en |
dc.subject | Empirical Bayes | en |
dc.subject | Frequentist | en |
dc.subject | Lower confidence bound | en |
dc.subject | Optimizers | en |
dc.subject | Optimizing policies | en |
dc.subject | Policy learning | en |
dc.subject | Policy optimization | en |
dc.title | Pessimistic Off-Policy Optimization for Learning to Rank | en |
dc.type.driver | conferenceObject | en |
dc.type.status | Peer-reviewed | en |
dc.type.version | publishedVersion | en |
sync.item.dbid | VAV-189891 | en |
sync.item.dbtype | VAV | en |
sync.item.insts | 2025.04.04 13:56:32 | en |
sync.item.modts | 2025.04.04 08:32:02 | en |
thesis.grantor | Vysoké učení technické v Brně. Fakulta informačních technologií. Ústav počítačové grafiky a multimédií | cs |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- FAIA392FAIA240703.pdf
- Size:
- 591.37 KB
- Format:
- Adobe Portable Document Format
- Description:
- file FAIA392FAIA240703.pdf