Which looks more like fire? Left or Right?
Astudillo, R., Lin, Z. J., Bakshy, E., & Frazier, P. (2023, April). qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics (pp. 1093-1114). PMLR.
Let $\mathcal{X} \subseteq \mathbb{R}^q$ be a set of points and $g: \mathcal{X} \rightarrow \mathbb{R}$ be a black-box function. Our objective is to find
$$x_{\text{min}} = \arg \min_{x \in \mathcal{X}} g(x)$$
where $g$ is not directly accessible, and queries can only be performed in pairs of points, referred to as duels $[x, x’] \in \mathcal{X} \times \mathcal{X}$.
The feedback is binary $y \in \{0, 1\}$. $y = 1$ if $x$ is preferred (lower value) and $y = 0$ otherwise. The goal is to find $x_{\text{min}}$ with the minimal number of query.
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
Recall
$$p(x \succ x') = \sigma(r(x') - r(x))$$
where $r(\cdot)$ and $\sigma(\cdot)$ is the latent reward function and sigmoid function, respectively. We want to find $x$ that can minimize $g(\cdot)$, so that $r(x) = -g(x)$. Hence, $p(x \succ x’) = \sigma(g(x) - g(x’))$.
For any duel $[x, x’]$, we have
$$p(x \succ x') \geq 0.5 \iff g(x) \leq g(x')$$
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
The Copeland score is used to measure how often an option wins in pairwise comparisons against others. The normalized Copeland score is defined as:
$$S(x) = \text{Vol}(\mathcal{X})^{-1} \int_{\mathcal{X}} \mathbb{I}(g(x) \leq g(x')) dx'$$
where $\text{Vol}(\mathcal{X})$ normalizes the score to $[0, 1]$.
Zoghi, M., Karnin, Z. S., Whiteson, S., & De Rijke, M. (2015). Copeland dueling bandits. Advances in neural information processing systems, 28.
Nurmi, H. (1983). Voting procedures: a summary analysis. British Journal of Political Science, 13(2), 181-208.
The normalized Copeland score is not smooth nor differentiable, motivating the soft version of it:
$$\begin{aligned} C(x) \propto \int_{\mathcal{X}} \sigma(g(x') - g(x)) dx' = \int_{\mathcal{X}} p(x \succ x') dx' \end{aligned}$$
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
The soft Copeland score can be obtained by integrating the preference score over all possible duels. Thus, it is possible to learn this function from data by integrating $\pi([x, x’], \mathcal{D})$. We can use Monte Carlo sampling to approximate the integral:
$$\begin{aligned} C(x; \mathcal{D}, \theta) &\propto \int_{\mathcal{X}} \pi([x, x']; \mathcal{D}, \theta) dx' \approx \frac{1}{M} \sum_{i=1}^{M} \pi([x, x_i]; \mathcal{D}, \theta) \end{aligned}$$
The Condorcet winner is a maximizer of the soft Copeland score found by any suitable optimizer: $x_C = \arg \max_{x \in \mathcal{X}} C(x; \mathcal{D}, \theta)$
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
A Gaussian distribution (i.e. normal distribution) is a continuous probability distribution that is completely described with two parameters (mean $\mu$ and variance $\sigma^2$) with the following density:
$$f(x | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)$$
A multivariate Gaussian distribution is a generalization of the one-dimensional Gaussian distribution to higher dimensions by defining a mean vector $\mu$ and a covariance matrix $\Sigma$.
$$\Sigma = \begin{bmatrix} \sigma_{1,1} & \sigma_{1,2} & \cdots & \sigma_{1,d} \\ \sigma_{2,1} & \sigma_{2,2} & \cdots & \sigma_{2,d} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{d,1} & \sigma_{d,2} & \cdots & \sigma_{d,d} \end{bmatrix}$$
where $\sigma_{i,j} = (x_i - \mu_i)(x_j - \mu_j)$ is the covariance between dimensions $i$ and $j$.
$$P(X) = \sum_Y P(X, Y)$$
Example: Given a joint distribution $P(X, Y)$ as follows:
$$\begin{array}{|c|c|c|} \hline X & Y & P(X, Y) \\ \hline 0 & 0 & 0.1 \\ 0 & 1 & 0.2 \\ 1 & 0 & 0.3 \\ 1 & 1 & 0.4 \\ \hline \end{array}$$
Compute the marginal distribution of $X$?
$$\begin{array}{|c|c|c|} \hline X & Y & P(X, Y) \\ \hline 0 & 0 & 0.1 \\ 0 & 1 & 0.2 \\ 1 & 0 & 0.3 \\ 1 & 1 & 0.4 \\ \hline \end{array}$$
The marginal distribution of $X$ is obtained by summing over all possible values of $Y$:
$$P(X) = \sum_Y P(X, Y) = \begin{cases} 0.1 + 0.2 = 0.3 & \text{if } X = 0 \\ 0.3 + 0.4 = 0.7 & \text{if } X = 1 \end{cases}$$
$$P(X | Y) = \frac{P(X, Y)}{P(Y)}$$
Example: Given a joint distribution $P(X, Y)$ as follows:
$$\begin{array}{|c|c|c|} \hline X & Y & P(X, Y) \\ \hline 0 & 0 & 0.1 \\ 0 & 1 & 0.2 \\ 1 & 0 & 0.3 \\ 1 & 1 & 0.4 \\ \hline \end{array}$$
Compute the conditional distribution of $X$ given $Y = 1$?
The conditional distribution of $X$ given $Y = 1$ is obtained by dividing the joint distribution by the marginal distribution of $Y = 1$:
$$P(X | Y = 1) = \frac{P(X, Y = 1)}{P(Y = 1)} = \begin{cases} \frac{0.2}{0.2 + 0.4} = 0.33 & \text{if } X = 0 \\ \frac{0.4}{0.2 + 0.4} = 0.67 & \text{if } X = 1 \end{cases}$$
When learning a function $f(x)$ without knowing its functional form, we tycally have two modeling choice:
The covariance matrix $\Sigma$ is computed by applying the kernel function $k(t,t’)$ to each pair of data points:
$$k: \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R},\quad \Sigma = \text{Cov}(X,X') = k(t,t')$$
https://distill.pub/2019/visual-exploration-gaussian-processes/
RBF Kernel
https://distill.pub/2019/visual-exploration-gaussian-processes/
Periodic Kernel
https://distill.pub/2019/visual-exploration-gaussian-processes/
Linear Kernel
https://distill.pub/2019/visual-exploration-gaussian-processes/
Analytical Inference: In case the likelihood is Gaussian, the posterior is also Gaussian, and the exact inference can be done as described earlier. For non-Gaussian likelihoods, the posterior is not Gaussian, and exact inference is intractable (i.e., having no closed-form solution).
Numerical Inference: Variational Inference (approximate) or Markov Chain Monte Carlo (asymptotic exact).
Let us consider a simple example of Gaussian Process binary classification. We have a dataset $\mathcal{D} = \{(x_1, y_1), \dots, (x_N, y_N)\}$, where $x_i \in \mathbb{R}^d$ and $y_i \in \{-1, +1\}$. We use Bernoulli distribution to model the data. With a latent function $f(x)$, the likelihood is given by:
$$p(y_i | x_i) = \sigma(y_i \cdot f(x_i))$$
where $\sigma(\cdot)$ is the sigmoid function.
We first present the Elliptical Slice Sampling (ESS) algorithm for sampling from the posterior distribution of the latent function $f(x)$.
$$\log \pi(f^{(t)}) = \log p(y | f^{(t)}) + \log u$$
Murray, I., Adams, R., & MacKay, D. (2010, March). Elliptical slice sampling. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 541-548). JMLR Workshop and Conference Proceedings.
$$f^{(t+1)} = f^{(t)} \cos(\theta) + v \sin(\theta)$$
$$\log p(y | f^{(t+1)}) = \sum_{i=1}^N \log \sigma(y_i \cdot f^{(t+1)}(x_i))$$
Murray, I., Adams, R., & MacKay, D. (2010, March). Elliptical slice sampling. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 541-548). JMLR Workshop and Conference Proceedings.
Murray, I., Adams, R., & MacKay, D. (2010, March). Elliptical slice sampling. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 541-548). JMLR Workshop and Conference Proceedings.

Nickisch, H., & Rasmussen, C. E. (2008). Approximations for binary Gaussian process classification. Journal of Machine Learning Research, 9(Oct), 2035-2078.
Here is the comparison of some approximate inference methods in likelihood approximation:

We can observe that the Expectation Propagation (EP) method provides the best approximation for the likelihood.
Nickisch, H., & Rasmussen, C. E. (2008). Approximations for binary Gaussian process classification. Journal of Machine Learning Research, 9(Oct), 2035-2078.
$$P(y_i | f_i) \approx t_i(f_i, \mu_i, \sigma_i^2, Z_i) := Z_i\mathcal{N}(f_i | \mu_i, \sigma_i^2)$$
where $\mu_i$, $\sigma_i^2$, and $Z_i$ will be iteratively optimized.
Minka, T. P. (2013). Expectation propagation for approximate Bayesian inference. arXiv preprint arXiv:1301.2294.
$$p(f | \mathcal{D}) \approx \mathcal{N}(f | m, (K^{-1} + W)^{-1})$$
where $W = [\sigma_i^{-2}]_{ii}$ and $m =[I - K(K + W^{-1})^{-1}]KW\mu$.
Minka, T. P. (2013). Expectation propagation for approximate Bayesian inference. arXiv preprint arXiv:1301.2294.
Nickisch, H., & Rasmussen, C. E. (2008). Approximations for binary Gaussian process classification. Journal of Machine Learning Research, 9(Oct), 2035-2078.
$$\begin{aligned} \log (p(y | x)) &= \log \int p(y | f) p(f | x) df\\ &\approx \log \int \prod_{i=1}^n t_i(f_i, \mu_i, \sigma_i^2, Z_i) p(f | x) df\\ &= \sum_{i=1}^n \log Z_i - \frac{1}{2} \mu^{\top}(K + W^{-1})^{-1} \mu\\ &- \frac{1}{2}\log |K+W^{-1}| - \frac{n}{2}\log 2\pi \end{aligned}$$
Minka, T. P. (2013). Expectation propagation for approximate Bayesian inference. arXiv preprint arXiv:1301.2294.
Nickisch, H., & Rasmussen, C. E. (2008). Approximations for binary Gaussian process classification. Journal of Machine Learning Research, 9(Oct), 2035-2078.
In practice, the new duels are queried sequentially. Thus, it may be very expensive if we do not have a good acquistion strategy, as we need to explore a lot of duels before finding the Condorcet winner.
To address this issue, we can formulate this problem as an active preference learning problem, where we can customize the acquisition function to balance between exploration and exploitation, helping to find the Condorcet winner with fewer duels.
In this lecture, we will explore three acquisition functions which can be used to find the Condorcet winner:
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
Pure Exploration
$$\begin{aligned} \mathcal{V}[\sigma(f_t) | [x_t, x_t'] | \mathcal{D}, \theta] &= \int \left( \sigma(f_t) - \mathbb{E}\left[ \sigma(f_t) \right] \right)^2 p(f_t | \mathcal{D}, [x_t, x_t'], \theta) df_t\\ &= \int \sigma(f_t)^2 p(f_t | \mathcal{D}, [x_t, x_t'], \theta) df_t - \mathbb{E}\left[ \sigma(f_t) \right]^2 \end{aligned}$$
In practice, we can approximate the variance by using Monte-Carlo sampling.
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
Copeland Expected Improvement
$$\begin{aligned} A_{\text{CEI}}([x_t, x_t'] | \mathcal{D}, \theta) &= \pi([x, x']|\mathcal{D}_t, \theta)(c^*_{x} - c^*)\\ &+ \pi([x', x]|\mathcal{D}_t, \theta)(c^*_{x'} - c^*) \end{aligned}$$
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
Dueling-Thompson Sampling
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
Dueling-Thompson Sampling
Step 1 (Selecting $x_t$): At first, we sample an $\tilde{f}$ from the model using continuous Thompson sampling and compute the soft Copeland score as:
$$x_{t} = \arg \max_{x \in \mathcal{X}} \int_{\mathcal{X}} \pi_{\tilde{f}}([x, x']; \mathcal{D}, \theta) dx'$$
where $\pi_{\tilde{f}}([x_t, x_t’];\mathcal{D}, \theta) = \int \sigma(\tilde{f}_t) p(\tilde{f}_t | \mathcal{D}, [x_t, x_t’], \theta) d\tilde{f}_t$.
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
Dueling-Thompson Sampling
Step 2 (Selecting $x’$): Given $x_t$, the $x_t’$ is selected to maximize the variance of $\sigma(f_t)$ in the direction of $x_t$:
$$x'_{t} = \arg \max_{x'_t \in \mathcal{X}} \mathcal{V}[\sigma(f_t) | [x_t, x_t'] | \mathcal{D}, \theta]$$
Finally, the selected duel $[x_t, x_t’]$ is queried, and the process is repeated until the Condorcet winner is found.
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
Dueling-Thompson Sampling
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
Dueling-Thompson Sampling
100 continuous samples of the Copeland score function (grey) in the Forrester example were generated using Thompson sampling. The three plots show the samples obtained once the model has been trained with different numbers of duels (10, 30, and 150). In black, we show the Coplenad function computed using the preference function. The more samples are available more exploitation is encouraged in the first element of the duel as the probability of selecting $x_t$ as the true optimum increases.
González, Javier, et al. "Preferential bayesian optimization." International Conference on Machine Learning. PMLR, 2017.
We restart with a more general problem where we have a set of more-than-2 points to get the preference feedback.
$$V_t(X) = \mathbb{E}_t \left[ \max_{x \in \mathcal{X}} \mathbb{E}_{t+1} [g(x)] | X_{t+1} = X \right]$$
Astudillo, R., Lin, Z. J., Bakshy, E., & Frazier, P. (2023, April). qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics (pp. 1093-1114). PMLR.
The goal is to find $X$ that maximizes the expected utility $V_t(X)$.
$$X_t = \arg \max_{X \in \mathcal{X}^q} V_t(X)$$
Since $\max_{x \in \mathcal{X}} \mathbb{E}_{t+1} [g(x)]$ is not dependent on $X_t$, we can rewrite the expected utility as:
$$\begin{aligned} V_t(X) &= \mathbb{E}_t \left[ \max_{x \in \mathcal{X}} \mathbb{E}_{t+1} [g(x)] | X_{t+1} = X \right]\\ &\propto \mathbb{E}_t \left[ \max_{x \in \mathcal{X}} \mathbb{E}_{t+1} [g(x)] - \max_{x \in \mathcal{X}} \mathbb{E}_{t} [g(x)] \vert X_{t+1} = X \right]\\ \end{aligned}$$
Astudillo, R., Lin, Z. J., Bakshy, E., & Frazier, P. (2023, April). qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics (pp. 1093-1114). PMLR.
We define the expected utility of the best option (qEUBO) as:
$$A_{qEUBO, t} (X) = \mathbb{E} \left[ \{ g(x_1), \dots, g(x_q) \} \right]$$
According to Theorem 1 in (Astudillo et al., 2023), we have:
$$\arg \max_{X \in \mathcal{X}^q} A_{qEUBO, t} (X) \subseteq \arg \max_{X \in \mathcal{X}^q} V_t(X)$$
Thus, optimizing the expected utility of the query is sufficient to find the optimal query (i.e., maximize the $A_{qEUBO, t}$).
Astudillo, R., Lin, Z. J., Bakshy, E., & Frazier, P. (2023, April). qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics (pp. 1093-1114). PMLR.
Theorem 1:
Proof Outline:
For any given $X \in \mathcal{X}$ and each $i \in \{1, \dots, q\}$,
let $x^+(X, i) \in \arg \max_{x \in \mathcal{X}} \mathbb{E}_t[g(x) \mid (X, i)]$,
and define $X^+(X) = (x^+(X, 1), \dots, x^+(X, q))$.
We claim that:
$$V_t(X) \leq A_{qEUBO, t}(X^+(X))$$
To see this, we denote $r(X) \in \{1, \dots, q\}$ as the index of the component of $X^+(X)$ that maximizes the expected utility of the best option:
$$V_t(X) = \sum_{i=1}^{q} P_t(r(X) = i) \mathbb{E}_t[g(x^+(X, i)) | (X, i)]$$
$$\leq \sum_{i=1}^{q} P_t(r(X) = i) \mathbb{E}_t\left[\max_{i=1,\dots,q} g(x^+(X, i)) \mid (X, i)\right]$$
$$= \mathbb{E}_t\left[\max_{i=1,\dots,q} g(x^+(X, i))\right] = A_{qEUBO, t}(X^+(X))$$
On the other hand, for any given $X \in \mathcal{X}^q$, we have:
$$\mathbb{E}_t[g(x_{r(X)}) | (X, r(X))] \leq \max_{x \in \mathcal{X}} \mathbb{E}_t[g(x) | (X, r(X))].$$
Since $g(x_{r(X)}) = \max_{i=1,\dots,q} g(x_i)$, taking expectations over $r(X)$ on both sides, we obtain:
$$A_{qEUBO, t}(X) \leq V_t(X).$$
Now, building on the arguments above, let $X^* \in \arg \max_{X \in \mathcal{X}} A_{qEUBO, t}(X)$ and suppose for the sake of contradiction that $X^* \notin \arg \max_{X \in \mathcal{X}} V_t(X)$.
Then, there exists $\tilde{X} \in \mathcal{X}^q$ such that $V_t(\tilde{X}) > V_t(X^*)$. By the arguments above, we have:
$$A_{qEUBO, t}(X^+(\tilde{X})) \geq V_t(\tilde{X}) > V_t(X^*) \geq A_{qEUBO, t}(X^*(X)),$$
which contradicts the assumption. Therefore, the claim holds.
The first inequality follows from $(1)$. The second inequality is due to our supposition for contradiction. The third inequality is due to $(2)$. Finally, the fourth inequality holds since $X^* \in \arg \max_{X \in \mathcal{X}} A_{qEUBO, t}(X)$.
This contradiction concludes the proof.
In case there are noises in the actor responses, maximizing $A_{qEUBO, t}$ is not equivalent to maximizing $V_t$. However, if the noises in responses follow the logistic likelihood function $L(g(x); \lambda)$ with noise level parameter $\lambda$, $qEUBO$ can still be highly effective. Each component in $L(g(x); \lambda)$ is a logistic function defined as follows.
$$L_i(g(x); \lambda) = \frac{\exp (g(x_i) / \lambda)}{\sum_{j=1}^q \exp (g(x_j) / \lambda)}$$
We denote $V_t$ as $V_t^\lambda$ to make its dependence on $\lambda$ explicit. If $X^* \in \arg \max_{X \in \mathcal{X}} A_{qEUBO, t}(X)$, then we have Theorem 2 as:
$$V_t^{\lambda}(X^*) \geq \max_{X \in \mathcal{X}} V_t^0(X) - \lambda C,$$
where $C = L_W((q - 1)/ e)$, and $L_W$ is the Lambert W function (Corless et al., 1996).
Corless, R. M., Gonnet, G. H., Hare, D. E., Jeffrey, D. J., & Knuth, D. E. (1996). On the Lambert W function. Advances in Computational mathematics, 5, 329-359.
Lemma 1: For any $s_1, \dots, s_q \in \mathbb{R}$,
$$\sum_{i=1}^q \frac{\exp(s_i/\lambda)}{\sum_{j=1}^q \exp(s_j/\lambda)} s_i \geq \max_{i=1,\dots,q} s_i - \lambda C,$$
where $C = L_W((q - 1)/ e)$, where $L_W$ is the Lambert W function (Corless et al., 1996).
Corless, R. M., Gonnet, G. H., Hare, D. E., Jeffrey, D. J., & Knuth, D. E. (1996). On the Lambert W function. Advances in Computational mathematics, 5, 329-359.
Astudillo, R., Lin, Z. J., Bakshy, E., & Frazier, P. (2023, April). qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics (pp. 1093-1114). PMLR.
Proof of Lemma 1 We may assume without loss of generality that $\max_{i=1,\dots,q} s_i = s_q$. Let $t_i = (s_q - s_i)/\lambda$ for $i \in \{1,\dots,q-1\}$. After some algebra, we see that the inequality we want to show is equivalent to
$$\sum_{i=1}^{q-1} \frac{t_i \exp(-t_i)}{1 + \sum_{j=1}^{q-1} \exp(-t_j)} \leq C.$$
Astudillo, R., Lin, Z. J., Bakshy, E., & Frazier, P. (2023, April). qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics (pp. 1093-1114). PMLR.
Proof of Lemma 1 (cont) Thus, it suffices to show that the function $\eta : [0,\infty)^{q-1} \to \mathbb{R}$ given by
$$\eta(t_1, \dots, t_{q-1}) = \sum_{i=1}^{q-1} \frac{t_i \exp(-t_i)}{1 + \sum_{j=1}^{q-1} \exp(-t_j)}$$
is bounded above by $C$.
Viappiani, P. and Boutilier, C. (2010). Optimal Bayesian recommendation sets and myopically optimal choice query sets. Advances in Neural Information Processing Systems, 23.
Astudillo, R., Lin, Z. J., Bakshy, E., & Frazier, P. (2023, April). qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics (pp. 1093-1114). PMLR.
Lemma 2: $\mathbb{E}_t^{\lambda} [g(x_{r(X)})] \geq A_{qEUBO, t}(X) - \lambda C$ for all $X \in \mathcal{X}$.
Note that
$$\mathbb{E}_t^{\lambda}[g(x_{r(X)}) | g(x)] = \sum_{i=1}^q \frac{\exp(g(x_i)/\lambda)}{\sum_{j=1}^q \exp(g(x_j)/\lambda)} g(x_i).$$
Thus, Lemma 1 implies that
$$\mathbb{E}_t^{\lambda}[g(x_{r(X)}) | g(x)] \geq \max_{i=1,\dots,q} g(x_i) - \lambda C.$$
Taking expectations over both sides of the inequality yields the desired result.
Lemma 3: $V_t^{\lambda}(X) \geq \mathbb{E}_t^{\lambda} [g(x_{r(X)})]$ for all $X \in \mathcal{X}$.
Observe that
$$\begin{aligned} V_t^{\lambda}(X) &= E_t^{\lambda}\left[\max_{x \in \mathcal{X}} E[g(x) | (X, r(X))]\right]\\ &\geq E_t^{\lambda}[E[g(x_{r(X)}) | (X, r(X))]]\\ &= E_t^{\lambda}[g(x_{r(X)})] \end{aligned}$$
where the penultimate equality is followed by the law of iterated expectation.
Recall the Theorem 2, if we have $X^* \in \arg \max_{X\in \mathcal{X}^q} A_{qEUBO, t}(X)$, then:
$$V_t^{\lambda}(X^*) \geq \max_{X \in \mathcal{X}^q} V_t^0(X) - \lambda C$$
Proof for Theorem 2 Let $X^{**} \in \arg \max_{X \in \mathcal{X}^q} V_t^0(X)$. We have the following chain of inequalities:
$$\begin{aligned} V_t^{\lambda}(X^{*}) &\geq \mathbb{E}_t^{\lambda} [g(x_{r(X^*)})]\\ &\geq \mathbb{E}_t^{0} [g(x_{r(X^*)})] - \lambda C\\ &= A_{qEUBO, t}(X^{*}) - \lambda C \geq A_{qEUBO, t}(X^{**}) - \lambda C\\ &\geq V_t^0(X^{**}) - \lambda C = \max_{X \in \mathcal{X}} V_t^0(X) - \lambda C. \end{aligned}$$
Astudillo, R., Lin, Z. J., Bakshy, E., & Frazier, P. (2023, April). qEUBO: A decision-theoretic acquisition function for preferential Bayesian optimization. In International Conference on Artificial Intelligence and Statistics (pp. 1093-1114). PMLR.