How Does Information Bottleneck Help Deep Learning?

In the paper Representation compression and generalization in deep neural networks (Shwartz-Ziv et al., 2019), the following conjecture is given.

Conjecture 1. (Informal Version)

With probability $1 - δ$ over the training data $s = {(x_{i}, u_{i})}_{i = 1}^{n}$ drawn from the same distribution as a random variable pair $(X, Y)$ , for the generalization error $Δ (s) = E_{X, Y} [l (f^{s} (X), Y)] - \frac{1}{n} \sum_{i = 1}^{n} l (f^{s} (x_{i}), y_{i})$ , there is a bound obeying the following form:
$Δ (s) ⩽ \frac{2 ^{I (X; Z_{l}^{s})} + lo g \frac{2}{δ}}{2 n},$
where $f^{s}$ is the full model obtained by training and $Z_{l}^{s} = ϕ_{l}^{s} (X)$ is the output of an intermediate $l$ -layer encoder $ϕ_{l}^{s}$ of the model, i.e., representation obtained after passing through the first $l$ layers.