We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment length distributions and deletion probabilities. Precisely, we show that the capacity is given by a closed-form expression that can be interpreted as F - A, where F is the coverage fraction ,i.e., the fraction of the input codeword that is covered by output fragments, and A is an alignment cost incurred due to the lack of ordering in the output fragments. We then consider a noisy version of the problem, where the fragments are corrupted by binary symmetric noise. We derive upper and lower bounds to the capacity, both of which can be seen as F - A expressions. These bounds match for specific choices of fragment length distributions, and they are approximately tight in cases where there are not too many short fragments.
我们考虑通过将消息块打破到随机长度的片段,将其删除并删除片段的随机分数的通道上进行通信的问题。这样的通道是由分子数据存储和取证中的应用激励的,我们将其称为折纸通道。我们在任意片段长度分布和删除概率下表征该通道的能力。确切地说,我们证明了容量是由可以解释为f -a的封闭式表达式给出的,其中f是覆盖范围的分数,即,输入密码字的分数被输出片段覆盖,而a是一个由于输出片段缺乏排序而产生的对齐成本。然后,我们考虑了该问题的嘈杂版本,其中碎片被二进制对称噪声损坏。我们将上限和下边界推导到容量,两者都可以看作是f-表达式。这些界限与碎片长度分布的特定选择相匹配,并且在没有太多短片段的情况下它们大约很紧。