Method and system for recovering text from a damaged electronic file
    1.
    发明授权
    Method and system for recovering text from a damaged electronic file 失效
    从损坏的电子文件中恢复文本的方法和系统

    公开(公告)号:US5964885A

    公开(公告)日:1999-10-12

    申请号:US891897

    申请日:1997-07-14

    IPC分类号: G06F17/22 G06F17/27 G06F11/00

    摘要: Recovering text from a damaged electronic file by scanning an arbitrary stream of bytes and extracting text that is encoded as ASCII or Unicode. A byte of the damaged file is read. The read byte may be interpreted using the ASCII encoding standard. The read byte and the immediately preceding read byte may also be interpreted using the Unicode character encoding standard. The interpreted byte(s) is classified based upon the likelihood that the byte(s) is actually text for the particular character set rather than a control character, damaged data, or an element other than a textual character. The classifications are used to adjust a likelihood counter for each character type. The likelihood counter may be an integer value that indicates the probability that a text run has been detected. A text run is a sequence of bytes that is believed to be undamaged text. Each likelihood counter is then examined to determine whether there is a text run for one of the character types. If there is a text run, then the starting position of the text run is saved. The entire text run is output when the text run ends.

    摘要翻译: 通过扫描任意字节流并提取ASCII或Unicode编码的文本,从损坏的电子文件中恢复文本。 读取损坏文件的字节。 读取字节可以使用ASCII编码标准进行解释。 读取字节和紧接在前的读取字节也可以使用Unicode字符编码标准来解释。 解释的字节根据字节实际上是特定字符集的文本而不是控制字符,损坏的数据或除文本字符之外的元素的可能性进行分类。 分类用于调整每个字符类型的似然计数器。 似然计数器可以是指示检测到文本运行的概率的整数值。 文本运行是被认为是未受损文本的字节序列。 然后检查每个似然计数器,以确定是否有一个字符类型的文本运行。 如果有文本运行,则文本运行的起始位置将被保存。 当文本运行结束时,输出整个文本运行。