TEXT COMPRESSION WITH PREDICTED CONTINUATIONS

发明公开

US20240046026A1 TEXT COMPRESSION WITH PREDICTED CONTINUATIONS 审中-公开

请登陆查看更多内容

专利标题： TEXT COMPRESSION WITH PREDICTED CONTINUATIONS
申请号： US18488275

申请日： 2023-10-17
公开(公告)号： US20240046026A1

公开(公告)日： 2024-02-08
发明人: Ronny LEMPEL , Chenyan XIONG
申请人： Microsoft Technology Licensing, LLC
申请人地址： US WA Redmond
专利权人： Microsoft Technology Licensing, LLC
当前专利权人： Microsoft Technology Licensing, LLC
当前专利权人地址： US WA Redmond
主分类号： G06F40/126
IPC分类号： G06F40/126 ; G06F40/279 ; H03M7/30

TEXT COMPRESSION WITH PREDICTED CONTINUATIONS

摘要：

A method for text compression comprises recognizing a prefix string of one or more text characters preceding a target string of a plurality of text characters to be compressed. The prefix string is provided to a natural language generation (NLG) model configured to output one or more predicted continuations each having an associated rank. If the one or more predicted continuations include a matching predicted continuation relative to the next one or more text characters of the target string, the next one or more text characters are compressed as an NLG-type compressed representation. If no predicted continuations match the next one or more text characters of the target string, a longest matching entry in a compression dictionary is identified. The next one or more text characters of the target string are compressed as a dictionary-type compressed representation that includes the dictionary index value of the longest matching entry.

信息查询

Global Dossier Espacenet

IPC分类:

G	物理
G06	计算；推算或计数
G06F	电数字数据处理（基于特定计算模型的计算机系统入G06N）
G06F40/00	处理自然语言数据（语音分析或综合，语音识别G10L）
G06F40/10	.文本处理（自然语言分析G06F 40/20;语义分析G06F 40/30;自然语言处理或翻译G06F 40/40）
G06F40/12	..使用代码处理文本实体
G06F40/126	...字符编码