Hybrid comparison for unicode text strings consisting primarily of ASCII characters

    公开(公告)号:US10540425B2

    公开(公告)日:2020-01-21

    申请号:US16445139

    申请日:2019-06-18

    摘要: A method compares text strings having Unicode encoding. The method receives a first string S=s1 s2 . . . sn and a second string T=t1 t2 . . . tm, where s1, s2, . . . , sn and t1, t2, . . . , tm are Unicode characters. The method computes a first string weight for the first string S according to a weight function ƒ. When S consists of ASCII characters, ƒ(S)=S. When S consists of ASCII characters and some accented ASCII characters that are replaceable by ASCII characters, ƒ(S)=g(s1) g(s2) . . . g(sn), where g(si)=si when si is an ASCII character and g(si)=si′ when si is an accented ASCII character that is replaceable by the corresponding ASCII character si′. When S includes one or more non-replaceable non-ASCII characters, the first string weight concatenates an ASCII weight prefix ƒA (S) and a Unicode weight suffix ƒU(S). The method also computes a second string weight for the second text string T. Equality of the strings is tested using the string weights.

    Hybrid Comparison for Unicode Text Strings Consisting Primarily of ASCII Characters

    公开(公告)号:US20190303425A1

    公开(公告)日:2019-10-03

    申请号:US16445139

    申请日:2019-06-18

    摘要: A method compares text strings having Unicode encoding. The method receives a first string S=s1 s2 . . . sn and a second string T=t1 t2 . . . tm, where s1, s2, . . . , sn and t1, t2, . . . , tm are Unicode characters. The method computes a first string weight for the first string S according to a weight function ƒ. When S consists of ASCII characters, ƒ(S)=S. When S consists of ASCII characters and some accented ASCII characters that are replaceable by ASCII characters, ƒ(S)=g(s1) g(s2) . . . g(sn), where g(si)=si when si is an ASCII character and g(si)=si′ when si is an accented ASCII character that is replaceable by the corresponding ASCII character si′. When S includes one or more non-replaceable non-ASCII characters, the first string weight concatenates an ASCII weight prefix ƒA (S) and a Unicode weight suffix ƒU(S). The method also computes a second string weight for the second text string T. Equality of the strings is tested using the string weights.

    Hybrid approach to collating unicode text strings consisting primarily of ASCII characters

    公开(公告)号:US10089282B1

    公开(公告)日:2018-10-02

    申请号:US15885646

    申请日:2018-01-31

    摘要: Collating text strings having Unicode encoding includes receiving two text strings S=s1s2 . . . sn and T=t1t2 . . . tm. When the two text strings are not identical, there is a smallest positive integer p for which the two text strings differ. The process looks up the characters sp and tp in a predefined lookup table. If either of these characters is missing from the lookup table, the collation of the text strings is determined using the standard Unicode comparison of the text strings spsp+1 . . . sn and tptp+1 . . . tm. Otherwise, the lookup table assigns weights vp and wp for the characters sp and tp. When vp≠wp, these weights define the collation order of the strings S and T. When vp=wp, the collation of S and T is determined recursively using the suffix strings sp+1 . . . sn and tp+1 . . . tm.