发明名称 INDEXING AND SEARCHING IDEOGRAPHIC CHARACTERS ON A NETWORKED SYSTEM OF COMPUTERS
摘要 <p>The system and method allows the retrieval, indexing and searching of information stored on computers connected by a communications network, where that information comprises ideographic, logographic or pictographic characters, which are encoded using two bytes per character. The binary value which encodes a particular character contained in a given document is converted into hexadecimal text format, which is then prefixed with a predetermined marker character to indicate that it is the hexadecimal value of a double-byte character. That value is then added to a sequential string of such values for each of such characters in that document. The marker characters are then removed from this string, leaving a series of alphanumeric characters separated at set intervals by blank spaces. Each set of characters demarcated by a blank space is then indexed as if it were a standard word such as an English word, albeit a meaningless one. A unique index entry is created for each such word and phase (up to a predetermined combination of such words) which the search engine encounters, and incorporates positional data which points to the location on a networked system of computers of each occurrence of that particular word or phase which the search engine has encountered. Search queries are then met by retrieving the positional data associated with each character or sequence of characters contained in the search query to determine whether any occurrence of those characters which has been encountered by the search engine meets the criteria of the user.</p>
申请公布号 WO2001090930(A1) 申请公布日期 2001.11.29
申请号 AU2001000612 申请日期 2001.05.24
申请人 发明人
分类号 主分类号
代理机构 代理人
主权项
地址
您可能感兴趣的专利