摘要 |
<p>The system and method allows the retrieval, indexing and searching of information stored on computers connected by a communications network, where that information comprises ideographic, logographic or pictographic characters, which are encoded using two bytes per character. The binary value which encodes a particular character contained in a given document is converted into hexadecimal text format, which is then prefixed with a predetermined marker character to indicate that it is the hexadecimal value of a double-byte character. That value is then added to a sequential string of such values for each of such characters in that document. The marker characters are then removed from this string, leaving a series of alphanumeric characters separated at set intervals by blank spaces. Each set of characters demarcated by a blank space is then indexed as if it were a standard word such as an English word, albeit a meaningless one. A unique index entry is created for each such word and phase (up to a predetermined combination of such words) which the search engine encounters, and incorporates positional data which points to the location on a networked system of computers of each occurrence of that particular word or phase which the search engine has encountered. Search queries are then met by retrieving the positional data associated with each character or sequence of characters contained in the search query to determine whether any occurrence of those characters which has been encountered by the search engine meets the criteria of the user.</p> |