摘要 |
Provided is an indexing system for structured or semi-structured source data comprising a tokenizer for accepting source data and generating tokens representing the source data, the tokens from the tokenization representing the source data in a relational view, where for tokens representing a subset of the source data, the system generates tokens identifying the table and column of the subset of the data in the relational view of the source data, and an index builder for building index structures based on the tokens generated by the tokenizer, the index builder creating indexes which comprise a set of positional indexes for indicating the position of token data in the source data, a set of lexicographical indexes comprising a sort vector index and a join bit index, associated with the sort vector index, a set of data structures mapping between the lexicographical indexes and the positional indexes.
|