发明名称 Event-level parallel methods and apparatus for XML parsing
摘要 Embodiments of techniques and systems for parallel XML parsing are described. An event-level XML parser may include a lightweight events partitioning stage, parallel events parsing stages, and a post-processing stage. The events partition may pick out event boundaries using single-instruction, multiple-data instructions to find occurrences of the “<” character, marking event boundaries. Subsequent checking may be performed to help identify other event boundaries, as well as non-boundary instances of the “<” character. During events parsing, unresolved items, such as namespace resolution or matching of start and end elements, may be recorded in structure metadata. This structure metadata may be used during the subsequent post-processing to perform a check of the XML data. If the XML data is well-formed, individual sub-event streams formed by the events parsing processes may be assembled into a flat result event stream structure. Other embodiments may be described and claimed.
申请公布号 US8838626(B2) 申请公布日期 2014.09.16
申请号 US200912641234 申请日期 2009.12.17
申请人 Intel Corporation 发明人 Yu Zhiqiang;Fang Yuejian;Zhai Lei;Wang Yun;Wu Zhonghai;Dai Mo
分类号 G06F17/30;G06F17/27;G06F17/22 主分类号 G06F17/30
代理机构 Schwabe, Williamson &amp; Wyatt, P.C. 代理人 Schwabe, Williamson &amp; Wyatt, P.C.
主权项 1. A computer-implemented method for parsing XML data, the method comprising: partitioning, by an events partitioning module of a computing device, the XML data into a plurality of XML chunks having a plurality of XML events contained therein, wherein partitioning includes determining a type of event associated with individual XML events of the plurality of XML events and ignoring character data contained within one or more XML events based, at least in part, on the type of event to prevent identification of character data contained within the one or more XML events from being identified as an XML event; parsing, by a plurality of instances of an events parsing module of the computing device, the plurality of chunks in parallel into sub-event streams, wherein parsing includes creating structure metadata to identify unresolved items in the sub-event streams to avoid a parsing error based on the unresolved items, wherein the unresolved items include one or more of an identity of an unresolved start element, an identity of an unresolved end element, or an identity of an unresolved prefix; and generating, by a post processing module of the computing device, a result event stream for the XML data from the sub-event streams, wherein generating the result event stream includes resolving an unresolved end element identified in the structure metadata with a preceding unresolved start element identified in the structure metadata or resolving an unresolved prefix identified in the structure metadata with a namespace of a preceding start element to avert the need to reparse the XML chunks that produced the unresolved item.
地址 Santa Clara CA US