新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> 关于 XML 的一般性技术讨论,提供 XML入门资料 和 XML教程
    [返回] 中文XML论坛 - 专业的XML技术讨论区XML.ORG.CN讨论区 - XML技术『 XML基础 』 → XML解析器测评(一)。参评对象:LIBXML2, Java 1.5 Default SAX, Woodstox, Sun SJSXP, BEA StAX, Javolution, Oracle StAX 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 33341 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: XML解析器测评(一)。参评对象:LIBXML2, Java 1.5 Default SAX, Woodstox, Sun SJSXP, BEA StAX, Javolution, Oracle StAX 举报  打印  推荐  IE收藏夹 
       本主题类别: XML工具和开发环境    
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 XML基础 』的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 XML解析器测评(一)。参评对象:LIBXML2, Java 1.5 Default SAX, Woodstox, Sun SJSXP, BEA StAX, Javolution, Oracle StAX

    XML Parser Benchmarks: Part 1
    By Matthias Farwick, Michael Hafner
    May 10, 2007
    Five years after the introduction of SOAP 1.0, XML parsing is still the main bottleneck in web service performance. In search of components for a high performance web service security solution, we have executed benchmarks for various XML parsers in Java and C. These benchmarks cover event-driven parser models like SAX and StAX, object model parsers like DOM, and also new breeds of XML parsers like Apache's AXIOM, which only builds parts of the document tree in the memory.

    Our intention was to find the right components for our high performance web service security gateway, so that it could be run on a small dedicated appliance. The limited resources of such a device brought the C tests into the game, since the Java virtual machine already needs a lot of memory. Object model parsers are the most important parser types in the context of web service security because they can be used to alter a XML document in memory. In this first part of a two-part series, we will present our benchmark results for the event-driven parsers like SAX and StAX, because those are used by the object model parsers, and therefore determine the performance of object model parsers by a large amount. First, we will give you a quick overview in the XML parser jungle.

    Recap of XML Parser Types
    Generally, there are two types of XML parsers. First are the push- and pull-parsers that simply read a XML document and return the data and structure of the document (e.g., SAX and StAX). Both are event-driven parsers because they return events that the developer has to handle. Push parsers implementations like SAX (Simple API for XML) return the data of the whole document in one stream and cannot be stopped (you could throw an exception in Java). Pull-parsers, on the other hand, only return data when they are asked to read the next node in a document. StAX (Streaming API for XML) is a pull-parser specification for Java defined in [URL=http://jcp.org/en/jsr/detail?id=173]JSR 173[/URL].

    The second type of XML parsers are object model parsers (e.g., DOM and Apache AXIOM), which not only read the data but also construct an in-memory representation of the document, which can be altered. Since DOM parsers mostly use SAX parsers to read in the documents, it is clear that the object model of a document is always built completely. This is a performance limitation if only data at the beginning of a document needs to be read and altered. New approaches like Apache's AXIOM make use of StAX pull-parser implementations to overcome this limitation. AXIOM only builds the tree representation of a document until the last node that was requested. Therefore, it does not need to read the complete document.

    In this first part of the series we will talk about the performance of the reading parsers. Since these parsers are used by the object model parsers to read in the data, we can already make assumptions about the performance of the corresponding object model parsers.

    The Tested Event-Driven Parsers
    [URL=http://www.xmlsoft.org/]LIBXML2[/URL] Stream Pull-Parser + SAX-like 2.6.27 (C): LIBXML2 is a C library that provides several APIs for XML processing and manipulation. Besides a DOM-like implementation it also provides a streaming pull-parser and a SAX-like interface. The latter is used to read in the data for the DOM-like parser.
    [URL=http://java.sun.com/j2se/1.5.0/docs/guide/xml/jaxp/index.html]Java 1.5 Default SAX[/URL] (Java): This parser is the default SAX parser in Java 1.5.
    [URL=http://xircles.codehaus.org/projects/woodstox]Woodstox[/URL] StAX Pull-Parser 3.1.0 (Java): Woodstox is a JSR173 conforming StAX parser implementation. It was created by the open source community Codehaus and is tightly coupled with its SOAP engine, XFire.
    Sun SJSXP StAX Pull-Parser 1.0-b26 (Java): The SJSXP is Sun's implementation of the JSR173 StAX specification. It is shipped with the Java 6 SDK.
    [URL=http://dev2dev.bea.com/xml/stax.html]BEA StAX[/URL] implementation 1.1.2 (Java): This is a JSR173 implementation by BEA.
    [URL=http://javolution.org/]Javolution[/URL] StAX-like implementation 4.0 (Java): Javolution is an open source project that aims on enhancing the performance of the Java base library. It provides a StAX-like XML parser that does not fully comply to JSR 173.
    [URL=http://www.oracle.com/technology/tech/xml/xdk/staxpreview.html]Oracle StAX[/URL] implementation XDK 10.1.0.1 (Java): A JSR 173 implementation by Oracle.
    The Test Environment
    The main benchmark tool that we used is a modified version of Sun's XMLTest. It lets you define test suites that describe which parsers are tested with which documents. On execution, it measures how many documents a parser processed in a specified period of time, and calculates the throughput per second for it. The most modification involved the inclusion of the external C benchmarks into the tests. Those benchmarks were inspired by the [URL=http://xmlbench.sourceforge.net/]xmlbench[/URL] benchmark tool, which is under the GNU public license. All C tests in the benchmarks of this part of the series used the GNU GCC compiler. Each Java benchmark was executed with the -server option to reserve more resources for the JVM. All tests were run on a Fujitsu Siemens S Series notebook, with a 1.70 GHz Intel processor and 1 GB RAM.

    The Benchmark Execution
    The aim of our benchmarks was to measure how many documents a parser can process in a given time. Processing means that the parser walks through the whole document and counts the number of elements, attributes, and the length of the text elements. This way we were able to see if each parser performs the same walk through the document. We measured the throughput for 15 seconds with a 5 second warm-up phase for each parser. XMLTest presents the results as bar charts with the throughput per second on the Y axis, and the different parsers on the X axis. The XML documents which we used are purchase order XML files that are provided with XMLTest. These documents have a maximum depth of 6, and an almost equal amount of elements and attributes.

    The Event Parser Benchmarks
    First, we will show you the benchmark results for the push-parsers and one StAX implementation.

    按此在新窗口浏览图片
    Figure 1: Benchmark results for event-driven XML parsers and small documents

    按此在新窗口浏览图片
    Figure 2: Benchmark results for event-driven XML parsers and large documents

    In Figures 1 and 2 you can see that the LIBXML2 SAX-like parser (red) does much better than all other implementations. This implies that the LIBXML2 object model parser has an advantage over the other implementations because it uses the LIBXML2 SAX parser to read in the documents. Unfortunately the LIBXML2 SAX-like parser has a very complex interface. And also, as in most C XML parsers a great amount of focus has to be laid on the memory management. In second place is the Woodstox StAX implementation (yellow). The LIBXML2 stream pull-parser (blue) and the Java 1.5 default SAX implementation (green) show almost even results.


    [URL=http://www.xml.com/lpt/a/%3C!--CS_NEXT_REF--%3E][/URL]


    StAX Parser Benchmark
    You can choose the StAX implementation (for example Apache's AXIOM) in many recent Java XML applications. Since there are already a handful of StAX implementations out there, we compared their reading performance in the following benchmarks.


    按此在新窗口浏览图片
    Figure 3: Benchmark results the StAX parsers and small documents

    按此在新窗口浏览图片
    Figure 4: Benchmark results for the StAX parsers and medium-sized documents

    按此在新窗口浏览图片
    Figure 5: Benchmark results for the StAX parsers and large documents

    Figures 3-5 show the benchmarks of the five different StAX implementations. In all but the last benchmark the Javolution and the Woodstox parser perform the best results. The SUN SJSXP lags behind for small documents but outperforms all other parsers for the very large 4 MB XML file. The BEA implementation is slightly better for small documents than the SJSXP, but for XML files bigger than 10 KB it is overtaken by the SJSXP. Oracles StAX implementation ranks last on the two biggest documents where it performs equal to the BEA implementation.

    Conclusions
    From the results of the benchmarks we can see that there are big performance differences between the parser implementations. Overall the SAX-like implementation of LIBXML2 in C performs best in all benchmarks. For most document sizes it had one-third to twice as much throughput as its competitors. This is interesting because as we will see in the next part of this series, the LIBXML2 DOM implementation in C uses this parser to read in data and therefore already has a performance advantage over the other object model parsers in Java. A negative point of this parser is definitely the complexity of its interface. With the need to handle void, and double pointers in the callback interface, it is a great difference to the rather intuitive use of the Java StAX interfaces.

    Javolution and Woodstox are the winners of the StAX parsers. Woodstox has the advantage of being JSR 173 conforming StAX parser, which makes it usable for more applications.

    In the next part of this series we will look at the results of the object model parser benchmarks, and will see if any Java parser can beat the performance of the LIBXML2 object model parser in C. This will lead to our final conclusion which XML parser to use for our high-performance web service gateway.

    Additional Resources
    [URL=http://jcp.org/en/jsr/detail?id=173]The StAX specification JSR 173[/URL]
    Sun's XMLTest XML parser benchmark tool
    [URL=http://xmlbench.sourceforge.net/]xmlbench[/URL] a XML parser benchmark tool in C
    SUN's [URL=http://java.sun.com/performance/reference/whitepapers/StAX-1_0.pdf]StAX benchmark[/URL] with XMLTest


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/5/11 10:02:00
     
     小妖漏掉的沙 帅哥哟,离线,有人找我吗?
      
      
      等级:大一(高数修炼中)
      文章:15
      积分:160
      门派:XML.ORG.CN
      注册:2007/11/27

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给小妖漏掉的沙发送一个短消息 把小妖漏掉的沙加入好友 查看小妖漏掉的沙的个人资料 搜索小妖漏掉的沙在『 XML基础 』的所有贴子 引用回复这个贴子 回复这个贴子 查看小妖漏掉的沙的博客2
    发贴心情 
    A good topic .thank you very much. But the xml parsers are so many,  only a small part are list here.
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/5/31 17:44:00
     
     zzh2277 帅哥哟,离线,有人找我吗?
      
      
      等级:大一新生
      文章:1
      积分:54
      门派:XML.ORG.CN
      注册:2009/6/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给zzh2277发送一个短消息 把zzh2277加入好友 查看zzh2277的个人资料 搜索zzh2277在『 XML基础 』的所有贴子 引用回复这个贴子 回复这个贴子 查看zzh2277的博客3
    发贴心情 
    没有译文的吗?
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/6/5 17:40:00
     
     waf17 帅哥哟,离线,有人找我吗?
      
      
      等级:大一新生
      文章:7
      积分:80
      门派:XML.ORG.CN
      注册:2010/1/19

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给waf17发送一个短消息 把waf17加入好友 查看waf17的个人资料 搜索waf17在『 XML基础 』的所有贴子 引用回复这个贴子 回复这个贴子 查看waf17的博客4
    发贴心情 
    额,英文版的哈·~~
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2010/1/22 15:37:00
     
     内蒙小汉2011 帅哥哟,离线,有人找我吗?
      
      
      等级:大一新生
      文章:4
      积分:67
      门派:XML.ORG.CN
      注册:2011/2/19

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给内蒙小汉2011发送一个短消息 把内蒙小汉2011加入好友 查看内蒙小汉2011的个人资料 搜索内蒙小汉2011在『 XML基础 』的所有贴子 引用回复这个贴子 回复这个贴子 查看内蒙小汉2011的博客5
    发贴心情 
    多谢!里面所提的libxml2符合我的需要,下载试试!感谢!
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2011/2/19 22:38:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 XML基础 』的所有贴子 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/11/25 12:40:50

    本主题贴数5,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    101.563ms