   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> XML与各种文件格式的相互转换及相关工具。 word to xml, xml to word, html to xml, xml to pdf,
    csv to xml, rtf to xml, text to xml, xml to text, xls to xml, xml to xls
    [返回] 中文XML论坛 - 专业的XML技术讨论区XML.ORG.CN讨论区 - XML技术『 WORD to XML, HTML to XML 』 → CambridgeDocs - 一个Word to XML工具 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 12591 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: CambridgeDocs - 一个Word to XML工具 举报  打印  推荐  IE收藏夹 
     admin 帅哥哟,离线,有人找我吗?

    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 WORD to XML, HTML to XML 』的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 CambridgeDocs - 一个Word to XML工具

    CambridgeDocs Technology Overview:   
       Driver for Microsoft Word Paragraph Content (text)
    Styles Information (style of text of paragraph and of text runs within paragraphs)
    Formatting information (font, font-color, font-size) of paragraph and of text runs within paragraphs that deviate from "Style" setting
    Paragraph Format Information (leftindent, rightindent, spacebefore, spaceafter)
    Frames (text frames can be extracted as a block-level <FRAME> tag, which has the contents within it).  Specific information about the location of the frame on the page (x,y coordinates) can be extracted (if Pagination = true).
    Images (bitmap images within the Word document can be extracted).  Specific information about location of the image on the page (x,y coordinates) can be set (if Pagination =true).
    Superscripted text in-line is extracted and noted, including reference to footnotes.
    WordArt (extracted as WMF files)
    Lists -numbered lists and bulleted lists are identified
    Page Breaks - hard page breaks are inserted as block level items
    Word Fields - word fields can have either their text extracted by itself, or you can have <FIELD> tags as in-line tags, with a field code, as well as the content of the field.
    Tables - table information is extracted, including background color, column-widths, row-height, colspan, rowspan, table-border (at the level of each cell), including border-color.
    Pagination - pagination can be set to true, in which case the entire document is divided into <PAGE> tags.
    Footnotes and EndNotes can extracted (they all become endnotes in the XML version of the document and are automatically renumbered)
    Page Headers and Footers can be extracted, as <HEADER> and <FOOTER> elements.

    The Microsoft Word driver built by CambridgeDocs was meant to extract as much information as possible from a Microsoft Word (.doc, .rtf, or other) file into XML.  This includes the content, the formatting and stylistic information, layout information, and graphics information. We refer to this as "non-lossy", because many of our customers want to use XML for multi-channel publishing, which means that after the conversion to XML, they may want to reconvert to HTML, to PDF, etc.

    Depending on your needs, you can set options on or off for specific bits of information.  Our XML conversion also includes a pagination option, which preserves the pagination of the original document (especially useful for pages which have text frames and images positioned exactly on the page).

    Word Driver FAQs

    What XML format an I convert my Word documents into?

    The driver initially converts into ppXML, our "intermediate format".  You can then  convert into any further XML schema you like, including DocBook, LegalXML, or into your own custom DTD/schema using an XSLT, or by using the extraction and transformation rules of the xDoc Converter, our flagship product.

    What format can I render it into?

    We provide an XSLT that can be used to convert it further - into XHTML so that it can be viewed in a browser.   You can see this in action by going to the "View as HTML" tab of the RUN/DEBUG window in the xDoc Converter, or by applying the XSLT in the XMLSpy plug-in.

    We also provide an XSLT that can transform ppXML into XSL:FO, which can be used to create PDF files, RTF files, etc.

    Can I do a two-way conversion back into Word?

    Yes, you can do a two way conversion - from Word in to XML, and then from XML back into Word using our XSL:FO and RTF rendering capabilities.  The xDoc Submit plug-in for Word will have this functionality built into it.  However, because of some limitations of XSL:FO rendering engines, you may not be able to convert some of the more advanced features of the word driver both ways.

       收藏   分享  



    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2005/2/24 0:13:00
     zhangshying 帅哥哟,离线,有人找我吗?

    给zhangshying发送一个短消息 把zhangshying加入好友 查看zhangshying的个人资料 搜索zhangshying在『 WORD to XML, HTML to XML 』的所有贴子 引用回复这个贴子 回复这个贴子 查看zhangshying的博客2
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2005/4/25 16:50:00
     cxh0926 帅哥哟,离线,有人找我吗?

    给cxh0926发送一个短消息 把cxh0926加入好友 查看cxh0926的个人资料 搜索cxh0926在『 WORD to XML, HTML to XML 』的所有贴子 引用回复这个贴子 回复这个贴子 查看cxh0926的博客3
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2005/5/5 22:18:00
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 WORD to XML, HTML to XML 』的所有贴子 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2025/2/18 22:36:38

    本主题贴数3,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点