新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> 搜索引擎, 信息分类与检索, 语义搜索, Lucene, Nutch, GRUB, Larbin, Weka
    [返回] 中文XML论坛 - 专业的XML技术讨论区计算机技术与应用『 Web挖掘技术 』 → What is Web Mining ? 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 6590 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: What is Web Mining ? 举报  打印  推荐  IE收藏夹 
       本主题类别:     
     DavidPotter 帅哥哟,离线,有人找我吗?
      
      
      等级:大三暑假(ITELS考了6.5分!)
      文章:150
      积分:852
      门派:Lilybbs.net
      注册:2006/3/7

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给DavidPotter发送一个短消息 把DavidPotter加入好友 查看DavidPotter的个人资料 搜索DavidPotter在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给DavidPotter 引用回复这个贴子 回复这个贴子 查看DavidPotter的博客楼主
    发贴心情 What is Web Mining ?

    quote from http://www.galeas.de/webmining.html
    Web Mining is the extraction of interesting and potentially useful patterns and implicit information from artifacts or activity related to the World­Wide Web. There are roughly three knowledge discovery domains that pertain to web mining: Web Content Mining, Web Structure Mining, and Web Usage Mining. Web content mining is the process of extracting knowledge from the content of documents or their descriptions. Web document text mining, resource discovery based on concepts indexing or agent­based technology may also fall in this category. Web structure mining is the process of inferring knowledge from the World­Wide Web organization and links between references and referents in the Web. Finally, web usage mining, also known as Web Log Mining, is the process of extracting interesting patterns in web access logs.
    Web Content Mining
    Web content mining is an automatic process that goes beyond keyword extraction. Since the content of a text document presents no machine­readable semantic, some approaches have suggested to restructure the document content in a representation that could be exploited by machines. The usual approach to exploit known structure in documents is to use wrappers to map documents to some data model. Techniques using lexicons for content interpretation are yet to come.
    There are two groups of web content mining strategies: Those that directly mine the content of documents and those that improve on the content search of other tools like search engines.


    Web Structure Mining
    World­Wide Web can reveal more information than just the information contained in documents. For example, links pointing to a document indicate the popularity of the document, while links coming out of a document indicate the richness or perhaps the variety of topics covered in the document. This can be compared to bibliographical citations. When a paper is cited often, it ought to be important. The PageRank and CLEVER methods take advantage of this information conveyed by the links to find pertinent web pages. By means of counters, higher levels cumulate the number of artifacts subsumed by the concepts they hold. Counters of hyperlinks, in and out documents, retrace the structure of the web artifacts summarized.


    Web Usage Mining
    Web servers record and accumulate data about user interactions whenever requests for resources are received. Analyzing the web access logs of di#erent web sites
    can help understand the user behaviour and the web structure, thereby improving the design of this colossal collection of resources. There are two main tendencies in Web Usage Mining driven by the applications of the discoveries: General Access Pattern Tracking and Customized Usage Tracking.
    The general access pattern tracking analyzes the web logs to understand access patterns and trends. These analyses can shed light on better structure and grouping of resource providers. Many web analysis tools existd but they are limited and usually unsatisfactory. We have designed a web log data mining tool, WebLogMiner, and proposed techniques for using data mining and OnLine Analytical Processing (OLAP) on treated and transformed web access files. Applying data mining techniques on access logs unveils interesting access patterns that can be used to restructure sites in a more efficient grouping, pinpoint effective advertising locations, and target specific users for specific selling ads.
    Customized usage tracking analyzes individual trends. Its purpose is to customize web sites to users. The information displayed, the depth of the site structure and the format of the resources can all be dynamically customized for each user over time based on their access patterns.
    While it is encouraging and exciting to see the various potential applications of web log file analysis, it is important to know that the success of such applications depends on what and how much valid and reliable knowledge one can discover from the large raw log data. Current web servers store limited information about the accesses. Some scripts custom­tailored for some sites may store additional information. However, for an effective web usage mining, an important cleaning and data transformation step before analysis may be needed.


       收藏   分享  
    顶(0)
      




    ----------------------------------------------
    Don‘t try so hard, the best things come when you least expect them to.

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/4/18 17:41:00
     
     DavidPotter 帅哥哟,离线,有人找我吗?
      
      
      等级:大三暑假(ITELS考了6.5分!)
      文章:150
      积分:852
      门派:Lilybbs.net
      注册:2006/3/7

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给DavidPotter发送一个短消息 把DavidPotter加入好友 查看DavidPotter的个人资料 搜索DavidPotter在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给DavidPotter 引用回复这个贴子 回复这个贴子 查看DavidPotter的博客2
    发贴心情 
    开始挣经验,当斑竹...

    ----------------------------------------------
    Don‘t try so hard, the best things come when you least expect them to.

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/4/18 17:44:00
     
     teng_t1986 帅哥哟,离线,有人找我吗?天秤座1986-10-22
      
      
      威望:1
      头衔:智能缔造者
      等级:计算机学士学位(版主)
      文章:368
      积分:2273
      门派:IEEE.ORG.CN
      注册:2006/4/8

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给teng_t1986发送一个短消息 把teng_t1986加入好友 查看teng_t1986的个人资料 搜索teng_t1986在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给teng_t1986 访问teng_t1986的主页 引用回复这个贴子 回复这个贴子 查看teng_t1986的博客3
    发贴心情 
    支持David Potter!
    我给你站内邮箱发邮件了:)

    ----------------------------------------------
    书山奋战不觉难,
    一刻光阴莫等闲。
    长路遥遥飞浩志,
    前尘洗却作泥丸。
    粗茶薄被心灯暖,
    明月清窗几案寒。
    欲待桂枝香万里,
    海阔天空俱欢颜。

    My blog:http://hi.baidu.com/tengteng2007

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/5/12 15:10:00
     
     teng_t1986 帅哥哟,离线,有人找我吗?天秤座1986-10-22
      
      
      威望:1
      头衔:智能缔造者
      等级:计算机学士学位(版主)
      文章:368
      积分:2273
      门派:IEEE.ORG.CN
      注册:2006/4/8

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给teng_t1986发送一个短消息 把teng_t1986加入好友 查看teng_t1986的个人资料 搜索teng_t1986在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给teng_t1986 访问teng_t1986的主页 引用回复这个贴子 回复这个贴子 查看teng_t1986的博客4
    发贴心情 
    其实你去申请一下应该没问题,这个版现在还没斑竹确实有点不大正常……

    ----------------------------------------------
    书山奋战不觉难,
    一刻光阴莫等闲。
    长路遥遥飞浩志,
    前尘洗却作泥丸。
    粗茶薄被心灯暖,
    明月清窗几案寒。
    欲待桂枝香万里,
    海阔天空俱欢颜。

    My blog:http://hi.baidu.com/tengteng2007

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/5/12 16:33:00
     
     GoogleAdSense天秤座1986-10-22
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给Google AdSense 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2025/8/10 11:18:14

    本主题贴数4,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    62.500ms