第一范文网 - 专业文章范例文档资料分享平台

网络爬虫外文翻译参考文献

来源:用户分享 时间:2025/5/20 21:40:03 本文由loading 分享 下载这篇文档手机版
说明:文章内容仅供预览,部分内容可能不全,需要完整文档或者需要复制内容,请下载word后使用。下载word有问题请添加微信号:xxxxxxx或QQ:xxxxxx 处理(尽可能给您提供完整文档),感谢您的支持与谅解。

网络爬虫外文翻译参考文献

crawling process even if multithreading is used will be insufficient for large - scale engines that need to fetch large amounts of data rapidly.When a single centralized crawler is used all the fetched data passes through a single physical link.Distributing the crawling activity via multiple processes can help build a scalable, easily configurable system,which is fault tolerant system.Splitting the load decreases hardware requirements and at the same time increases the overall download speed and reliability. Each task is performed in a fully distributed fashion,that is ,no central coordinator exits.

Ⅵ.PROBLEM OF SELECTING MORE “INTERESTING”

A search engine is aware of hot topics because it collects user queries.The crawling process prioritizes URLs according to an importance metric such as similarity(to a driving query),back-link count,Page Rank or their combinations/variations.Recently Najork et al. Showed that breadth-first search collects high-quality pages first and suggested a variant of Page Rank.However,at the moment,search strategies are unable to exactly select the “best” paths because their knowledge is only partial.Due to the enormous amount of information available on the Internet a total-crawling is at the moment impossible,thus,prune strategies must be applied.Focused crawling and intelligent crawling,are techniques for discovering Web pages relevant to a specific topic or set of topics.

CONCLUSION

In this paper we conclude that complete web crawling coverage cannot be achieved, due to the vast size of the whole WWW and to resource availability.Usually a kind of threshold is set up(number of visited URLs, level in the website tree,compliance with a topic,etc.)to limit the crawling process over a selected website.This information is available in search engines to store/refresh most relevant and updated web pages,thus improving quality of retrieved contents while reducing stale content and missing pages.

网络爬虫外文翻译参考文献

谢谢下载,

祝您生活愉快!

搜索更多关于: 网络爬虫外文翻译参考文献 的文档
网络爬虫外文翻译参考文献.doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
本文链接:https://www.diyifanwen.net/c4hmmm9f56y7b8vd538ce5nrap1rg1l00xgx_4.html(转载请注明文章来源)
热门推荐
Copyright © 2012-2023 第一范文网 版权所有 免责声明 | 联系我们
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:xxxxxx 邮箱:xxxxxx@qq.com
渝ICP备2023013149号
Top