第一范文网 - 专业文章范例文档资料分享平台

网络爬虫Java实现原理

来源:用户分享 时间:2025/5/25 18:16:40 本文由loading 分享 下载这篇文档手机版
说明:文章内容仅供预览,部分内容可能不全,需要完整文档或者需要复制内容,请下载word后使用。下载word有问题请添加微信号:xxxxxxx或QQ:xxxxxx 处理(尽可能给您提供完整文档),感谢您的支持与谅解。

begin_actionPerformed(event); } }

/**

* Called when the begin or cancel buttons are clicked *

* @param event The event associated with the button. */

void begin_actionPerformed(java.awt.event.ActionEvent event) {

if ( backgroundThread==null ) { begin.setLabel(\

backgroundThread = new Thread(this); backgroundThread.start(); goodLinksCount=0; badLinksCount=0; } else {

spider.cancel(); }

} /**

* Perform the background thread operation. This method * actually starts the background thread. */

public void run() { try {

errors.setText(\

spider = new Spider(this); spider.clear();

base = new URL(url.getText()); spider.addURL(base); spider.begin();

Runnable doLater = new Runnable() {

public void run() {

begin.setText(\ } };

SwingUtilities.invokeLater(doLater); backgroundThread=null;

} catch ( MalformedURLException e ) { UpdateErrors err = new UpdateErrors(); err.msg = \address.\ SwingUtilities.invokeLater(err);

} }

/**

* Called by the spider when a URL is found. It is here * that links are validated. *

* @param base The page that the link was found on. * @param url The actual link address. */

public boolean spiderFoundURL(URL base,URL url)

{

UpdateCurrentStats cs = new UpdateCurrentStats(); cs.msg = url.toString(); SwingUtilities.invokeLater(cs);

if ( !checkLink(url) ) {

UpdateErrors err = new UpdateErrors(); err.msg = url+\page \+ base + \ SwingUtilities.invokeLater(err); badLinksCount++; return false; }

goodLinksCount++;

if ( !url.getHost().equalsIgnoreCase(base.getHost()) ) return false; else

return true; }

/**

* Called when a URL error is found *

* @param url The URL that resulted in an error. */

public void spiderURLError(URL url) { }

/**

* Called internally to check whether a link is good *

* @param url The link that is being checked.

* @return True if the link was good, false otherwise. */

protected boolean checkLink(URL url) { try {

URLConnection connection = url.openConnection(); connection.connect(); return true;

} catch ( IOException e ) { return false; } }

/**

* Called when the spider finds an e-mail address *

* @param email The email address the spider found. */

public void spiderFoundEMail(String email) { } /**

* Internal class used to update the error information * in a Thread-Safe way */

class UpdateErrors implements Runnable { public String msg; public void run() {

errors.append(msg); } } /**

* Used to update the current status information * in a \way */

class UpdateCurrentStats implements Runnable { public String msg; public void run() {

current.setText(\Processing: \+ msg );

goodLinksLabel.setText(\Links: \+ goodLinksCount); badLinksLabel.setText(\Links: \+ badLinksCount); } } }

2.ISpiderReportable .java import java.net.*;

interface ISpiderReportable {

public boolean spiderFoundURL(URL base,URL url); public void spiderURLError(URL url); public void spiderFoundEMail(String email); }

3.Spider .java

搜索更多关于: 网络爬虫Java实现原理 的文档
网络爬虫Java实现原理.doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
本文链接:https://www.diyifanwen.net/c0463u8hrgx9da6a52izb_3.html(转载请注明文章来源)
热门推荐
Copyright © 2012-2023 第一范文网 版权所有 免责声明 | 联系我们
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:xxxxxx 邮箱:xxxxxx@qq.com
渝ICP备2023013149号
Top