第一范文网 - 专业文章范例文档资料分享平台

Southampton and The Open University. Preface(15)

来源:用户分享 时间:2021-06-02 本文由揽清幽 分享 下载这篇文档 手机版
说明:文章内容仅供预览,部分内容可能不全,需要完整文档或者需要复制内容,请下载word后使用。下载word有问题请添加微信号:xxxxxx或QQ:xxxxxx 处理(尽可能给您提供完整文档),感谢您的支持与谅解。

Advanced Knowledge Technologies is recognised as a leading research programme conducted at some of the foremost informatics departments in Britain. It is also a training ground for a new generation of researchers. To highlight the work of these students, a

di eringreliability;informationfoundcanevenbespam,typos,deliberatemis-

informationorsimplyerroneous.Wheninformationiscontainedintextualdoc-

uments,extractingitrequiresmoresophisticatedIEmethodologiesbasedon

linguisticanalysisandmethodstoensureadegreeofreliabilityofextracted

information.

Informationfromwhateversourceisoftenredundant,i.e.thatitcanbe

foundindi erentcontextsandindi erentsuper cialformats-theredundancy

ofinformationcaninitselfbeaweakproofofitsvalidity[5].However,wesuggest,

thatthisaloneisnotenough,additionalinformationeitherwithinorsurrounding

extractedentitiescanbeusedtodevelopadditionalevidence.Armadillonowuses

anevidencebuildingapproachofnumerousrudimentarytechniques,(basedon

SimMetrics,Section3).

4.2HowArmadilloworks?

Armadillolearnshowtobestextractinformationinthefollowingway:

1.itminesacoherentportionoftherepository(e.g.awebsiteoraclassof

sites);

2.itintegratesinformationandassignsreliabilitiesofdi erentsources(e.g.

digitallibraries,services,webpages).Theseratingsareusedtodirectthe

learningfromtherepository;

3.itdiscoversnewinformationintherepositorythatinturnisratedandused

tobootstrapnewlearninguntilastableinformationbaseisreached;

4.itstorestheharvestedinformationintoaRDFKnowledgebase.Thedatabase

canthenbeusedtoaccesstheextractedinformation(asdetailedlater)or

toproduceindicesfordocumentretrievalorannotation.

Armadilloisadatadrivensystemtypicallybeginingfromrigidlystructured

reliablesourcesusingexamplesprovidedbyeitherawrapper,theuserorprevious

data.Armadillousespreviouslyobtainedseedstolearnonmorecomplexsources

(e.g.freetexts)usingthepreviouslyacquiredinformation.

InordertoexplaininmoredepththeworkingofArmadilloanexampleis

nowdetailed.

4.3TheComputerScienceDepartmentApplication

ConsiderthefollowingexampletaskofminingwebsitesofComputerScience

Departmentsto ndacademics(name,position,homepage,emailaddress,tele-

phonenumberandalistofpublicationsmorecompletethantheoneprovided

byrepositoriessuchasCiteseer).

Simplydiscoveringwhoworksforadepartmentismorecomplexthangeneric

NamedEntityRecognition(NER)asmanyirrelevantpeople’snamesaremen-

tionedinasite,sofundergraduatestudents,secretaries,aswellas

namesofresearchersfromexternalsitesandhenceirrelevantforthistask.

Armadillousesastatisticalevidencebasedloopingapproachtoaidtheval-

idationtask.Initiallyaquicklistofpotentialnamesofpeopleworkinginthe

搜索“diyifanwen.net”或“第一范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,第一范文网,提供最新人文社科Southampton and The Open University. Preface(15)全文阅读和word下载服务。

Southampton and The Open University. Preface(15).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
本文链接:https://www.diyifanwen.net/wenku/1192501.html(转载请注明文章来源)
热门推荐
Copyright © 2018-2022 第一范文网 版权所有 免责声明 | 联系我们
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:xxxxxx 邮箱:xxxxxx@qq.com
渝ICP备2023013149号
Top