作者:101ms.com 文章来源:中国论文下载中心 点击数: 更新时间:2008-7-22 9:16:04  |
,BooleanQuery 布尔搜索等等。 分词基于 Lucene 的中文分词软件—JE 分词。本搜索引擎同时支持中文和英文搜索。
图 5 各个学院的 F 度量 Fig5 F-measure of different colleges 5.总结和展望 本文提出了利用 Google 按照“ 学校- 学院- 教师- 课程” 发现搜索路径,然后利用 HTMLParser 编写包装器进行基于 WEB 页面结构分析的课程元数据信息抽取,从而建立课 程信息垂直搜索引擎的实用方法。 采用包装器的方法,对于不同的信息源都需要编写不同的包装器。作者将研究采用适用 性更好的 HMM、CRF 等统计模型的元数据算法。并且将利用 WEB2.0 的各项技术继续探索 实时垂直搜索的概念和实现,运用 Ajax 技术完善课程垂直搜索引擎。
参考文献 M. Chau and H. Chen. Personalized and Focused Web Spiders. In: Zhong, N., Liu, J., Yao Y. (eds): Web Intelligence. Springer-Verlag (2003) 197-217. M. Najork and J.L. Wiener, “Breadth-First Search Crawling Yields High-Quality Pages,” Proc. 10th WWW Conf., 2001; www10.org/cdrom/papers/208/. J. Cho, H. Garcia-Molina, and L. Page, “Efficient Crawling through URL Ordering,” Proc. 7th WWW Conf., 1998; www7.scu.edu.au/programme/fullpapers/1919/com1919.htm. H. Chen and T. Ng, “An Algorithmic Approach to Concept Exploration in a Large Knowledge Network (Automatic Thesaurus Consultation): Symbolic Brand-and-Bound Search vs. Connectionist Hopfield Net Activation,” J. Am. Soc. Information Science, vol. 46, no. 5, 1995, pp. 348-369. Message Understanding Conference (MUC): Named Entity Task Definition. Version 2.0. 1995. Erik F. Tjong Kim Sang and Fien De Meulder, Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proceedings of CoNLL-2003, Edmonton, Canada, 2003, pp. 142-147. Linguistic data consortium (LDC): ACE (Automatic Content Extraction) Chinese annotation guidelines for entities. Version 5.5. 2005. G. Levow. The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition. In Proc. of SIGHAN-2006, 108-117. Sydney, Australia. 2006 Vertical Search Engine for Course Information Based On Google Searching Path 上一页 [1] [2] [3]
|