¥H»y·N¬°°ò¦¤§ºô¸ô¥Ç¸o¸ê°T·j´M¬ã¨s
On Semantic-Based
Intelligent Crime Information Retrieval
on the Internet
ÃC§Ó¥ Chih-Ping Yen
¤¤¥¡Äµ¹î¤j¾Ç
¹q¤lpºâ¾÷¤¤¤ß
®}ºµ°· Shyong-Jian Shyu
»Ê¶Ç¤j¾Ç
¸ê°TºÞ²z¬ã¨s©Ò°Æ±Ð±Â
ºK n
ÀËĵ¾÷Ãö³q±`ÂǧU¤J¤fºô¯¸ªº·j´M¤ÞÀº¡A¶i¦æºô»Úºô¸ô¥Ç¸o±¡³øªº·j¶°¡AµM¦Ó³oºØ·j´M¤ÞÀº¥Ñ©óºë½T²v¤ÎÀË¥X²v¤£°ª¡A©Ò¥H©¹©¹¦^À³³\¦h¤£¬ÛÃöªººô¶¡AP¨Ï°»¿ì¤Hû»Ý¦A¯Ó¶O®É¶¡³v¤@¹LÂo¡A¬Û·í¤£²Å®Ä¯q¡A¦]¦¹¥»¤å±N¹B¥Î´¼¼z«¬ªººtºâ¤è¦¡¡A¨Ó´£°ªºë½T²v¤ÎÀË¥X²v¡A¥H§ïµ½³oÓ°ÝÃD¡C
º¥ý¡A§Q¥Î»y·N³õ²z½×±Nµü»Pµüªº¦P¸qÃö«Y¡A²Õ´¦¨»yµü®w¡A«Ø¥ß°_Ãþ¦üWordNet ªº¶¥¼h¦¡¬[ºc¡A¦P®É¨Ï¥Î³o»yµü®w¡A¶i¦æºô¶¤º®eªº¬Û¦ü«×¤ñ¹ï¡C¥»¤å¦@±À¾É¤ºØ¬Û¦ü«×ºtºâ¤è¦¡¡G¥]¬A¡uµüÀWÅv«¬Û¦ü«×¡v¡B¡u¤ÀÃþ«ü¼Æ¬Û¦ü«×¡v¡B¡u¤ÀÃþ«ü¼ÆÅv«¬Û¦ü«×¡v¡B¡u»~®t®Õ¥¿¬Û¦ü«×¡v¡B¡uµüÀWÅv««p¡v¡A¨Ã¤À§O¤ñ¸û¨ä¶¡¤§Àu¦H¡A¾Ü¥X³Ì¨Îªº¤èªk¤Î±À½×¥XªùÂeÈ¡C
¦¹¥~¡A¥»¬ã¨s¦¨ªG¨Ã»P³¯§Ó¸Û©ó1999¦~¤§¡uºô¸ô¤W°ªºë½T²v¤§¥Ç¸o¸ê°T»`´M¨t²Î¡v¡]ºÙ¬°e-Detective system¡^ªº¬ã¨spµe¦¨ªG§@¤ñ¸û¡A¦b¥H·j´Mºô»Úºô¸ô¤W¡u³c°â«Dªk³nÅé¡v¬°¨Ò¶i¦æµû¦ô¡A¹êÅçÃÒ©ú¥»¤å©Ò«ØÄ³¤§¨t²Î¡A¨äF´ú¶qȳ̨ιF0.5581¡A¦Ó«ez¨t²Î³Ì¨Î¶È¬°0.2376¡AÅãµM¥»¬ã¨s¨t²Î®Ä¯à¸û¨Î¡C
ÃöÁäµü¡G·j´M¤ÞÀº¡B¸ê°TÀ˯Á¡B¤å¥ó¤ÀÃþ¡B¬Û¦ü«×¡Bºë½T²v¡BÀË¥X²v¡B»y·N³õ¡Bºô¸ô¥Ç¸o¡B¹q¤l°»±´
We
usually search the Web with the help of search engines. Due to the imprecision
of the search result, we often face the problem of too many pages recommended.
The reason why search engines response many irrelevant pages is that it just
exactly matches the search word(s) user entered. In order to cope with the
problem, we suggest the determination of similarities that should be associated
with a knowledge base to a given topic. That will reduce the number of
irrelevant pages significantly.
In
this research we first apply to the theory of semantic fields in which a term
(concept) forms a term database through its relationships to other concepts.
Based on the term databases, we suggest several models to evaluate the
similarity between search concepts and the contents of Web pages. They are the
model of weighted terms (the modified vector space model), the model of
classified weighted terms, and the exponential model of classified weighted
terms. The latest one is designed based on to the Facet Analysis Method. We
also evaluate the similarity with error correction and term reweighting. The
approaches described in this paper are used to construct a search engine for discriminating
Web pages advertising pirated compact discs (CDs) that are very difficult to be
distinguished from the pages advertising legitimate CDs. We further determine
an adequate threshold of term weights for our search purpose as a trade-off of
recall and precision. Our search result compared with that of previous work shows
the advantage of this approach.
Keywords: Search Engine, Information
Retrieval, Text Classification, Similarity, Precision, Recall, Semantic Field,
Cybercrime, e-Detective