新世代網路Web 2.0在近年來迅速發展,其中Blog的崛起格外的令人矚目。根據創市際市場研究顧問公司,在2008年11月針對台灣Blog使用行為進行調查,發現有32%的受訪者擁有專屬Blog。詢問有專屬部落格的受訪者,主要使用的Blog服務提供平台(BSP, Blog Service Provider)以無名小站(42.9%)最受到網友的青睞。
BSP上的Blog分類係由Blog作家(Blogger)自行分類,但部份Blogger的寫作方向非常多元,也多不會嚴謹的分類,致使Blog及其文章的分類大多徒具形式、而欠缺精確。
以往網路相關的分類研究,大多使用網路連結、回朔引用等的連結聚合度,抑或是使用關鍵詞的詞頻密度作為分類的方式,但其結果大多僅能釐清網站與網站間的關連強度,卻無法作較為切確的分類。
本研究採用較實務的折衷方式,先由BSP的人工分類中,取樣一定數量的Blog做為訓練資料,再依其分類擷取關鍵詞做詞頻分析,經過檢定及測試後,以驗證其分類方法確實能做有效的分類。
Web 2.0, especially Blog, grows rapidly in recent years. According to a survey conducted by InsightXplorer in November 2008 regarding to the usage of Blogs in Taiwan, found that 32% of respondents have their own Blogs. Most of them used Blog Service Delivery Platforms (Blog Service Providers, BSPs) to build and maintain their blogs. Among them "Wretch" (42.9 %) is the most favored one.
The classification of Blogs on BSPs usually is done through the blog authors’ (bloggers’) self-classifications. However, the writings of bloggers are diversified and the self-classifications are not rigorous, which resulting in the classification of blogs usually are lacking of precision.
Mostly of studies of network-related classification in the past used the hyperlinks, degrees of aggregation, such as backtracking, or the frequencies of words used as the methods of classification. However, most results can only be used to show liking strengths among sites, and can’t be used as effective classification methods.
In this study, a more realistic compromise is used. We first pick a certain amount of sample blogs from manually classification of BSP as training data, and then pick keywords according to categories they belonged to classified them and do word frequency analysis. And finally test and verify the results. The result shows the proposed classification method is indeed effective.