以模型融合配合社群網路資料進行流感趨勢預測

 

Building Fusion Model for Flu Trend Prediction by using Social Web Data

 

張昭憲

淡江大學資訊管理系

新北市淡水區英專路151

jschang@mail.tku.edu.tw

 

周書任

淡江大學資訊管理系

新北市淡水區英專路151

jason093351@hotmail.com.tw

 

摘要

根據世界衛生組織(WHO)統計,流感每年在全球約造成300萬個嚴重病例及25萬人死亡,對民生、經濟之影響有目共睹。為監測流感情,各國疾管局通常藉由臨床就診通報來彙整資料,但可能產生1~2的延遲,顯然緩不濟急。考量網路社群已成為現代人生活一部分,若能從中蒐集資料並發展預測方法,應可更快了解流感現況,降低其負面影響。此外,流感流行變化快速、預測不易,但若能整合不同的預測方法,將可提升其準確性。有鑒於此,本研究將運用社群網路資料,以模型融合(Model Fusion)為基礎,建立有效的流感就診率預測方法。首先,我們由不同的網路資料來源蒐集資料,透過關鍵詞集統計建立資料集。接著,配合延遲概念,以線性歸建立多種不同特質的預測模型。最後,再透過模型融合整合各模型之預測結果,以提升總體準確性與穩定性。

為驗證提出方法之有效性,本研究蒐集英國地區約82(2015/82017/3)超過160萬筆的Twitter發文,及同時期的Google關鍵字搜尋熱度資料,經處理後進行實驗。與六種單一預測模型相較,本研究提出之方法具有最高的預測關聯度,顯示方法之有效性。為了解各種模型之穩定性,再將流感資料依照流行程度分為「劇升降」區與「緩升降」區進行統計。結果顯示本方法在二個區域分別具有的最高與次高之關聯度,其他單一模型則呈現不一致之預測效果,驗證本方法確能產生較穩定之預測。綜合上述結果,我們相信透過本研究所提出之方法,能提供更有效之流感早期預警,建立更多元的防疫防線。

 

關鍵詞:流感監測(Flu Monitoring)、模型融合(Model Fusion)、線性(Linear Regression)、社群網路(Social Web)

Abstract

According to statistics of WHO, flu averagely causes 3 million serious cases of illness and 250 thousand deaths per year. Obviously, it is a constant and big threaten for global people. However, if we can discover the epidemic trend of flu in advance, it is still possible to reduce the illness rate effectively. To monitor the epidemic situation of flu, the CDCs of countries usually collect weekly influenza-like illness (ILI) rate by gathering clinical reports from hospitals. However, it could cause about 1-2 weeks delay and therefore might miss the information about the peak period of flu epidemic. To remedy the above situation, it is necessary to develop new methods to discover the epidemic of flu in time. Because social webs have become part of our lives, it is a promise way to build prediction methods by mining the flu information from the webs. In addition, in view of the drastic change of flu epidemic trend, it is necessary to combine several prediction methods to provide a more accurate prediction. To this end, this paper tries to develop effective methods for flu trend prediction by model fusion and mining data from the social web. First, we collect the web data from different sources. Next, various prediction models are built by considering the delay of epidemic. Finally, those generated models are merged by model fusion to increase the accuracy and stability of prediction.

To demonstrate the effectiveness of the proposed method, we collected over 1.6 million posts from Twitter in England and the flu-related keywords search statistics from Google Trends for experiments. Compared with the six single prediction models, the proposed method has the highest predictive relevance that shows the effectiveness of the method. In order to understand the stability of various models, the data will be divided into "dramatic up-down" area and "slow up-down" area. The results show that the method has the highest correlation with the second highest in the two regions, and the other single models show inconsistent prediction effects, indicating that the method can produce more stable prediction results. Based on the above results, the proposed method of this study does contribute to the early warning of influenza surveillance and establish more anti-epidemic defense lines.

 

Keywords: Flu MonitoringModel FusionLinear RegressionSocial Web