資料不平衡的問題向來是機器學習領域中一個相當重要的課題,因為在所有的機器學習模型中都以誤差最小化的方式作為主要的判斷依據,但是如果此種模型遇到資料不平衡的情境下,此模型的預測效果就會產生相當大的差異,為了可以降低不平衡資料對於模型的影響,本研究將發展一個混合的決策模型,此模型含括四大步驟:1.營運表現之評估、2.資料再平衡、3.特徵選取、4.預測模型之建構。在企業營運表現之評估上,本研究將以平衡計分卡為主要的發展工具,因為平衡計分卡它有別於傳統的評估指標僅將焦點集中在財務的構面上,評估工具若僅將焦點集中在財務的表現上,則會有決策偏誤之疑慮,為了可以更加完備的考慮企業的營運表現,平衡計分卡是一個比較完備的評估機制,因為它不僅考慮了財務的構面,亦將非財務的構面予以納入分析,而平衡計分卡的缺點是無法提供決策者一個明確的目標去執行與遵守,為了可以降低此問題,本研究進一步採用資料包絡分析法來將各個平衡計分卡的構面予以彙整;資料的不平衡對於預測模型的效力會大打折扣,為了降低資料不平衡的影響,本研究將採用SMOTE方法將資料做再平衡,透過降低資料間不平衡比率,就可以提供預測模型一個比較可靠的建置環境;而絕大部分的資料來源都是財務文件,而財務文件很容易以一些偏誤與誤植的情況發生,以至於資訊缺乏可信度,為了可以降低此問題,本研究將採用特徵選取機制,透過有效的特徵選取機制,將萃取出較具關鍵的重要指標;透過關鍵指標的萃取,進而匯入到支援向量機建構企業營運表現之預測模型。本模型透過實際的資料進行驗證,都擁有相當優越的表現,可以提供管理者後續的發展與建議。
Imbalanced data is one of the essential issues in machine learning field. It is be-cause conventional machine learning algorithms aim to minimize the forecasting er-ror. If the forecasting model is constructed in an imbalanced data environment, the forecasting model will ignore the minority data and favor majority data, and then lead to biased outcome. To combat this, this study introduced a fusion decision framework that involves four steps: 1. Performance evaluation, 2. Data rebalancing, 3. Feature selection, and 4. Forecasting model construction, for performance fore-casting. In the first step, the performance evaluation is based on balanced scorecards (BSC) that can convert the firm’s strategical visions and goals into practical ratios which are distributed among four perspectives. One of the critical challenges of BSC is unable to provide specifics direction for users. To handle this task, the data envelopment analysis (DEA) is considered. In the second stage, the imbalanced data is rebalanced by performing a synthetic minority oversampling technique (SMOTE). By doing so, we can alleviate the problem of the mode favor majority data as well as prevent lead to biased outcome. Sequentially, the rebalanced data are going through the feature selection procedure. It is because most of the data are gathered from financial reports which may be contaminated by some degree of errors. Thus, feature selection is an inevitable procedure in data mining field. Finally, all the analyzed data are then inserted into support vector machine (SVM) to construct the forecasting model. The model, examined by real cases, is a promising alternative for performance forecasting. The managers can take this model as a roadmap to allocate the resource to a suitable place. The market participants can consider the potential implications to modify their investment portfolios to reach the goal of profit maximization.