序列比對研究至今已發展多年,序列比對又分為兩序列比對與多重序列比對兩種,其中多重序列比對被研究者歸類於NP完全問題,因此研究者提出許多改良方法,其中著名的多重序列比對法如漸進演算法,本研究提出分割比對法,其排比原理是以連續性符號匹配為基礎做改良,以搜尋序列中出現最長共同字串作為序列對齊依據,並且以字串對齊做序列切割。
由於藉由字串的搜尋因此可以同時匹配多條序列,另外序列以字串做對齊會形成分割點做切割,由於分割了序列關係,當序列長度減少比對序列內符號的計算次數就會減少,因此達到比對上的加速。
最後序列相似度評分通常只針對符號的匹配(match)、不匹配(mismatch)、空格(gap)做給分探討,但就以連續性符號匹配角度來看說,對於序列比對中出現連續匹配應該給予更高的得分,因此本研究多增加連續性匹配的配分加權去突顯連續匹配與單獨匹配的差異。
Sequence alignment researches that are divided into pairwise sequence alignment and multiple sequence alignment categories have been developed over years. The multiple sequence alignment is classified as an NP-complete problem. Many researchers have proposed advanced methods such as Progressive algorithm in order to reduce the time complexity and decrease the running time of solving the problem.
This research studies a segmentation technique of multiple sequence alignment based on longest common continuous matched symbols in sequences. Each original sequence is then separated into two much shorter length subsequences and so on. The alignment processes can be accelerated due to short length sequences. In the perspective of scoring, the research applies cost-benefit approach with continuous matched symbols score which has higher score then summation score of each individual symbol match.
Comparing to Progressive algorithm, the actual running times of the segmentation technique are much less and scores gained are higher in multiple sequence alignment.