文化大學機構典藏 CCUR:Item 987654321/20454
English  |  正體中文  |  简体中文  |  Items with full text/Total items : 46962/50828 (92%)
Visitors : 12424610      Online Users : 1089
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: https://irlib.pccu.edu.tw/handle/987654321/20454


    Title: 網格架構下大量動態網頁擷取之排程-以整合式競標商品搜尋平台之發展為例
    Authors: 戴亞筑
    Contributors: 資訊管理學系
    Keywords: 動態網頁
    網頁擷取
    排程機制
    網格
    Date: 2010
    Issue Date: 2011-11-28 13:10:59 (UTC+8)
    Abstract: 動態網頁已隨網際網路的發達而更為普及,此類網頁擁有資料隨機變動且具時效的特性,擷取數量龐大的動態網頁,需要花費很多時間,更由於網頁有時效性,網頁若無法即時擷取,則資料會造成正確性的問題且失去其使用價值。
    為了保持動態網頁和資料庫資料同步的問題,本研究提出兩種在網格架構下擷取大量動態網頁的排程機制,一是時間區間排程,二是權重排程。運用網格架構下多台work nodes的閒置資源,透過排程將網頁作優先順序的排序,可以對即將失去時效的網頁快速多次擷取資料外,並可以大量縮短擷取動態網頁的時間,同時我們也比較兩種排程機制的優缺點。
    以發展一個整合式競標商品搜尋平台而言,在網格架構下,配置3台完全閒置work nodes資源,擷取2010年三月美國前十大拍賣網站平均共有118,591,011個拍賣商品網頁資料,在時間區間排程機制下,以10個區間設定,需花5小時擷取完所有的商品資料,而在權重排程機制下,配合適當的權重分配和round table 的設定,則需花5小時擷取完所有的商品資料,比使用單一電腦擷取資料分別節省超過16倍和80倍的時間。

    The popularity of the internet has grown enormously. In recent years dynamic webpages become more popular. The characteristics of dynamic webpages are changing at random and time effivenss. Retrieving a large of pages may span many time. Because pages have time effivenss, webpages have invalidity before retrieving them.
    To synchronize dynamic web pages and database, our research provides two jobs scheduling with grid framework. One is the time division scheduling, another is the weighted scheduling. Deciding what data could be retrieved first. Our jobs scheduling run on the grid environment. We use many computers to retrieve webpages. This model can reduce retrieving time, and increase pages to retrieve. We are also compared with lacks superiorly.
    An integrated auction merchandise platform for example. Retrieving American top ten auction websites in March 2010, there is an average of 118,591,011 merchandises. Useing time division scheduling. Three work nodes spaning five hours. They can reduce retrieving time about sixteen times. Useing weighted scheduling and round table. It spans five hours, and reduceing about eigthty times.
    Appears in Collections:[Department of Information Management & Graduate Institute of Information Management] Thesis

    Files in This Item:

    File Description SizeFormat
    index.html0KbHTML237View/Open


    All items in CCUR are protected by copyright, with all rights reserved.


    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback