Site Search
Computer Science


Windsor WaterfrontWindsor Waterfront Park
Windsor Waterfront Park
Alioune Ngom, Ph.D.Dr. Alioune Ngom
Dr. Alioune Ngom
Jessica Chen, Ph.D.Dr. Jessica Chen
Dr. Jessica Chen
Lambton TowerLambton Tower
Lambton Tower
Xiaobu Yuan, Ph.D.Dr. Xiaobu Yuan
Dr. Xiaobu Yuan
Dr. Robert KentDr. Robert Kent
Dr. Robert Kent
Dr. Luis RuedaDr. Luis Rueda
Dr. Luis Rueda
Arunita Jaekel, Ph.D.Dr. Arunita Jaekel
Dr. Arunita Jaekel
Imran Ahmad, Ph.D.Dr. Imran Ahmad
Dr. Imran Ahmad
Dr. Ziad Kobti lecturingDr. Ziad Kobti
Dr. Ziad Kobti
Robin Gras, Ph.D.Dr. Robin Gras
Dr. Robin Gras
Christie Ezeife, Ph.D.Dr. Christie Ezeife
Dr. Christie Ezeife
Dr. Scott GoodwinDr. Scott Goodwin
Dr. Scott Goodwin

Mining High Sequential Utility Patterns from Uncertain Web access Sequences using the PL-WAP

Add this event into your calendar using the iCAL format
  • Thu, 05/04/2017 - 11:00am - 1:00pm

Mining High Sequential Utility Patterns from Uncertain Web access Sequences using the PL-WAP         

MSc Thesis Defense by:

Sravya Vangala

Date:  Thursday, May 4, 2017
Time:  11: 00 am – 12:30 pm
Location: 3105, Lambton Tower

Abstract: In general, the web access patterns are retrieved in the web access sequence databases using various sequential pattern algorithms such as GSP, WAP and PLWAP tree. However, these algorithms do not consider sequential data with quantity (e.g., the amount of the time spent by the user on a particular web page or number of the item purchased) and quality (e.g., rating of a web page in a website or price of the purchased item) information. These algorithms also do not work on uncertain sequential items (e.g., purchased products) having a zero to 1% probability of being present.  The item quantity is called internal utility while the item quality is called external utility. Factoring in the utility and uncertainty of each sequence item provides more product information that can be beneficial in mining profitable patterns from company’s websites. For example, a customer can purchase a bottle of ink more frequently than a printer but the purchase of a single printer can yield more profit to the business owner than purchase of multiple bottles of ink.

Most existing algorithms factoring in item utility are designed for only precise databases and do not work on uncertain data. In addition, the traditional uncertain sequential pattern algorithms such as U-Apriori, UF-Growth, and U-PLWAP do not include the utility measures. In U-PLWAP, the web sequences are derived from weblog data without including the time spent by the user and the webpages are not associated with any rating. By considering these two utilities, sometimes the items with lower existential probability can be more profitable to the website owner. In utility based traditional algorithms, the only algorithm related to both uncertain and high utility is PHUI-UP algorithm which considers the probability and utility as different entities and the retrieved patterns are not dependent with both due to two different thresholds, and it does not deal with the web sequence databases. It also suffers from level-wise candidate generation-and-test methodology, which needs several database scans and does not mine uncertain web access database sequences.

This thesis proposes the algorithm HUU-PLWAP miner for mining uncertain sequential patterns with internal and external utility information using PLWAP tree approach that cut down on several database scans of level-wise approaches. HUU-PLWAP uses the uncertain internal utility values (derived from sequence uncertainty model) and the constant external utility values (predefined) to retrieve the high utility sequential patterns from uncertain web access sequence databases with the help of U-PLWAP methodology. Experiments show that HUU-PLWAP is at least 95% faster than U-PLWAP, and 75% faster than PHUI-UP algorithm.

Thesis Committee:
Internal Reader: Dr. Robin Gras
External Reader: Dr. Severien Nkurunziza
Advisor: Dr. Christie Ezeife
Chair: Dr. Luis Rueda

See More: