Service Quality Monitoring in Confined Spaces Through Mining Twitter Data (Aspect Extraction)
This is a pythonic implementation of Aspect Extraction in the context of service quality of public transport. This repository uses a pre-trained BERT language model to transform multi-label tweets into a vector of words. First, we fine-tune the model using a dataset of tweets. Then, using a binary classifier, tweets can be classified into semantically-related groups, i.e., service quality aspects in our application.
In this project, two major transport hubs are considered as the case studies due to their current importance on transferring a large number of people.
First, a Twitter dataset comprising of more than 32 million tweets is collected. This data is obtained from the Australian Urban Research Infrastructure Network (AURIN). Keywords and spatial proximity to hubs are employed to detect relevant tweets.
Next, tweets are manually labelled and mapped to different aspects of SQ (Safety, View, Information, Service Reliability, Comfort, Personnel, and Additional Services). Those tweets that do not fall into any of these aspects are considered as irrelevant to the SQ of public transport and therefore, are discarded (Class -1).
This dataset contains tweet-ids as well as their corresponding service quality label according to the framework presented in our paper. For text processing, you need to collect the textual content of tweets using Twitter API. For further details, please proceed to our paper:
Rahimi, M.M., Naghizade, E., Stevenson M., Winter, S., Service Quality Monitoring in Confined Spaces Through Mining Twitter Data In Journal of Spatial Information Science.