Rivindu Perera


A Multi-Strategy Approach for Location Mining in Tweets: AUT NLP Group Entry for ALTA-2014 Shared Task

Abstract
This paper describes the strategy and the results of a location mining system used for the ALTA-2014 shared task competition. The task required the participants to identify the location mentions in 1003 Twitter test messages given a separate annotated training set of 2000 messages. We present an architecture that uses a basic named entity recognizer in conjunction with various rule-based modules and knowledge infusion to achieve an average F score of 0.747 which won the second place in the competition. We used the pre-trained Stanford NER which gives us an F score of 0.532 and used an ensemble of other techniques to reach the 0.747 value. The other major source of location resolver was the DBpedia location list which was used to identify a large percentage of locations with an individual F-score of 0.935.