Rivindu Perera


The Role of Linked Data in Content Selection

Abstract
This paper explores the appropriateness of utilizing Linked Data as a knowledge source for content selection. Content Selection is a crucial subtask in Natural Language Generation which has the function of determining the relevancy of contents from a knowledge source based on a communicative goal. The recent online era has enabled us to accumulate extensive amounts of generic online knowledge some of which has been made available as structured knowledge sources for computational natural language processing purposes. This paper proposes a model for content selection by utilizing a generic structured knowledge source, DBpedia, which is a replica of the unstructured counterpart, Wikipedia. The proposed model uses log likelihood to rank the contents from DBpedia Linked Data for relevance to a communicative goal. We performed experiments using DBpedia as the Linked Data resource using two keyword datasets as communicative goals. To optimize parameters we used keywords extracted from QALD-2 training dataset and QALD-2 testing dataset is used for the testing. The results was evaluated against the verbatim based selection strategy. The results showed that our model can perform 18.03% better than verbatim selection.

Tags: Natural Language Processing (NLP), Content Selection, Natural Language Generation (NLG), DBpedia, Semantic Web, Linked Open Data