Researchers in every field are being made increasingly aware of the need for their research to have impact. However, often researchers don’t realize where beyond their own field of study their research might have impact. How does one go about finding the documents on the Internet which could be connected conceptually to another document, such as one’s own work or project proposal? Searching for key terms is a good start but often the same kinds of documents come up top again and again and it is difficult to sift through the results to find something different and relevant. Further, documents from other domains or sources (e.g., government policy documents) may use a different vocabulary making them less likely to come up in keyword searches.
In the Text Analytics Group (TagLab) at Sussex, we are developing a system which does four things. First, it automatically identifies key words and phrases for a document or set of documents. Second, it searches the web using queries based on combinations of the key words/phrases and related words. Third, it allows the user to build custom classifiers (using active learning), e.g., for relevance. Finally, it clusters the results with a view to making it easier to identify documents or clusters of documents outside of existing known clusters. The purpose of this session is to teach delegates about the underlying technology and to give delegates the opportunity to play with and evaluate the prototype system, using their own work as input.
Delegates will ideally have a laptop with Google Chrome installed to be able to access the software which is run as a web service. It would also be helpful if delegates brought with them a digital copy of some of their own work (e.g., an academic paper or grant proposal) in raw text format (i.e., ‘.txt’) which can be uploaded and processed by the system. However, we can help with both installation of software and file formatting as required.