Difference between revisions of "CS6093/Projects"

From VistrailsWiki
Jump to navigation Jump to search
(Created page with '== Matching Entities and News == In this project, you will build a real time service that matches entities with news. Given a set of entities mentioned in some input text (e.g.,…')
 
Line 9: Line 9:


* 3 Create a method that ranks the most relevant news. In addition to the actual entities, you should also consider the available metadata, including the input text  and other features that were automatically obtained within the news, e.g., content, news title, publisher.
* 3 Create a method that ranks the most relevant news. In addition to the actual entities, you should also consider the available metadata, including the input text  and other features that were automatically obtained within the news, e.g., content, news title, publisher.
Here's a sample of the input for this task, which consists of a set of tweets: [[Sample Data]]. 
(Additional data will be provided.)

Revision as of 05:05, 14 February 2012

Matching Entities and News

In this project, you will build a real time service that matches entities with news. Given a set of entities mentioned in some input text (e.g., tweets), this service will identify and ranks a set of relevant news documents. To accomplish this, you will have to accomplish the following three tasks:

  • 1 Given an input text I (tweets, news, article) with a timestamp, you need to identify the set of entities E (PERSON, LOCATION, ORGANIZATION, MISC) present in I. To find the entities, you can use the tool described in "L. Ratinov and D. Roth Design Challenges and Misconceptions in Named Entity Recognition - CoNLL 2009", called LbjNerTagger, which is freely-available: http://cogcomp.cs.illinois.edu/page/download_view/NETagger.
  • 2 Given the entities found in the input text and the timestamp, find the related news stories. You can do this by submitting a query to news apis, such us: google news, bing news, the NYTimes and digg.com, in

order to obtain the titles, content, links, publisher and publication date of news that have mentioned the given entities.

  • 3 Create a method that ranks the most relevant news. In addition to the actual entities, you should also consider the available metadata, including the input text and other features that were automatically obtained within the news, e.g., content, news title, publisher.

Here's a sample of the input for this task, which consists of a set of tweets: Sample Data. (Additional data will be provided.)