Information filtering
Remove redundant or undesired information from an information stream in semi or fully automatic methods before presenting them to human users.
Document summarization
Create a shortened version of a text in order to reduce the information overload.
Document clustering and categorization
Group documents together based on their proximity (as defined by a suitable spatial model) in an unsupervised fashion.
Clustering is an unsupervised technique that does not assume a priori knowledge: data are grouped into categories on the basis of some measure of inherent similarity between instances, in such a way that objects in one cluster are very similar (compactness property) and objects in different clusters are different (separateness property).
Classification is a supervised technique that assigns a class to each data item by performing an initial training phase over a set of human annotated data and then a subsequent phase which applies the classification to the remaining elements. Learnings are done either unsupervised or supervised. Techniques such as naive Bayes, regression, decision trees, and support vector machines are used in classification problems. Unsupervised learning models require priori knowledge about the classifications. In contrast Unsupervised learning models does not require pre-knowledge about the classifications and intrinsic similarities in data is used as measurement in clustering of data. K-means is an example of clustering analysis. Objects are tagged to the clustered data once the classes are determined. This is known as labeling. In other words, the labeling is performed by automatically, no human is involved.
Question answering (QA)
Select relevant document portions to answer user’s queries formulated in natural language.
Recommending systems
A form of information filtering, by which interesting information items (e.g., songs, movies, or books) are presented to users based on their profile or their neighbors’ taste, neighborhood being defined by such aspects as geographical proximity, social acquaintance, or common interests.
Group documents together based on their proximity (as defined by a suitable spatial model) in an unsupervised fashion.
Clustering is an unsupervised technique that does not assume a priori knowledge: data are grouped into categories on the basis of some measure of inherent similarity between instances, in such a way that objects in one cluster are very similar (compactness property) and objects in different clusters are different (separateness property).
Classification is a supervised technique that assigns a class to each data item by performing an initial training phase over a set of human annotated data and then a subsequent phase which applies the classification to the remaining elements. Learnings are done either unsupervised or supervised. Techniques such as naive Bayes, regression, decision trees, and support vector machines are used in classification problems. Unsupervised learning models require priori knowledge about the classifications. In contrast Unsupervised learning models does not require pre-knowledge about the classifications and intrinsic similarities in data is used as measurement in clustering of data. K-means is an example of clustering analysis. Objects are tagged to the clustered data once the classes are determined. This is known as labeling. In other words, the labeling is performed by automatically, no human is involved.
Question answering (QA)
Select relevant document portions to answer user’s queries formulated in natural language.
Recommending systems
A form of information filtering, by which interesting information items (e.g., songs, movies, or books) are presented to users based on their profile or their neighbors’ taste, neighborhood being defined by such aspects as geographical proximity, social acquaintance, or common interests.
No comments:
Post a Comment