EURO-Online login
- New to EURO? Create an account
- I forgot my username and/or my password.
- Help with cookies
(important for IE8 users)
3388. Principal Phrase Mining: An Automated Method for Extracting Meaningful Phrases from Text
Invited abstract in session TC-40: Data mining biomedical applications, stream Advances in Stochastic Modelling and Learning Methods.
Tuesday, 12:30-14:00Room: 96 (building: 306)
Authors (first author is the speaker)
1. | Ellie Small
|
Mathematics and CS, Drew University |
Abstract
Technological advances have resulted in an explosion of information in all aspects of society. Much of
this information is unstructured in the form of text. Being unstructured means that machine learning and
artificial learning techniques cannot be applied.
The text mining field allows the extraction of frequent words from collections of texts in order to obtain
summary information in a more structured format. Machine learning techniques and artificial
intelligence may then be applied to this structured data.
However, as useful as it is to obtain a collection of commonly occurring words from texts, more specific
information may be obtained from texts in the form of commonly occurring phrases.
Despite this need, extracting frequent phrases is not commonly done due to inherent complications, the
most significant being double-counting. Double-counting occurs when words or phrases are counted
when they appear inside longer phrases that themselves are also counted, resulting in a large selection of
mostly meaningless phrases that are frequent only because they occur inside frequent super phrases.
Several papers have been written on phrase mining that describe solutions to this issue; however, they
either require a list of so-called quality phrases to be available to the extracting process, or they require
human interaction to identify those quality phrases during the process. In addition, those methods are
often very time-consuming.
Keywords
- Analytics and Data Science
- Algorithms
- Artificial Intelligence
Status: accepted
Back to the list of papers