1,721,207 research outputs found

    Extracting Dependency Relations for Opinion Mining

    No full text
    Intent mining is a special kind of document analysis whose goal is to assess the attitude of the document author with respect to a given subject. Opinion mining is a kind of intent mining where the attitude is a positive or negative opinion. Techniques based on extracting dependency relations have proven more effective for intent mining than traditional bag-of-word approaches. We propose an approach to opinion mining which uses frequent dependency sub-trees as features for classifying documents and extracting opinions. We developed an efficient multi-language dependency parser to analyze documents and extracting dependency relations which can be used on large scale collections. An opinion retrieval system has been built and is being tested on the TREC 2006 Blog Opinion task

    DeSR Dependency Parser

    No full text
    DeSR is a Dependency Shift/Reduce parser for multiple languages. It generates dependency parse trees for natural language sentences. The parser has been trained on 24 languages, incuding those of the 2006 and 2007 CoNLL Shared tasks. Dependency structures are built scanning the input and deciding at each step whether to perform a shift or to create a dependency between two adjacent tokens. The parser algorithm is deterministic and highly efficient while still achieving state of the art accuracy

    Embeddable Common Lisp

    No full text
    ECL (Embeddable Common-Lisp) is an interpreter/compiler for the Common-Lisp language as described in the X3J13 Ansi specification, featuring CLOS (Common-Lisp Object System), conditions, loops, etc, plus a translator to C, which can produce standalone executables. ECL supports the operating systems Linux, FreeBSD, NetBSD, OpenBSD, Solaris and Windows, running on top of the Intel, Sparc, Alpha, PowerPC and ARM processors

    WikiExtractor

    No full text
    WikiExtractor.py is a Python script that extracts and cleans text from a Wikipedia database dump. The tool is written in Python and requires Python 2.7 but no additional library. The current version performs template expansion by preprocesssng the whole dump and extracting template definitions. The code provides these performance features: •multiprocessing is used for dealing with articles in parallel •a cache is kept of parsed templates

    DeepNL

    No full text
    DeepNL is a Python library for Natural Language Processing tasks based on a Deep Learning neural network architecture. The library currently provides tools for performing part-of-speech tagging, Named Entity tagging and Semantic Role Labeling. DeepNL also provides code for creating word embeddings from text, using either the Language Model approach by [Collobert11], or Hellinger PCA, as in [Lebret14]. It can also create sentiment specific word embeddings from a corpus of annotated Tweets

    IXE at the TREC Terabyte Task

    No full text
    The TREC Terabyte task provides an opportunity to analyze scalability issues in document retrieval systems. I describe how to overcome some of these issues and in particular improvements to the IXE search engine in order to achieve higher precision while maintaining good retrieval performance. A new algorithm has been introduced to handle OR queries efficiently. A proximity factor is also computed and added to the relevance score obtained by the PL2 document weighting model: several experiments have been performed to tune its parameters. By tuning also other parameters used in relevance ranking, IXE achieved second best overall P@10 score, combined with the fastest reported retrieval speed

    Experiments with a Multilanguage Non-Projective Dependency Parser

    No full text
    Parsing natural language is an essential step in several applications that involve document analysis, e.g. knowledge extraction, question answering, summarization, filtering. Using Maximum Entropy (Berger, et al. 1996) classifiers I built a parser that achieves a throughput of over 200 sentences per second, with a small loss in accuracy of about 2-3 %. I extended the Yamada-Matsumoto parser to handle labeled dependencies: I tried two approaches: using a single classifier to predict pairs of actions and labels and using two separate classifiers, one for actions and one for labels. Finally, I extended the repertoire of actions used by the parser, in order to handle non-projective relations. Tests on the PDT (Böhmovà et al., 2003) show that the added actions are sufficient to handle all cases of non-projectivity

    DeSR at the Evalita Dependency Parsing Task

    No full text
    DeSR is a multilingual deterministic shift/reduce depen- dency parser, capable of handling non-projective depen- dencies incrementally. It learns from annotated corpora the actions to use for building the parse trees. For the Evalita task DesR used a second-order multiclass avera- ged perceptron classifier as a learning algorithm

    Blog Mining Through Opinionated Words

    No full text
    Intent mining is a special kind of document analysis whose goal is to assess the attitude of the document author with respect to a given subject. Opinion mining is a kind of intent mining where the attitude is a positive or negative opinion. Most systems tackle the problem with a two step approach, an information retrieval followed by a postprocess or filter phase to identify opinionated blogs. We explored a single stage approach to opinion mining, retrieving opinionated documents ranked with a special ranking function which exploits an index enriched with opinion tags. A set of subjective words are used as tags for identifying opinionated sentences. Subjective words are marked as “opinionated” and are used in the retrieval phase to boost the rank of documents containing them. In indexing the collection, we recovered the relevant content from the blog permalink pages, exploiting HTML metadata about the generator and heuristics to remove irrelevant parts from the body. The index also contains information about the occurrence of opinionated words, extracted from an analysis of WordNet glosses. The experiments compared the precision of normal queries with respect to queries which included as constraint the proximity to an opinionated word. The results show a significant improvement in precision for both topic relevance and opinion relevance

    Chunking and Dependency Parsing

    No full text
    Since chunking can be performed efficiently and accurately, it is attractive to use it as a preprocessing step in full parsing stages. We analyze whether providing chunk data to a statistical dependency parser can benefit its accuracy. We present a set of experiments meant to select first a set of features that provide the greates improvement to a Shift/Reduce dependency parser, then to determine an appropriate feature model. We report on accuracy gain obtained using features from chunks produced using a statistical chunker as well as from an approximate representation of noun phrases induced directly by the parser. Finally we analyze the degree of accuracy that such a parser can achieve in chunking compared to a specialized statistical chunker
    corecore