1,721,215 research outputs found
Sudden Cardiac Death in Athletes - What Can be Done?
Sudden death in athletes is a rare event but brings with it an impact that goes beyond sport. There are many causes of sudden death during exercise. While the responsibility of preventing or treating them lays with us physicians, preparticipation screening is largely ineffective and impractical. Definitive, large scale prospective research is required in order to design the most cost-effective system for screening of athletes. In the meanwhile rapid access to defibrillators by trained personnel remains the best possible approach to abort sudden death
Evaluation of non-cancer risk owing to groundwater fluoride and iron in a semi-arid region near the Indo-Bangladesh international frontier
Recommended from our members
Using social network information in recommender systems
textRecommender Systems are used to select online information relevant
to a given user. Traditional (memory based) recommenders explore the user-item rating matrix and make recommendations based on users who have rated similarly or items that have been rated similarly. With the growing popularity of social networks, recommender systems can benefit from combining history of user preferences with information from the social/trust network of users. This thesis explores two techniques of combining user-item rating history with trust network information to make better user-item rating predictions. The first
approach (SCOAL [5]) simultaneously co-clusters and learns separate models
for each co-cluster. The co-clustering is based on the user features as well as
the rating history. This captures the intuition that certain groups of users have similar preferences for certain groups of items. The grouping of certain users is affected by the similarity in the rating behavior and the trust network.
The second graph-based label propagation approach (MAD [27]) works in a transductive setting and propagates ratings of user-item pairs directly on the
user social graph. We evaluate both approaches on two large public data-sets from Epinions.com and Flixster.com.
The thesis is amongst the first to explore the role of distrust in rating prediction. Since distrust is not as transitive as trust i.e. an enemy's enemy need not be an enemy or a friend, distrust can't directly replace trust in trust
propagation approaches. By using a low dimensional representation of the original trust network in SCOAL, we use distrust as it is and don't propagate it. Using SCOAL, we can pin-point the groups of users and the groups of
items that have the same preference model. Both SCOAL and MAD are able to seamlessly integrate side information such as item-subject and item-author
information into the trust based rating prediction model.Electrical and Computer Engineerin
Recommended from our members
Constrained relative entropy minimization with applications to multitask learning
textThis dissertation addresses probabilistic inference via relative entropy minimization subject to expectation constraints. A canonical representation of the solution is determined without the requirement for convexity of the constraint set, and is given by members of an exponential family. The use of conjugate priors for relative entropy minimization is proposed, and a class of conjugate prior distributions is introduced. An alternative representation of the solution is provided as members of the prior family when the prior distribution is conjugate. It is shown that the solutions can be found by direct optimization with respect to members of such parametric families. Constrained Bayesian inference is recovered as a special case with a specific choice of constraints induced by observed data.
The framework is applied to the development of novel probabilistic models for multitask learning subject to constraints determined by domain expertise. First, a model is developed for multitask learning that jointly learns a low rank weight matrix and the prior covariance structure between different tasks. The multitask learning approach is extended to a class of nonparametric statistical models for transposable data, incorporating side information such as graphs that describe inter-row and inter-column similarity. The resulting model combines a matrix-variate Gaussian process prior with inference subject to nuclear norm expectation constraints. In addition, a novel nonparametric model is proposed for multitask bipartite ranking. The proposed model combines a hierarchical matrix-variate Gaussian process prior with inference subject to ordering constraints and nuclear norm constraints, and is applied to disease gene prioritization. In many of these applications, the solution is found to be unique. Experimental results show substantial performance improvements as compared to strong baseline models.Electrical and Computer Engineerin
Recommended from our members
Analysis and classification of drift susceptible chemosensory responses
textThis report presents machine learning models that can accurately classify gases by analyzing data from an array of 16 sensors. More specifically, the report presents basic decision tree models and advanced ensemble versions. The contribution of this report is to show that basic decision trees perform reasonably well on the gas sensor data, however their accuracy can be drastically improved by employing ensemble decision tree classifiers. The report presents bagged trees, Adaboost trees and Random Forest models in addition to basic entropy and Gini based trees. It is shown that ensemble classifiers achieve a very high degree of accuracy of 99% in classifying gases even when the sensor data is drift ridden. Finally, the report compares the accuracy of all the models developed.Electrical and Computer Engineerin
Recommended from our members
Knowledge transfer techniques for dynamic environments
The expense involved in obtaining class labels for data has led to the emergence of
semi-supervised learning techniques which try to make use of both the labeled and
the unlabeled data to obtain classifiers with better generalization capabilities. Most
existing semi-supervised methods assume that the unlabeled data have the same
underlying distribution as the training data. However, data acquired for actual
problems often suffer from population drift over time or space, and consequently
classifiers learned from existing labeled data tend to become obsolete over time or
extended geographic areas.
In this dissertation, semi-supervised techniques are considered for updating
existing classifiers, while allowing for the possibility of population drift in the incoming data. The proposed techniques make use of meta-information that is not
explicitly provided by the data to aid in semi-supervision.
First, a framework that exploits the contextual information in an existing
hierarchical binary classifier is presented to rapidly construct a new classifier for a
new but related classification problem. The knowledge transfer technique is augmented with active learning to efficiently update the classifier using far fewer data
points than simple semi-supervised methods. The proposed technique is shown to
be well-suited for adapting classifiers, even when there is a significant difference
between the labeled and unlabeled data.
The knowledge transfer approach detailed in this thesis assumes the existence
of a pre-defined hierarchy of classes. However, it is possible that several different
class hierarchies are defined or obtained for the same domain. A maximum likelihood framework is proposed for integrating available hierarchies into a single ‘master
hierarchy’. The taxonomy integration method is shown to result in more natural
mappings between existing taxonomies compared to alternative approaches that do
not exploit the class hierarchy information. A technique that automatically generates n-ary class hierarchies is also presented. The n-ary trees are shown to better
reflect the inter-class relationships and are in general more effective for knowledge
transfer than binary trees.
Focusing on the domain of hyperspectral data, the efficacy of the new techniques is evaluated for the problem of classifying spatially/temporally varying hyperspectral images. The empirical results clearly demonstrate the utility of exploiting
‘contextual’ information for the problem of knowledge transfer in dynamic environments.Electrical and Computer Engineerin
Recommended from our members
Robust methods for locating multiple dense regions in complex datasets
textIn classical clustering, each data point is assigned to at least one cluster. However,
in many real-world problems, only a small subset of the data clusters well, while the
rest shows little or no clustering tendencies. For such situations, this thesis presents
several techniques that cluster only a subset of the data into one or more groupings.
We first develop a very general parametric approach called Bregman Bubble
Clustering that can find multiple dense regions, and can scale to very large datasets.
By using a fast iterative relocation based approach combined with a novel concept
for improving local search called Pressurization, Bregman Bubble Clustering extends
density-based clustering to a much larger set of problems. We also develop a seeding
algorithm that can automatically determine the number of clusters, and make the
viii
results deterministic.
We then describe a more focussed non-parametric alternative called Automated
Hierarchical Density Shaving (Auto-HDS), a framework that consists of a
fast, hierarchical, density-based clustering algorithm and an unsupervised model selection
strategy. Auto-HDS can automatically select between clusters of different
densities, present them in a compact hierarchy, and rank individual clusters using
an innovative stability criteria. The Auto-HDS framework also provides a simple yet
powerful 2-D visualization of the hierarchy of clusters that is useful for further exploring
the dense clusters in high-dimensional datasets. We also developed a robust,
memory efficient, platform independent, and open source Java based implementation
of Auto-HDS called Gene DIVER (Gene Density Interactive Visual Explorer) that
provides interactive clustering capabilities for high-throughput biological datasets.
For problems where finding small dense regions is important, the parametric
approach is applicable to a wide variety of scenarios and is scalable to very large
datasets. On the other hand, Auto-HDS, the non-parametric approach, provides
a powerful visualization, a compact clustering hierarchy, and interactive clustering:
properties that are useful for biologists interested in finding and understanding small
dense clusters of genes. Together, the two approaches greatly extend the scope of
density based clustering in three different dimensions; the diversity of problems that
density-based clustering can now be used with, the expanded capability to quickly
understand and analyze the clusters in the data, and the scale of the problems that
are now within reach of modest computing resources.Electrical and Computer Engineerin
Recommended from our members
Learning to rank in supervised and unsupervised settings using convexity and monotonicity
textThis dissertation addresses the task of learning to rank, both in the supervised and unsupervised settings, by exploiting the interplay of convex functions, monotonic mappings and their fixed points. In the supervised setting of learning to rank, one wishes to learn from examples of correctly ordered items whereas in the unsupervised setting, one tries to maximize some quantitatively defined characteristic of a "good" ranking. A ranking method selects one permutation from among the combinatorially many permutations defined on the items to rank. Accomplishing this optimally in the supervised setting, with minimal loss in generality, if any, is challenging. In this dissertation this problem is addressed by optimizing, globally and efficiently, a statistically consistent loss functional over the class of compositions of a linear function by an arbitrary, strictly monotonic, separable mapping with large margins. This capability also enables learning the parameters of a generalized linear model with an unknown link function. The method can handle infinite dimensional feature spaces if the corresponding kernel function is known. In the unsupervised setting, a popular ranking approach is is link analysis over a graph of recommendations, as exemplified by pagerank. This dissertation shows that pagerank may be viewed as an instance of an unsupervised consensus optimization problem. The dissertation then solves a more general problem of unsupervised consensus over noisy, directed recommendation graphs that have uncertainty over the set of "out" edges that emanate from a vertex. The proposed consensus rank is essentially the pagerank over the expected edge-set, where the expectation is computed over the distribution that achieves the most agreeable consensus. This consensus is measured geometrically by a suitable Bregman divergence between the consensus rank and the ranks induced by item specific distributions Real world deployed ranking methods need to be resistant to spam, a particularly sophisticated type of which is link-spam. A popular class of countermeasures "de-spam" the corrupted webgraph by removing abusive pages identified by supervised learning. Since exhaustive detection and neutralization is infeasible, there is a need for ranking functions that can, on one hand, attenuate the effects of link-spam without supervision and on the other hand, counter spam more aggressively when supervision is available. A family of non-linear, iteratively defined monotonic functions is proposed that propagates "rank" and "trust" scores through the webgraph. It relies on non-linearity, monotonicity and Schurconvexity to provide the resistance against spam.Electrical and Computer Engineerin
Recommended from our members
The effect of oversampling and undersampling on classifying imbalanced text datasets
Many machine learning classification algorithms assume that the target classes share similar prior probabilities and misclassification costs. However, this is often not the case in the real world. The problem of classification when one class has a much lower prior probability in the training set is called the imbalanced dataset problem. One popular approach to solving the imbalanced dataset problem is to resample the training set. However, few studies in the past have considered resampling algorithms on data sets with high dimensionality. In this thesis, we examine the imbalanced dataset problem in the realm of text classification. Text has the added problems of both sparsity and high dimensionality. We first describe the resampling techniques we use in this thesis, including several resampling techniques we are introducing. After resampling, we classify the data using multinomial naïve Bayes, k nearest neighbor, and SVMs. Finally, we compare the results of our experiments and find that, while the best resampling technique to use is often dataset dependent, certain resampling techniques tend to perform consistently when coupled with certain classifiersElectrical and Computer Engineerin
Recommended from our members
Ethical AI: A Policy Framework to Regulate Bias in Large Language Models
As large language models (LLMs) like ChatGPT become increasingly prevalent, addressing the biases present in these models has become a pressing concern. This thesis investigates the origins of bias in LLMs, exploring how the transformer architecture used in these models can amplify societal biases present in their training data. It also discusses the ethical implications of biased LLM outputs across different sectors, such as healthcare and hiring. The main research question that the thesis concerns is: how do we create a policy framework to regulate bias in LLMs to guide legislation that protects citizens and consumers without inhibiting LLM development? My methods look at analyzing the current policy landscape around mitigating AI bias, such as New York City's AI hiring law and the EU AI Act, to find crucial aspects that an effective framework needs to address. Drawing from this analysis, the thesis proposes several key aspects that a policy framework to regulate bias in LLMs should focus on. This framework outlines key ethical principles, delineates responsibilities for stakeholders like model providers and data brokers, establishes governance mechanisms, and advocates for risk-based, sector-specific guidelines. Methodologies to quantify bias through techniques like template-based tests and counterfactual evaluations are also discussed. By bridging technical understanding with policy considerations, this research aims to inform the development of fair and equitable LLMs that uphold societal values while fostering responsible innovation.Plan II Honors Progra
- …
