1,720,990 research outputs found

    #nowplaying

    No full text
    <p>This dataset contains a dump of the #nowplaying dataset which contains so-called listening events of users who publish the music they are currently listening to on Twitter. In particular, this dataset includes tracks which have been tweeted using the hashtags #nowplaying, #listento or #listeningto. In this dataset, we provide the track and artist of a listening event and metadata on the tweet (date sent, user, source). Furthermore, we provide a mapping of tracks to its respective Musicbrainz identifiers. The dataset features a total of 126 mio listening events.</p> <p>This archive contains the nowplaying.csv file, the main file which contains the following fields:</p> <ul> <li>user id (each user is identified by a unique hash value)</li> <li>source of the tweet (how it was sent; as provided by the Twitter API)</li> <li>timestamp of the time the tweet underlying the listening event was sent</li> <li>track title</li> <li>artist name</li> <li>musicbrainz identifier of the recording (cf. https://musicbrainz.org/)</li> </ul> <p>In case you make use of our dataset in a scientific setting, we kindly ask you to cite the following paper: </p> <p><br> Eva Zangerle, Martin Pichl, Wolfgang Gassler, and Günther Specht. 2014. #nowplaying Music Dataset: Extracting Listening Behavior from Twitter. In Proceedings of the First International Workshop on Internet-Scale Multimedia Management (WISMM '14). ACM, New York, NY, USA, 21-26.</p> <p>If you have any questions or suggestions regarding the dataset, please do not hesitate to contact Eva Zangerle ([email protected]).</p&gt

    Hit Song Prediction (Million Song Dataset and Audio Features)

    No full text
    <p><strong>Hit Song Prediction Dataset</strong></p> <p>This dataset is based on the Million Song Dataset (MSD), which contains one million songs that are representative for western commercial music released between 1922 and 2011. The dataset contains release year information for 515,576 of the MSD songs. Please refer to http://millionsongdataset.com/ for further information on the million song dataset.</p> <p>For our hit song prediction experiments, we extract high- and low-level audio features using the Essentia toolkit (cf. https://essentia.upf.edu/). For the high-level features, we make use of the pre-trained classifiers as provided by Essentia. For a detailed description of the features, please visit the Essentia documentation.</p> <p><br> The dataset hence contains:</p> <ul> <li><strong>Audio features</strong>: the compressed msd_audio_features.tar.gz file contains the low- and high-level features for each track, stored as json files. Please note that we organize all MSD audio feature files based on the track's identifier with one folder holding all tracks with the same first letter of the track identifier to keep the files manageable. For each track, we provide two files: one containing the high-level and one containing the low-level features extracted by Essentia.</li> <li><strong>Billboard data:</strong> the folder billboard_data contains two files: msd_bb_matches.csv contains information about the MSD tracks that were also featured in the Billboard Hot 100 charts. Here, we provide the MSD id, Echo Nest id, artist name, track title, release year, peak position in Billboard charts and the number of weeks in the charts. The second file, msd_bb_non_matches.csv contains meta-information about the tracks of the MSD that were not featured in the Billboard Hot 100 and hence were used as negative samples. Here, we provide the MSD id, Echo Nest id, artist name, track title and the release year.</li> </ul> <p><br> If you make use of the dataset, please kindly cite the following paper:</p> <p>Eva Zangerle, Michael Vötter, Ramona Huber, and Yi-Hsuan Yang. Hit Song Prediction: Leveraging Low- and High-Level Audio Features. In Proceedings of the 20th International Society for Music Information Retrieval Conference 2019 (ISMIR 2019), 2019.</p> <p><br> @inproceedings{zangerle_ismir19,<br> title = {{Hit Song Prediction: Leveraging Low- and High-Level Audio Features}},<br> author = {Eva Zangerle and Ramona Huber and Michael V\"{o}tter and Yi-Hsuan Yang},<br> year = {2019},<br> booktitle = {{Proceedings of the 20th International Society for Music Information Retrieval Conference 2019 (ISMIR 2019)}},<br> }</p&gt

    Culture-Aware Music Recommendation Dataset

    No full text
    LFM-1b dataset extended by acoustic track features and cultural cues describing users   This dataset is based on the LFM-1b dataset (cf. http://www.cp.jku.at/datasets/LFM-1b/), however, adds acoustic features describing the tracks to the original dataset as well as cultural aspects describing users (taken from Hofstede's six dimension model and the World Happiness Report) on the country-level. For the creation of the dataset, we extract all users for which the original dataset contains country information for. We extract the listening events of these users and match the tracks against the Spotify API to subsequently retrieve the acoustic features of these tracks (cf. [Spotify Audio Feature Description](https://developer.spotify.com/documentation/web-api/reference/object-model/#audio-features-object)). The final dataset contains only events of users with country information and tracks with acoustic features, which can be matched with the country-level data of the World Happiness Report and Hofstede's cultural dimensions to add cultural and socio-economic aspects for users. This new dataset contains 55,190 users 3,471,884 tracks including acoustic features 351,469,333 listening events of those users for tracks we have obtained acoustic features for Hofstede's cultural dimensions for 47 countries World Happiness Report (WHR) data for 164 countries   Files All files are tab-separated, with no quoting of strings. The dataset contains the following files, whose content we describe in more detail in the following parts. * acoustic_features_lfm_id.tsv: acoustic features for all tracks in the dataset, identified by their LFM track identifier * events.tsv: listening events for all users * hofstede.tsv: Hofstede's cultural dimensions * users.tsv: user metadata * world_happiness_report_2018.tsv: World Happiness Report data For further information on the contents of these files, please cf. the Readme file.</p

    #nowplaying-rs

    No full text
    &lt;p&gt;The nowplaying-rs dataset features context- and content features of listening events. It contains 11.6 million music listening events of 139K users and 346K tracks collected from Twitter. The dataset comes with a rich set of item content features and user context features, as well as timestamps of the listening events. Moreover, some of the user context features imply the cultural origin of the users, and some others - like hashtags - give clues to the emotional state of a user underlying a listening event.&lt;/p&gt; &lt;p&gt;The dataset contains three files:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;user_track_hashtag_timestamp.csv contains basic information about each listening event. For each listening event, we provide an id, the user_id, track_id, hashtag, created_at&nbsp;&lt;/li&gt; &lt;li&gt;context_content_features.csv: contains all context and content features. For each listening event, we provide the id of the event, user_id, track_id, artist_id, content features regarding the track mentioned in the event (instrumentalness, liveness, speechiness, danceability, valence, loudness, tempo, acousticness, energy, mode, key) and context features regarding the listening event (coordinates (as geoJSON), place (as geoJSON), geo (as geoJSON), tweet_language, created_at, user_lang, time_zone, entities contained in the tweet).&lt;/li&gt; &lt;li&gt;sentiment_values.csv contains sentiment information for hashtags. It contains the hashtag itself and the sentiment values gathered via four different sentiment dictionaries: AFINN, Opinion Lexicon, Sentistrength Lexicon and vader. For each of these dictionaries we list the minimum, maximum, sum and average of all&nbsp;sentiments of the tokens of the hashtag (if available, else we list empty values). However, as most hashtags only consist of a single token, these&nbsp;values are equal in most cases. Please note that the lexica are rather diverse and therefore, are able to resolve very different terms against a score. Hence,&nbsp;the resulting csv is rather sparse. The file contains the following comma-separated values: &lt;hashtag, vader_min, vader_max, vader_sum,vader_avg, &nbsp;afinn_min, afinn_max,&nbsp;afinn_sum, afinn_avg, ol_min, ol_max, ol_sum, ol_avg, ss_min, ss_max, ss_sum, ss_avg &gt;, where we abbreviate all scores gathered over the Opinion Lexicon with the&nbsp;prefix &#39;ol&#39;. Similarly, &#39;ss&#39; stands for SentiStrength.&nbsp;&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;Please note that user_track_hashtag_timestamp.csv and context_content_features.csv partly provide the same features. We deliberately chose to do so to be able to provide useable files that do not have to be matched and joined with each other to perform e.g., simple recommendation tasks.&lt;/p&gt; &lt;p&gt;Please also find the training and test-splits for the dataset in this repo. Also, Asmita provides prototypical implementations of a context-aware recommender system based on the dataset at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM.&lt;/p&gt; &lt;p&gt;&lt;br&gt; If you make use of this dataset, please cite the following paper where we describe and experiment with the dataset:&lt;/p&gt; &lt;p&gt;@inproceedings{smc18,&lt;br&gt; title = {#nowplaying-RS: A New Benchmark Dataset for Building Context-Aware Music Recommender Systems},&lt;br&gt; author = {Asmita Poddar and Eva Zangerle and Yi-Hsuan Yang},&lt;br&gt; url = {http://mac.citi.sinica.edu.tw/~yang/pub/poddar18smc.pdf},&lt;br&gt; year = {2018},&lt;br&gt; date = {2018-07-04},&lt;br&gt; booktitle = {Proceedings of the 15th Sound &amp; Music Computing Conference},&lt;br&gt; address = {Limassol, Cyprus},&lt;br&gt; note = {code at https://github.com/asmitapoddar/nowplaying-RS-Music-Reco-FM},&lt;br&gt; tppubtype = {inproceedings}&lt;br&gt; }&lt;/p&gt

    Spotify Playlists Dataset

    No full text
    &lt;p&gt;&lt;br&gt; This dataset is based on the subset of users in the #nowplaying dataset who publish their #nowplaying tweets via Spotify. In principle, the dataset holds users, their playlists and the tracks contained in these playlists.&lt;/p&gt; &lt;p&gt;The csv-file holding the dataset contains the following columns:&nbsp;&quot;user_id&quot;, &quot;artistname&quot;, &quot;trackname&quot;, &quot;playlistname&quot;, where&lt;/p&gt; &lt;ul&gt; &lt;li&gt;user_id is a hash of the user&#39;s Spotify user name&lt;/li&gt; &lt;li&gt;artistname is the name of the artist&lt;/li&gt; &lt;li&gt;trackname is the title of the track and&lt;/li&gt; &lt;li&gt;playlistname is the name of the playlist that contains this track.&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;The separator used is , each entry is enclosed by double quotes and the escape character used is \.&lt;/p&gt; &lt;p&gt;A description of the generation of the dataset and the dataset itself can be found in the following paper:&lt;/p&gt; &lt;p&gt;Pichl, Martin; Zangerle, Eva; Specht, G&uuml;nther: &quot;Towards a Context-Aware Music Recommendation Approach: What is Hidden in the Playlist Name?&quot; in 15th IEEE International Conference on Data Mining Workshops (ICDM 2015), pp. 1360-1365, IEEE, Atlantic City, 2015.&lt;br&gt; &nbsp;&lt;/p&gt

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    PAN23 Multi-Author Writing Style Analysis

    No full text
    This is the dataset for the shared task on Multi-Author Writing Style Analysis PAN@CLEF2023. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code. Task: We ask participants to solve the following intrinsic style change detection task: for a given text, find all positions of writing style change on the paragraph-level (i.e., for each pair of consecutive paragraphs, assess whether there was a style change). The simultaneous change of authorship and topic will be carefully controlled and we will provide participants with datasets of three difficulty levels: Easy: The paragraphs of a document cover a variety of topics, allowing approaches to make use of topic information to detect authorship changes. Medium: The topical variety in a document is small (though still present) forcing the approaches to focus more on style to effectively solve the detection task. Hard: All paragraphs in a document are on the same topic. All documents are provided in English and may contain an arbitrary number of style changes. However, style changes may only occur between paragraphs (i.e., a single paragraph is always authored by a single author and contains no style changes). Data: To develop and then test your algorithms, three datasets including ground truth information are provided (dataset1 for the easy task, dataset2 for the medium task, and dataset3 for the hard task). Each dataset is split into three parts: training set: Contains 70% of the whole dataset and includes ground truth data. Use this set to develop and train your models. validation set: Contains 15% of the whole dataset and includes ground truth data. Use this set to evaluate and optimize your models. test set: Contains 15% of the whole dataset, no ground truth data is given. This set is used for evaluation. You are free to use additional external data for training your models. However, we ask you to make the additional data utilized freely available under a suitable license. Versioning: 1.0: initial uploa

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis
    corecore