1,721,351 research outputs found
FOXG1 Regulates PRKAR2B Transcriptionally and Posttranscriptionally via miR200 in the Adult Hippocampus
Rett syndrome is a complex neurodevelopmental disorder that is mainly caused by mutations in MECP2. However, mutations in FOXG1 cause a less frequent form of atypical Rett syndrome, called FOXG1 syndrome. FOXG1 is a key transcription factor crucial for forebrain development, where it maintains the balance between progenitor proliferation and neuronal differentiation. Using genome-wide small RNA sequencing and quantitative proteomics, we identified that FOXG1 affects the biogenesis of miR200b/a/429 and interacts with the ATP-dependent RNA helicase, DDX5/p68. Both FOXG1 and DDX5 associate with the microprocessor complex, whereby DDX5 recruits FOXG1 to DROSHA. RNA-Seq analyses of Foxg1cre/+ hippocampi and N2a cells overexpressing miR200 family members identified cAMP-dependent protein kinase type II-beta regulatory subunit (PRKAR2B) as a target of miR200 in neural cells. PRKAR2B inhibits postsynaptic functions by attenuating protein kinase A (PKA) activity; thus, increased PRKAR2B levels may contribute to neuronal dysfunctions in FOXG1 syndrome. Our data suggest that FOXG1 regulates PRKAR2B expression both on transcriptional and posttranscriptional levels.Fil: Weise, Stefan C.. Institute Of Anatomy And Cell Biology; AlemaniaFil: Arumugam, Ganeshkumar. Institute Of Anatomy And Cell Biology; AlemaniaFil: Villarreal, Alejandro. Institute Of Anatomy And Cell Biology; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay. Instituto de Biología Celular y Neurociencia "Prof. Eduardo de Robertis". Universidad de Buenos Aires. Facultad de Medicina. Instituto de Biología Celular y Neurociencia; ArgentinaFil: Videm, Pavankumar. Universität Freiburg Im Breisgau; AlemaniaFil: Heidrich, Stefanie. Institute Of Anatomy And Cell Biology; AlemaniaFil: Nebel, Nils. Institute Of Anatomy And Cell Biology; AlemaniaFil: Dumit, Veronica Ines. Universität Freiburg Im Breisgau; Alemania. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Sananbenesi, Farahnaz. Deutsches Zentrum Für Neurodegenerative Erkrankungen E.v.; AlemaniaFil: Reimann, Viktoria. Albert Ludwigs University Of Freiburg; AlemaniaFil: Craske, Madeline. Active Motif Incorporation; Estados UnidosFil: Schilling, Oliver. Universität Freiburg Im Breisgau; AlemaniaFil: Hess, Wolfgang R.. Universität Freiburg Im Breisgau; AlemaniaFil: Fischer, Andre. Universitätsmedizin Göttingen; Alemania. Deutsches Zentrum Für Neurodegenerative Erkrankungen E.v.; AlemaniaFil: Backofen, Rolf. Universidad de Copenhagen; DinamarcaFil: Vogel, Tanja. Universität Freiburg Im Breisgau; Alemani
Computational analysis and prediction of RNA-protein interactions
This dissertation is about the computational analysis and prediction of RNA-protein interactions. Ribonucleic acids (RNAs) and proteins both are essential for the control of gene expression in our cells. Gene expression is the process by which a functional gene product, namely a protein or an RNA, is produced from a gene, starting from the gene region on the DNA with the transcription of an RNA. Once regarded primarily as a messenger to transmit the protein information, recent years have seen RNA moving further into the biomedical spotlight, thanks to its increasingly uncovered roles in regulating gene expression. In addition, RNA has showcased its therapeutic potential, as famously demonstrated by the groundbreaking success of RNA vaccines in the COVID-19 pandemic. However, RNAs rarely function on their own: In humans, more than 1,500 different RNA-binding proteins (RBPs) are involved in controlling the various stages of an RNA's life cycle, creating a highly complex regulatory interplay between RNAs and proteins. It is therefore of fundamental importance to study these RNA-protein interactions, in order to deepen our understanding of gene expression.Over the last decade, CLIP-seq has become the dominant experimental method to identify the set of cellular RNA binding sites for an RBP of interest. However, analysing the resulting CLIP-seq data can be challenging, as there are many analysis steps and CLIP-seq protocol variants available, each requiring specific adaptations to the analysis workflow. Consequently, there is a need for analysis guidelines, providing easy access to tools, as well as the constant improvement of tools and workflows to increase the accuracy of the analysis results.The first set of works included in this thesis (publications P1, P4, and P5) deals with these topics, by providing a review article on CLIP-seq data analysis, as well as two articles on how to further improve CLIP-seq data analysis. Publication P1 supplies readers with an overview of tools and protocols, as well as guidelines to conduct a successful analysis, drawing largely from our own experience with analysing CLIP-seq data. Publication P4 demonstrates the issues current binding site identification tools have with CLIP-seq data from RBPs that bind to processed RNAs, and that the integration of RNA processing information improves the resulting binding site quality. On top of this, publication P5 presents Peakhood, the first tool that utilizes RNA processing information in order to increase the quality of RBP binding sites identified from CLIP-seq data.A natural drawback of experimental methods is that a target RNA needs to be sufficiently expressed in the observed cells for an RNA-protein interaction to be detected. Hence, since gene expression is a dynamic process that differs between cell types, time points, and conditions, a CLIP-seq experiment cannot recover the complete set of cellular RBP binding sites. This creates a demand for computational methods which can learn the binding properties of an RBP from existing CLIP-seq data, in order to predict RBP binding sites on any given target RNA. Besides interacting with proteins, RNAs can also interact with other RNAs, further increasing the amount of possible regulatory interactions between RNAs and proteins. In this regard, long non-coding RNAs (lncRNAs), a large class of non-protein-coding RNAs whose functions are still vastly unexplored, have become especially important, as it has been shown that they can engage in RNA-RNA interactions, whose regulatory mechanisms also include RNA-protein interactions. As such mechanistic studies are typically slow and expensive, computational tools that combine RNA-protein and RNA-RNA interaction predictions to infer potential mechanisms could be of great help, e.g., by screening a set of target RNAs and proteins and suggesting plausible mechanisms for experimental validation.The second set of works included in this thesis (publications P2 and P3) thus deals with the computational prediction of RNA-protein interactions, RNA-RNA interactions and the functional mechanisms that can be inferred from these interactions. Publication P2 introduces MechRNA, the first tool to infer functional mechanisms of lncRNAs based on their predicted interactions with RBPs and other RNAs, as well as gene expression data. We demonstrated MechRNA's capability to identify formerly described lncRNA mechanisms and experimentally validated one prediction, underlining its value for functional lncRNA studies. Finally, publication P3 presents RNAProt, a flexible and performant RBP binding site prediction tool based on recurrent neural networks. Compared to other popular deep learning methods, RNAProt achieves state-of-the-art predictive performance, as well as superior runtime efficiency. In addition, it is more feature-rich than any other available method, including the support of user-defined predictive features. We further showed that its visualizations agree with known RBP binding preferences, and demonstrated that its additional predictive features can increase the specificity of predictions.Diese Dissertation beschäftigt sich mit der computergestützten Analyse und Vorhersage von RNA-Protein-Interaktionen. Ribonukleinsäuren (RNAs) und Proteine sind essentielle Bestandteile der Genexpressionskontrolle in den Zellen unseres Körpers. Genexpression bezeichnet den Prozess der Herstellung eines funktionellen Genprodukts, welches ein Protein oder eine RNA sein kann, angefangen mit der Transkription einer RNA von der betreffenden Genregion auf der DNA. In den letzten Jahren hat sich unser ursprüngliches Bild der RNA als Überträger der Proteininformation erheblich erweitert: Diverse Forschungsarbeiten haben zahlreiche neue RNA-Funktionen bei der Regulierung der Genexpression offengelegt, wodurch sich der wissenschaftliche Fokus in der biomedizinischen Forschung weiter in Richtung RNA verschoben hat. Darüber hinaus hat der bahnbrechende Erfolg der RNA-Impfstoffe in der COVID-19 Pandemie auf beeindruckende Weise das therapeutische Potential von RNA aufgezeigt. RNAs führen ihre Funktionen jedoch in den seltensten Fällen alleine aus: Mehr als 1500 RNA-Bindeproteine (RBPs) sind im Menschen an der Kontrolle der verschiedenen Phasen des RNA-Lebenszyklus beteiligt, was zu einem hochkomplexen regulatorischen Zusammenspiel zwischen RNA und Proteinen führt. Es ist daher von grundlegender Bedeutung, diese RNA-Protein-Interaktionen zu untersuchen, um ein tieferes Verständnis der Genexpression zu erlangen.Im Laufe des letzten Jahrzehnts hat sich CLIP-seq als experimentelle Methode zur Identifizierung der zellulären RNA-Bindestellen eines bestimmten RBPs durchgesetzt. Die Analyse der resultierenden CLIP-seq-Daten ist jedoch alles andere als trivial, da sie ein fundiertes Wissen über die zahlreichen Analyseschritte und die unterschiedlichen CLIP-seq-Protokolle voraussetzt. Es ist daher notwendig, dem Anwender Anleitungen und Programme für die einzelnen Analyseschritte und Protokollvarianten zur Verfügung zu stellen. Ebenso wichtig ist die kontinuerliche Verbesserung der Programme und Workflows, um die Qualität der Analyseergebnisse weiter zu erhöhen.Die ersten drei in dieser Dissertation enthaltenen Publikationen (Publikationen P1, P4 und P5) behandeln diese Themen: Publikation P1 ist ein Übersichtsartikel zur Analyse von CLIP-seq-Daten, der die wichtigsten Analyseschritte, Protokolle und Programme beschreibt, mit dem Ziel, dem Leser eine erfolgreiche Datenanalyse zu ermöglichen. Die enthaltenen Anleitungen basieren dabei weitgehend auf unseren eigenen Erfahrungen mit der Analye von CLIP-seq-Daten. Publikation P4 stellt die Probleme aktueller Programme zur Identifizierung von Bindestellen dar, wenn die CLIP-seq-Daten von RBPs stammen die an prozessierte RNAs binden. Weiterhin zeigen wir, dass die Integration von Informationen zur RNA-Prozessierung die Qualität der resultierenden Bindestellen verbessert. Darauf aufbauend präsentieren wir in Publikation P5 Peakhood, das erste Programm welches Informationen zur RNA-Prozessierung benutzt um die Qualität der aus CLIP-seq-Daten ermittelten RBP-Bindestellen zu erhöhen.Ein offensichtlicher Nachteil experimenteller Methoden ist, dass diese auf eine ausreichend hohe Expression der RNA angewiesen sind, um die sich darauf befindlichen RBP-Bindestellen detektieren zu können. Da die Genexpression dynamisch ist und deshalb unterschiedlich ausfällt zwischen verschiedenen Zelltypen, Zeitpunkten und Konditionen, kann ein CLIP-seq-Experiment folglich niemals den kompletten Satz an zellulären RBP-Bindestellen ermitteln. Dies führt zu einem Bedarf an computergestützten Methoden, welche die Bindeeigenschaften eines RBP aus existierenden CLIP-seq-Daten lernen können, um damit neue RBP-Bindestellen auf beliebigen RNAs vorherzusagen. Neben der Interaktion mit Proteinen können RNAs auch mit anderen RNAs interagieren, wodurch sich die Anzahl der möglichen regulatorischen Interaktionen zwischen RNAs und Proteinen nochmals deutlich erhöht. In diesem Zusammenhang sind vor allem lange nicht-kodierende RNAs (lncRNAs) zu nennen, eine große noch weitgehend unerforschte Klasse nicht-proteinkodierender RNAs, da gezeigt werden konnte, dass diese RNA-RNA-Interaktionen ausbilden können, deren regulatorische Mechanismen auch RNA-Protein-Interaktionen mit einbeziehen. Diese mechanistischen Studien sind allerdings mit einem erheblichen Zeit- und Kostenaufwand verbunden. Dementsprechend entsteht ein Bedarf an computergestützten Methoden zur Vorhersage potentieller Mechanismen anhand von vorausberechneten RNA-Protein- und RNA-RNA-Interaktionen. Diese dienen dann beispielsweise zur Vorauswahl plausibler Mechanismen, welche anschließend experimentell validiert werden können.Die restlichen zwei in dieser Dissertation enthaltenen Publikationen (Publikationen P2 und P3) befassen sich deshalb mit der computergestützten Vorhersage von RNA-Protein-Interaktionen, RNA-RNA-Interaktionen, sowie den funktionellen Mechanismen, die sich aus diesen Interaktionen ableiten lassen. In Publikation P2 stellen wir MechRNA vor, das erste Programm zur Vorhersage funktioneller Mechanismen von lncRNAs, abgeleitet aus vorausberechneten Interaktionen der lncRNA mit RBPs und anderen RNAs sowie aus Genexpressionsdaten. Wir zeigen dass MechRNA in der Lage ist, bekannte lncRNA-Mechanismen zu identifizieren. Ebenso konnten wir eine Vorhersage erfolgreich experimentell validieren, was nochmals den Wert des Programms für funktionelle lncRNA-Studien unterstreicht. Schließlich präsentieren wir in Publikation P3 RNAProt, ein flexibles und leistungsfähiges Programm zur Vorhersage von RBP-Bindestellen, basierend auf rekurrenten neuronalen Netzen. Im Vergleich zu anderen populären Deep-Learning-Methoden bietet RNAProt sowohl eine überragende Vorhersageleistung als auch eine überlegene Laufzeiteffizienz. Darüber hinaus bietet das Programm mehr Funktionen als jede andere verfügbare Methode, einschließlich der Unterstützung benutzerdefinierter Vorhersage-Features. Zudem haben wir gezeigt, dass die in RNAProt enthaltenen Visualisierungen mit bekannten RBP-Bindepräferenzen übereinstimmen, und dass die zusätzlichen Vorhersage-Features von RNAProt die Spezifität der Vorhersagen weiter erhöhen können
Bioinformatic analysis and online database set-up for circular RNAs in non-small cell lung cancer
Workflow recommendations using deep learning and machine learning tools for exploratory and predictive data analysis in Galaxy
Galaxy is an open-source web platform for scientific data analysis. The Galaxy Europe platform constitutes over 3,000 scientific tools for processing and analysing scientific datasets. Also, it accesses a large computing cluster comprising thousands of CPU cores, several terabytes (TB) of memory and a few petabytes (PB) of storage for executing those tools on scientific datasets. To promote exploratory and predictive data analysis in Galaxy Europe, the thesis develops a few approaches broadly divided into two parts - (a) create a workflow recommendation system predicting tools using deep learning (DL) to extend data analysis and workflows, and (b) develop tools and infrastructure with robust machine learning (ML) approaches for researchers to perform scalable and reproducible applied machine learning research.Scientific analyses are carried out using several blocks of tools that collectively process datasets stepwise, transforming their raw nature into conclusion-bearing insights. Such blocks of scientific tools are chained together to form workflows in Galaxy. Creating meaningful workflows is a complicated task and requires knowledge of many tools. Therefore, creating a guidance system to assist researchers in creating complex scientific workflows is essential by recommending high-quality tools at each step. In addition, multiple recommendations at each step pave the way for creating divergent workflows, enabling exploratory data analysis. The Galaxy recommendation system is created by training two deep learning architectures, namely recurrent neural networks (RNN) and Transformers (publications P1 and P2, respectively) on workflows stored in Galaxy Europe. These architectures learn the underlying sequential nature of scientific workflows to recommend the most useful tools. Galaxy workflows are directed acyclic graphs consisting of tool sequences, and these architectures are robust for learning sequential patterns. Tool recommendation models have been created by training both architectures on tool sequences following a multi-label, multi-class classification. A Galaxy API has been developed to predict recommendations using trained DL models, taking a tool or a tool sequence as input. These recommendations are displayed in Galaxy using two user interface (UI) integrations. Machine learning methods are widely used for predictive analysis tasks in Bioinformatics, such as the classification of DNA sequences, protein functions and gene expression patterns, biomedical image analysis, drug-response prediction and many more, achieving state-of-the-art accuracy. These tasks using machine learning methods on high-dimensional biological datasets often require enormous computing resources consisting of several CPU cores and GPUs, ample disk space, and high memory. Such large computing resources are readily available only to a few researchers. JupyterLab is a popular program editor for rapidly developing prototypes and end-to-end analyses for machine learning and data science projects. As part of this thesis, it has been integrated into Galaxy as an online tool (available through Galaxy Europe) that can access its large computing cluster. The tool is developed as a Docker container with several machine learning software packages installed. The software packages installed in a Docker container ensure reproducibility and secure execution of Python scripts written in JupyterLab notebooks. In addition, the JupyterLab tool can also be used as a regular Galaxy tool and can be directly integrated into any workflow. Additionally, researchers can perform machine learning model training remotely. The resulting model, represented in an open neural network exchange format (ONNX), and other supporting datasets become available in Galaxy.The usage of the JupyterLab tool in Galaxy is demonstrated by two use cases - prediction of infected regions in COVID-19 CT scans and 3D structure of proteins (publication P3). However, researchers needing more programming expertise may be unable to utilise this machine learning tool for their predictive tasks, such as developing machine learning analysis notebooks inside the JupyterLab tool. Addressing this gap, several methods from Scikit-learn, TensorFlow, and XGBoost have been wrapped and integrated into Galaxy to provide researchers access to UI-based machine learning tools in Galaxy running on its large computing cluster. These Galaxy-ML tools have myriad functions, broadly divided into data preprocessing, classification, regression, and clustering. Researchers can use these tools on Galaxy to create end-to-end machine learning analysis workflows. Several ready-to-use Galaxy Training Network (GTN) tutorials have also been developed to demonstrate the creation of an end-to-end machine learning analysis for researchers. These tutorials showcase the usage of Galaxy-ML tools for reproducing results of two scientific publications to predict chronological human age using RNA-seq and DNA-methylation datasets (publication P4) and the classification of two types of leukaemia, a type of blood cancer using a gene expression dataset
Developing a workflow management system for fragment-based virtual screening
Drug development is a long, complex and expensive process. In particular, the first step of obtaining an initial list of drug candidates is challenging. Experimental screening, for example using protein-ligand binding assays, is fundamentally limited, and as a result, the concept of virtual screening comes into play. Virtual screening involves the use of in silico experiments such as statistical analyses, protein-ligand docking, and free energy calculations based on molecular dynamics (MD) simulation, in order to predict whether a particular compound is likely to bind to a particular target protein. Often, an initial list of candidates is generated by a fragment- approach, where a fragment is a small organic compound that can serve as a substructure for a putative drug candidate. Fragments can be found in either an experimental or theoretical manner, and can then be combined, or amended by the addition of other functional groups, in order to produce a list of candidate molecules.There is then a need to determine the likelihood that these candidates bind to the target protein. There are several computer-based methods that can be of service in this task; these methods are not mutually exclusive, but on the contrary are typically used sequentially as well as in parallel. However, they require different amounts and types of computational resources, and careful planning is therefore required to manage resources, organise the software tools as complete workflows, and then to deploy them. To organise and perform the analysis, the scientist can use a workflow management system. Such systems allow multiple tools to be concatenated into a single pipeline, which can then be can be executed via the command line or a graphical interface. This has the advantage of being more convenient than the tedious execution of individual tools one after the other and helps avoid any manual errors. For highly complex analyses that require several different software tools with stepwise repetition, such as MD simulations for hundreds of ligands against a single target protein, the use of a workflow management system is the only viable option. Another challenge in virtual screening is reproducibility. In a reproducible scientific work, other scientists must be able to critically evaluate the work by performing the same experiments or simulations themselves and thus verifying the results. The issue of reproducibility has received much attention recently, including in the field of computational chemistry and virtual screening. The use of a workflow management ix system helps to increase the reproducibility of a study, because the details of all tools run, with parameters and all versions of the tool software, are recorded to make the analyses repeatable for other scientists who want to verify their work.The focus of this work was to develop a platform for fragment-based virtual screening based on the Galaxy workflow management system. This platform can be used either through a graphical web-based interface or through the command-line - the latter is a useful alternative for complex simulations or analyses that may require additional scripting. In order to make the use of the command line easier, significant contributions were made to Planemo and BioBlend, two Python libraries that allow direct access to Galaxy via the Application programming Interface (API). In order to demonstrate the utility of the platform developed, two projects were carried out using the developed tools and workflows.First, a study was performed on the T4 lysozyme mutant L99A in complex with benzene using the dcTMD technique as a model system for fragment-protein binding. T4L-L99A is a commonly used model system for free energy calculations, and is especially useful as a model for fragment binding, due to the small size of the pocket and the benzene ligand, which is typical for the compounds and pockets generally used in fragment-based screening studies, and the fact that benzene binds rather weakly. Like many MD methods, dcTMD requires the execution of a large number of steps in sequence, and requires the creation of an ensemble of simulations, both features which benefit from the use of a workflow management system. The analysis was able to uncover multiple unbinding pathways, an essential feature of the dcTMD method, and to characterise the thermodynamics and kinetics of several of these. The final results were comparable to experimental benchmarks.Second, a virtual screening was performed with the aim of identifying effective inhibitors of the major protease of the SARS-CoV virus; 53,000 compounds were generated based on 22 non-covalent crystallographic fragments, and their binding ability was analysed sequentially by protein-ligand docking, MMGBSA calculations and dcTMD simulations. Several million docking poses were generated, and scored by experimental validation against the crystallographic fragment structures. Over 200 compounds were then assessed by MMGBSA, followed by a further filtering and execution of a dcTMD workflow for 50 compounds. One fragment, which enforces a conformational change on the protein binding site, was found to confer particularly strong binding ability on derived compounds, and it was shown that particular interactions correlated especially strongly with both MMGBSA and dcTMD scores
Approaches to analysis of chromosome conformation capture data
The three-dimensional structure of the genome has a rising impact in the research to understand the regulatory mechanisms in eukaryotic cells. Expression-based methods like RNA-Seq can show if a gene is active or inactive; however, they cannot explain why the gene is regulated in this specific way. The chromatin structure, an epigenetic property, is the focus of biomedical researchers to explain the factors involved in the regulation. Enhancer and promoter interactions are one key concept to understand the regulation of genes. However, without wet-lab techniques providing evidence of the interaction of two specific DNA regions containing the enhancer and promoter regions, it is only an interpretation of the data. Chromosome conformation capture (3C) is a technique that can capture the spatial closeness of DNA regions; it is essential to mention that the interaction of these regions is only an interpretation of the spatial closeness. 3C and its derivatives like Hi-C are based on a two dimensional data structure and require, compared to one-dimensional techniques like RNA-Seq or ChIP-Seq, a squared factor of reads for a similar coverage. Hi-C is a genome-wide approach and is the method of choice for coarser analysis; however, it lacks a high read coverage due to the protocol’s economic costs. Specialized but cheaper approaches like capture Hi-C or HiChIP fill this gap but require different analysis methods. Furthermore, Hi-C uses up to a million cells for one sample generation, resulting in an accumulated data profile. To overcome this, single-cell Hi-C exists and provides the foundation to analyze the differing chromatin structure of cell types respectively cell cycles. The analysis of high-throughput sequencing data requires specialized algorithms and methods. In this dissertation, different analysis approaches to analyze chromosome conformation data (3C) have been developed. A particular focus was the 3C derivatives Hi-C, capture Hi-C, and single-cell Hi-C, where improved analysis methods, the adaption of new developments in the wet-lab protocols, the improved data exchange options, and a complexity reduction of the analysis pipeline were contributed. The target users of a Hi-C data analysis software are biomedical researchers without knowledge in computer science. The software has been distributed via package managers like ’Conda’, and a web server, the Galaxy HiCExplorer, was provided to make the software HiCExplorer, scHiCExplorer, and pyGenomeTracks accessible via the software-as-a-service approach. The developed software is also provided as a Docker container, solving software reproducibility and archiving with all its dependencies, and enables fast usage in a cloud environment. In this thesis, I developed a chromatin loop detection algorithm based on continuous negative binomial distributions for Hi-C data. Furthermore, algorithms for a differential analysis of TADs or global comparisons like short-to-long range contact ratios have been created. Visualization options have been programmed for both the Hi-C data itself, as well as to integrate Hi-C with other genomic data. Contributions have been made to extend or integrate quality control tools; unique quality control methods for capture Hi-C and single-cell Hi-C have been added. A method to detect and analyze the large scale of point-to-point interactions, i.e., enhancer-promoter interactions, in the context of capture Hi-C and HiChIP was designed, including features for significance and differential detection. The major contribution to single-cell Hi-C data was by creating a specialized file format to improve the interoperability of single-cell Hi-C experiments. A method to cluster high-dimensional single-cell Hi-C data using approximate k-nearest neighbor graphs was implemented
Going Beyond Counting First Authors in Author Co-citation Analysis
The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation
counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings
are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that
only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into
account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
- …
