1,720,958 research outputs found

    Beyond Supervised Learning: Applications and Implications of Zero-shot Text Classification

    Full text link
    This dissertation explores the application of zero-shot text classification, a technique for categorizing texts without annotated data in the target domain. A true zero-shot setting breaks with the conventions of the traditional supervised machine learning paradigm that relies on quantitative in-domain evaluation for optimization, performance measurement, and model selection. The dissertation summarizes existing research to build a theoretical foundation for zero-shot methods, emphasizing efficiency and transparency. It benchmarks selected approaches across various tasks and datasets to understand their general performance, strengths, and weaknesses, mirroring the model selection process. On this foundation, two case studies demonstrate the application of zero-shot text classification: The first engages with historical German stock market reports, utilizing zero-shot methods for aspect-based sentiment classification. The case study reveals that although there are qualitative differences between finetuned and zero-shot approaches, the aggregated results are not easily distinguishable, sparking a discussion about the practical implications. The second case study integrates zero-shot text classification into a civil engineering document management system, showcasing how the flexibility of zero-shot models and the omission of the training process can benefit the development of prototype software, at the cost of an unknown performance. These findings indicate that, although zero-shot text classification works for the exemplary cases, the results are not generalizable. Taking up the findings of these case studies, the dissertation discusses dilemmas and theoretical considerations that arise from omitting the in-domain evaluation of applying zero-shot text classification. It concludes by advocating a broader focus beyond traditional quantitative metrics in order to build trust in zero-shot text classification, highlighting their practical utility as well as the necessity for further exploration as these technologies evolve.:1 Introduction 1.1 Problem Context 1.2 Related Work 1.3 Research Questions & Contribution 1.4 Author’s Publications 1.5 Structure of This Work 2 Research Context 2.1 The Current State of Text Classification 2.2 Efficiency 2.3 Approaches to Addressing Data Scarcity in Machine Learning 2.4 Challenges of Recent Developments 2.5 Model Sizes and Hardware Resources 2.6 Conclusion 3 Zero-shot Text Classification 3.1 Text Classification 3.2 State-of-the-Art in Text Classification 3.3 Neural Network Approaches to Data-Efficient Text Classification 3.4 Zero-shot Text Classification 3.5 Application 3.6 Requirements for Zero-shot Models 3.7 Approaches to Transfer Zero-shot 3.7.1 Terminology 3.7.2 Similarity-based and Siamese Networks 3.7.3 Language Model Token Predictions 3.7.4 Sentence Pair Classification 3.7.5 Instruction-following Models or Dialog-based Systems 3.8 Class Name Encoding in Text Classification 3.9 Approach Selection 3.10 Conclusion 4 Model Performance Survey 4.1 Experiments 4.1.1 Datasets 4.1.2 Model Selection 4.1.3 Hypothesis Templates 4.2 Zero-shot Model Evaluation 4.3 Dataset Complexity 4.4 Conclusion 5 Case Study: Historic German Stock Market Reports 5.1 Project 5.2 Motivation 5.3 Related Work 5.4 The Corpus and Dataset - Berliner Börsenzeitung 5.4.1 Corpus 5.4.2 Sentiment Aspects 5.4.3 Annotations 5.5 Methodology 5.5.1 Evaluation Approach 5.5.2 Trained Pipeline 5.5.3 Zero-shot Pipeline 5.5.4 Dictionary Pipeline 5.5.5 Tradeoffs 5.5.6 Label Space Definitions 5.6 Evaluation - Comparison of the Pipelines on BBZ 5.6.1 Sentence-based Sentiment 5.6.2 Aspect-based Sentiment 5.6.3 Qualitative Evaluation 5.7 Discussion and Conclusion 6 Case Study: Document Management in Civil Engineering 6.1 Project 6.2 Motivation 6.3 Related Work 6.4 The Corpus and Knowledge Graph 6.4.1 Data 6.4.2 BauGraph – The Knowledge Graph 6.5 Methodology 6.5.1 Document Insertion Pipeline 6.5.2 Frontend Integration 6.6 Discussion and Conclusion 7 MLMC 7.1 How it works 7.2 Motivation 7.3 Extensions of the Framework 7.4 Other Projects 7.4.1 Product Classification 7.4.2 Democracy Monitor 7.4.3 Climate Change Adaptation Finance 7.5 Conclusion 8 Discussion: The Five Dilemmas of Zero-shot 8.1 On Evaluation 8.2 The Five Dilemmas of Zero-shot 8.2.1 Dilemma of Evaluation or Are You Working at All? 8.2.2 Dilemma of Comparison or How Do I Get the Best Model? 8.2.3 Dilemma of Annotation and Label Definition or Are We Talking about the Same Thing? 8.2.4 Dilemma of Interpretation or Am I Biased? 8.2.5 Dilemma of Unsupervised Text Classification or Do I Have to Trust You? 8.3 Trust in Zero-shot Capabilities 8.4 Conclusion 9 Conclusion 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 9.1.1 RQ1: Strengths and Weaknesses . . . . . . . . . . . . . . . . 139 9.1.2 RQ2: Application Studies . . . . . . . . . . . . . . . . . . . . 141 9.1.3 RQ3: Implications . . . . . . . . . . . . . . . . . . . . . . . . 143 9.2 Final Thoughts & Future Directions . . . . . . . . . . . . . . . . . . 144 References 147 A Appendix for Survey Chapter A.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.2 Task-specific Hypothesis Templates . . . . . . . . . . . . . . . . . . 180 A.3 Fractions of SotA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 181 B Uncertainty vs. Accuracy 182 C Declaration of Authorship 185 D Declaration: Use of AI-Tools 186 E Bibliographic Data 187In dieser Dissertation wird die Anwendung von Zero-Shot-Textklassifikation -- die Kategorisierung von Texten ohne annotierte Daten in der Anwendungsdomäne -- untersucht. Ein echter Zero-Shot-Ansatz bricht mit den Konventionen des traditionellen überwachten maschinellen Lernens, welches auf einer quantitativen Evaluierung in der Zieldomäne zur Optimierung, Performanzmessung und Modellauswahl (model selection) basiert. Eine Zusammenfassung bestehender Forschungsarbeiten bildet die theoretische Grundlage für die verwendeten Zero-Shot-Methoden, wobei Effizienz und Transparenz im Vordergrund stehen. Ein Vergleich ausgewählter Ansätze mit verschiedenen Tasks und Datensätzen soll allgemeine Stärken und Schwächen aufzeigen und den Prozess der Modellauswahl widerspiegeln. Auf dieser Grundlage wird die Anwendung der Zero-Shot-Textklassifikation anhand von zwei Fallstudien demonstriert: Die erste befasst sich mit historischen deutschen Börsenberichten, wobei Zero-Shot zur aspekt-basierten Sentiment-Klassifikation eingesetzt wird. Es zeigt sich, dass es zwar qualitative Unterschiede zwischen trainierten und Zero-Shot-Ansätzen gibt, dass die aggregierten Ergebnisse aber nicht leicht zu unterscheiden sind, was Überlegungen zu praktischen Implikationen anstößt. Die zweite Fallstudie integriert Zero-Shot-Textklassifikation in ein Dokumentenmanagementsystem für das Bauwesen und zeigt, wie die Flexibilität von Zero-Shot-Modellen und der Wegfall des Trainingsprozesses die Entwicklung von Prototypen vereinfachen können -- mit dem Nachteil, dass die Genauigkeit des Modells unbekannt bleibt. Die Ergebnisse zeigen, dass die Zero-Shot-Textklassifikation in den Beispielanwendungen zwar annähernd funktioniert, die Ergebnisse aber nicht leicht verallgemeinerbar sind. Im Anschluss werden Dilemmata und theoretische Überlegungen erörtert, die sich aus dem Wegfall der Evaluierung in der Zieldomäne von Zero-Shot-Textklassifikation ergeben. Abschließend wird ein breiterer Fokus über die traditionellen quantitativen Metriken hinaus vorgeschlagen, um Vertrauen in die Zero-Shot-Textklassifikation aufzubauen und den praktischen Nutzen zu verbessern. Die Überlegungen zeigen aber auch die Notwendigkeit weiterer Forschung im Zuge der Weiterentwicklung dieser Technologien.:1 Introduction 1.1 Problem Context 1.2 Related Work 1.3 Research Questions & Contribution 1.4 Author’s Publications 1.5 Structure of This Work 2 Research Context 2.1 The Current State of Text Classification 2.2 Efficiency 2.3 Approaches to Addressing Data Scarcity in Machine Learning 2.4 Challenges of Recent Developments 2.5 Model Sizes and Hardware Resources 2.6 Conclusion 3 Zero-shot Text Classification 3.1 Text Classification 3.2 State-of-the-Art in Text Classification 3.3 Neural Network Approaches to Data-Efficient Text Classification 3.4 Zero-shot Text Classification 3.5 Application 3.6 Requirements for Zero-shot Models 3.7 Approaches to Transfer Zero-shot 3.7.1 Terminology 3.7.2 Similarity-based and Siamese Networks 3.7.3 Language Model Token Predictions 3.7.4 Sentence Pair Classification 3.7.5 Instruction-following Models or Dialog-based Systems 3.8 Class Name Encoding in Text Classification 3.9 Approach Selection 3.10 Conclusion 4 Model Performance Survey 4.1 Experiments 4.1.1 Datasets 4.1.2 Model Selection 4.1.3 Hypothesis Templates 4.2 Zero-shot Model Evaluation 4.3 Dataset Complexity 4.4 Conclusion 5 Case Study: Historic German Stock Market Reports 5.1 Project 5.2 Motivation 5.3 Related Work 5.4 The Corpus and Dataset - Berliner Börsenzeitung 5.4.1 Corpus 5.4.2 Sentiment Aspects 5.4.3 Annotations 5.5 Methodology 5.5.1 Evaluation Approach 5.5.2 Trained Pipeline 5.5.3 Zero-shot Pipeline 5.5.4 Dictionary Pipeline 5.5.5 Tradeoffs 5.5.6 Label Space Definitions 5.6 Evaluation - Comparison of the Pipelines on BBZ 5.6.1 Sentence-based Sentiment 5.6.2 Aspect-based Sentiment 5.6.3 Qualitative Evaluation 5.7 Discussion and Conclusion 6 Case Study: Document Management in Civil Engineering 6.1 Project 6.2 Motivation 6.3 Related Work 6.4 The Corpus and Knowledge Graph 6.4.1 Data 6.4.2 BauGraph – The Knowledge Graph 6.5 Methodology 6.5.1 Document Insertion Pipeline 6.5.2 Frontend Integration 6.6 Discussion and Conclusion 7 MLMC 7.1 How it works 7.2 Motivation 7.3 Extensions of the Framework 7.4 Other Projects 7.4.1 Product Classification 7.4.2 Democracy Monitor 7.4.3 Climate Change Adaptation Finance 7.5 Conclusion 8 Discussion: The Five Dilemmas of Zero-shot 8.1 On Evaluation 8.2 The Five Dilemmas of Zero-shot 8.2.1 Dilemma of Evaluation or Are You Working at All? 8.2.2 Dilemma of Comparison or How Do I Get the Best Model? 8.2.3 Dilemma of Annotation and Label Definition or Are We Talking about the Same Thing? 8.2.4 Dilemma of Interpretation or Am I Biased? 8.2.5 Dilemma of Unsupervised Text Classification or Do I Have to Trust You? 8.3 Trust in Zero-shot Capabilities 8.4 Conclusion 9 Conclusion 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 9.1.1 RQ1: Strengths and Weaknesses . . . . . . . . . . . . . . . . 139 9.1.2 RQ2: Application Studies . . . . . . . . . . . . . . . . . . . . 141 9.1.3 RQ3: Implications . . . . . . . . . . . . . . . . . . . . . . . . 143 9.2 Final Thoughts & Future Directions . . . . . . . . . . . . . . . . . . 144 References 147 A Appendix for Survey Chapter A.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.2 Task-specific Hypothesis Templates . . . . . . . . . . . . . . . . . . 180 A.3 Fractions of SotA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 181 B Uncertainty vs. Accuracy 182 C Declaration of Authorship 185 D Declaration: Use of AI-Tools 186 E Bibliographic Data 18

    Beyond Supervised Learning: Applications and Implications of Zero-shot Text Classification

    No full text
    This dissertation explores the application of zero-shot text classification, a technique for categorizing texts without annotated data in the target domain. A true zero-shot setting breaks with the conventions of the traditional supervised machine learning paradigm that relies on quantitative in-domain evaluation for optimization, performance measurement, and model selection. The dissertation summarizes existing research to build a theoretical foundation for zero-shot methods, emphasizing efficiency and transparency. It benchmarks selected approaches across various tasks and datasets to understand their general performance, strengths, and weaknesses, mirroring the model selection process. On this foundation, two case studies demonstrate the application of zero-shot text classification: The first engages with historical German stock market reports, utilizing zero-shot methods for aspect-based sentiment classification. The case study reveals that although there are qualitative differences between finetuned and zero-shot approaches, the aggregated results are not easily distinguishable, sparking a discussion about the practical implications. The second case study integrates zero-shot text classification into a civil engineering document management system, showcasing how the flexibility of zero-shot models and the omission of the training process can benefit the development of prototype software, at the cost of an unknown performance. These findings indicate that, although zero-shot text classification works for the exemplary cases, the results are not generalizable. Taking up the findings of these case studies, the dissertation discusses dilemmas and theoretical considerations that arise from omitting the in-domain evaluation of applying zero-shot text classification. It concludes by advocating a broader focus beyond traditional quantitative metrics in order to build trust in zero-shot text classification, highlighting their practical utility as well as the necessity for further exploration as these technologies evolve.:1 Introduction 1.1 Problem Context 1.2 Related Work 1.3 Research Questions & Contribution 1.4 Author’s Publications 1.5 Structure of This Work 2 Research Context 2.1 The Current State of Text Classification 2.2 Efficiency 2.3 Approaches to Addressing Data Scarcity in Machine Learning 2.4 Challenges of Recent Developments 2.5 Model Sizes and Hardware Resources 2.6 Conclusion 3 Zero-shot Text Classification 3.1 Text Classification 3.2 State-of-the-Art in Text Classification 3.3 Neural Network Approaches to Data-Efficient Text Classification 3.4 Zero-shot Text Classification 3.5 Application 3.6 Requirements for Zero-shot Models 3.7 Approaches to Transfer Zero-shot 3.7.1 Terminology 3.7.2 Similarity-based and Siamese Networks 3.7.3 Language Model Token Predictions 3.7.4 Sentence Pair Classification 3.7.5 Instruction-following Models or Dialog-based Systems 3.8 Class Name Encoding in Text Classification 3.9 Approach Selection 3.10 Conclusion 4 Model Performance Survey 4.1 Experiments 4.1.1 Datasets 4.1.2 Model Selection 4.1.3 Hypothesis Templates 4.2 Zero-shot Model Evaluation 4.3 Dataset Complexity 4.4 Conclusion 5 Case Study: Historic German Stock Market Reports 5.1 Project 5.2 Motivation 5.3 Related Work 5.4 The Corpus and Dataset - Berliner Börsenzeitung 5.4.1 Corpus 5.4.2 Sentiment Aspects 5.4.3 Annotations 5.5 Methodology 5.5.1 Evaluation Approach 5.5.2 Trained Pipeline 5.5.3 Zero-shot Pipeline 5.5.4 Dictionary Pipeline 5.5.5 Tradeoffs 5.5.6 Label Space Definitions 5.6 Evaluation - Comparison of the Pipelines on BBZ 5.6.1 Sentence-based Sentiment 5.6.2 Aspect-based Sentiment 5.6.3 Qualitative Evaluation 5.7 Discussion and Conclusion 6 Case Study: Document Management in Civil Engineering 6.1 Project 6.2 Motivation 6.3 Related Work 6.4 The Corpus and Knowledge Graph 6.4.1 Data 6.4.2 BauGraph – The Knowledge Graph 6.5 Methodology 6.5.1 Document Insertion Pipeline 6.5.2 Frontend Integration 6.6 Discussion and Conclusion 7 MLMC 7.1 How it works 7.2 Motivation 7.3 Extensions of the Framework 7.4 Other Projects 7.4.1 Product Classification 7.4.2 Democracy Monitor 7.4.3 Climate Change Adaptation Finance 7.5 Conclusion 8 Discussion: The Five Dilemmas of Zero-shot 8.1 On Evaluation 8.2 The Five Dilemmas of Zero-shot 8.2.1 Dilemma of Evaluation or Are You Working at All? 8.2.2 Dilemma of Comparison or How Do I Get the Best Model? 8.2.3 Dilemma of Annotation and Label Definition or Are We Talking about the Same Thing? 8.2.4 Dilemma of Interpretation or Am I Biased? 8.2.5 Dilemma of Unsupervised Text Classification or Do I Have to Trust You? 8.3 Trust in Zero-shot Capabilities 8.4 Conclusion 9 Conclusion 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 9.1.1 RQ1: Strengths and Weaknesses . . . . . . . . . . . . . . . . 139 9.1.2 RQ2: Application Studies . . . . . . . . . . . . . . . . . . . . 141 9.1.3 RQ3: Implications . . . . . . . . . . . . . . . . . . . . . . . . 143 9.2 Final Thoughts & Future Directions . . . . . . . . . . . . . . . . . . 144 References 147 A Appendix for Survey Chapter A.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.2 Task-specific Hypothesis Templates . . . . . . . . . . . . . . . . . . 180 A.3 Fractions of SotA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 181 B Uncertainty vs. Accuracy 182 C Declaration of Authorship 185 D Declaration: Use of AI-Tools 186 E Bibliographic Data 187In dieser Dissertation wird die Anwendung von Zero-Shot-Textklassifikation -- die Kategorisierung von Texten ohne annotierte Daten in der Anwendungsdomäne -- untersucht. Ein echter Zero-Shot-Ansatz bricht mit den Konventionen des traditionellen überwachten maschinellen Lernens, welches auf einer quantitativen Evaluierung in der Zieldomäne zur Optimierung, Performanzmessung und Modellauswahl (model selection) basiert. Eine Zusammenfassung bestehender Forschungsarbeiten bildet die theoretische Grundlage für die verwendeten Zero-Shot-Methoden, wobei Effizienz und Transparenz im Vordergrund stehen. Ein Vergleich ausgewählter Ansätze mit verschiedenen Tasks und Datensätzen soll allgemeine Stärken und Schwächen aufzeigen und den Prozess der Modellauswahl widerspiegeln. Auf dieser Grundlage wird die Anwendung der Zero-Shot-Textklassifikation anhand von zwei Fallstudien demonstriert: Die erste befasst sich mit historischen deutschen Börsenberichten, wobei Zero-Shot zur aspekt-basierten Sentiment-Klassifikation eingesetzt wird. Es zeigt sich, dass es zwar qualitative Unterschiede zwischen trainierten und Zero-Shot-Ansätzen gibt, dass die aggregierten Ergebnisse aber nicht leicht zu unterscheiden sind, was Überlegungen zu praktischen Implikationen anstößt. Die zweite Fallstudie integriert Zero-Shot-Textklassifikation in ein Dokumentenmanagementsystem für das Bauwesen und zeigt, wie die Flexibilität von Zero-Shot-Modellen und der Wegfall des Trainingsprozesses die Entwicklung von Prototypen vereinfachen können -- mit dem Nachteil, dass die Genauigkeit des Modells unbekannt bleibt. Die Ergebnisse zeigen, dass die Zero-Shot-Textklassifikation in den Beispielanwendungen zwar annähernd funktioniert, die Ergebnisse aber nicht leicht verallgemeinerbar sind. Im Anschluss werden Dilemmata und theoretische Überlegungen erörtert, die sich aus dem Wegfall der Evaluierung in der Zieldomäne von Zero-Shot-Textklassifikation ergeben. Abschließend wird ein breiterer Fokus über die traditionellen quantitativen Metriken hinaus vorgeschlagen, um Vertrauen in die Zero-Shot-Textklassifikation aufzubauen und den praktischen Nutzen zu verbessern. Die Überlegungen zeigen aber auch die Notwendigkeit weiterer Forschung im Zuge der Weiterentwicklung dieser Technologien.:1 Introduction 1.1 Problem Context 1.2 Related Work 1.3 Research Questions & Contribution 1.4 Author’s Publications 1.5 Structure of This Work 2 Research Context 2.1 The Current State of Text Classification 2.2 Efficiency 2.3 Approaches to Addressing Data Scarcity in Machine Learning 2.4 Challenges of Recent Developments 2.5 Model Sizes and Hardware Resources 2.6 Conclusion 3 Zero-shot Text Classification 3.1 Text Classification 3.2 State-of-the-Art in Text Classification 3.3 Neural Network Approaches to Data-Efficient Text Classification 3.4 Zero-shot Text Classification 3.5 Application 3.6 Requirements for Zero-shot Models 3.7 Approaches to Transfer Zero-shot 3.7.1 Terminology 3.7.2 Similarity-based and Siamese Networks 3.7.3 Language Model Token Predictions 3.7.4 Sentence Pair Classification 3.7.5 Instruction-following Models or Dialog-based Systems 3.8 Class Name Encoding in Text Classification 3.9 Approach Selection 3.10 Conclusion 4 Model Performance Survey 4.1 Experiments 4.1.1 Datasets 4.1.2 Model Selection 4.1.3 Hypothesis Templates 4.2 Zero-shot Model Evaluation 4.3 Dataset Complexity 4.4 Conclusion 5 Case Study: Historic German Stock Market Reports 5.1 Project 5.2 Motivation 5.3 Related Work 5.4 The Corpus and Dataset - Berliner Börsenzeitung 5.4.1 Corpus 5.4.2 Sentiment Aspects 5.4.3 Annotations 5.5 Methodology 5.5.1 Evaluation Approach 5.5.2 Trained Pipeline 5.5.3 Zero-shot Pipeline 5.5.4 Dictionary Pipeline 5.5.5 Tradeoffs 5.5.6 Label Space Definitions 5.6 Evaluation - Comparison of the Pipelines on BBZ 5.6.1 Sentence-based Sentiment 5.6.2 Aspect-based Sentiment 5.6.3 Qualitative Evaluation 5.7 Discussion and Conclusion 6 Case Study: Document Management in Civil Engineering 6.1 Project 6.2 Motivation 6.3 Related Work 6.4 The Corpus and Knowledge Graph 6.4.1 Data 6.4.2 BauGraph – The Knowledge Graph 6.5 Methodology 6.5.1 Document Insertion Pipeline 6.5.2 Frontend Integration 6.6 Discussion and Conclusion 7 MLMC 7.1 How it works 7.2 Motivation 7.3 Extensions of the Framework 7.4 Other Projects 7.4.1 Product Classification 7.4.2 Democracy Monitor 7.4.3 Climate Change Adaptation Finance 7.5 Conclusion 8 Discussion: The Five Dilemmas of Zero-shot 8.1 On Evaluation 8.2 The Five Dilemmas of Zero-shot 8.2.1 Dilemma of Evaluation or Are You Working at All? 8.2.2 Dilemma of Comparison or How Do I Get the Best Model? 8.2.3 Dilemma of Annotation and Label Definition or Are We Talking about the Same Thing? 8.2.4 Dilemma of Interpretation or Am I Biased? 8.2.5 Dilemma of Unsupervised Text Classification or Do I Have to Trust You? 8.3 Trust in Zero-shot Capabilities 8.4 Conclusion 9 Conclusion 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 9.1.1 RQ1: Strengths and Weaknesses . . . . . . . . . . . . . . . . 139 9.1.2 RQ2: Application Studies . . . . . . . . . . . . . . . . . . . . 141 9.1.3 RQ3: Implications . . . . . . . . . . . . . . . . . . . . . . . . 143 9.2 Final Thoughts & Future Directions . . . . . . . . . . . . . . . . . . 144 References 147 A Appendix for Survey Chapter A.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.2 Task-specific Hypothesis Templates . . . . . . . . . . . . . . . . . . 180 A.3 Fractions of SotA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 181 B Uncertainty vs. Accuracy 182 C Declaration of Authorship 185 D Declaration: Use of AI-Tools 186 E Bibliographic Data 18

    Beyond Supervised Learning: Applications and Implications of Zero-shot Text Classification

    No full text
    This dissertation explores the application of zero-shot text classification, a technique for categorizing texts without annotated data in the target domain. A true zero-shot setting breaks with the conventions of the traditional supervised machine learning paradigm that relies on quantitative in-domain evaluation for optimization, performance measurement, and model selection. The dissertation summarizes existing research to build a theoretical foundation for zero-shot methods, emphasizing efficiency and transparency. It benchmarks selected approaches across various tasks and datasets to understand their general performance, strengths, and weaknesses, mirroring the model selection process. On this foundation, two case studies demonstrate the application of zero-shot text classification: The first engages with historical German stock market reports, utilizing zero-shot methods for aspect-based sentiment classification. The case study reveals that although there are qualitative differences between finetuned and zero-shot approaches, the aggregated results are not easily distinguishable, sparking a discussion about the practical implications. The second case study integrates zero-shot text classification into a civil engineering document management system, showcasing how the flexibility of zero-shot models and the omission of the training process can benefit the development of prototype software, at the cost of an unknown performance. These findings indicate that, although zero-shot text classification works for the exemplary cases, the results are not generalizable. Taking up the findings of these case studies, the dissertation discusses dilemmas and theoretical considerations that arise from omitting the in-domain evaluation of applying zero-shot text classification. It concludes by advocating a broader focus beyond traditional quantitative metrics in order to build trust in zero-shot text classification, highlighting their practical utility as well as the necessity for further exploration as these technologies evolve.:1 Introduction 1.1 Problem Context 1.2 Related Work 1.3 Research Questions & Contribution 1.4 Author’s Publications 1.5 Structure of This Work 2 Research Context 2.1 The Current State of Text Classification 2.2 Efficiency 2.3 Approaches to Addressing Data Scarcity in Machine Learning 2.4 Challenges of Recent Developments 2.5 Model Sizes and Hardware Resources 2.6 Conclusion 3 Zero-shot Text Classification 3.1 Text Classification 3.2 State-of-the-Art in Text Classification 3.3 Neural Network Approaches to Data-Efficient Text Classification 3.4 Zero-shot Text Classification 3.5 Application 3.6 Requirements for Zero-shot Models 3.7 Approaches to Transfer Zero-shot 3.7.1 Terminology 3.7.2 Similarity-based and Siamese Networks 3.7.3 Language Model Token Predictions 3.7.4 Sentence Pair Classification 3.7.5 Instruction-following Models or Dialog-based Systems 3.8 Class Name Encoding in Text Classification 3.9 Approach Selection 3.10 Conclusion 4 Model Performance Survey 4.1 Experiments 4.1.1 Datasets 4.1.2 Model Selection 4.1.3 Hypothesis Templates 4.2 Zero-shot Model Evaluation 4.3 Dataset Complexity 4.4 Conclusion 5 Case Study: Historic German Stock Market Reports 5.1 Project 5.2 Motivation 5.3 Related Work 5.4 The Corpus and Dataset - Berliner Börsenzeitung 5.4.1 Corpus 5.4.2 Sentiment Aspects 5.4.3 Annotations 5.5 Methodology 5.5.1 Evaluation Approach 5.5.2 Trained Pipeline 5.5.3 Zero-shot Pipeline 5.5.4 Dictionary Pipeline 5.5.5 Tradeoffs 5.5.6 Label Space Definitions 5.6 Evaluation - Comparison of the Pipelines on BBZ 5.6.1 Sentence-based Sentiment 5.6.2 Aspect-based Sentiment 5.6.3 Qualitative Evaluation 5.7 Discussion and Conclusion 6 Case Study: Document Management in Civil Engineering 6.1 Project 6.2 Motivation 6.3 Related Work 6.4 The Corpus and Knowledge Graph 6.4.1 Data 6.4.2 BauGraph – The Knowledge Graph 6.5 Methodology 6.5.1 Document Insertion Pipeline 6.5.2 Frontend Integration 6.6 Discussion and Conclusion 7 MLMC 7.1 How it works 7.2 Motivation 7.3 Extensions of the Framework 7.4 Other Projects 7.4.1 Product Classification 7.4.2 Democracy Monitor 7.4.3 Climate Change Adaptation Finance 7.5 Conclusion 8 Discussion: The Five Dilemmas of Zero-shot 8.1 On Evaluation 8.2 The Five Dilemmas of Zero-shot 8.2.1 Dilemma of Evaluation or Are You Working at All? 8.2.2 Dilemma of Comparison or How Do I Get the Best Model? 8.2.3 Dilemma of Annotation and Label Definition or Are We Talking about the Same Thing? 8.2.4 Dilemma of Interpretation or Am I Biased? 8.2.5 Dilemma of Unsupervised Text Classification or Do I Have to Trust You? 8.3 Trust in Zero-shot Capabilities 8.4 Conclusion 9 Conclusion 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 9.1.1 RQ1: Strengths and Weaknesses . . . . . . . . . . . . . . . . 139 9.1.2 RQ2: Application Studies . . . . . . . . . . . . . . . . . . . . 141 9.1.3 RQ3: Implications . . . . . . . . . . . . . . . . . . . . . . . . 143 9.2 Final Thoughts & Future Directions . . . . . . . . . . . . . . . . . . 144 References 147 A Appendix for Survey Chapter A.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.2 Task-specific Hypothesis Templates . . . . . . . . . . . . . . . . . . 180 A.3 Fractions of SotA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 181 B Uncertainty vs. Accuracy 182 C Declaration of Authorship 185 D Declaration: Use of AI-Tools 186 E Bibliographic Data 187In dieser Dissertation wird die Anwendung von Zero-Shot-Textklassifikation -- die Kategorisierung von Texten ohne annotierte Daten in der Anwendungsdomäne -- untersucht. Ein echter Zero-Shot-Ansatz bricht mit den Konventionen des traditionellen überwachten maschinellen Lernens, welches auf einer quantitativen Evaluierung in der Zieldomäne zur Optimierung, Performanzmessung und Modellauswahl (model selection) basiert. Eine Zusammenfassung bestehender Forschungsarbeiten bildet die theoretische Grundlage für die verwendeten Zero-Shot-Methoden, wobei Effizienz und Transparenz im Vordergrund stehen. Ein Vergleich ausgewählter Ansätze mit verschiedenen Tasks und Datensätzen soll allgemeine Stärken und Schwächen aufzeigen und den Prozess der Modellauswahl widerspiegeln. Auf dieser Grundlage wird die Anwendung der Zero-Shot-Textklassifikation anhand von zwei Fallstudien demonstriert: Die erste befasst sich mit historischen deutschen Börsenberichten, wobei Zero-Shot zur aspekt-basierten Sentiment-Klassifikation eingesetzt wird. Es zeigt sich, dass es zwar qualitative Unterschiede zwischen trainierten und Zero-Shot-Ansätzen gibt, dass die aggregierten Ergebnisse aber nicht leicht zu unterscheiden sind, was Überlegungen zu praktischen Implikationen anstößt. Die zweite Fallstudie integriert Zero-Shot-Textklassifikation in ein Dokumentenmanagementsystem für das Bauwesen und zeigt, wie die Flexibilität von Zero-Shot-Modellen und der Wegfall des Trainingsprozesses die Entwicklung von Prototypen vereinfachen können -- mit dem Nachteil, dass die Genauigkeit des Modells unbekannt bleibt. Die Ergebnisse zeigen, dass die Zero-Shot-Textklassifikation in den Beispielanwendungen zwar annähernd funktioniert, die Ergebnisse aber nicht leicht verallgemeinerbar sind. Im Anschluss werden Dilemmata und theoretische Überlegungen erörtert, die sich aus dem Wegfall der Evaluierung in der Zieldomäne von Zero-Shot-Textklassifikation ergeben. Abschließend wird ein breiterer Fokus über die traditionellen quantitativen Metriken hinaus vorgeschlagen, um Vertrauen in die Zero-Shot-Textklassifikation aufzubauen und den praktischen Nutzen zu verbessern. Die Überlegungen zeigen aber auch die Notwendigkeit weiterer Forschung im Zuge der Weiterentwicklung dieser Technologien.:1 Introduction 1.1 Problem Context 1.2 Related Work 1.3 Research Questions & Contribution 1.4 Author’s Publications 1.5 Structure of This Work 2 Research Context 2.1 The Current State of Text Classification 2.2 Efficiency 2.3 Approaches to Addressing Data Scarcity in Machine Learning 2.4 Challenges of Recent Developments 2.5 Model Sizes and Hardware Resources 2.6 Conclusion 3 Zero-shot Text Classification 3.1 Text Classification 3.2 State-of-the-Art in Text Classification 3.3 Neural Network Approaches to Data-Efficient Text Classification 3.4 Zero-shot Text Classification 3.5 Application 3.6 Requirements for Zero-shot Models 3.7 Approaches to Transfer Zero-shot 3.7.1 Terminology 3.7.2 Similarity-based and Siamese Networks 3.7.3 Language Model Token Predictions 3.7.4 Sentence Pair Classification 3.7.5 Instruction-following Models or Dialog-based Systems 3.8 Class Name Encoding in Text Classification 3.9 Approach Selection 3.10 Conclusion 4 Model Performance Survey 4.1 Experiments 4.1.1 Datasets 4.1.2 Model Selection 4.1.3 Hypothesis Templates 4.2 Zero-shot Model Evaluation 4.3 Dataset Complexity 4.4 Conclusion 5 Case Study: Historic German Stock Market Reports 5.1 Project 5.2 Motivation 5.3 Related Work 5.4 The Corpus and Dataset - Berliner Börsenzeitung 5.4.1 Corpus 5.4.2 Sentiment Aspects 5.4.3 Annotations 5.5 Methodology 5.5.1 Evaluation Approach 5.5.2 Trained Pipeline 5.5.3 Zero-shot Pipeline 5.5.4 Dictionary Pipeline 5.5.5 Tradeoffs 5.5.6 Label Space Definitions 5.6 Evaluation - Comparison of the Pipelines on BBZ 5.6.1 Sentence-based Sentiment 5.6.2 Aspect-based Sentiment 5.6.3 Qualitative Evaluation 5.7 Discussion and Conclusion 6 Case Study: Document Management in Civil Engineering 6.1 Project 6.2 Motivation 6.3 Related Work 6.4 The Corpus and Knowledge Graph 6.4.1 Data 6.4.2 BauGraph – The Knowledge Graph 6.5 Methodology 6.5.1 Document Insertion Pipeline 6.5.2 Frontend Integration 6.6 Discussion and Conclusion 7 MLMC 7.1 How it works 7.2 Motivation 7.3 Extensions of the Framework 7.4 Other Projects 7.4.1 Product Classification 7.4.2 Democracy Monitor 7.4.3 Climate Change Adaptation Finance 7.5 Conclusion 8 Discussion: The Five Dilemmas of Zero-shot 8.1 On Evaluation 8.2 The Five Dilemmas of Zero-shot 8.2.1 Dilemma of Evaluation or Are You Working at All? 8.2.2 Dilemma of Comparison or How Do I Get the Best Model? 8.2.3 Dilemma of Annotation and Label Definition or Are We Talking about the Same Thing? 8.2.4 Dilemma of Interpretation or Am I Biased? 8.2.5 Dilemma of Unsupervised Text Classification or Do I Have to Trust You? 8.3 Trust in Zero-shot Capabilities 8.4 Conclusion 9 Conclusion 9.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 9.1.1 RQ1: Strengths and Weaknesses . . . . . . . . . . . . . . . . 139 9.1.2 RQ2: Application Studies . . . . . . . . . . . . . . . . . . . . 141 9.1.3 RQ3: Implications . . . . . . . . . . . . . . . . . . . . . . . . 143 9.2 Final Thoughts & Future Directions . . . . . . . . . . . . . . . . . . 144 References 147 A Appendix for Survey Chapter A.1 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 A.2 Task-specific Hypothesis Templates . . . . . . . . . . . . . . . . . . 180 A.3 Fractions of SotA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 181 B Uncertainty vs. Accuracy 182 C Declaration of Authorship 185 D Declaration: Use of AI-Tools 186 E Bibliographic Data 18

    Forschen mit COLIBRI: Herausforderungen und Potenziale der wissenschaftlichen Erschließung und Analyse einer digitalen Sammlung

    No full text
    Vor vier Jahren startete zum Weltkindertag das DFG-geförderte Digitalisierungsprojekt COLIBRI, bei dem die Universitätsbibliothek Bielefeld federführende Akteurin gewesen ist. Die Aufgabe: Die deutschsprachige Kinder- und Jugendliteratur des 19. Jahrhunderts erstens in ihrer thematischen und literarischen Vielfalt in Gänze abzubilden und sie zweitens Forschenden und am Gegenstand Interessierten zugänglich zu machen. Mit annähernd 15.000 Titeln ist COLIBRI nun eine der weltweit größten digitalen Kinderbuchkollektionen und bildet im aktuellen Pilotprojekt „Buchkindheiten digital“ die Datengrundlage für die Betrachtung und Beurteilung des Spiel- und Leseverhaltens von Mädchen und Jungen zwischen 1801 und 1914. Der Fokus liegt dabei auf den Illustrationen der historischen Kinder- und Jugendbücher, die bildanalytisch mit Methoden der Digital Humanities untersucht werden. Die Präsentation skizziert einerseits die Herausforderungen im Workflow der Forschungsgruppe von der automatischen Bildextraktion über die Szenendetektion mittels verschiedener Vision Language Modelle (VLMs) bis hin zur Übertragung der Ergebnisse in ein Scalable Viewing Verfahren. Sie umreißt andererseits aber auch die Potenziale, die eine Sammlung wie COLIBRI für die digitale (Kinder- und Jugendliteratur-)Forschung bereithält

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Variations on the Author

    Full text link
    “Variations on the Author” discusses two of Eduardo Coutinho’s recent films (Um Dia na Vida, from 2010, and Últimas Conversas, posthumously released in 2015) and their contribution to the general question of documentary authorship. The director’s filmography is characterized by a consistent yet self-effacing form of authorial self-inscription: Coutinho often features as an interviewer that rather than express opinions propels discourses; an interviewer that is good at listening. This mode of self-inscription characterizes him as an author who is not expressive but who is nonetheless markedly present on the screen. In Um Dia na Vida, however, Coutinho is completely absent form the image, while Últimas Conversas, on the contrary, includes a confessional prologue that moves the director from the margins to the center of his films. This article examines the ways in which these works stand out in the filmography of a director who offers new insights into the notion of cinematic authorship

    Appropriate Similarity Measures for Author Cocitation Analysis

    Full text link
    We provide a number of new insights into the methodological discussion about author cocitation analysis. We first argue that the use of the Pearson correlation for measuring the similarity between authors’ cocitation profiles is not very satisfactory. We then discuss what kind of similarity measures may be used as an alternative to the Pearson correlation. We consider three similarity measures in particular. One is the well-known cosine. The other two similarity measures have not been used before in the bibliometric literature. Finally, we show by means of an example that our findings have a high practical relevance.information science;Pearson correlation;cosine;similarity measure;author cocitation analysis

    Dispelling the Myths Behind First-author Citation Counts

    Full text link
    We conducted a full-scale evaluative citation analysis study of scholars in the XML research field to explore just how different from each other author rankings resulting from different citation counting methods actually are, and to demonstrate the capability of emerging data and tools on the Web in supporting more realistic citation counting methods. Our results contest some common arguments for the continued use of first-author citation counts in the evaluation of scholars, such as high correlations between author rankings by first-author citation counts and other citation counting methods, and high costs of using more realistic citation counting methods that are not well-supported by the ISI databases. It is argued that increasingly available digital full text research papers make it possible for citation analysis studies to go beyond what the ISI databases have directly supported and to employ more sophisticated methods

    Author Index

    No full text
    Nao informado
    corecore