1,720,971 research outputs found

    Football analytics: a bibliometric study about the last decade contributions

    Full text link
    Machine learning and digitization tools are exponentially increasing in these last years and their applications are reflected in different areas of our life: in particular, this article has the aim to focus on football (i.e. soccer for Americans), the most practised sport in the world. Due to needing of professional teams, an- alytics tools in football are becoming a crucial point, in order to help technical staff, scouting and clubs management in policy evaluation and to optimize strate- gic decisions. In this article we propose an original bibliometric analysis about football analytics in the decade 2010-2020, thanks the powerful R package Bibliometrix and the well-known bibliometric database SCOPUS. The main goal is to understand better what already exist in football analytics literature and what not, in order to suggest future researchers to find new topics or to refine existing tools. Furthermore, our intention is to show some results starting from the sources production distribution, then focus on the most productive research groups and their countries, discover the most dynamic authors and highlight topics trend thanks keywords, during these last ten years. Finally, three relevant articles that summaries the most important themes are presented.Machine learning and digitization tools are exponentially increasing inthese last years and their applications are reected in dierent areas of ourlife: in particular, this article has the aim to focus on football (i.e. soc-cer for Americans), the most practised sport in the world. Due to needingof professional teams, an- alytics tools in football are becoming a crucialpoint, in order to help technical sta, scouting and clubs management inpolicy evaluation and to optimize strate- gic decisions. In this article we pro-pose an original bibliometric analysis about football analytics in the decade2010-2020, thanks the powerful R package Bibliometrix and the well-knownbibliometric database SCOPUS. The main goal is to understand better whatalready exist in football analytics literature and what not, in order to sug-gest future researchers to nd new topics or to rene existing tools. Fur-thermore, our intention is to show some results starting from the sourcesproduction distribution, then focus on the most productive research groupsand their countries, discover the most dynamic authors and highlight topicstrend thanks keywords, during these last ten years. Finally, three relevantarticles that summaries the most important themes are presented

    The higher-order PLS-SEM confirmatory approach for composite indicators of football performance quality

    No full text
    Supporting the strategic decisions of a football team’s management is becoming crucial. We create some new composite indicators to measure the performance quality, applying both Confirmatory Tetrad Analysis (CTA) and Confirmatory Composite Analysis (CCA) to a Third-Order Partial Least Squares Structural Equation Model (PLS-SEM). To do this, data provided by Electronic Arts (EA) Sports experts and available on the Kaggle data science platform has been used; in particular, the dataset was composed of 29 Key Performance Indices defined by EA Sports experts, concerning the top 5 European leagues. A PLS-SEM for each player’s role was developed, relying on the most recent season, 2021/2022. In order to improve each model, a CTA to evaluate the nature of the constructs (formative or reflective) and a CCA were applied. The results underline how some sub-areas of performance have different significance weights depending on the player’s role; as concurrent and predictive analysis, our third-order Player Indicator overall was compared with the existing EA overall and with some performance quality proxies, such as the player’s market value and wage, showing interesting and consistent relations

    STATISTICAL METHODS AND TOOLS FOR FOOTBALL ANALYTICS

    Full text link
    Gli strumenti di digitalizzazione e di machine learning hanno avuto una crescita esponenziale nel corso degli ultimi anni e tutto ciò ha riguardato di riflesso i più svariati settori della nostra vita: in particolar modo, questa tesi ha l'obiettivo di focalizzarsi sulla sport analytics, in particolare sul calcio, lo sport più praticato al mondo. A causa della crescente necessità dei club professionistici, gli strumenti analitici nel calcio stanno diventando uno snodo cruciale per aiutare gli staff tecnici, le aree scouting e i management nell'ottimizzare e nel prendere decisioni; per questa ragione, in questa tesi sono state sviluppate diverse applicazioni statistiche, una per ogni capitolo, ognuna corrispondente ad un articolo scientifico pubblicato o in revisione da parte di una rivista scientifica. Nell'introduzione della tesi sono elencate le principali attività svolte durante il periodo di dottorato, seguite dal primo capitolo dedicato alla revisione della letteratura, effettuato in forma analitica grazie ad un originale analisi bibliometrica sugli ultimi 10 anni di produzione scientifica. Il secondo capitolo è dedicato ad un approfondimento metodologico sul Partial Least Squares Structural Equation Modeling (PLS-SEM), metodologia statistica utilizzata per la creazione di indicatori compositi volti ad analizzare la performance dei giocatori, tramite l'utilizzo di dati forniti dagli esperti di Electronic Arts (EA) e disponibili sulla piattaforma di data science Kaggle; nella seconda parte del capitolo è presente l'applicazione sviluppata, in particolare un modello gerarchico del terzo ordine utilizzando i Key Performance Indices di sofifa per calcolare un indicatore composito differenziato per ogni ruolo. Nel terzo capitolo il modello sviluppato nel capitolo precedente è stato rifinito e validato per ogni ruolo, applicando una Confirmatory Tetrad Analysis (CTA) e una Confirmatory Composite Analysis (CCA), utilizzando i dati relativi ai più recenti campionati (stagione 2021/2022); i risultati ottenuti sottolineano come le diverse aree e sottoaree di performance hanno diversi pesi e valori a seconda del ruolo del giocatore. Infine, con lo scopo di valutare la validità predittiva del modello, il nuovo indicatore composito (PI) overall è stato confrontato con un benchmark (EA overall) e con delle variabili proxy come il valore di mercato e l'ingaggio dei giocatori, ottenendo dei risultati interessanti e significativi. A questo punto, nell'ultimo capitolo gli indicatori compositi sviluppati in precedenza sono stati introdotti come regressori nel modello di expected goal (xG), con lo scopo di migliorarne l'accuratezza predittiva. Il modello xG è infatti uno dei modelli emergenti nel mondo della football analytics e ha lo scopo di prevedere i goal e misurarne la qualità. Per fare questo è stato applicato un modello logistico classico ed un modello logistico aggiustato su diversi scenari per campioni bilanciati. Nella fattispece, alcuni indicatori compositi e altri nuovi regressori (variabili di tracking) sono risultati significativi per il modello di classificazione, contribuendo a migliorare l'accuratezza nella predizione dei goal, confrontandolo con un benchmark.Machine learning and digitization tools are exponentially increasing in these last years and their applications are reflected in different areas of our life: in particular, this thesis aims to focus on football (i.e. soccer for Americans), the most practised sport in the world. Due to needing of professional teams, analytics tools in football are becoming a crucial point, in order to help technical staff, scouting and clubs management in policy evaluation and to optimize strategic decisions; for this reason, different statistical applications have been developed, one for each chapter, corresponding to published or submitted scientific articles. In the first part are presented the main activities I attended during my PhD, then the first chapter is dedicated to literature review, by an original bibliometric analysis relying football analytics development in the decade 2010-2020. The following chapter is designated for in-depth the Partial Least Squares Structural Equation Modeling (PLS-SEM) framework, in order to study and create some original composite indicators for players performance using data provided by Electronic Arts (EA) experts and available on the Kaggle data science platform; in particular, a Third-Order PLS-PM approach was adopted on the sofifa Key Performance Indices, in order to compute a composite indicator differentiated by role. In the next chapter the PLS-SEM model has been refined and validated, applying both Confirmatory Tetrad Analysis (CTA) and Confirmatory Composite Analysis (CCA), using EA \emph{sofifa} data relying the most recent football season (2021/2022); the final results underline how some sub-areas of performance have different significance weights depending on the player's role; as concurrent and predictive analysis, the new Player Indicator (PI) overall was compared with a benchmark (the EA overall) and with some performance quality proxies, such as players' market value and wage, showing interesting and consistent relations. At this point, these original composite indicators have been introduced as regressors in the last chapter for improving in terms of prediction performance the expected goal (xG) model; it is one emerging tool in the field of football analytics, that aims to predict goal and measure the quality of each shot, by applying a supervised machine learning approach (logit model) on different scenarios for sample balanced techniques. In particular, some performance composite indicators obtained by the PLS-SEM and some original tracking variables are significant for the classification model, contributing to increase the goal prediction probability, compared with a benchmark

    Accuracy and explainability of statistical and machine learning xG models in football

    No full text
    This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle

    Temi d'esame con soluzione

    No full text
    La statistica è un importante strumento per comprendere e interpretare il mondo che ci circonda. Utilizzata in ambito accademico e professionale, riveste un ruolo fondamentale nella raccolta, l’analisi e l’interpretazione dei dati. Questo volumetto è stato creato con l’intento di fornire agli studenti un supporto nello studio della statistica, offrendo soluzioni dettagliate ed esaustive degli esercizi e dei quesiti proposti nei più recenti esami, in formato di test a risposta chiusa. (dalla Premessa

    Detecting Causal Relations Among Indicators with the CTA Test: Simulations and Applications

    Full text link
    In the context of using structural equation modelling to develop economic and social indicators, a debate regarding the choice of measurement modes for theoretical constructs is becoming a very important issue, with conceptual and practical implications. The nature of each construct, which can be defined as reflective or formative, is mainly based on theoretical considerations, but confirmatory tetrad analysis (CTA) can support decisions about the model specification. One flexible approach to carrying out CTA involves multiple hypothesis testing, which also provides relevant information on empirical data to guide the construction of composite indicators. This prompts a deeper investigation of the effects of correction methods on decisions derived from tests, with special attention to error control and statistical power. In this study, we explore the properties of six procedures, in particular the well-known Bonferroni and Benjamini–Hochberg corrections, using various simulation scenarios and real applications. We find that, with respect to the Benjamini–Hochberg, the Bonferroni correction is too conservative and has lower power, especially with small sample sizes and many manifest variables

    On CTA-PLS corrections applied on sports performance

    Full text link
    This work explores a novel approach for assessing causal directions in measurement models and structural equation models with higher-order constructs. This extension of CTA-PLS incorporates different methods for controlling errors in multiple hypothesis testing, adapting them to the soft modeling context and highlighting their relevance during exploratory model construction. The CTA-PLS corrections method is applied to a second-order construct for performance assessment in sports analytics

    Training Load, Official Match Locomotor Demand, and Their Association in Top-Class Soccer Players During a Full Competitive Season

    Full text link
    : Riboli, A, Nardi, F, Osti, M, Cefis, M, Tesoro, G, and Mazzoni, S. Training load, official match locomotor demand, and their association in top-class soccer players during a full competitive season. J Strength Cond Res 39(2): 249-259, 2025-To examine training load and official match locomotor demands of top-class soccer players during a full competitive season and to evaluate their association. Twenty-five top-class soccer players competing in UEFA international competitions were included. The season was divided into 2 different categories: 2 matches (M2) or 3 matches (M3) in 8 days. Starters and nonstarters were classified. Total distance (TD), high-speed running (HSR, 15-20 km·h-1), very high-speed running (VHSR, 20.1-24 km·h-1), sprint (SPR, >24.1 km·h-1), and accelerations/decelerations (Acc + Dec, >3 m·s2) were recorded. Trivial to moderate differences (p 0.05) in M2 and M3 with ∼5 to ∼29% match-to-match variability depending on metrics. Total load (i.e., training plus match loads) was higher (p < 0.05, ES: 0.75/1.61) in starters than nonstarters, because of a higher match load and no difference in the training load. Very high-speed running and SPR accumulated during training sessions were largely to very largely (r = 0.60 to 0.72) associated with TD, HSR, VHSR, and Acc + Dec covered during official match; VHSR and TD during training were largely to very largely (r = 0.57 and 0.71) associated with SPR and Acc + Dec during official match. In conclusion, (a) congested periods seemed to not affect official match locomotor performance; (b) practitioners may consider high week-by-week workload variability for individualizing training prescriptions, especially for nonstarters; and (c) the VHSR and SPR accumulated during training were associated with the official match locomotor demands, and it may be considered for maximizing performance
    corecore