Pakistan Journal of Statistics and Operation Research
Not a member yet
861 research outputs found
Sort by
Empirical Performance of Nonparametric Regression with Heteroscedasticity
Heteroscedasticity is a well-known violation of an assumption in parametric regression analysis. In such cases, to handle this problem, a generalized least squares method is used. In this article, we have manifested the robustness of nonparametric regression in the case of heteroscedastic errors. Nonparametric regression is a robust method that proceeds without requiring inflexible assumptions from the model. We empirically compared the performance of the generalized least squares method with multivariate nonparametric kernel regression. Multivariate nonparametric kernel regression is used with a Gaussian kernel and six bandwidths on China's per capita consumption expenditure. The performance of nonparametric regression with Bayesian bandwidth was found better on the basis of mean squared error. Simulation results are also presented, with their graphical representation, where nonparametric regression with different bandwidths at different heteroscedastic levels is observed, and we found that our proposed method performed best in both presence and absence of homoscedasticity
Predictive Accuracy of Logistic Regression and Support Vector Machine for Short Interpregnancy Interval
Support vector machine (SVM) is considered a robust machine learning (ML) algorithm. In contrast, Logistic regression (LR) is the most preferred statistical model especially in healthcare and medical field due to its interpretability and mathematical foundations. Considering the competitive characteristics of these models, the predictive and discriminative strength of these models have been tested in this study. Short interpregnancy interval (SIPI) is a global public health issue and is associated with several feto-maternal complications. This study aims to identify the risk factors of SIPI and compare the predictive accuracy of LR vs SVM. Further, feature importance of both models will also be computed and compared. This study was conducted on 528 Pakistani pregnant females and their status of SIPI was predicted through number of risk factors. Various evaluation matrices have been computed to assess the superiority of model. Results have shown that the overall accuracy for LR was 83.14, while Sensitivity, Specificity, PPV, and NPV were 81.6%, 85.23%, 84.58% and 81.82%, respectively. The discriminating strength of this model is 92.1% and examined through receiver operating characteristic (ROC) curve. SVM yielded 94.70% accuracy, with Sensitivity, Specificity, PPV, and NPV as 95.08%, 94.32%, 94.36% and 95.04%, respectively. Further, ROC value was 98.83%. These findings suggests that SVM is better algorithm in predicting SIPI. All measures of predictive analysis as well as model fit indices were better in SVM. Hence, SVM is a comprehensive, interactive, flexible and accurate ML tool that can be used for better predictions of risk factors of SIPI compared to LR. Further, this ML algorithm is free from certain statistical assumptions like linearity of logits, model specification and weak multicollinearity as required in LR models
New highly accurate improvements for single-term approximations of the standard normal distribution function
This paper proposed new, highly accurate, single-term, and explicitly invertible approximations for the standard normal distribution function and its related functions, such as the error function and the quantile function. The proposed approximations are built based on some existing approximations, however, the proposed ones are much more accurate. The accuracy of the proposed approximations is measured via maximum absolute error and mean absolute error. Some of the proposed approximations are at least five times more accurate than the original ones and two of them have maximum absolute error lower than 1.8×10-4, which is quite sufficient for most of real-world applications. Two real applications are studied to show the applicability of the proposed improvements. These applications showed the superiority of one of the proposed approximations over some of the available single-term approximations even though the latter have smaller maximum absolute error
Characterizations of the Recently Introduced Discrete Distributions II
Certain characterizations of 19 recently introduced discrete distributions are presented in three directions: (i) based on an appropriate function of the random variable; (ii) in terms of the reverse hazard function and (iii) in terms of the hazard function. This is a continuation of our previous work with the same title
Modeling Anemia Dynamics Among Women of Reproductive Age Using Topp-Leone Exponentiated Generalized Exponential (TLEG-E) Distribution
Anemia continues to be a significant public health issue, particularly impacting women aged 15 to 49. To improve the modeling of anemia prevalence, this study introduces the proposed distribution, offering enhanced flexibility for capturing skewed and heavy-tailed data structures. The model is applied to country-level data from Pakistan, with global trends from World Bank data serving as a comparative backdrop. The TLEG-E distribution demonstrates superior fit and interpretability compared to traditional models, effectively highlighting a declining trend in anemia among Pakistani women, potentially reflecting the impact of health policy reforms and improved nutritional access. While global prevalence varies widely across regions, the emphasis here lies in the methodological advancement and its utility for health data modeling. The proposed framework provides a robust statistical foundation for tracking anemia trends and can support more targeted policy interventions. Its adaptability makes it suitable for broader applications in epidemiological research, enabling more precise assessments of public health initiatives across diverse populations
A New Model for Reliability Value-at-Risk Assessments with Applications, Different Methods for Estimation, Non-parametric Hill Estimator and PORT-VaRq Analysis
This paper introduces a new extension of the exponential distribution tailored for enhanced reliability and risk analysis. We incorporate several insurance risk indicators like the value-at-risk, tail mean-variance, tail value-at-risk, tail variance, and maximum excess loss to significantly refine reliability risk assessments. These indicators offer vital insights into the financial consequences of extreme risk events and potential for substantial losses. To assess these risk indicators, we explore various non-Bayesian estimation techniques, including maximum likelihood estimation, ordinary least squares estimation, Anderson-Darling estimation, right tail Anderson-Darling estimation, and left tail Anderson-Darling estimation of the second order. Our approach involves a comprehensive simulation study with varying sample sizes, followed by empirical risk analysis using these methods. We also evaluate the applicability of the new model on two real reliability data sets. Finally, we apply the risk indicators including the value-at-risk (VaRq), tail mean-variance (TMVq), tail value-at-risk (TVaRq), tail variance (TVq) and maximum excess loss (MELq) to analyze reliability risk using failure (relief) and survival data. Finally the peaks over a random threshold value-at-risk (PORT-VaRq) analysis under the failure and survival data is presented
Comparison of Metaheuristic Algorithms for Maximum Likelihood Estimation of the Transmuted Weibull Distribution with Applications
The Weibull distribution, widely utilized due to its flexibility, often requires generalization to improve its fit to real-world data. The Transmuted Weibull Distribution offers enhanced flexibility by incorporating a transmutation parameter. Metaheuristic algorithms have emerged as robust tools for parameter estimation, particularly for probability distributions with complex likelihood functions. This study compares the performance of four metaheuristic algorithms: Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Differential Evolution (DE), and Artificial Bee Colony (ABC) against the traditional Newton-Raphson (NR) algorithm for estimating parameters of the Transmuted Weibull Distribution (TWD). Extensive Monte Carlo simulations evaluated the algorithms' efficiencies using metrics like log-likelihood values, bias, mean squared error (MSE), and deficiency. Additionally, the methods are applied to real-world datasets to compare their practical utility. Both simulation and real data application results revealed that metaheuristic algorithms outperformed traditional Newton-Raphson (NR) optimization
The Beta-Weibull-X Family of Distributions with some Properties and Applications to Engineering and Health Data when X ∼ Rayleigh Distribution
Generating new statistical distributions that provide sufficient characterization for real-life phenomena such as those in reliability engineering, meteorology, and the health sciences is an important area of concern for the researchers. Many complex real-life phenomena are yet to be optimally characterized by some of the existing methods and this study proposed the Beta Weibull-X (BWei-X) family. The Beta Weibull-Rayleigh (BWR), developed as a family member, has notable distributions in the literature as special cases, moments, and some basic statistical properties were investigated. The parameters of the distribution were estimated by the method of maximum likelihood estimation. Graphical reports show that the failure rates can be declining or increasing, J-shape, bathtub, and inverted bathtub shapes, making it an exciting tool in diverse areas of applications for modeling noisy phenomena with left-skewed, right-skewed, and approximately symmetric features. A systolic blood pressure and engineering dataset was applied to investigate the performance of the model, and the results from data analysis using the R-software justify the significance of the researc
The Burr Inverse Weibull Model for Risk Analysis Under US Social Security Administration Disability Data Using Peaks Over Random Threshold Method with A Case Study in KSA
This study assesses and analyzes real disability insurance data to evaluate extreme risks using advanced statistical tools and metrics. The primary objective is to identify significant events or anomalies in the data and propose actionable strategies for managing financial risks associated with disability insurance claims. To achieve this, we utilize a range of indicators, including Value-at-Risk (VaR), Tail-VaR (TVaR), Tail-Mean-Variance (TMV), Tail-Variance (TV), Mean Excess Loss (MXL), Mean of Order P (MOO-P), Optimal Order of P (O-P), and Peaks Over a Random Threshold Value-at-Risk (PORT-VaR), are applied to identify and describe significant events or anomalies in the data. To address these risks effectively, the research explores the application of the Burr inverse Weibull (BIW) model, a well-regarded framework within extreme value theory (EVT). The study provides a structured approach for disability insurance institutions to better manage unexpected and potentially severe financial losses. Our dataset comprises n=2000 anonymized records from the Social Security Administration (SSA) disability insurance system. By analyzing the asymmetric, right-skewed nature of SSA disability insurance data through these advanced indicators, the research offers insights into the behavior of extreme events and long-tail distributions. Moreover, the percentage distribution of disability reasons in KSA for 2023 is considered. Based on this comprehensive risk analysis, practical recommendations are proposed
Analysis of Two-Dimensional State Markovian Queuing Model with Multiple Vacation, Correlated Servers, Feedback and Catastrophes
This paper investigates the queuing system with multiple vacation, correlated servers, feedback and catastrophes. Inter arrival times follow an exponential distribution with parameters λ and service times follow Bivariate exponential distribution BVE (μ, μ, ν) where μ is the service time parameter and ν is the correlation parameter. Both the servers go on vacation with probability one when there are no units in the system. Laplace transform approach has been used to find the time-dependent solution. The model estimates the total expected cost, total expected profit and obtained the optimal values by varying time for cost and profit. The best optimal value at t=5 when service rate=2.75 and t=2 when feedback probability=0.55 for minimum cost and maximum profit respectively. These important key measures give a greater understanding of the model behaviour. Numerical analysis and graphical representations have been done by using Maple software