1,720,996 research outputs found

    Accelerating DSS Workloads through Coherence Protocols

    No full text
    n this work, we analyze how a DSS (Decision Support System) workload can be accelerated in the case of a shared-bus shared-memory multiprocessor, by adding simple and inexpensive support for a coherence protocol, in order to reduce the overhead produced by thread migration. Indeed, it is well known that, in this kind of systems, the bus is the critical element that can limit the scalability of the machine. Nevertheless, many factors that influence bus utilization have not been yet investigated for this kind of workload, in particular the effects of thread migration. For the sake of completeness the DSS workload is not running alone in the machine but together with other programs that might be spawned by the DSS application, like system commands or other software typically running in this kind of machine. The operating system effects are also considered in our evaluation. We analyzed a basic four-processor and a high-end sixteen-processor machine, implementing three different coherence protocols (including MESI and another solution from the literature). We show that even in the four-processor case, the overhead induced by the sharing of private data, as a consequence of process migration, namely passive sharing, cannot be neglected. Indeed, the analysis shows that a protocol based on a selective strategy for dealing with private and shared data has a better performance than protocols either relying on the detection of migratory access-pattern or purely using a Write-Invalidate strategy, like MESI.DAWe varied the architectural parameters to show how passive sharing and other coherence overhead are influenced by different cache choices. Then, we considered the sixteen-processor case, where the effects on performance are more evident. We also end up that performance can take advantage of large caches and cache affinity scheduling. However, even with affinity scheduling, a selective protocol delivers better performance

    OS Effects on Memory Hierarchy of a SMP Multiprocessor Running a DBMS Workload

    No full text
    In this work, we characterized the impact of operating system activities like process migration on a shared-bus shared-memory multiprocessor running typical DBMS workload. Our workload has been set-up utilizing the TPC-D benchmark on the PostgreSQL DBMS. Analysis has been performed via trace driven simulation enhanced technique which includes most important operating system activities and analyzes the sharing overhead in detail. We evaluated a basic four-processor and a high-end sixteen-processor machine, implementing MESI and other coherence protocols that deal with migration of processes and data. Our results show that even in the four-processor case operating system effects may not be neglected. In fact, different coherence protocols can more effectively reduce the effects of process migration. The consequences on performance become more important in high-end machines (16 or more processors). In this case, even little sharing, as we found in DBMS applications can become crucial for system performance. Better speed up may be achieved adopting several alternatives including redesign of kernel data structure. Cache affinity is somewhat useful in reducing migration effect, but it is not effective in every load conditions

    Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload

    No full text
    In this work, we characterized the memory performance - and in particular the impact of coherence overhead and process migration - of a shared-bus shared-memory multiprocessor running a DSS workload. When the number of processors is increased in order to achieve higher computational power, the bus becomes a major bottleneck of such architecture. We evaluated solutions that can greatly reduce that bottleneck. An area where this kind of optimization is important regards data base systems. For this reason, we considered a DSS workload and we setup the experiments following TPC-D specifications on the PostgreSQL DBMS in order to explore different optimizations on same kind of workloads as evaluated in the literature. In this scenario, we compare possible solutions to boost performance and we show the impact of process migration on coherence overhead. We found that the consequences of coherence overhead and process migration on performance are very important in machines with 16 or more processors. In this case, even little sharing, as in DSS applications, can become crucial for system performance. Another important result of our analysis regards the interaction between the coherence protocol and the scheduler. The basic cache affinity scheduling is useful in reducing migration, but it is not effective in every load condition. Specific coherence protocols can help reduce the effects of process migration, especially in situations when the scheduler cannot apply the affinity requirement. In these conditions, the use of a write-update protocol with a selective invalidation strategy for private data improves performance (and scalability) of about 20% with respect to a classical MESI-based solution. This advantage is about 50% in the case of high cache-to-cache transfer

    Process Migration Effects on Memory Performance of Multiprocessor Web-Server

    No full text
    In this work we put into evidence how the memory performance of a Web-Server machine may depend on the sharing induced by process migration. We considered a shared-bus shared-memory multiprocessor as the simplest multiprocessor architecture to be used for accelerating Web-based and commercial applications. It is well known that, in this kind of system, the bus is the critical element that may limit the scalability of the machine. Nevertheless, many factors that influence bus utilization, when process migration is permitted, have not been thoroughly investigated yet. We analyze a basic four-processor and a high-performance sixteen-processor machine. We show that, even in the four-processor case, the overhead induced by the sharing of private data as a consequence of process migration, namely passive sharing, cannot be neglected. Then, we consider the sixteen-processor case, where the effects on performance are more massive. The results show that even though the performance may take advantage of larger caches or from cache affinity scheduling, there is still a great amount of passive sharing, besides false sharing and active sharing. In order to limit false sharing overhead, we can adopt an accurate design of kernel data structures. Passive sharing can be reduced, or even eliminated, by using appropriate coherence protocols. The evaluation of two of such protocols (AMSD and PSCR) shows that we can achieve better processor utilization compared to the MESI case

    Performance analysis of electronic commerce multiprocessor server

    No full text
    In this paper, the performance of an Electronic Commerce server, i.e. a system running Electronic Commerce applications, is evaluated in the case of shared-bus multiprocessor architecture. In particular, we focused on the memory subsystem design. We have analyzed the common case of a system using the MESI coherence protocol, for maintaining coherency among the processor private caches. We have evaluated the miss ratio and the bus traffic of such a system by varying cache size, number of ways, scheduling policy and number of processors, highlighting the relations with different types of data sharing generated by the application or the kernel. We found that passive sharing and false sharing are the major sources of coherence overhead, in the case of relatively large caches (over 1M-byte size). False sharing is mainly due to kernel data, and can be eliminated by using appropriate data structure design techniques. A scheduling technique, like cache-affinity can reduce passive sharing, but it is not effective in every load conditions. Thus, a special coherence protocol could be a better solution to completely eliminate passive sharing overhead and boost performance

    Artena. 1. Rapports et études de M.-A. Delsaux, G. Foglia, P. Fontaine , R. Lambrechts, F. Van Wonterghem et autres réunis et présentés par Roger Lambrechts

    No full text
    Van Compernolle Thierry. Artena. 1. Rapports et études de M.-A. Delsaux, G. Foglia, P. Fontaine , R. Lambrechts, F. Van Wonterghem et autres réunis et présentés par Roger Lambrechts. In: L'antiquité classique, Tome 54, 1985. pp. 546-548

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed

    Row-level algorithm to improve real-time performance of glass tube defect detection in the production phase

    Full text link
    In the case of the glass tube for pharmaceutical applications, high-quality defect detection is made via inspection systems based on image processing. Such processing must be fast enough to guarantee real-time inspection and to meet the increasing rate and quality required by the market. Defect detection is complex due to specific problems of the production process: vibration, rotation and irregularity of the tube. All these aspects prevent the efficient use of known techniques. The authors present an algorithm that decreases the processing time of the defect detection phase. The algorithm is based on a moving average filter working at row level, that allows to minimize the effects of rotation, vibration, and irregularity of the tube. Luminosity variations due to the tube curvature are cut by the filter and a threshold algorithm can be applied. They made the evaluation considering different solutions taken from literature. The algorithm outperforms, in processing time, all these solutions with increased accuracy. Experimental measures show that the algorithm achieves a throughput gain of 2.6 times with respect to Canny. They develop also a methodology to get the best values for the algorithm parameters directly at the factory, during the change of production batches
    corecore