2,039 research outputs found
Variants of the Borda count method for combining ranked classifier hypotheses
The Borda count is a simple yet effective method of combining rankings. In pattern recognition, classifiers are often able to return a ranked set of results. Several experiments have been conducted to test the ability of the Borda count and two variant methods to combine these ranked classifier results. By using artificial data, domain-specific results were avoided. The results show the strength of the Borda count when many errors occur in the results, but also show its weakness in case of a limited number of large ranking errors
Motor unit simulation, on handwriting trajectories, Fortran code
Fortran code with a simulation of motor-unit activity as twitches, tested using Monte Carlo estimation of neuron parameters on a task of simulating handwriting movements. This study is unpublished in itself. The files are copied in 1996, which is a few years after the original creation.
The ideas are from my MSc thesis with Anton van Boxtel at Tilburg University.
https://pubmed.ncbi.nlm.nih.gov/6642529/
Van Boxtel A, Schomaker LR. Motor unit firing rate during static contraction indicated by the surface EMG power spectrum. IEEE Trans Biomed Eng. 1983 Sep;30(9):601-9. doi: 10.1109/tbme.1983.325057. PMID: 6642529.
The Fortran MU ensemble model lead to an article on our neuron-inter neuron ensemble model:
Lambert R.B. Schomaker (1992)
A neural oscillator-network model of temporal pattern generation,
Human Movement Science, Volume 11, Issues 1–2, 1992, Pages 181-192,
ISSN 0167-9457, https://doi.org/10.1016/0167-9457(92)90059-K.
That study is present on Zenodo under: https://zenodo.org/record/254285
Lillian L. Lambert, Author, Speaker, and Entrepreneur
Lillian L. Lambert, Author, Speaker, and Entrepreneu
Variants of the Borda count method for combining ranked classifier hypotheses
The Borda count is a simple yet effective method of combining rankings. In pattern recognition, classifiers are often able to return a ranked set of results. Several experiments have been conducted to test the ability of the Borda count and two variant methods to combine these ranked classifier results. By using artificial data, domain-specific results were avoided. The results show the strength of the Borda count when many errors occur in the results, but also show its weakness in case of a limited number of large ranking errors
Variants of the Borda count method for combining ranked classifier hypotheses
The Borda count is a simple yet effective method of combining rankings. In pattern recognition, classifiers are often able to return a ranked set of results. Several experiments have been conducted to test the ability of the Borda count and two variant methods to combine these ranked classifier results. By using artificial data, domain-specific results were avoided. The results show the strength of the Borda count when many errors occur in the results, but also show its weakness in case of a limited number of large ranking errors
Analysis of texture and connected-component contours for the automatic identification of writers
Recent advances in 'off-line' writer identification allow for new applications in handwritten text retrieval from archives of scanned historical documents. This paper describes new algorithms for forensic or historical writer identification, using the contours of fragmented connected-components in free-style handwriting. The writer is considered to be characterized by a stochastic pattern generator, producing a family of character fragments (fraglets). Using a codebook of such fraglets from an independent training set, the probability distribution of fraglet contours was computed for an independent test set. Results revealed a high sensitivity of the fraglet histogram in identifying individual writers on the basis of a paragraph of text. Large-scale experiments on the optimal size of Kohonen maps of fraglet contours were performed, showing usable classification rates within a non-critical range of Kohonen map dimensions. The proposed automatic approach bridges the gap between image-statistics approaches and purely knowledge-based manual character-based methods. (c) 2006 Elsevier B.V. All rights reserved
ImUnipen image data set for writer identification (N=208) - vectorial handwriting converted to usable images
==============
Terms of Usage
==============
The ImUnipen data set is intended for non-commercial, scientific use,
and is distributed under auspices of the Unipen Foundation.
Please always refer to the following paper in IEEE PAMI when using
the ImUnipen data set:
Bulacu, M.; Schomaker, L.
Text-Independent Writer Identification and Verification
Using Textural and Allographic Features
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 29, Issue 4, April 2007 Page(s):701 - 717
The ImUnipen data set is derived from the Unipen (unipen.org)
data set of on-line (i.e., vectorial, xy) handwriting.
The xy-coordinates and a line-generator algorithm are used
to generate a raster image, as if the data were optically scanned.
Contents: for 208 writers, there are two PNG images per writer of
an artificially constructed table of naturally written words (49MByte).
These words are pasted onto a white page. For systematics reasons,
we call such a page a Paragraph, see below.
The file names are organized as (example):
Writ990221.Doc01.Par00.png
Writ990221.Doc01.Par01.png
meaning: writer number 990221, document 01 (there exists only Doc01)
and the image with artificial "paragraph" of isolated words "Par00"
and "Par01".
The Par00 and Pa01 images are typically used as the query
and best match in a leave-one-out setting for writer identification.
For instance, Par00 is the query, and Par01 is added to the total set
of all other images as the attractor for an identification search.
For these experiments, word labels are not given in this data set,
on purpose, as the goal is to test recognition-free writer identification
methods.
For a description of the regular
Unipen data set, please visit http://unipen.org
Lambert Schomaker constructed this set in 2005</p
ImUnipen image data set for writer identification (N=208) - vectorial handwriting converted to usable images
The ImUnipen data set is intended for non-commercial, scientific use, and is distributed under auspices of the Unipen Foundation. Please always refer to the following paper in IEEE PAMI when using the ImUnipen data set: Bulacu, M.; Schomaker, L. Text-Independent Writer Identification and Verification Using Textural and Allographic Features Pattern Analysis and Machine Intelligence, IEEE Transactions on Volume 29, Issue 4, April 2007 Page(s):701 - 717 The ImUnipen data set is derived from the Unipen (unipen.org) data set of on-line (i.e., vectorial, xy) handwriting. The xy-coordinates and a line-generator algorithm are used to generate a raster image, as if the data were optically scanned. Contents: for 208 writers, there are two PNG images per writer of an artificially constructed table of naturally written words (49MByte). These words are pasted onto a white page. For systematics reasons, we call such a page a Paragraph, see below. The file names are organized as (example): Writ990221.Doc01.Par00.png Writ990221.Doc01.Par01.png meaning: writer number 990221, document 01 (there exists only Doc01) and the image with artificial "paragraph" of isolated words "Par00" and "Par01". The Par00 and Pa01 images are typically used as the query and best match in a leave-one-out setting for writer identification. For instance, Par00 is the query, and Par01 is added to the total set of all other images as the attractor for an identification search. For these experiments, word labels are not given in this data set, on purpose, as the goal is to test recognition-free writer identification methods. For a description of the regular Unipen data set, please visit http://unipen.org Lambert Schomaker constructed this set in 200
MPS Data set with images of medieval charters for handwriting-style based dating of manuscripts
The MPS benchmark data set for handwritten manuscript dating
____________________________________________________________
This data set is collected for the Dutch NWO project:
Medieval Paleographical Scale (MPS)
by Petros Samara
Project website: http://application02.target.rug.nl/monk/Projects/MPS/
Copyright (c) Huygensinstituut, Den Haag, 2016
University of Groningen, 2016.
All rights reserved.
Organisation of the data: Each .tar.gz file contains a number of NetPBM
images. The format is chosen because of its simplicity. Also,
there is no doubt about lossy compression in the processing chain. The file
names are of the format 'MPS_.ppm', for example, 'MPS1300_0056.ppm'.
Note: the files are not in a separate directory, they will be extracted in place.
However, due to the unique naming, there is no problem extracting them in one
single current (destination) directory.
The actual type of the image can be gray scale (.pgm) or color (.ppm),
in '8-bit DirectClass' according to ImageMagick's 'identify' tool.
The images were cropped out of larger photographs because of irrelevant
elements such as a Kodak color calibrator and non-text content such as supporting
surface (table) backgrounds, seals (emblems), ribbons, etc.
No effort has been made to obtain a balanced set of samples over years:
the given frequencies of occurrence in archives are used.
There is evidently less data in years before 1375 A.D. while some periods
provides us with ample data for historical reasons (e.g, 1450 A.D.). It
would have been a pity if the scarce years had determined and limited the
size of this data set. Selection criteria for data reduction, whether random
or systematic, would have been arbitrary. In any case, these images were
used in our publications, such that the performance results of
future attempts on manuscript dating can be compared with earlier results.
The performances that have been reached using our algorithms are in
the order of an MAE (mean average error) of 10 years.
If you have any questions, please contact us:
Sheng He ([email protected])
Petros Samara ([email protected])
Jan Burgers ([email protected])
Lambert Schomaker ([email protected])
Please cite our papers if you use this data set:
[1] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker.
Image-based historical manuscript dating using contour and stroke fragments.
Pattern Recognition(PR), Vol. 59, pp. 159-171, 2016
[2] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker.
Towards style-based dating of historical documents.
International Conference on Frontiers in Handwriting Recognition(ICFHR), Crete, Greece, 2014
[3] Sheng He, Petros Samara, Jan Burgers, Lambert Schomaker.
Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization
IEEE Trans. on Image Processing, Vol. 25(11), Nov. 2016.
http://ieeexplore.ieee.org/document/7551181/
Data are collected thanks to Dutch NWO grant project 380-50-006</p
GIWIS - Micosoft Windows (TM) binary setup.exe for writer-identification application
GIWIS v3.1 - beta
Groningen Intelligent Writer Identification System
Documentation
V3.1c
Lambert Schomaker
November 2011
September 2012
The GIWIS program is an exploratory software tool for non-commercial applications in a forensic or paleographic context. No warranties can be given concerning reliability of matching results for handwritten documents. The user is responsible for the collection of statistical reference material for calibration of GIWIS over several years of usage, using his/her own reference collection of handwritten image samples, consisting of minimally several hundreds, preferably thousands of images of extracted, pure-handwriting samples of sufficient and standardized quality.</pre
- …
