CODE and EXECTUABLES
Here is the code used in the paper.
DATASETS: Ornamental Initial Letters and Historical Manuscripts
- Virtual Library Humanist Program
(VLHP) digital library that contains more than 6, 000 historical manuscripts online, the dataset pool is getting larger and larger.
- GOLD1 Dataset
that contains 578 ornamental initial letters. This dataset is used in the classification experiment in Section 3.1
- 6405 Annotated Initial Letters This annotated initial letter dataset pool;
; Warning: the size of this dataset is 324 MB.
- Three Books This dataset is used in Section 5.3
; Warning: the size of this dataset is 262 MB.
- Twenty Books These historical manuscripts that contain 6,956 pages are used in Section 5.6. They were from the 15th and 16th century.
Warning: the size of this dataset is 1.13 GB.
POWERFUL CK1 DISTANCE MEASURE
- CK1 Distance Measure is first introduced by B. Campana and E. Keogh, Here is the paper[pdf] and [supporting page]
- CK1 distance measure is able to robustly recognize very subtle distinctions.
Figure 4 : Twelve ornamental initial letters, from three classes that represent the letter S, are clustered using CK1 with complete linkage hierarchical clustering
- CK1 Distance Measure is robust to handle the 'nasty' images
Figure 12 (top) A clustering of four pairs of the same initial letter. (bottom) The clustering repeated after randomly introducing various "distortions"
ADDITIONAL EXPREIMENTS THAT OMITTED IN THE SUBMITTED PAPER
- Here we put additional experiments that omitted in the submitted paper due to the limited space.
- The black pixel density distribution for this large collection of annotated initial letters is approximately Gaussian.
Moreover, and more importantly for us, the variance is relatively large
- The black pixel density distribution for 6,395 annotated initial letters collection
- We can produce generic versions of these curves for any image that has K percent black pixel density
- top) We can create a generic lower bound for any image that has say 45% black pixel density by taking all the examples in our reference library that have this density (here, there are just two, LB_Red and LB_Blue) and taking their minimum values at each location in the X-axis (bottom).
This example is contrived for visual clarity. The set of curves for any particular pixel density tend to be very similar.
- The following figure shows a visual intuition of search technique.
- An illustration of the search process. top) When the best_so_far is initialized to 1.2, nothing can be pruned. middle) The first item examined reduces the best_so_ far to 0.82, pruning 28% of the data, shown in light gray. bottom) After examining a handful more candidates,
a match is found with a distance of 0.37 allowing us to prune 81% of the data