Seed Identification of Gramineous Grass Using Local Similarity Pattern and Linear Discriminant Analysis

RESEARCH ARTICLE Seed Identification of Gramineous Grass Using Local Similarity Pattern and Linear Discriminant Analysis Tong Chen, Xin Pan, Yubao Ma, Weihong Yan, Guifang Wu and Zhihong Yu College of Computer and Information Engineering, Inner Mongolia Agricultural University, Huhhot, Inner Mongolia 010018, P.R. China Grassland Research Institute of Chinese Academy of Agricultural Sciences, Huhhot, Inner Mongolia 010010, P.R. China College of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Huhhot, Inner Mongolia 010018, P.R. China


INTRODUCTION
With the popularization of computer and Internet, digital management [1] develops rapidly in many fields.Grassland, as one of the most important reproducible resources for human being, attracts the attention of researchers for digitalization.The applications widely exist in the monitoring of grassland biomass [2], loss [3] and desertification [4] by the means of remote sensing.Comparatively, most images captured by digital camera do not yield some useful information for the grassland digital management [5].Forage identification mainly executed by experts in manual ways lacks efficiency and accuracy.Therefore, to develop an automatic identification technology for grass based on computer vision is of great importance.Gramineous grass is the main forage in grassland with high similarity, and the seeds are relatively stable and vital organ for grass reproduction, therefore, we investigated the seed identification for gramineous grass in this study.An approach integrated LSP and LDA was developed to extract the textural features as the input of supervised classification for automatic seed identification of gramineous grass.
At present, the research on the classification of plant seeds has achieved some progresses.For example, Long [6] extracted 4 kinds of features, including RGB, HOG, Gist and the Sketch Token.They tested the features with 3 classifiers, linear support vector machine (SVM), radial basis kernel function support vector machines (RBF SVM) and random forests under the same condition, obtaining a top identification rate of 95.27%.Subsequently, robustness experiments to the noise were conducted on random forest, using weed seeds of different incomplete rates 0% (undamaged), 5%, 10%, 15% and 20% as the test set.The results showed no significant difference between the random forests and SVM in the identification performance.Khunkhett and Remsungnen [7] put forward an automatic classification method based on image segmentation and RGB color features.In their researches, rice images were acquired by scanners, and then classified into two categories, pure rice seed named Dawk Mali 105 and impure rice.The correct identification rates of qualified rice and pure rice seeds reached 98% and 82%, respectively.Similarly, 9 Indian wheat seeds were identified using 1080 photos by extracting 131 textural features [8].Stepwise discrimination method was employed to select the top 50 features including 17 gray level, 5 GLCM (gray level co-occurrence matrix), 1 GLRM (gray level run-length matrix), 4 LBP (local binary patterns), 13 LSP (local similarity patterns) and 10 LSN (local similarity numbers) features as the input of LDA(linear discriminant analysis), yielding the best average classification accuracy of 98.15%.
Even though the classification procedures of the seeds are similar, the gramineous grass seeds have their own characteristics as compared with above seeds.They are very similar in color, shape, size and texture in different species of the same family.Sometimes, even the seeds of different families are hard to distinguish.Moreover, in practical circumstances, the forage seed images were taken outdoors, subjected to the influence of wind, illumination, background and position variance.Hence, we developed a robust identification approach for the gramineous seeds, using textural features derived from LSP and histogram statistics as the input of LDA classifier.Unlike traditional texture feature descriptors, such as HOG and LBP, which are sensitive to noises, LSP is robust to the noise in the real world [9].In the phase of classifier selection, clustering algorithm based on traditional Euclidean distance considered all the attributes in the clustering have the same effects, and therefore, sometimes it cannot accurately describe the similarity between objects [10].Linear Discriminant Analysis (LDA) can choose a projection direction to ensure the maximum between-class distance and minimum within-class distance of the samples in the new subspace by adjusting the weight vector components [11].So, in this paper, LSP and LDA were integrated to solve the high similarity of Gramineous grass seeds for better identification results.Fig. (1) shows the flowchart of the proposed approach.

IDENTIFICATION BASED ON LSP AND LDA
LSP (local similarity patterns) was proposed by H. R. Pourreza in 2011 [9].It was a kind of rotation invariant operator based on a variety of textural operators with advantages of simple operation, easy understanding, robustness to the variations aroused by grayscale, better identification performance, and etc.Compared with LBP, LSP was insensitive to noise and more powerful in texture analysis.The main difference between LSP and LBP was the selection of threshold.In LBP, the threshold was the grayscale of the center pixel of the neighborhood, while the threshold can be flexibly set with different values in LSP.The main procedure of LSP was to calculate the absolute differences between the pixels of a 3×3 neighborhood and the center pixel.If the value was greater than a certain SRR, the neighbor pixel was set 0. Otherwise, it would be set 1 [9].Suppose g c was the center pixel of the neighborhood, and g 0 , g 1 ,...,g 7 were the pixels of its neighborhood, then the texture T can be converted into binary as follows. (1) where binary operation was where x=|g c -g 0 | -SRR.Comparatively, LSP is more flexible in the feature selection because of dynamic selection of SRR.When SRR is 0, LSP equals to LBP.
Then the LSP value is the sum of products between the s(x) of all 8-neighborhood pixels and the corresponding weights.Fig. (2) gives an example when SRR is 10.
Hence, there were 256 LSP values ranged from 0 to 255 altogether.Arrange the binary values of 8-neighborhood pixels clockwise from 8 starting positions, 8 different decimal values would be obtained (Fig. 3).The minimum of the 8 values was chosen as the rotation-invariant LSP descriptor of the center pixel.When we selected sampling points within an 8-neighborhood region, there were 36 rotation-invariant LSP values altogether.
In fact, most modes centered on several values, i.e. the histogram were sparse.For an LSP descriptor, the conversion from 0 to 1 or 1 to 0 was called as a jump.If the jump number of a LSP descriptor was no more than 2, it was referred to as uniform pattern.For the most LSP descriptors, the jump numbers greater than 2 which were often caused by noises have no statistical meaning [9].Hence, the numbers of LSP patterns were condensed greatly without losing any information.Meanwhile, the redundant information containing noise was eliminated, along with the desired dimension reduction.When sampled in an 8-neighborhood of 3x3 region, 9 LSP descriptors among 36 original ones conformed to "uniform" definition, including 00000000, 00000001, 00010011, 00000111, 01111111, 00011111, 11111111, 01111111, 11111111.Furthermore, the remaining 27 non-uniform descriptors were combined into one descriptor, and hence 10 values were contained in uniform LSP histogram.When describing the image characteristics, the statistical features of image histogram including mean, standard deviation, smoothness and the third moment can represent the textures effectively [12].Therefore, 4 image histogram statistical characteristics were concatenated to LSP histogram to form the input of the LDA classifiers.The concrete calculation formulas are shown in Table 1.

Table 1. Formulas of statistical features.
Feature Formula LDA is a classical supervised learning approach to find the optimal combination of features separating two classes with low computational requirements and good classification results.It ensures the projected model in the space with the best separability [11].Some advanced extensions of LDA have been recently proposed and widely used in many applications of recognition, such as event-related potential [13 -15], electromyography [16], and etc.They can well solve the problem when insufficient training samples are available.In our approach, the 10 uniform LSP histogram values and 4 histogram statistics, totally 14 features, were imported to LDA classifier for discriminate classification.

Image Database
To test the effectiveness of the proposed algorithm for seed identification, we constructed an image database of the gramineous-grass seeds, provided by the Grassland Research Institute of the Chinese Academy of Agricultural Sciences.For efficient and practical applications, 6 seeds were arranged on a black card when taking photos outdoors.

Image Preprocessing
The main purposes of image preprocessing were to remove the noises that may affect the identification result, and extract region of interest (ROI) from the background.As seen from Fig. (4), the outlines of the original seeds in the image directed inconsistently with different tilting angles.Therefore, three steps were involved in image preprocessing as follows.
Firstly, an original color image was converted into the corresponding binary one and the long axis of each seed was detected.Then the image was rotated to keep the long axis horizontal.Thirdly, the sub-images of the seeds were cropped individually from the original image by removing the redundant background.The whole preprocessing procedure was shown in Fig. (5).The overall 1080 seed images from 12 species of gramineous grass constructed the image database for experiments.Fig. (6) gave some examples of the database, where each species has 90 seeds, and each seed one mere image.All the experiments were executed on an Intel Dual Core i5-3470 CPU @ 1.60 GHz and 4 GB RAM; the codes were written in MATLAB 2011b.
For each species, 90 seed images were divided into training set and test set equally, that is either the training set or test set contained 45 images.To avoid the impact of sample selection on the experimental results, cross validation was adopted.By dividing 90 samples of each species into 9 subsets, we selected 5 samples in the same orders from each subset as the training set, and the remaining samples were categorized to test set.For example, when we chose seed images from No.1 to No.5 of each subset for training, the remaining seed images from No.6 to No.10 composed test set.The average identification result of 126 selections was used as the final results, and the deviation measured the robustness of the identification experiments.

Selection of SRR
As we know, different values of SRR would construct various LSP matrices, and accordingly leading to diverse identification results.Fig. (7) showed the relationship of SRR and identification accuracy when the values of SRR ranged from 0 to 8 in LSP (LDA Classifier).When SRR was 0, LSP equaled to LBP.When SRR was 1, the identification results of 6 seeds species and 12 seeds species were 91.07%and 97.85%, respectively.With the increase of SRR, the identification results declined obviously.Therefore, in the following experiments, SRR was set 1.

Comparative Experiments
To testify the performance of the algorithm, we compared 3 different feature extraction approaches (Histograms of Oriented Gradients (HOG), LBP and LSP) and 2 classifiers (Nearest Neighbor Classifier with Eulidean Distance(NNC+ED) and LDA) in the experiments.The sliding steps of HOG [17] were fixed, and the size of image and other parameters were positive correlation.The specified steps were as follows.Firstly, choose a quarter of horizontal and vertical step width L/4 and C/4 as the length of corresponding directions, and fix the step number to 3. Then divide the gradient direction uniformly into 9 bin directions.Let Bsize and Csize represent the sizes of block and cell, then bSize and cSize were L/2×C/2 and L/2×C/2, respectively.Experiment 1 was conducted on 6 species of seeds, HOG, LBP and LSP did not reveal satisfying identification performances when using NNC+ED classifier.LSP achieved the highest identification accuracy of 80.99%.Comparatively, all the feature extraction algorithms achieved higher accuracies when using LDA classifier.LSP+LDA yielded the top identification accuracy of 97.85%.Experiment 1 revealed that LSP can extract the textural feature more precisely as compared with HOG and LBP.Moreover, LDA classifier was more discriminative as compared with traditional NNC+ED.
Experiment 2 was conducted on 12 species of seeds, the identification performances of HOG, LBP and LSP declined obviously when using NNC+ED classifier.The identification accuracy of HOG dropped from 77.03% to 46.84% with a gap of 30.19%.LSP yielded the top identification accuracy of 60.65%, 20.43% lower than that of the first group.The decline mainly came from increased 6 kinds of similar seeds, which made the identification more difficult.Comparatively, when using LDA classifier, the three feature extraction approaches achieved much higher accuracies.The identification accuracy of HOG+LDA increased to 67.09%, 20.25% higher than that of NNC+ED.LSP+LDA classifier yielded the top identification accuracy of 91.07%, 31.09% higher than that of NNC+ED.It can be concluded that LDA was more discriminative when identification difficulties increased with more similar species.LSP+LDA achieved more robust identification performance in Experiment 2 with an identification accuracy of 91.07%,only 6.78% was lower than that of Experiment 1.The standard deviations of the two experiments were lower than 1%, indicating the stability of the overall experiments and the robustness of the algorithm.
It could be observed that when the number of seed species was relatively small with less texture similarities, the approaches HOG, LBP and LSP yielded good performance.With the increase of seed species number and identification difficulty level, HOG revealed an obvious decline.The main reason lay in that HOG operated on the local square units, being invariant to geometry and optical deformation appeared on a relatively large region.However, for the similar gramineous grass seeds, more textural details were ignored when partitioned to local blocks, leading to a relatively poor identification result.Comparatively, LBP and LSP still worked well with accuracies above 90% because they can detect more details in feature extraction.Moreover, their rotation invariance can deal with the rotations of the biological characteristics and surroundings.The top identification accuracy was achieved by LSP+LDA, indicating LSP was more robust to noise as compared with LBP, and LDA classifier was more discriminative based on category knowledge than NNC+ED.
To investigate on the overall decline of the Experiment 2, we listed the average category details of 126 cross validation experiments using HOG+LDA, LBP+LDA and LSP+LDA in Table 4(a-c).The rows and columns corresponded to the number of input and output species, respectively.For example in Table 4a, 72.82% seed images were identified correctly to No. I species among 45 samples in the test set, and the remaining samples were mistakenly identified to other species.As can be seen, HOG can not well distinguish the seed images whether they are in the same genus.In LBP and LSP, most of the mistakenly classified seeds were of the same genus with very similar textures.In

Table 3
listed the identification accuracies of different combinations.

Table 4c ,
among the 45 test samples of No.I species, 94.76% seeds were correctly classified to No. I species, only 5.24% seeds were mistakenly identified to No. XI species, showing that LSP+LDA classifier was more capable of describing similar textures.