The pattern theory developed by Granander in the 1970s is a unified mathematical structure for representing, learning, and recognizing patterns encountered in science and engineering. The objects in this theory tend to be rich in complexity and dimensionality, and algebraic and probability distributions characterize their patterns. Thus, different types of models exist to evaluate tasks involving these patterns based on a variety of perspectives to obtain a wide diversity of outputs. For simplicity, models can be grouped into two big families: generative and discriminative.(1)
The pattern theory developed by Granander in the 1970s is a unified mathematical structure for representing, learning, and recognizing patterns encountered in science and engineering. The objects in this theory tend to be rich in complexity and dimensionality, and algebraic and probability distributions characterize their patterns. Thus, different types of models exist to evaluate tasks involving these patterns based on a variety of perspectives to obtain a wide diversity of outputs. For simplicity, models can be grouped into two big families: generative and discriminative.(1)
Generative models are called “generative” because sampling can generate synthetic data points. They produce a probability density model with all the variables in a system and use them to create classification and regression functions. In these models, a system’s input and output are represented homogeneously by a joint probability distribution; they define a distribution over all variables. (3)
They can learn in a semi-supervised and unsupervised manner where datasets only have labeled input signals, and output signals lack labels.1 These models can be used for compression, denoising, inpainting, and texture synthesis, and they’ve also proven helpful in medical diagnosis, genomics, and bioinformatics. Since many applications exist, generative models are usually formulated, trained, and evaluated differently.(3)
Discriminative Models
Discriminative models directly make estimations of posterior probabilities without attempting to model underlying probability distributions. These models focus on the given task obtaining a better performance; they are most interested in optimizing a mapping from the inputs while only the resulting classification boundaries are adjusted.
Here, the final mapping of input (x) and output (y) is important, and the final estimate is only considered. Even the estimation of a conditional distribution is viewed as unnecessary. These models only use the conditional probability of a candidate analysis given the input sentence. Thus, the joint probability is no longer possible to derive.
The main advantage is that it provides more freedom in defining features and can incorporate arbitrary features over the input.4 The downside is that discriminative models usually require numerical optimization techniques that can be computationally difficult, and the parsing problem gets harder by using more complexity.4
Discriminative models have succeeded in various tasks like image and document classification, problems in biosequence analysis, and time sequence prediction.2 Popular models are Logistic Regression, Support Vector Machines, Traditional Neural Networks, Nearest Neighbor, and Conditional Random Fields.
To illustrate better the difference between generative and discriminative models, let’s consider the task of determining the language that someone speaks: the generative approach is to learn each language and determine which language fits into the speech, and the discriminative approach is to determine the linguistic differences without learning any language at all.