Symmetric Positive Definite (SPD) matrix based representation is widely used in many visual recognition tasks. This archive is an effort for keeping the track of this thriving and novel field of research in computer vision.
Recently, symmetric positive definitive (SPD) matrix-based visual representation methods have shown promising performance in various applications such as fine-grained image classification, person re-identification and ImageNet classification. This page keeps track of the recent advances in SPD matrix-based visual representation methods. Kindly refer to the contact section if you have any queries or sugesstions, and interested to add your method in the listings.
Tutorial on Higher-order Statistical Modeling based Deep Convolutional Neural Networks
Tutorial on Second- and Higher-order Representations in Computer Vision
Bilinear CNNs for Fine-grained Visual Recognition
Compact Bilinear Pooling
Kernel Pooling for Convolutional Neural Networks
Low-rank Bilinear Pooling for Fine-Grained Classification
Is Second-order Information Helpful for Large-scale Visual Recognition?
Improved Bilinear Pooling with CNNs
Factorized Bilinear Models for Image Recognition
G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition
Statistically-motivated Second-order Pooling
DeepKSPD: learning kernel-matrix-based SPD representation for fine-grained image recognition
Second-order Democratic Aggregation
Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks
Local Temporal Bilinear Pooling for Fine-Grained Action Parsing
Global Second-order Pooling Convolutional Networks
Deep Global Generalized Gaussian Networks
Second-order Attention Network for Single Image Super-Resolution
Low-Rank Pairwise Alignment Bilinear Network For Few-Shot Fine-Grained Image Classification
Compact Approximation for Polynomial of Covariance Feature
Learning Neural Bag-of-Matrix-Summarization with Riemannian Network
Fine-Grained Classification via Hierarchical Bilinear Pooling With Aggregated Slack Mask
Learning Deep Bilinear Transformation for Fine-grained Image Representation
Higher-Order Occurrence Pooling for Bags-of-Words: Visual Concept Detection
A Deeper Look at Power Normalizations
A Robust Distance Measure for Similarity-Based Classification on the SPD Manifold
What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective
Revisiting Bilinear Pooling: A Coding Perspective
Power Normalizations in Fine-grained Image, Few-shot Image and Graph Classification
Bilinear CNNs for Fine-grained Visual Recognition
Compact Bilinear Pooling
Sparse Coding for Third-order Super-symmetric Tensor Descriptors with Application to Texture Recognition
Kernel Pooling for Convolutional Neural Networks
Low-rank Bilinear Pooling for Fine-Grained Classification
Improved Bilinear Pooling with CNNs
Factorized Bilinear Models for Image Recognition
G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition
Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks forFine-Grained Visual Recognition
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition
Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification
Statistically-motivated Second-order Pooling
DeepKSPD: learning kernel-matrix-based SPD representation for fine-grained image recognition
Second-order Democratic Aggregation
Fine-Grained Image Classification With Gaussian Mixture Layer
Learning a Robust Representation via a Deep Network on Symmetric Positive Definite Manifolds
Compact Approximation for Polynomial of Covariance Feature
Fine-Grained Classification via Hierarchical Bilinear Pooling With Aggregated Slack Mask
Learning Deep Bilinear Transformation for Fine-grained Image Representation
ReDro: Efficiently Learning Large-Sized SPD Visual Representation
Is Second-order Information Helpful for Large-scale Visual Recognition?
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks
Global Second-order Pooling Convolutional Networks
Deep Global Generalized Gaussian Networks
Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons
Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition
Local Temporal Bilinear Pooling for Fine-Grained Action Parsing
Approximated Bilinear Modules for Temporal Modeling
Tensor Representations for Action Recognition
Second-Order Non-Local Attention Networks for Person Re-Identification
Mixed High-Order Attention Network for Person Re-Identification
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification
Second-order Camera-aware Color Transformation for Cross-domain Person Re-identification
Domain Adaptation by Mixture of Alignments of Second- or Higher-Order Scatter Tensors
Museum Exhibit Identification Challenge for Domain Adaptation and Beyond
Domain Adaptation Using Riemannian Geometry of SPD Matrices
Power Normalizing Second-Order Similarity Network for Few-Shot Learning
Few-Shot Learning via Saliency-guided Hallucination of Samples
Few-Shot Object Detection by Second-order Pooling
Few-shot Action Recognition with Permutation-invariant Attention
Adaptive Subspaces for Few-Shot Learning
RAID-G: Robust Estimation of Approximate Infinite Dimensional Gaussian with Application to Material Recognition
A Neural Network Based on SPD Manifold Learning for Skeleton-Based Hand Gesture Recognition
Second-Order Attention Network for Single Image Super-Resolution
Bilinear Attention Networks for Person Retrieval
Factorized Higher-Order CNNs with an Application to Spatio-Temporal Emotion Estimation
DOA-GAN: Dual-Order Attentive Generative Adversarial Network for Image Copy-move Forgery Detection and Localization
Visual-Semantic Matching by Exploring High-Order Attention and Distraction
Second Order enhanced Multi-glimpse Attention in Visual Question Answering
SOFA-Net: Second-Order and First-order Attention Network for Crowd Counting
BARNet: Bilinear Attention Network with Adaptive Receptive Field for Surgical Instrument Segmentation
Non-Local Neural Networks with Grouped Bilinear Attentional Transforms
Plese feel free to use sorting options and search box for efficient analysis of the existing SPD representation based methods and their results.
Summary of method | Performace across datasets (in %) | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Year | Method | Backbone Model | Classifier | Input Size | DA | Optimizer | Feature Size | CUB | Airplane | Cars | MIT | DTD | Food-101 | ImageNet |
2015 | Bilinear CNN | VGG-M/VGG-16 | SVM | 448X448 | Flip | SGD | 262k | 84.1 | 84.1 | 91.3 | -- | -- | -- | -- |
2016 | RAID-G | VGG19 | SVM | 224X224 | -- | -- | -- | 84.0 | -- | -- | -- | 76.4 | -- | -- |
2016 | Compact Bilinear | VGGM/VGG16 | SVM | 448X448 | -- | SGD | 10k | 84.0 | -- | -- | 73.4 | 67.7 | -- | -- |
2017 | Kernel Pooling | VGG16/ResNet50 | SVM | 448X448 | Flip, Crop | SGD | 12.8k/14.3k | 86.2 | 86.9 | 92.4 | -- | -- | 85.5 | -- |
2017 | Low-Rank Bilinear | VGG16/ResNet50 | SVM | 448X448 | Flip, Crop | SGD | 12.8k/14.3k | 84.2 | 87.3 | 90.9 | -- | 65.8 | -- | -- |
2017 | MPN-COV | VGG16/ResNet50 | SVM | 448X448 | Jitter | SGD | 32k | -- | -- | -- | -- | -- | -- | 78.8 |
2017 | Improved BCNN | VGGM/VGG16 | SVM | 448X448 | Flip | SGD | 262k | 85.8 | 88.5 | 92.0 | -- | -- | -- | -- |
2017 | Factorized bilinear | ResNet50 | SVM | 112X112 | Flip, Crop | SGD | 10M | 82.9 | -- | -- | -- | 67.8 | -- | 76.0 |
2017 | G2DeNet | VGG16 | SVM | 448X448 | Flip | SGD | 131k | 87.1 | 89.0 | 92.5 | -- | -- | -- | -- |
2017 | Recurrent Bilinear | VGG16 | SVM | 448X448 | Flip, Shift | SGD | 262k | 89.7 | 88.4 | 93.4 | -- | -- | -- | -- |
2018 | iSQRT-COV | AlexNet/ResNet101 | Softmax | 224X224 | Flip, Crop, Jitter | SGD | 32k | 88.7 | 91.4 | 93.3 | -- | -- | -- | 78.79 |
2018 | Grassmann Pooling | VGG16 | Softmax | 448X448 | Flip | SGD | 4k | 85.8 | 89.8 | 92.8 | -- | -- | 85.7 | -- |
2018 | Hierarchical Bilinear | VGG16 | Softmax | 448X448 | Flip, Crop | SGD | 8k | 87.1 | 90.3 | 93.7 | -- | -- | -- | -- |
2018 | SMSO Pooling | VGG16/ResNet50 | Softmax | 448X448 | Flip | SGD | 2k | 85.8 | -- | -- | 79.7 | 72.5 | -- | -- |
2018 | DeepKSPD | VGG16 | Softmax | 448X448 | Flip | Adam | 262k | 86.5 | 91.5 | 93.2 | 81.0 | -- | -- | -- |
2018 | SoDA | VGG16/ResNet50 | SVM | 448X448 | Flip | SGD | 8k/262k | 85.9 | 87.6 | 91.7 | 84.3 | 76.2 | -- | -- |
2018 | GM-SOP | ResNet18 | Softmax | 64X64 | Flip | SGD | 8k | -- | -- | -- | -- | -- | -- | 67.7* |
2018 | GMNet | VGG16/VGG19 | Softmax | random | Flip | SGD | 400k | 86.3 | 90.5 | 93.5 | -- | -- | -- | -- |
2020 | SOP+SC+SigmE | AlexNet/ResNet50 | Softmax | 224/336/448 | Flip | Adam | -- | -- | -- | -- | 86.3 | -- | 87.5 | -- |
2019 | SPD aggregation Network | VGG16 | Softmax | 224X224 | -- | SGD | 262k | 72.4 | 77.8 | -- | -- | 68.9 | -- | -- |
2019 | Global SoP | ResNet50 | Softmax | 224X224 | Flip | SGD | 2k/32k | -- | -- | -- | -- | -- | -- | 78.9* |
2019 | 3G-Net | ResNet50/ResNet101 | Softmax | 224X224 | Flip | SGD | 32k | -- | -- | -- | -- | -- | -- | 79.1* |
2019 | iPCCP-Net | ResNet50/ResNet101 | Softmax | 448X448 | Flip | SGD | 8k | 88.4 | 91.6 | 94.1 | -- | -- | -- | -- |
2019 | HBPASM | ResNet-34 | Softmax | 448X448 | Flip | SGD | 24k | 86.8 | 91.3 | 93.8 | -- | -- | -- | -- |
2019 | DBTNet | ResNet50/ResNet101 | Softmax | 448X448 | -- | SGD | 2k | 88.1 | 91.6 | 94.5 | -- | -- | -- | -- |
2020 | ReDro | VGG16/ResNet50 | Softmax | 448X448 | Flip | Adam | 262k | 84.3 | 89.2 | 92.2 | 84.0 | -- | -- | -- |
2020 | SOP+SC+Spec. MaxExp(F) | AlexNet/ResNet50 | Softmax | 224/336/448 | Flip | Adam | -- | -- | -- | -- | 86.8 | -- | 88.4 | 77.95 |
* - Produced with ImageNet-1k dataset while the others use ImageNet-2012 dataset.
This website uses images and code links that are shared by the original authors. Bootstarp 4 CSS library and jQuery 3.x were used for the development of this website.
For any queries and sugesstions you may directly contact at sr801@uowmail.edu.au or leiw@uow.edu.au. Also, please do not hesitate to contact us if you would like to add your method into the listing of SPD Archive.