I have come across this term on several occasions. This time I saw it while reading the papers tsne- a visualization approach in high dimension. http://homepage.tudelft.nl/19j49/t-SNE.html
The way I understood manifolds is through an example in the lecture http://www.iis.ee.ic.ac.uk/~tkkim/mlcv/lecture11_manifold.pdf
Thus the points in high-dimension inherently lie on a low-dimensional manifold.
A manifold can be viewed as a general curve for instance- if points lie on a line – > hyperplane -> any curve. All these can be called by a single name manifold. As per wiki manifold is a subset of Eucledian space which is locally the graph of a smooth function.
There are many papers by the name of manifold learning, so it seems to be a generalization of a hyperplane (linear) or something. [That is what I infer].
1. PCA is a dimensionality reduction techniques that embeds points in a linear space (lower dimension)
2. MDS is also a linear technique.
3. Techniques like isomap are non-linear in that they embed points in space that is non-linear. What do I mean by non-linear space is that the datapoints no-longer satisfy linear properties and measure of distance changes.
Non-linear Space: Two simple interpretetion
1. A simplest example is for each pt in Eucledian space, you construct a new spacewhere you use their squares.
2. The space is curved.
I came across a website containing a gist of papers and found it pretty useful. Some added from my side
It contains many seminal papers organized neatly into class. The one’s interesting to me were on Codebooks based approaches. Referring to the author I am adding the papers here and will keep on updating the list as I get more of these.
Dimension Reduction, Vocabulary
- J. Winn, A. Criminisi and T. Minka. Object Categorization by Learned Universal Visual Dictionary, ICCV 2005
- S. Savarese, J. Winn and A. Criminisi, Discriminative Object Class Models of Appearance and Shape by Correlatons, CVPR 2006
- F. Jurie and B. Triggs, Creating efficient codebooks for visual recognition, ICCV 2005
- E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classication. ECCV, 2006
- F. Moosmann, B. Triggs, and F. Jurie. Randomized clustering forests for building fast and discriminative visual vocabularies. NIPS 2006.
- Eric Nowak and Frédéric Jurie, Learning Visual Similarity Measures for Comparing Never Seen Objects, CVPR 2007
- Jan C. van Gemert, Cor J. Veenman, Arnold W.M. Smeulders, Visual Word Ambiguity, IEEE PAMI, 2010.
- Randomized Clustering Forests for Image Classification, IEEE PAMI, 2009
Now I have understood that parallelizing is an important component of working with huge data. Things were fine till I was working with images , but shifting to video, things became complicated.
Hence I finally parallelized the component where each descriptor is assigned a nearest neighbor. It was a bottleneck in the system, not because of reading of files but searching for nearest neigbour through kdree (that was another optimization).
The time reduces by a factor of 300/90~3 times with 6-8 matlab worker (kind of processors).
Matlab has this parfor command that allows you to run loops in parallel. But the loops have to follow certain strructure like
- The loops should be independent of each other.
- No nested looping allowed.
- Variables need to be used w/o indexing structure.