What are Manifolds

I have come across this term on several occasions. This time I saw it while reading the papers tsne- a visualization approach in high dimension. http://homepage.tudelft.nl/19j49/t-SNE.html

The way I understood manifolds is through an example in the lecture http://www.iis.ee.ic.ac.uk/~tkkim/mlcv/lecture11_manifold.pdf

Thus the points in high-dimension inherently lie on a low-dimensional manifold.

A manifold can be viewed as a general curve for instance- if points lie on a line – > hyperplane -> any curve. All these can be called by a single name manifold. As per wiki manifold is a subset of Eucledian space which is locally the graph of a smooth function.

There are many papers by the name of manifold learning, so it seems to be a generalization of a hyperplane (linear) or something. [That is what I infer].

Some pointers

1. PCA is a dimensionality reduction techniques that embeds points in a linear space (lower dimension)

2. MDS is also a linear technique.

3. Techniques like isomap are non-linear in that they embed points in space that is non-linear. What do I mean by non-linear space is that the datapoints no-longer satisfy linear properties and measure of distance changes.

Non-linear Space: Two simple interpretetion

1. A simplest example is for each pt in Eucledian space, you construct a new spacewhere you use their squares.

2. The space is curved.

Advertisements

List of Papers

I came across a website containing a gist of papers and found it pretty useful. Some added from my side

http://www.ifp.illinois.edu/~cao4/reading/patchbib.htm

It contains many seminal papers organized neatly into class. The one’s interesting to me were on Codebooks based approaches. Referring to the author I am adding the papers here and will keep on updating the list as I get more of these.

Dimension Reduction, Vocabulary

Parallelizing-1

Now I have understood that parallelizing is an important component of working with huge data. Things were fine till I was working with images , but shifting to video, things became complicated.

Hence I finally parallelized the component where each descriptor is assigned a nearest neighbor. It was a bottleneck in the system, not because of reading of files but searching for nearest neigbour through kdree (that was another optimization).

The time  reduces by a factor of 300/90~3 times with 6-8 matlab worker (kind of processors). 

Matlab has this parfor command that allows you to run loops in parallel. But the loops have to follow certain strructure like 

  1. The loops should be independent of each other.
  2. No nested looping allowed.
  3. Variables need to be used w/o indexing structure. 

e.g. 

IDX=[]
parfor i=1:10
   IDX=[IDX i];
end