Vincent's Blog: 2012

Saturday, December 15, 2012

Starry Night Mosaic Art

I'm posting here a picture that I generated using so called mosaic art, which is a technique used by several artists and designers consisting of depicting an image using smaller units of content. The smaller units can usually be other images, for instance, remember the poster of the movie The Truman Show? There is software available to generate this type of images, here is one named Mazaika. Mosaic art has also been popularized to some extent by artists such as Chuck Close or Juan Osborne's amazing text mosaics. I wanted to do a simple experiment that incorporates an additional variable that I haven't seen in previous mosaics.

Coming from a computer vision perspective my motivation here is that most mosaics that tile images try to match only the color of the image for each specific location but they do not capture other characteristics like gradients (a sense of directionality) which is a common feature in most of today's computer vision systems. Take as inspiration the picture of the Starry Night by Vincent Van Gogh, made available by the Google Art Project at incredible resolutions. I'm including here a pretty high res picture if you click on it but feel free to explore the link above.

Click on the image for higher resolution.

For this kind of image, gradients are just as beautiful as colors and a good mosaic should be capturing both when tiling images. I used a database of 1 million pictures taken from Flickr to retrieve the closest matching image regarding both color and gradients for every tile in the composition. I also made sure to add some randomization so that nearby tiles don't get too similar images and only slightly modified the tonality of the resulting picture. I will try to update this post if I find a way to return even better matches regarding color and gradients or better tiling but I hope this mosaic conveys the idea of using gradients (a popular cue for many state of the art feature descriptors in computer vision) for creating mosaics.

Click on the image for higher resolution

Wednesday, July 18, 2012

2300 Jackson Street

The King of Pop, the Greatest Entertainer of All Time, I don't think any superlative can account for the genius of this man born in Gary, Indiana, a small city a few miles away south of Chicago. The neighborhood doesn't look in any way special, it looks just like any neighborhood in a suburb. The town is surrounded by big factories and it doesn't feel like you are getting closer to this seemingly forgotten landmark. It feels like there is usually nobody around but it only takes a minute or two to start seeing cars stopping and fans getting out to take pictures. This is not the Neverland ranch or the Hollywood Walk of Fame, this is the true place where you can relate to Michael Joseph Jackson, the child that grew onto one of the biggest talents humanity has witnessed.

At the intersection of Jackson Street and Jackson Family Boulevard.
Gary, Indiana.

Whiteboard outside the house. Lots of messages already.

Sunday, May 20, 2012

Visual Attention and Visual Saliency

"Everybody knows what attention is ..."
-William James 1890

This quote is referenced in a research paper from the Visual Computing group at Microsoft Research Asia (MSRA), titled "Learning to Detect a Salient Object". I don't know exactly the context for that quote but it is interesting that somebody says this in 1890 when yet today we don't know many things about visual attention. Visual attention is particularly interesting in Computer Vision because in this field we want to teach computers how to recognize things in the visual world and it seems humans might be taking advantage of things like visual attention in ways computers still aren't.

To actually give some definition of visual attention I would say that it is the condition by which our vision focus in more or less degree on some things within the total amount of information that is perceived. Particularly in computer vision there are some research lines that are closely related to ideas in visual attention, one of them is visual saliency, which could cover among other things a) class-independent object detection or proto-object detection (although proto-objects as defined in the visual perception literature might not be directly usable in a practical application), b) detecting salient objects on an image (under the assumption that we humans do not consider all objects are equally visually important) or c) Detecting saliency maps that define regions that are important on an image without explicitly associating them with an object.

The paper from MSRA ("Learning to Detect a Salient Object" CVPR'07) detects a salient object under the assumption that we know a priori that there exists a salient object in the image. I believe this assumption holds for a large number of images on the web because that's just the way we think when we capture pictures, we usually focus on something. It is easy to imagine that not only Microsoft but also Google are already using some form of visual saliency to autocrop images from the web for display on search results or generating thumbnails. But beyond this obvious application there is room for using these kind of techniques to improve object detection itself or at least to avoid trying to detect objects on every possible location within an image.

Sample saliency maps for the top left image used as features in the MSRA paper.
Those maps were generated using my own implementation of their method.

The MSRA paper also introduces the MSRA Salient Object Database, a large collection of images with manually annotated bounding boxes enclosing the salient object on each image. The only thing not included is source code, that's why in 2009 while I was starting graduate school I decided to implement their method on Matlab [link to source code]. And although the CRF formulation is not exactly the same, I get similar performance to the one reported in the original paper (See slides included at the end of this post). The paper has got some considerable attention since it was first published and so although I don't keep track of how many people are downloading my code, I see a lot of traffic coming from Google search. Also I didn't run many experiments beyond what is explained in the original paper but I found somebody using my code who did a more thorough evaluation. This was done by a student at the Computer Vision class at the University of Texas Austin http://vision.cs.utexas.edu/cv-fall2011/slides/larry-expt.pdf. As I had expected this method does better than Itti & Koch (previous much simpler approach) but only when it actually detects something, which is most likely to happen in the kind of images where we have a clear single salient object, the kind of images the method was trained on.

Links in this post:
MSRA: Learning to detect a salient object source code:
http://www.cs.stonybrook.edu/~vordonezroma/code.html
Saliency Experiments Slides from the University of Texas Austin:
http://vision.cs.utexas.edu/cv-fall2011/slides/larry-expt.pdf

Wednesday, March 28, 2012

Going South

This is an East Coast story: Finally I found some time after a couple of conference deadlines to embark on a road trip from Long Island to Virginia Beach. Stops at Jersey City (NJ), New York City (NY), Newark (NJ), Hamburg (PA), Hightstown (NJ), passing through corn fields in Delaware (DE), Pocomoke City (MD), Virginia Beach (VA), Richmond (VA), Baltimore (MD), Atlantic City (NJ), Jersey City (NJ), New York City (NY) and back to homely Port Jefferson Station (NY).

View East Coast Trip 2012 in a larger map

Old Google maps still a wonderful tool for displaying geographical stories.

Wednesday, February 15, 2012

Simple Audio Synthesizer in Java

This is about a program I wrote in 2005. The objective was to generate sounds that resemble the sounds of real musical instruments. This blog post includes an executable JAR file and the source code for this. If you computer is Java-ready you can start by trying it here right away [YASS]. (UI in English).

This is how the UI should look like (Except this one is the original UI in Spanish)

It uses a bottom-up approach to recreate the sounds of real musical instruments by starting from the most basic constructs - simple sinusoidal functions. This means that more complex wave functions are created by aggregating sinusoidal waves with different frequencies and modulating the amplitude of the resulting waves using envelope functions. The UI allows to modify the basic wave function by modifying the individual sinusoidal waves or choosing from a preset list of wave functions. It also allows for choosing from a preset list of envelope functions. Finally it allows to choose from a preset list of musical instruments. This latter option just chooses the appropriate wave functions and envelope functions that make the resulting sound resemble a musical instrument. This last part was done just by using my own judgement and not any machine learning (Disclaimer: I don't pride myself of having a sense of musical aptitude). And this is all this program can do.

One thing that is most likely wrong is the keyboard, I wouldn't trust the mapping of the tones in the keyboard to the actual tones, this is a quick fix though. I frankly don't remember where did I get the mapping for this keyboard. This document from the University of Tennessee explains how to make the mapping correctly: http://web.eecs.utk.edu/~qi/ece505/project/proj1.pdf.
Another thing that I definitely have to credit is Manfred Thole's demo on Fourier Synthesis, while I clearly took inspiration on the sinusoidal editor from his demo I totally borrowed his function to convert integers to the μ-law scale: http://www.thole.org/manfred/fourier/en_idx.html

Finally I was not sure if I should post the source code for two reasons: 1) Function definitions, variable names, etc are all in Spanish and I'm writing my blog in English. 2) I actually lost the original source code and had to decompile the class files inside the JAR file to get source code and recompile it again with the UI in English, thus additionally losing comments and therefore potentially not acknowledging some sources of inspiration and some wisdom for the future. Still I'm including here the source code in case somebody finds it useful despite of cons 1) and 2), one good thing is that it still compiles in the Java SDK 7 even after decompilation: [Yass-src.zip].

Sunday, February 12, 2012

Learning English

I mentioned in some older post about how I learned to write in my native language when I was young [Learning Karate by Waxing Cars], in this post I will explain how I learned to communicate in a second language: English. I started to get interested in this language when I was a 13-14 year old teenager with lots of spare time. I was good at school but English is something that most of the times you don't get to learn at schools in Ecuador even with proper motivation. I had decided to learn this language as something that might be useful later but most importantly because I saw this as a challenge and something that nobody close to me could do at least at the time. I had no clue about how one teaches himself a second language so I took an English-Spanish dictionary and tried to learn every word brute-force style. Disappointed after some time I tried to learn the grammar rules from another book. Although more motivated by my recent experience with the acquired ability of creating some simple sentences using the few grammar rules I had learned I was still totally lacking any form of speech or listening skills.

ESL videotapes that I used. I just found
them using Google wondering if they
are still available somewhere. This is
apparently a DVD version but I can't forget
the Eagle logo. [Price: $600] [link]
I think their approach is remarkable!

This was around 1998 and while in Silicon Valley there was already a new born Google (maybe Backrub), I didn't have access to computers at the time. Fortunately I had at home an old VHS player and some ESL video tapes that my mother had got some time back from somewhere. My mother never used the tapes herself but she made sure to even photocopy the printed course materials and went as far as to retype some of the material using a typewriter (I hope I don't get my mother into trouble because of some possible case of copyright infringement). Look at the picture in this blog post and the caption. The Speak to Me course is 90% about practicing in front of the screen by repetition, these are not lectures. (Again another example related to [Learning Karate by Waxing Cars] although this time took me longer to realize about the success of this attempt). This educational resource was treasure to me. I went through every one of these more than a half dozen videotapes twice. This time I was very satisfied with the results and was eager to take it to a next level. Unfortunately most private English classes in Ecuador are either very expensive or a well elaborated scam. My parents just could not afford it. So I forgot about English for a while.

Two years later in my last year (senior year?) of high school I was fortunate enough to have assigned a native level speaker as my English teacher, Ms. Martha Dockter. To my own surprise I realized that I could already engage in casual conversation with her in English without major trouble. I had not met any other English speaker before. Ms. Martha was very helpful on correcting a lot of mistakes on my writing and speech, I learned tons with her. She passed away in 2008 (†). Her obiturary reads that she was the daughter of a Reverend from Mansfield, Ohio who travelled to Ecuador with his wife to do missionary work. I'm sure there are many more students like me grateful for her teachings. This is my story about how in a time before the internet I got to learn a second language. Finding the educational resources was harder than today but finding the willingness for self-education was still as hard as it is today.

Saturday, January 28, 2012

The Fire Island Lighthouse

Fire Island Lighthouse

Fire Island is a smaller island connected to the south shore of Long Island. I have been there a couple of times already but so far I have not been able to get there before 3pm so that I can get into the top of the lighthouse. I have not been able to make it even before 4pm so that I can make it to the museum that is located in the building besides the lighthouse. Still there are some nice afternoon views and you can more than often get to spot deer. At least to some of my friends this seems to have become the place for when you are getting bored at home and feel like driving.

Interesting fact is that Fire Island is one of the islands that are part of the Long Island and New York City barrier islands. Another of these islands being Coney Island for which I also dedicated a blog post some time back "The New York Aquarium and Coney Island". I will update this entry once I get to enter the lighthouse and the museum (and this might not happen for quite a while), for now I just wanted to share the picture I took with my fancy smartphone.