Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition

221,050

Solution 1

An alternative approach would be to extract features (keypoints) using the scale-invariant feature transform (SIFT) or Speeded Up Robust Features (SURF).

You can find a nice OpenCV code example in Java, C++, and Python on this page: Features2D + Homography to find a known object

Both algorithms are invariant to scaling and rotation. Since they work with features, you can also handle occlusion (as long as enough keypoints are visible).

Enter image description here

Image source: tutorial example

The processing takes a few hundred ms for SIFT, SURF is bit faster, but it not suitable for real-time applications. ORB uses FAST which is weaker regarding rotation invariance.

The original papers

Solution 2

To speed things up, I would take advantage of the fact that you are not asked to find an arbitrary image/object, but specifically one with the Coca-Cola logo. This is significant because this logo is very distinctive, and it should have a characteristic, scale-invariant signature in the frequency domain, particularly in the red channel of RGB. That is to say, the alternating pattern of red-to-white-to-red encountered by a horizontal scan line (trained on a horizontally aligned logo) will have a distinctive "rhythm" as it passes through the central axis of the logo. That rhythm will "speed up" or "slow down" at different scales and orientations, but will remain proportionally equivalent. You could identify/define a few dozen such scanlines, both horizontally and vertically through the logo and several more diagonally, in a starburst pattern. Call these the "signature scan lines."

Signature scan line

Searching for this signature in the target image is a simple matter of scanning the image in horizontal strips. Look for a high-frequency in the red-channel (indicating moving from a red region to a white one), and once found, see if it is followed by one of the frequency rhythms identified in the training session. Once a match is found, you will instantly know the scan-line's orientation and location in the logo (if you keep track of those things during training), so identifying the boundaries of the logo from there is trivial.

I would be surprised if this weren't a linearly-efficient algorithm, or nearly so. It obviously doesn't address your can-bottle discrimination, but at least you'll have your logos.

(Update: for bottle recognition I would look for coke (the brown liquid) adjacent to the logo -- that is, inside the bottle. Or, in the case of an empty bottle, I would look for a cap which will always have the same basic shape, size, and distance from the logo and will typically be all white or red. Search for a solid color eliptical shape where a cap should be, relative to the logo. Not foolproof of course, but your goal here should be to find the easy ones fast.)

(It's been a few years since my image processing days, so I kept this suggestion high-level and conceptual. I think it might slightly approximate how a human eye might operate -- or at least how my brain does!)

Solution 3

Fun problem: when I glanced at your bottle image I thought it was a can too. But, as a human, what I did to tell the difference is that I then noticed it was also a bottle...

So, to tell cans and bottles apart, how about simply scanning for bottles first? If you find one, mask out the label before looking for cans.

Not too hard to implement if you're already doing cans. The real downside is it doubles your processing time. (But thinking ahead to real-world applications, you're going to end up wanting to do bottles anyway ;-)

Solution 4

Isn't it difficult even for humans to distinguish between a bottle and a can in the second image (provided the transparent region of the bottle is hidden)?

They are almost the same except for a very small region (that is, width at the top of the can is a little small while the wrapper of the bottle is the same width throughout, but a minor change right?)

The first thing that came to my mind was to check for the red top of bottle. But it is still a problem, if there is no top for the bottle, or if it is partially hidden (as mentioned above).

The second thing I thought was about the transparency of bottle. OpenCV has some works on finding transparent objects in an image. Check the below links.

Particularly look at this to see how accurately they detect glass:

See their implementation result:

Enter image description here

They say it is the implementation of the paper "A Geodesic Active Contour Framework for Finding Glass" by K. McHenry and J. Ponce, CVPR 2006.

It might be helpful in your case a little bit, but problem arises again if the bottle is filled.

So I think here, you can search for the transparent body of the bottles first or for a red region connected to two transparent objects laterally which is obviously the bottle. (When working ideally, an image as follows.)

Enter image description here

Now you can remove the yellow region, that is, the label of the bottle and run your algorithm to find the can.

Anyway, this solution also has different problems like in the other solutions.

  1. It works only if your bottle is empty. In that case, you will have to search for the red region between the two black colors (if the Coca Cola liquid is black).
  2. Another problem if transparent part is covered.

But anyway, if there are none of the above problems in the pictures, this seems be to a better way.

Solution 5

I really like Darren Cook's and stacker's answers to this problem. I was in the midst of throwing my thoughts into a comment on those, but I believe my approach is too answer-shaped to not leave here.

In short summary, you've identified an algorithm to determine that a Coca-Cola logo is present at a particular location in space. You're now trying to determine, for arbitrary orientations and arbitrary scaling factors, a heuristic suitable for distinguishing Coca-Cola cans from other objects, inclusive of: bottles, billboards, advertisements, and Coca-Cola paraphernalia all associated with this iconic logo. You didn't call out many of these additional cases in your problem statement, but I feel they're vital to the success of your algorithm.

The secret here is determining what visual features a can contains or, through the negative space, what features are present for other Coke products that are not present for cans. To that end, the current top answer sketches out a basic approach for selecting "can" if and only if "bottle" is not identified, either by the presence of a bottle cap, liquid, or other similar visual heuristics.

The problem is this breaks down. A bottle could, for example, be empty and lack the presence of a cap, leading to a false positive. Or, it could be a partial bottle with additional features mangled, leading again to false detection. Needless to say, this isn't elegant, nor is it effective for our purposes.

To this end, the most correct selection criteria for cans appear to be the following:

  • Is the shape of the object silhouette, as you sketched out in your question, correct? If so, +1.
  • If we assume the presence of natural or artificial light, do we detect a chrome outline to the bottle that signifies whether this is made of aluminum? If so, +1.
  • Do we determine that the specular properties of the object are correct, relative to our light sources (illustrative video link on light source detection)? If so, +1.
  • Can we determine any other properties about the object that identify it as a can, including, but not limited to, the topological image skew of the logo, the orientation of the object, the juxtaposition of the object (for example, on a planar surface like a table or in the context of other cans), and the presence of a pull tab? If so, for each, +1.

Your classification might then look like the following:

  • For each candidate match, if the presence of a Coca Cola logo was detected, draw a gray border.
  • For each match over +2, draw a red border.

This visually highlights to the user what was detected, emphasizing weak positives that may, correctly, be detected as mangled cans.

The detection of each property carries a very different time and space complexity, and for each approach, a quick pass through http://dsp.stackexchange.com is more than reasonable for determining the most correct and most efficient algorithm for your purposes. My intent here is, purely and simply, to emphasize that detecting if something is a can by invalidating a small portion of the candidate detection space isn't the most robust or effective solution to this problem, and ideally, you should take the appropriate actions accordingly.

And hey, congrats on the Hacker News posting! On the whole, this is a pretty terrific question worthy of the publicity it received. :)

Share:
221,050
Charles Menguy
Author by

Charles Menguy

Passionate about programming, computer science, data-mining and new technologies in general. Always eager to learn, and happy to teach what I know. Software engineer with a background in CS and AI, I'm currently working in the big data space with technologies such as Hadoop. I've joined the StackExchange network in an effort to improve my knowledge and also help others get the answers they're looking for.

Updated on October 01, 2021

Comments

  • Charles Menguy
    Charles Menguy over 2 years

    One of the most interesting projects I've worked on in the past couple of years was a project about image processing. The goal was to develop a system to be able to recognize Coca-Cola 'cans' (note that I'm stressing the word 'cans', you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.

    Template matching

    Some constraints on the project:

    • The background could be very noisy.
    • The can could have any scale or rotation or even orientation (within reasonable limits).
    • The image could have some degree of fuzziness (contours might not be entirely straight).
    • There could be Coca-Cola bottles in the image, and the algorithm should only detect the can!
    • The brightness of the image could vary a lot (so you can't rely "too much" on color detection).
    • The can could be partly hidden on the sides or the middle and possibly partly hidden behind a bottle.
    • There could be no can at all in the image, in which case you had to find nothing and write a message saying so.

    So you could end up with tricky things like this (which in this case had my algorithm totally fail):

    Total fail

    I did this project a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:

    Language: Done in C++ using OpenCV library.

    Pre-processing: For the image pre-processing, i.e. transforming the image into a more raw form to give to the algorithm, I used 2 methods:

    1. Changing color domain from RGB to HSV and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with. Binarized image
    2. Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.
    3. Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps. Contour detection

    Algorithm: The algorithm itself I chose for this task was taken from this awesome book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). It basically says a few things:

    • You can describe an object in space without knowing its analytical equation (which is the case here).
    • It is resistant to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor.
    • It uses a base model (a template) that the algorithm will "learn".
    • Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model.

    In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below:

    GHT

    Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least...

    Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas:

    • It is extremely slow! I'm not stressing this enough. Almost a full day was needed to process the 30 test images, obviously because I had a very high scaling factor for rotation and translation, since some of the cans were very small.
    • It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes)
    • Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map.
    • In-variance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized.

    Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentioned?

    I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn. :)

    • ely
      ely about 12 years
      It might be said that this question is more appropriate at dsp.stackexchange.com, or stats.stackexchange.com, and you certainly should consider re-asking at those sites too.
    • ely
      ely about 12 years
      The first thing to do here is to analyze why the different failure cases are happening. E.g., isolate examples of places where bottles win, where the images are fuzzy, etc., and perform some statistical analysis to learn the difference between their Hough representations and the ones you wish it would detect. Some great places to learn about alternative approaches are here and here
    • stacker
      stacker about 12 years
      @linker Wouldn't be extracting SIFT or SURF features much faster than the hough-transformation ? Why only detect cans when you could detect more registered objects?
    • ely
      ely about 12 years
      @stacker makes a good point. For speed you want to get cheap-to-compute features, like histograms of oriented gradients. A really naive first approach would be to manually label a bunch of can rectangles in some training images, and use these plus random negative examples to train an SVM or decision-tree classifier. The training will take longer, but the execution on novel images will be much faster. I'm planning to write this method up when I get more free time to include the right references.
    • Charles Menguy
      Charles Menguy about 12 years
      @stacker I did this because the scope of the assignment was specifically aimed at CocaCola cans. I don't know much about SIFT or SURF, but if this algorithm fits this problem I'd love to see a reply on the topic.
    • George Duckett
      George Duckett about 12 years
      How about an approach similar to reCAPTCHA? ;)
    • BlueRaja - Danny Pflughoeft
      BlueRaja - Danny Pflughoeft about 12 years
      Why was this moved from dsp.stackexchange.com? It seems like that site would be an even better fit than stackoverflow o_O
    • Charles Menguy
      Charles Menguy about 12 years
      @GeorgeDuckett reCAPTCHA could be an idea, but you have absolutely no guarantee that the Coca-Cola letters will be visible, they could be totally hidden, partially hidden, or the can could be turned around. And doesn't solve the problem with the bottle either since the letters are the same.
    • maniek
      maniek about 12 years
      Have You tried to detect the top or bottom seal of the can? It could be possible to detect it as an edge parallel to the edge of the red area.
    • George Duckett
      George Duckett about 12 years
      I didn't mean for character recognition, I meant using the idea of using humans to do the finding. I.d. show then 2 pictures, one known and one not known. Was just joking. :-)
    • Abid Rahman K
      Abid Rahman K about 12 years
      Can you add few more test images to get a much more idea?
    • Saeed Amiri
      Saeed Amiri about 12 years
      Did you tried it without converting RGB to HSV? I think your problem of bottles is in your conversion, actually you removed your cans in your first step.
    • Charles Menguy
      Charles Menguy about 12 years
      @SaeedAmiri No the conversion to HSV works fine, it's just so I can eliminate some of the stuff that is clearly not red-like. Can and bottle are still there even after passing to HSV, the problem is mainly how to differentiate between the 2 since the have common characteristics.
    • Saeed Amiri
      Saeed Amiri about 12 years
      But seems your sample is saying something else, I think in your first sample after preprocessing images of can removed? would you arrange your samples step by step?
    • Charles Menguy
      Charles Menguy about 12 years
      @SaeedAmiri Oh i see what you mean, images 2 and 3 in my question are not of the same original ! In image 3, there was only a can. I could have posted more, but trying to keep the question to a strict minimum. Just assume that the HSV translation is correctly keeping both can and bottles (+ some noise all over the image)
    • John John Pichler
      John John Pichler about 12 years
      I have a great attraction for this kind of software. Anyone knows if there are some mature and well established Java Library to do these image recognition?
    • Cashew
      Cashew about 11 years
      @EdPichler openCV just very recently released their java bindings for their library (as of 2.4.4). So, basically you could use openCV in java (without all the fuss of doing JNI manually). I tried it and it worked fine (but it's still buggy since its very recent)
    • Cashew
      Cashew about 11 years
      This is all way over my head, but I was thinking: "why not use OpenCV's GPU module and take advantage of your GPU to speed it up dramatically?" OpenCV basically have a GPU module that has algorithms like Hough Transforms and what-not written in CUDA and run on CUDA-enabled GPU's. The great thing here is that there is no need to learn any CUDA. Just import the gpu module and start using it. I hope this helps (it should theoretically increase performance by an order of magnitude or more)
    • ldog
      ldog over 7 years
      This is like an obvious application of a convolutional neural network with scale/rotation invariance.
    • ldog
      ldog over 7 years
      If you use a hough-transform, you should use a faster version of the original algorithm. You can modify the hough-transform to focus on only high probability parameters using methods like RANSAC.
    • Takahiro Waki
      Takahiro Waki over 7 years
      Only the information of special red and white in coca-cola is enough distinguishable.
    • Ofek Shilon
      Ofek Shilon about 6 years
      1337! Please, nobody upvote this question again
    • Fattie
      Fattie over 5 years
      This question should be closed for 5 or 6 different reasons, pls click the "Close" button.
    • sachinruk
      sachinruk over 3 years
      I wonder how this task would be dealt with in 2020 with CNNs going crazy. There are SSD/ YOLO/ Unet type of things, but these primarily deal with larger objects. I wonder if there is an equivalent for small day to day objects like watches, phones toys etc.
    • Keithel
      Keithel over 2 years
      @GeorgeDuckett the processing time per frame would be pretty darn long for that approach. ;)
  • Charles Menguy
    Charles Menguy about 12 years
    Actually I didn't explain that in the post, but for this assignment I was given a set of roughly 30 images, and had to do an algorithm who would match them all in various situations as described. Of course some images were held out to test the algorithm in the end. But I like the idea of Kinect sensors, and I'd love to read more on the topic !
  • Charles Menguy
    Charles Menguy about 12 years
    What would roughly be the size of the training set with a neural network to have satisfying results? What's nice with this method also is that I only need one template to match almost everything.
  • Charles Menguy
    Charles Menguy about 12 years
    Yep I've thought about that too, but didn't have much time to do it. How would you recognize a bottle, since it's main part will look like a scaled can? I was thinking looking for the red plug as well and see if it's aligned with the bottled center, but that doesn't seem very robust.
  • Charles Menguy
    Charles Menguy about 12 years
    Actually no : there is no constraint of size or orientation (or orientation but i didn't really handle that), so you can have a bottle very far in the background, and a can in the foreground, and the can would be way bigger than the bottle.
  • Charles Menguy
    Charles Menguy about 12 years
    I've also checked that the width to height ratio is pretty similar for bottle and can, so that's not really an option as well.
  • littleadv
    littleadv about 12 years
    The label ratio (being it a trademark) is the same. So if the (bigger) bottle is slightly further away on the picture, its size will be exactly the same as that of the can.
  • Charles Menguy
    Charles Menguy about 12 years
    Sounds interesting. Does this algorithm also handle orientation invariance (i.e. if the can doesn't directly face the objective of the camera)? This is one of the major points where my algorithm failed.
  • Sharad
    Sharad about 12 years
    Yes exactly that is why I suggest stereo imaging to recover the depth first. By using stereo imaging you can get the depth and then evaluate the actual size by adding the depth info.
  • Sharad
    Sharad about 12 years
    To explain a bit more. Suppose can is at z=0 and bottle at z=-100. Since bottle is far behind it will look smaller. But if I know that the bottle is at z=-100 and can at z=0, then I can calculate the expected size of the can/bottle if both are translated to z=0. So now they are at the same depth and hence I can make decisions based on size.
  • stacker
    stacker about 12 years
    @linker you could take a few pictures of the 3D-object by rotating (around objects y-axis) Sinceimages of a can look different from front and backside you would try to find the closest match and estimate its orientation regarding cases where the logo isn't fully visible.
  • sne11ius
    sne11ius about 12 years
    If your set of images is predefined and limited, just hardcore perfect results in your prog ;)
  • Charles Menguy
    Charles Menguy about 12 years
    Yeah if I train on the dataset I'm going to run the algorithm against, sure I'll get perfect results :) But for example for this assignment, the program was tested by the teacher in the end on a set of held out images. I'd like to do something that would be robust and not overfit to the training data.
  • Lukasz Madon
    Lukasz Madon about 12 years
    If there is a red cap (or ring) parallel to the "Coca cola" it is most likely a bottle.
  • siamii
    siamii about 12 years
    @linker How did you train your algorithm for cans? Did you have examples of cans? How about training with examples of bottles?
  • Charles Menguy
    Charles Menguy about 12 years
    The strength of this algorithm is that you only need one template to train on, and then it applies all transformations to match it to other potential cans. I was using a binarized and contour-based version of this template to train, so the only difference between can and bottle would be the plug, but I'm afraid it would bring more false positives since the gravity center would be somewhere on the edge or outside of the bottle. It's worth giving it a try I guess. But that will double my processing time and I'm going to cry ;)
  • Charles Menguy
    Charles Menguy about 12 years
    Like this was being discussed on DSP in the short time when it was moved, some bottles may not have plugs ;) or the plug could partially hidden.
  • Charles Menguy
    Charles Menguy about 12 years
    Ideally I'd like to apply this algorithm in case of randomly sampled images, where I just have a plain jpeg or png image. Would this algorithm still work in this case, or do I need some special images?
  • Fantastic Mr Fox
    Fantastic Mr Fox about 12 years
    The number of training sets varies, you have to be careful of a few things though: Don't over train, you probably want a test set to show how your accuracy is going. Also the number of training sets will depend on the number of layers you will use.
  • Charles Menguy
    Charles Menguy about 12 years
    Could you provide an example in your answer on how to use it with OpenCV? That would be way cool !
  • Charles Menguy
    Charles Menguy about 12 years
    I like the idea, but it seems like you'd need some really really good lighting conditions. In the example image where there is both can and bottle for example this seems a bit hard to make the distinction.
  • Charles Menguy
    Charles Menguy about 12 years
    Regarding neural networks in the context of shape recognition, do you know if there's something similar in OpenCV? Or if i'd need to implement my own?
  • tskuzzy
    tskuzzy about 12 years
    In your example, notice how the specularity for the plastic label is much more diffuse than the very bright spots on the can? That's how you can tell.
  • Charles Menguy
    Charles Menguy about 12 years
    I see, which kind of color space representation would you use in this case to capture specularity in your algorithm? This seems quite tough to get in RGB or HSV
  • stacker
    stacker about 12 years
    @linker I tried to run it using your images, but the coke can in the scene image was probably too small (maybe a higher resolution would work) to extract enough features. If you have trouble with the features change the minHessian parameter (Step 1).
  • MSalters
    MSalters about 12 years
    Essentially this is a reasonable direction. I'd phrase it slightly different: First find all candidates, and then for each candidate determine whether it's a bottle, a can, or something else.
  • Charles Menguy
    Charles Menguy about 12 years
    Excellent post, thanks for the details ! Yes I've intentionnally reduces the size of the images here to reduce bandwidth, but my original images have much better quality. I'll give this a try, but sounds like exactly what it's made for and much simpler than the GHT.
  • Admin
    Admin about 12 years
    I agree with @stacker - SIFT is an excellent choice. It's very robust against scale and rotation operations. It 's somewhat robust against perspective deformation (this can be improved as suggested by stacker: a template database with different perspective views of the desired object). Its Achilles' heel in my experience would be strong lighting variations and very expensive computation. I don't know of any Java implementations. I'm aware of an OpenCV implementation and have used a GPU c++/Windows (SiftGPU) implementation suitable for realtime performance.
  • Agos
    Agos about 12 years
    A note of warning: as much as I love SIFT/SURF and what they have done to me, they are patent encumbered. This might be a problem, depending on a number of conditions including geographic location AFAIK.
  • Charles Menguy
    Charles Menguy about 12 years
    Thanks for the link that looks interesting. Regarding the training, what is the size of the training set that would be reasonable to achieve reasonable results? If you have an implementation even in c# that would be very helpful as well !
  • moooeeeep
    moooeeeep about 12 years
    A potential problem could be that it may generate false positives whereever there is the Coca-Cola logo.
  • Charles Menguy
    Charles Menguy about 12 years
    That's a great suggestion, I especially like the fact that this algorithm should be pretty fast, even if it will probably have many false negatives. One of my hidden goals is to use this detection in real-time for robotics, so that could be a good compromise !
  • kmote
    kmote about 12 years
    Yes, it is often forgotten (in a field characterized by precision) that approximation algorithms are essential for most real-time, real-world-modeling tasks. (I based my thesis on this concept.) Save your time-demanding algorithms for limited regions (to prune false positives). And remember: in robotics you're usually not limited to a single image. Assuming a mobile robot, a fast alg can search dozens of images from different angles in less time than sophisticated algs spend on one, significantly reducing false negatives.
  • MrGomez
    MrGomez about 12 years
    I really like this approach! Unfortunately, it lacks for sufficient generalization, as bottles are not the only plausible false positives that may be detected. I've gone ahead and rolled this into an answer, because it was too much to comment on here. :)
  • Charles Menguy
    Charles Menguy about 12 years
    That's an interesting approach which is at least worth a try, I really like your reasoning on the problem
  • Li-aung Yip
    Li-aung Yip about 12 years
    I like the idea of using what amounts to a barcode scanner for extremely fast detection of Coca-Cola logos. +1!
  • Alexis Dufrenoy
    Alexis Dufrenoy about 12 years
    I had the same thought, but I think the silver lining on top of the can changes dramatically depending on the angle of the can on the picture. It can be a straight line or a circle. Maybe he could use both as reference?
  • karlphillip
    karlphillip about 12 years
    The problem of looking for signatures in this case is that if we turn the can to the other side, i.e. hiding the signature, the algorithm will fail to detect the can.
  • karlphillip
    karlphillip about 12 years
    +1 I thought about this and was in my way to implement this approach. However, @linker should share his set of images so we can try to do more educated guesses.
  • Abid Rahman K
    Abid Rahman K about 12 years
    yeah.. i am too thinking it was good if there were more images.
  • Fantastic Mr Fox
    Fantastic Mr Fox about 12 years
    You will probably want to implement your own. I have not seen anything in openCV.
  • Li-aung Yip
    Li-aung Yip about 12 years
    @karlphillip: If you hide the signature, i.e. the logo, then any method based on looking for the logo is going to fail.
  • karlphillip
    karlphillip about 12 years
    @Li-aungYip I'm aware of that, thank you. English is not my native language. :)
  • Li-aung Yip
    Li-aung Yip about 12 years
    @karlphillip: I think what you may have meant to say is "imagine if you turn the can 90 degrees so that only part of the logo is visible." You can overcome that by taking three scan lines (top, middle and bottom of logo) - if any part of the logo is visible, you can see at least one of these.
  • karlphillip
    karlphillip about 12 years
    @Li-aungYip Nice workaround, but the signature method has other limitations, for instance, if the label of the can is a little bit damaged or if the can is a little bit smashed, the detection will fail. The reality is that this question is a tough research problem. It's too complex and it's current format invites extended discussion. People don't realize that there are experts researching this type of thing in a daily basis. The problem ain't going to be solved in a SO thread.
  • karlphillip
    karlphillip about 12 years
    @Li-aungYip True, but that doesn't necessarily mean that it is appropriate to this Q&A site. Well, we already discussed this subject a lot on meta, no reason to do it here again. My argument was ditched when a moderator said that if people liked it then we should keep it.
  • Ian
    Ian almost 12 years
    This is kind of what I was thinking: don't rule out particular kinds of false positives. Rule in more features of what makes a coke can. But I'm wondering: what do you do about a squished can? I mean, if you step on a coke can it's still a coke can. But it won't have the same shape anymore. Or is that problem AI-Complete?
  • Rui Marques
    Rui Marques over 11 years
    So try OpenCV's ORB or FREAK which have no patent issues. ORB is much faster than SIFT. ORB it is a bit poor with scale and light variations in my experience but test it yourself.
  • mattbasta
    mattbasta over 11 years
    You could make this algorithm recognize the can's shape if you add some additional steps: if you signature is detected, have an array of lengths that you expect the can to find (and expect not to find) the red can at for a number of intervals over the signature's length. Scan down the signature line, then outwards, testing to see if the pixels are the expected color.
  • mattbasta
    mattbasta over 11 years
    Additional thought: you'd probably need a set of signatures to make the shape detection work, because you can't assume that the can is facing directly towards the camera. You could also run another algorithm to find the midpoint of the face of the can, but that's probably going down a long, dark road ;)
  • Diego Cerdan Puyol
    Diego Cerdan Puyol about 11 years
    Any idea what to Google if I want to build something using this approach?
  • kmote
    kmote about 11 years
    @DiegoCerdanPuyol: that's a pretty wide open question. What i have described is a fairly elementary application of the field of "Digital Image Processing" for which you will find numerous books at Amazon. Start reading some of that literature and if you run into a specific roadblock, post a more specific question here on SO (but not in the comments).
  • Diego Cerdan Puyol
    Diego Cerdan Puyol about 11 years
    @kmote What is the knowledge representation format used in "Digital Image Processing" to store the signature in a useful format so I can correlate it on new images?
  • G453
    G453 over 10 years
    How can you accept this as an answer... None of the feature descriptors can differentiate bottles from a cans.. They all just view invariant local pattern descriptors. I agree that SIFT,SURF,ORB,FREAK etc. can help you in feature matching but.. What about your other parts of the question like occlusions, Bottle vs Can etc. I hope this is not a complete solution in fact if you would have GOOGLED your problem probably the first result would be this answer only.
  • sepdek
    sepdek over 10 years
    @G453 you are absolutely right! Probably he was fascinated by the performance of SHIFT and forgot that feature extraction and matching was NOT THE PROBLEM...
  • Rui Marques
    Rui Marques about 10 years
    I also agree with @G453, but I think feature matching is still the best starting point. About occlusions, feature matching will have some degree of robustness, 1/3 of the Cola logo visible might be enough for the detection. To filter only the cans cans I would probably do something like Back Projection to detect the metal, I see no other way around using cans distinctive color features. Other than that, maybe its geometry.
  • Rui Marques
    Rui Marques about 10 years
    Interesting project but it only applies to your very specific setup.
  • Rui Marques
    Rui Marques about 10 years
    What if the light source was behind the can? I think you would not see the highlight.
  • user391339
    user391339 about 10 years
    To me a very important salient feature is the top part of the can, it's metallic color and curvature.
  • Sergey Orshanskiy
    Sergey Orshanskiy almost 10 years
    Wow! I actually thought there were two cans on that picture. I thought that was a can in the bottle... Perhaps you shouldn't blame the algorithm for missing it, only for not detecting the other one.
  • user613326
    user613326 over 9 years
    This is by far the best suggestion, and should be the solution to this question; its simple elegant, it breaks down looking at things, it found a hack inside the question, a shortcut into how we observe the world. And thats what is artificial vision recognition is all about.
  • Stefan
    Stefan about 9 years
    -1: This approach is an ad-hoc "solution" which does neither work with other logos or object types. Before trying to come up with you own solution you should really have a look into the literature of logo recognition, object recognition or image classification.
  • kmote
    kmote about 9 years
    @Stefan - I have great respect for the work you've done in logo recognition, but I just wanted to point out that my "solution" does in fact address the OP's specific question. He did not ask for a general solution; he wanted a fast solution satisfying the stated requirements. Sometimes, as experts, we are too quick to jump to the over-engineered solution, when a simple "ad-hoc" approach might get you there with a fraction of the effort. (But in principal, I should point out that my approach is indeed extensible, and could in theory be trained on any number of logos.)
  • NoBugs
    NoBugs almost 9 years
    Wouldn't you still need to check every single row/rows group in the whole image, *360 for every possible rotation, for this "line" identifier? I don't suppose artificial neural-net would help at all in this problem?
  • Octopus
    Octopus over 8 years
    The particular shade of red is mostly subjective and strongly influenced by lighting considerations and white balance. You might be surprised by how much those can change. Consider, for example, this checkerboard illusion.
  • Midhun Harikumar
    Midhun Harikumar about 8 years
    Though i agree with @G453 . I also feel that since the OP did not explore this path. His problem has not become exponentially easier to solve. Since the answer provided has taken OP in the right direction i feel it is a good answer. Also once OP finds the SIFT descriptor its only a matter of time before he figures out how to find rotation of the object and hence estimating the location and orientation of the can .
  • crodriguezo
    crodriguezo almost 8 years
    What about the state of the art of object detection? using CNN you can extract specific features and train a network to process it to obtain the bounding box and the objectness. However. the most challenging task of this problem is the classification between cans and bottles and the only thing that I can think to address this problem is using Structure from motion to reconstruct the volume, therefore, get 3D information.
  • Martijn Courteaux
    Martijn Courteaux over 7 years
    You shouldn't look for high frequencies in the red channel. Deep red = (1,0,0) and white=(1,1,1). So red doesn't change. It are the other components that change.
  • B.Kosmowski
    B.Kosmowski over 7 years
  • Ken
    Ken over 6 years
    Considering if we have only the labels for bottles / cans and none of the other distinguishing factors of bottle cap or transparency or can top/bottom - The width of the bottle is different than the width of the can.
  • Fattie
    Fattie over 6 years
    For cans V bottles obviously they are totally different objects, you just train it for two different things. (You might as well say "how to make it recognize both motorbikes and SUVs as vehicles?!", nonsensical.)
  • Fattie
    Fattie over 6 years
    This answer is just a comment ("try using SIFT").
  • Fattie
    Fattie over 6 years
    This "answer" is just a comment at best. On the entire site, this is the "not an answer" with the highest number of upvotes.
  • Fattie
    Fattie over 6 years
    It's hardly worth mentioning but of course this general idea utterly fails on quaternion invariance and scale invariance.
  • kmote
    kmote over 6 years
    @NoBugs: (Apologies for the extremely delayed response!) This requires a single pass through every row of the image, because we're not checking against a single line identifier, but rather a rotation-adjusted collection of identifiers. (See section on "signature scan lines".)
  • kmote
    kmote over 6 years
    @Fattie: Actually, (although I had to look up the word "quaternion"!) I believe both of your assertions are incorrect. If you'll read the description more closely, you will note that this approach is both scale- and rotation-invariant. (And rotation in the z-axis, for the purposes of this approach, is equivalent to scaling.)
  • Fattie
    Fattie over 6 years
    @kmote I see, you're saying that the training lines, are actually say 30 or so "taken" at different angles. So you are proposing to take (let's say) about 20 ? lines of the original, at perhaps 30 angles, being about 1000. then you make (let's say) about 100 scan lines SL of the image. so for each SL, the "matching red-white scan" could appear at: any scale / any location within SL. so, in a handywavey way, we will solve that particular classic image recognition problem. so you'd do 100,000 of those "linear scale/position invariant" handwave matches.
  • Fattie
    Fattie over 6 years
    "Wouldn't you still need to check every single row/rows group in the whole image, for every possible [many] rotations, for this "line" identifier" FWIW yes, that's correct. note that "every" rotation is meaningless, you'd just try your best with (say) 20, 50 or 100 rotations. In this scheme, for each scan of the test image (let's say you took 100 of them from top to bottom), you have to try each of those 100 scans with each of the (say) 50 rotations, at each of the (say) 10 or 20 test lines across the logo.
  • Fattie
    Fattie over 6 years
    (continuing that last comment) again noting that each linear line-test in this proposed scheme involves position/scale invariance, which would have to be solved somehow.
  • spillner
    spillner about 6 years
    While researching TLD, I found another user looking for a C# implementation--- is there any reason not to put your work on Github? stackoverflow.com/questions/29436719/…
  • eyeshield21
    eyeshield21 about 6 years
    N.B. Years, later, link is now dead
  • Kostas Mouratidis
    Kostas Mouratidis almost 5 years
    OP said there were 30 high-res images, which is probably not the best scenario for training ConvNets. Not only are they too few (even augmented), the high-res part would destroy ConvNets.
  • AlgoRythm
    AlgoRythm over 4 years
    What if a can is placed in front of the logo for the bottle?
  • Hat
    Hat about 4 years
    An update to the link that @Octopus posted: persci.mit.edu/gallery/checkershadow
  • DisappointedByUnaccountableMod
    DisappointedByUnaccountableMod about 4 years
    A perception illusion doesn’t affect what your webcam sees - i.e. what your code gets - only how a human eye helpfully(?) fools the brain.
  • artless noise
    artless noise about 3 years
    This fails when they market the 'coca cola' keg. A speaker dependant limited vocabulary is much easier than speaker independent unlimited vocabulary applications. The same concepts apply to vision.
  • Shaik Ahmad
    Shaik Ahmad almost 3 years
    HALCON is a proprietary software, is there any opensource packages which provides similar solutions than opencv.
  • Darien Pardinas
    Darien Pardinas almost 3 years
    If there were, MVTec would be out of business by now. Their software is really expensive!
  • Vidar
    Vidar about 2 years
    I love the elegance of this - a signature is so simple but genius.