How to detect if image is present on screen?

13,120

Solution 1

If you break this down into pieces, they're all pretty simple.

First, you need a screenshot of the app's window as a 2D array of pixels. There are a variety of different ways to do this in a platform-specific way, but you didn't mention what platform you're on, so… let's just grab the whole screen, using PIL:

screenshot = ImageGrab.grab()
haystack = screenshot.load()

Now, you need to convert your base64 into an image. Taking a quick look at it, it's clearly just an encoded PNG file. So:

decoded = data.decode('base64')
f = cStringIO.StringIO(decoded)
image = Image.open(f)
needle = image.load()

Now you've got a 2D array of pixels, and you want to see if it exists in another 2D array. There are faster ways to do this—using numpy is probably best—but there's also a dumb brute-force way, which is a lot simpler to understand: just iterate the rows of haystack; for each one, iterate the columns, and see if you find a run of bytes that matches the first row of needle. If so, keep going through the rest of the rows until you either finish all of needle, in which case you return True, or find a mismatch, in which case you continue and just start again on the next row.

Solution 2

this is probably the best place to start:

http://effbot.org/imagingbook/image.htm

if you don't have access to the image's meta data, file name, type, etc, what you're trying to do is very difficult, but your pseudo sounds on-point. essentially, you'll have to create an algorithmic model based on a photo's shapes, lines, size, colors, etc. then you'd have to match that model against models already made and indexed in some database. hope that helps.

Share:
13,120
user1251385
Author by

user1251385

Updated on June 23, 2022

Comments

  • user1251385
    user1251385 almost 2 years

    Here is the image I need to detect: http://s13.postimg.org/wt8qxoco3/image.png

    Here is the base64 representation: http://pastebin.com/raw.php?i=TZQUieWe

    The reason why I'm asking for your help is because this is a complex problem and I am not equipped to solve it. It will probably take me a week to do it by myself.

    Some pseudo-code that I thought about:

    1) Take screenshot of the app and store it as image object.

    2) Convert binary64 representation of my image to image object.

    3) Use some sort of algorithm/function to compare both image objects.

    By on screen, I mean in an app. I have the app's window name and the PID.

    To be 100% clear, I need to essentially detect if image1 is inside image2. image1 is the image I gave in the OP. image2 is a screenshot of a window.