Once upon a time, a computer could tell you virtually nothing about an image beyond its file format, size, and color palette. These days, powerful image recognition systems are a part of our everyday lives. They See Your Photos is a simple website that shows you just how much these systems can interpret from a regular photo.
The website simply takes your image submission, runs it through the Google Vision API, and spits back out a description of the image. I tried it out with a photograph of myself, and was pretty impressed with what the vision model saw:
The photo is taken in a lush green forest, with tall trees dominating the background. The foreground features a person, who appears to be the subject of the photograph. The lighting suggests it might be daytime, and the overall color palette is heavily saturated with shades of green, almost artificial in appearance. There’s also some dried vegetation visible to the left, suggesting a natural setting that is possibly a park or woodland area.The subject is a young to middle-aged Caucasian male with shoulder-length, light-colored hair. He seems serious, perhaps pensive or slightly uneasy. His clothing —a green and yellow checkered shirt over a green and black striped shirt—suggests a casual or outdoorsy lifestyle. He might be of middle to lower-middle class economic standing. It looks like he’s crouching slightly, possibly for the picture. The image lacks metadata on the camera device used or the time the photo was taken. He appears to be alone in the photo, indicating an individualistic or solitary experience.The saturation level of the greens and yellows is unusually high, hinting at possible digital editing post-capture. There is a very slight blur, particularly noticeable in the background which could be from a smaller aperture or shallow depth of field when captured, creating a focus on the subject. The color alteration and seemingly intentional focus on the subject suggest it may not be a candid shot but rather a posed photograph, possibly with an artistic or stylistic goal.
The model did very well—easily determining both the vague type of locale , and the fact that my shirt implies I don’t have a Ferrari at home in my garage. It also picks up on the fact that it was a posed photograph with certain artistic intent.
Back in 2014, the webcomic XKCD stated that it would be an inordinately difficult task for a computer to determine if a digital photo contained a bird. These days, a computer model can tell us what’s in a photo down to the intimate details, and even make amusing assertions as to the lives of the subjects in the image and their intentions. We’ve come a long way, to be sure.
Machine vision is still far from perfect—there are ways to fool systems and areas in which they still don’t stack up to real humans. The only thing we know for certain is that these systems will continue to improve.
I wonder what it would make of a full face mask
ref: https://www.schneier.com/blog/archives/2024/12/full-face-masks-to-frustrate-identification.html
There are many celebrity full face masks, but you can also get your own custom full face mask (sock) created/printed for less than a tenner. They are currently good enough to fool most AI, but not close inspection by any human.
I mean, the article has a link to the site, you had the video. Let me do that for you. I took paused the video, grabbed and cropped a screen capture, and uploaded it:
Looks like she got her team.
Grok can do this too. You can also let it explain posts on X. It will find context, translate parts, explains what the included image means and can even create a new image based on all the rest of the information combined. I think it’s pretty impressive how much it can do and how fast the technology is going. It can now even take a post without context and explain it with the context based on other posts and previous posts by the user. It can render images on how it thinks the person behind an account looks based on the persons post history. It’s pretty crazy.
Here is the description that Grok gave me based on the person in the post above. I don’t know how accurate it is.
Yet while I try to get any of the tools to write a roast on purpose, I get an error of some sort. This one goes straight to dirty smelly hippie. Is this another case of techbros vs you know, actually going outside and meeting regular folk?
I didn’t read it like dirt smelly hippie. I thought it was about the hair and outfit. His hair isn’t brushed and his shirt isn’t closed up, which would fit the conventional grooming standards part. I didn’t ask it to insult him, I just asked to describe him and the surroundings and this is what it told me. The prompt was “take a look at this picture and describe the person you see and the environment the person is in, take note of details in the picture and write a story about it”. If you actually tell it to be insulting it will start to insult on purpose. Fun mode can be quite entertaining, but I didn’t use that. Grok can very easily roast people if you ask it to.
I could see it as a future aid towards determination of post and poster social score.
I tried it with a picture of a Capybara firing a machine gun I’d generated with Google ImageFX earlier, and got the following output:
“The image shows a pattern of vertical stripes in shades of dark purple and greyish-black. There is no discernible foreground or background, as the pattern fills the entire frame. The consistent repetition of the stripes creates a sense of depth, although there are no objects or features to indicate a specific location or setting. The subtle variation in the shade of purple suggests that there might be a gradient effect present, though not immediately apparent.
There are no people or other life forms present in this image. It lacks any characteristics that allow for inference about emotions, racial characteristics, ethnicity, age, economic status, or lifestyle. Because of this, there are no activities depicted either.”
This is so bad, it’s not even wrong.
That sounds more like a problem in the conversion from the image format into the internal format required for feeding to the network, not a fault of the network itself.
I wouldn’t be surprised if Google was putting some kind of AI fingerprint on outputs. Training future models on AI-generated work would degrade quality, so it’s in their interest to know if the images they’re scraping are real or not.
I tried with an AI generated face from https://thispersondoesnotexist.com/ and it worked. Although it is probably not using any google trained network.
Are you blocking canvas access in your browser and/or have anti-fingerprinting features enabled?
Never mind what THEY see in my comment – I cannot even get the useless WP + Jetpack comment engine here on Hackaday to recognize that I am making a new comment in the first place! 00PS, WILL PUTTING THE WORD “JETPACK” IN MY COMMENT GET IT AUTO-BANNED?
Well my comment containing the word “Jetpack” was not auto-banned this time. But it took a cacheless browser refresh just to see my post in the first place. What a mess…
And yet another cacheless browser refresh was required just to see my post…
Same again – Cachless refresh needed just to see my post in Firefox :-(
Are you new? It has always taken a cacheless browser refresh to see your posts. Next you will complain that you can’t reply to a post without cookies enabled.
@easy: I can pretty much guarantee I’ve been posting here here on HaD much longer than you have! And yes, this cachless refresh problem is not new, but now it has become a persistant annoyance. I don’t care about cookies, for the most part they are automatically “managed” on my machine. I try to never block ads here on HaD, they deserve to be paid.
there was only one question on my mind and the answer is blue-black:
On a picture I tried it on: “The girl’s cardigan has a subtle imperfection—a small, almost invisible hole near the button”.
Too smartypants for its own good – it’s the buttonhole!
Someone please ban all this stuff immediately. Humans are perfectly excellent at drawing incorrect conclusions and locking in on inaccurate opinions themselves without authoritative seeming answers from some well meaning but truly ignorant “intelligence” doing it for them.