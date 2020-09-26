Maybe you heard about the anger surrounding Twitter’s automatic cropping of images. When users submit pictures that are too tall or too wide for the layout, Twitter automatically crops them to roughly a square. Instead of just picking, say, the largest square that’s closest to the center of the image, they use some “algorithm”, likely a neural network, trained to find people’s faces and make sure they’re cropped in.
The problem is that when a too-tall or too-wide image includes two or more people, and they’ve got different colored skin, the crop picks the lighter face. That’s really offensive, and something’s clearly wrong, but what?
A neural network is really just a mathematical equation, with the input variables being in these cases convolutions over the pixels in the image, and training them essentially consists in picking the values for all the coefficients. You do this by applying inputs, seeing how wrong the outputs are, and updating the coefficients to make the answer a little more right. Do this a bazillion times, with a big enough model and dataset, and you can make a machine recognize different breeds of cat.
What went wrong at Twitter? Right now it’s speculation, but my money says it lies with either the training dataset or the coefficient-update step. The problem of including people of all races in the training dataset is so blatantly obvious that we hope that’s not the problem; although getting a representative dataset is hard, it’s known to be hard, and they should be on top of that.
Which means that the issue might be coefficient fitting, and this is where math and culture collide. Imagine that your algorithm just misclassified a cat as an “airplane” or as a “lion”. You need to modify the coefficients so that they move the answer away from this result a bit, and more toward “cat”. Do you move them equally from “airplane” and “lion” or is “airplane” somehow more wrong? To capture this notion of different wrongnesses, you use a loss function that can numerically encapsulate just exactly what it is you want the network to learn, and then you take bigger or smaller steps in the right direction depending on how bad the result was.
Let that sink in for a second. You need a mathematical equation that summarizes what you want the network to learn. (But not how you want it to learn it. That’s the revolutionary quality of applied neural networks.)
Now imagine, as happened to Google, your algorithm fits “gorilla” to the image of a black person. That’s wrong, but it’s categorically differently wrong from simply fitting “airplane” to the same person. How do you write the loss function that incorporates some penalty for racially offensive results? Ideally, you would want them to never happen, so you could imagine trying to identify all possible insults and assigning those outcomes an infinitely large loss. Which is essentially what Google did — their “workaround” was to stop classifying “gorilla” entirely because the loss incurred by misclassifying a person as a gorilla was so large.
This is a fundamental problem with neural networks — they’re only as good as the data and the loss function. These days, the data has become less of a problem, but getting the loss right is a multi-level game, as these neural network trainwrecks demonstrate. And it’s not as easy as writing an equation that isn’t “racist”, whatever that would mean. The loss function is being asked to encapsulate human sensitivities, navigate around them and quantify them, and eventually weigh the slight risk of making a particularly offensive misclassification against not recognizing certain animals at all.
I’m not sure this problem is solvable, even with tremendously large datasets. (There are mathematical proofs that with infinitely large datasets the model will classify everything correctly, so you needn’t worry. But how close are we to infinity? Are asymptotic proofs relevant?)
Anyway, this problem is bigger than algorithms, or even their writers, being “racist”. It may be a fundamental problem of machine learning, and we’re definitely going to see further permutations of the Twitter fiasco in the future as machine classification is being increasingly asked to respect human dignity.
Maybe they could simply scale images instead of cropping. Yes, that would mean ignorant users who didn’t know how to manage their photos would get relatively useless results, revealing who is good at this modern tech stuff (or old skool framing of photos to produce a desired result) and who stinks. Meritocracy and all that. Someone would probably call that racist as well, but it’s a harder argument to make stick.
I for one ignore posts with horrible spelling or grammar errors if the author is using their native language. If you are illiterate, why should I feel like I’m going to learn something from you?
Or have the AI provide a crop for each detected face and then deliberately select one at random. That way in the instance where there is no single good answer, at least it’s not biased towards one over another. Or provide a visual cue in the cropped image that it *is* actually cropped and in what directions. Then people aren’t mislead into thinking that the framing was deliberately chosen by the poster.
I like this idea – as the problem isn’t that it doesn’t notice the other person. It seems to identify them perfectly if they are on their own, it just has higher confidence or some other reason to bias it towards the white face.
Still get lots of complaints about racism with that I don’t doubt.. As there are still less ethnic minorities in many cases so they would get selected less….
> If you are illiterate, why should I feel like I’m going to learn something from you?
Literacy does not reliably correlate with intelligence or ability to contribute things worth learning, to suggest otherwise is pretty messed up.
For example, there are disabilities that impede literacy but do not impede intelligence. I personally know multiple incredibly intelligent people who have extreme difficulty conveying their ideas through text. I would much rather have someone convey more of their intelligent thoughts and ideas with poorer quality writing than force them to spend an unreasonable amount of time correcting unavoidable errors and thereby contributing less to any particular conversation.
Another example is that illiteracy can be the result of a lack of formal education. Some of the greatest insights come from those without (or with limited) formal education.
Yet another example, many languages, including English, have several dialects that when written or spoken will appear to have horrible spelling or grammar to those unfamiliar said dialect.
There is another issue, and that is that there is no reliable way to determine if someone is using their first language or not without asking or having them deliberately divulge that information. This issue can lead to one assuming that poor literacy is not the result of using a second language but some other issue which you find to be indicative of a lack of intelligence.
Disregarding the contributions of these people will limit the diversity of the information sources you avail yourself to. Which is exactly the kind of problem that leads to one having messed up ideas like “literacy reliably correlates with intelligence and worth of intellectual contribution”.
I find a more reliable way to determine if someone has something of value to contribute is if or if not they espouse messed up ideas like then one you just did. That said, you have contributed multiple things of value, which I accept, because I know that absolutely everyone has things of value to contribute. Though some make less effort and fail to act in good faith compared to others.
– Nicely done, very well said.
Call me crazy, but maybe, just maybe, they should make a person crop it. Say, the person posting the tweet? It only takes a few seconds, and the human is aware of a little thing called context. You could even call this “crowdsourcing” to please the investors.
You are probably not crazy but investors will be far less pleased with this workload that is considerably higher than one might think. It’s night and day having an algorithm do this from a cost, time, workflow and theoretically even accuracy perspective. I don’t know that most people fully understand just how many posts are happening per second at this scale.
Agreed with the point of just scaling the photos. If I post a photo, it is up to me to decide what is in it or the proportions/quality/etc, not some out-of-control algorithm.
As for the rest, even we ( as humans ) are not that good in understanding and quanticizing human sensitivities, so those neural network recognition errors should be treated as what they are, just bugs. Either from a less-than-good dataset, or imperfect coeficientes. Not as something to try to earn victim points from.
Let the user choose which part of the picture to show. Using AI for that is just a waste of resources.
Absolutely this!
+10. Most of the problems we create are just from trying to cater to people’s laziness, usually trying to gain market share while at it.
The problem for twitter would be less images a second would be uploaded and they really do what to harvest the Geotagging metadata from the exif in most smartphone photos left at default settings. They want people to turn off their brains, not force them to think.
“Think of how stupid the average person is, and realize half of them are stupider than that.” – George Carlin
Twitter is not doing anything just to be nice, they are just like google, amazon, facebook, yahoo, microsoft, apple – stockpiling metadata, because it is cheap to store and in 5 -100 years time will be indescribably valuable. Cradle to the grave metadata profiles, they will even track your death if they can.
My favorite example from a long time ago was the military training an AI to find tanks in pictures. They got really good at it until they gave the AI new pictures and then it was unpredictable.
Turns out that on the training set, all the pictures with tanks were taken on sunny days and the rest on cloudy days.
This was from a PBS documentary called “The Machine That Changed the World.”
I still watch that show from time to time just to marvel at the things that they got right 30 years ago (and some are now much, much worse than they suggest) and how primitive and backwards they were at the time (their episode on the global online community was about Minitel, and dial-up).
If the users an crop their text to fit the Twitter character limits, they can learn to crop their pictures as well.
Well, I can speculate a few dozen hypotheses that do not include ‘racist’ attitudes to their owners, too. And frankly, that is not the issue. Twiiter is a company made to make profit, trying to somehow squeeze some metric of ‘fairness’ or ‘equality’ and be confident in its worth is a challenge of its own.
For example, most users are (non-african americans), pop. of most active on twitter countries, exemplifies it.
Twitter gains profit by engaging users, thus clicking on a (cropped) image is something the platform would like to maximize.
I assume the cropping algorithm is designed (through its loss function) to crop them in a way that maximizes clicking rate. And who is the one who clicks, the most? You guessed it, mostly white, non-black, users. Hence, we can conclude (assuming the above hypotheses hold to a good extend), that indeed, the problem is far worse than the algorithm, and ingraned in the institutions best interests (profit).
So the question is, is it really a technical problem or an institutional one? How is this institution likely to react? Maybe, the trend of calling tech “racist” and such, is one more way gaslight from the fact that these issues exist and always existed and reinforced either with non-racist objective conditions (fact that african americans are a minority and thus less cost effective to have equal grounding) or by a latent systemic issue that requires a lot more though and reform than critiquing a seemingly racist behaviour of an algorithm.
I find the automatic cropping of images upon upload very irritating and often perfectly crops out the most important part of the image, or removes the context. You should always be presented with a crop box when uploading an image / choose thumbnail style window. Why would you every use AI for that when cropping in a web UI has been commonplace for over a decade?
I’m beginning to see how the Human vs Robot war begins. Our “Natural Intelligence” will always detect some imagined implicit bias in the “Artificial Intelligence”, no matter how unintentional it is.
I can imagine a world with educational AI devices cheap enough for every child to have one, but they’ll have to be smart enough to detect the natural differences in individual thought processes, as well as be able to handle irrational, or emotional responses effectively, and be able to determine causal differences like trauma or mental illness. But it must be able to imbue trust in the student, and be able to detect and explore any perceived flaw, whether real or imagined by the student, and explain itself. It will have to be friend, teacher, and psychologist all in one. Lofty goals.
I like the idea of just letting the user be in charge of what is in the image, and scale it correctly. And if it isn’t scaled right, then twitter just scales it to fit.
Never the less:
AI and computer learning is still in it’s infancy and the only way to for it to grow is for it to continue to be used and updated as time goes on. So, yes there are going to be errors as we go through this learning curve; and yes, in this case we need to make it better at recognizing people, animals, plants, and objects, but I also think that people should just get a life and stop applying human bias to a computer. Enjoy the ride and laugh at the mess ups on they way.
Real problem is, it’s not really an AI at all. It would pick lemon instead of an avocado, because it’s simply a computer program using contrast.
Considering the fact that a few changes to pixel values can make a NN see a panda as a monkey I am pretty sure there will always be false classifications and if you don’t take the google route and stop your NN from ever producing outputs that might in some cases be racist I don’t see how this problem can be solved.
By improving classificators and reducing false-positives you can get less of those results, but as with any real-world process there will always be noise and randomness and sooner or later this randomness will get you and will ruin your day.
Maybe it is time to think about whether a computer calling a human a monkey should spark the amount of outrage that it did.
I am not saying the result shouldn’t be looked at to figure out what went wrong, but just in the same way a plane being called a human would be analysed. And calling a human a plane is objectively “much more wrong” than calling a human a monkey (we share a common ancestor after all). Just because one is much more offensive than the other should not make it worse (in terms of loss function).
Right you are, and making the function ‘worse’ on calling a human a monkey than calling them something stupid because that mistake is offensive to some doesn’t help in the long run. You have to feed these networks useful feedback so they tend towards better results effectively. It can’t possible learn correctly if you keep futzing with it because that near miss mistake is unacceptable.
Reminds me of the 35mm film industry, they said the same things about their biased films and how the science is impossible to show dark skin tones correctly till the chocolate companies complained that the photos of light chocolate are too dark and furniture companies had slimier problems with different wood types. They listened to them and solved the color range issues and they released the more expensive dynamic range films. The other affordable films that works best with lighter skin color they wrote on the box (normal skin color).
Look it up.
I think the “much worse” part of this problem is actually if they are using click-through rates to train their neural network. That is, you select the top two (say) crops, and randomly A/B test them against users. The one with the higher ‘engagement’ (probably not ‘click on image’ because that probably correlates to a *bad* crop, but replies/retweets/amount of time looking at that particular tweet before scrolling past) gets fed back into the network as the ‘better’ crop.
If that sort of algorithm is at work, twitter’s crop is bad precisely because it is reflecting an underlying unconscious racial bias in its readership. That’s pretty hard to correct, and as the youtube recommendation algorithm problems demonstrate, that sort of bias tends to be self-reinforcing.
So in the scale of badness, I ‘hope’ that this is simply inadequate training data on non-white faces. That’s the easiest problem to fix.
If they are using that sort of feedback from the users to suggest which is the better crop ‘racism’ is a certainty as the population you are receiving feedback from isn’t even close to an even split. And people tend to naturally lean towards things that look more familiar – which is perfectly understandable and not at all racist in itself. But will always bias the results towards whichever face looks most like most of the users….
I’m not even sure if I think that is wrong… As its not at all racist as an algorithm in that case, nor is it bad that user feedback promotes ‘better’ images.. But on the other hand its hard to be different and feel ignored, and doesn’t help create the appearance of balance or ‘correct’ the users subconscious into finding all skin types etc familiar (Which I am not sure is a good idea either – the idea of computers deciding and pushing automatically somehow seems worse than if a human is doing it to me).
Maybe the crop they show you should be selected based on your own profile – which images do you linger over.. So the computer shows you what you apparently want to see even if you are a minority… All that user tracking and preference data that can never be misused… right?…
If they’re using click through rates, that might explain cases where the crop algorithm focused on women’s chests…
Nice title in terms of click bait coefficients. But the article actually invalidates it. All the things mentioned are part of the algorithm. A more accurate title could be “it’s not the algorithm, it’s much worse … It’s the algorithm”.