Giving Stable Diffusion Some Depth

You’ve likely heard quite a bit of buzz over the last few months about Stable Diffusion. The new version (v2) has come out, and in addition to the standard image-to-image and text-to-image modes, it also has a depth-image-to-image that can be incredibly useful. [Andrew] has a write-up that guides you on using this mode.

The basic idea is that you can take both an image and depth into the model, which allows you to control what gets put where. Stable Diffusion is a bit confusing, but we already have some great resources to wrap your head around it. In terms of input, you can use a depth map from a camera with lidar (many recent phones include this) or have another model (like MiDaS) estimate it from a 2D picture. This becomes powerful when you can preserve a specific composition, such as an iconic scene from a well-known movie. You can keep the characters’ poses on the screen but transform the style of the scene into whatever you wish (as seen above).

We have already covered a technique to generate textures right in blender, but this new depth information has already been implemented to provide better accuracy of the textures.

[Justin Alvey] used it to create architectural photos from dollhouse furniture. Using the MiDaS model, he estimated the depth and threw away the RGB aspects by setting the denoising strength to maximum. The simplified dollhouse furniture was easily recognizable to the model, which helped produce great results.

However, the only downside is that the perspective produces a rather dollhouse feel. Changing the focal length and moving farther away helps. Overall, it’s a clever use of what the new AI model can do. It’s a fast-moving space, so this will likely be out of date in a few months.

 

55 thoughts on “Giving Stable Diffusion Some Depth

  1. Still no discussion about the source material used to train the machine learning model. And how it effects those who’s source material it is.

    Copyright and associated licensing is quite important in our society, not just for art. But likewise is it important for any other work. Be it sound, video or source code or even the compiled result of said source code.

    That non of these articles seems to even mention the legal ramification surrounding the gathering of publicly available data for use in generative machine learning applications is frankly a bit concerning.

    There is an exception in the USA and the EU about search engine optimization regarding data mining, and this makes it fair use on the grounds that it helps people find the source material since the source material itself isn’t provided other than saying where it is and how much of relevance it is to what one tried to find.

    This fair use exception of copyright is often stated as the reason why any machine learning system can use publicly available data in its training dataset without prior permission. (the EU copyright directive uses a different term for fair use, but in essence it is the same thing with more or less the same exceptions.)

    Even if generative machine learning systems creating content of their own don’t help people find the original, but rather directly competes with the author of the original. So this ground for fair use seems misapplied in this application to say the least. (and ignoring the reason why an exception where relevant is quite legally dangerous, since then we wouldn’t be far from castle doctrine allowing indiscriminate murder.)

    At least OpenAI’s/Microsoft’s copilot system made for Github is having a lawsuit against it currently, so some discussion is happening on that front. But that is frankly not bringing much light to the issue at large. Mainly since copilot has many cases of making exact copies of people’s source code including comments, so that is somewhat decent grounds for it being copyright infringement. Visual arts is more nuanced however.

    The rather distinct difference between:
    1. An ML system used for categorization/analysis of data (mainly content recognition)
    2. An ML system used to generate content. (art/audio/text/code/etc generation)
    Is something we should probably discuss more often, especially as far as copyright/licensing of source material for the training dataset is concerned.

    Personally, I am of the opinion that any ML system used for content generation should explicitly rely on authorized/licensed data for its training dataset. However, Pandora’s box has already been opened.

    1. I think the copyright violations happen at the point of distributing the training datasets to the public. Any other information gleaned off of the pictures is not subject to copyright as it is not the original pictures nor replications thereof.

      Re-creating (parts) of the original images then becomes a copyright violation again, but mainly if they are distributed to the public. A private copy is of course technically in violation (but exempt for legally downloaded content), but who’s going to check?

      I personally think that copyright is outmoded and obsolete, and ill-conceived in the first place. It is mainly aimed at securing distributor profits, not “rewarding” authors for creating new works although that is a side effect when the author is established enough to act as the distributor or secure a significant percentage of the profits otherwise. That only happens for the really famous and rich authors – the rest actually suffer because people are forced to allocate their spending on the few top artists due to their inflated prices, so they become reluctant to pay any more to the smaller guys.

      I believe that creative effort has its value and price, but that price cannot be determined by the number of copies. Otherwise the “value” of any creation is in principle “infinite”, and that does not make any sense. It has no connection to the effort and labor they put out. The brouhaha about copyright is simply due to the fact that everyone wants to gamble becoming superstars and earning millions – whether they truly deserve it or not – and the biggest supporters are the labels and distributors that would see their monopoly profits vanish in thin air.

      1. To the question of, “how then do you fairly compensate the authors?”, there is a simple answer. Think about: how do I keep my favorite artist in business, to make more of the stuff I like?

        You pay them money. Doesn’t matter who copies it or who distributes it, you have to pay them money or they go out of business. You subscribe to their Patreon feed or whatever it takes and ignore the whole copyright mess entirely. The artists in turn have a responsibility for accountability, so they’re not pulling in money and “donations” for simply sitting on their asses.

        1. Counterpoint: From what I can tell, most people who are into AI art see it as an ALTERNATIVE to digital artists, not a supplement to their career. Those people will not WANT to pay the artists whose work they crib from. So unless people convince the government to legislate royalties for the use of their art in these datasets, this isn’t going to happen.

          Also, if you think artists are going to “sit on their asses” en masse while being paid for AI art, you are seriously, bafflingly misinformed as to why they got into digital art in the first place.

          1. “From what I can tell, most people who are into AI art see it as an ALTERNATIVE to digital artists”
            This has not been the case in my experience as an active participant in several generative art communities. “AI art” (to use your phrasing) is a new tool that makes creating art more accessible. Realizing a picture you have in mind through SD requires human effort, in as much as capturing a photo does. The more work you put in, and the more knowledge and experience you have in the field, the better result you’ll get out. Photographers haven’t been put out of the job because of smartphone cameras, but the standard expected to be considered “professional” has absolutely been raised.
            If artists chose to integrate this technology into their workflow, it’ll only elevate their potential. If more people become artists, even if they only do it as a hobby, that’s a good thing.
            “Those people will not WANT to pay the artists whose work they crib from.”
            No more than any other artist who used someone else’s art as inspiration. “Cribbing” would fall under plagiarism if they claimed work too heavily “inspired” as their own, and the same standard is true of someone tracing another’s art. Fan art is NOT plagiarism, however, and it is the lifeblood of the indie art community. If you’d like to argue that fan art is unethical, that’s your prerogative. Most don’t consider it that way.

          2. I’m not talking about artists using AI to generate art. I’m talking about in general, because I see many people abuse their fanbases on services like Patreon, where they beg and plead and make up all sorts of excuses and only throw in minimum token effort – basically performing online panhandling. It’s a bit sad to see people fall for that, and there should be better platform models than just “Pay me monthly and I’ll do whatever I want.”

          3. > If more people become artists, even if they only do it as a hobby, that’s a good thing.

            The real (economic) value of anything is the amount of labor that is necessary to produce it. That is, it should only cost as much as is necessary to have someone produce it. AI simply makes decent quality production cheaper by enabling more competition in the field, instead of a few experts who are constantly overworked and priced up to the point that 99.9% of the potential customers simply go away. There’s a lot of latent demand because there’s no point in commissioning art when it costs a thousand dollars a piece.

            There will still be a market for completely original works, concept work, etc. since the AI can’t make what doesn’t already exist.

          4. > The real (economic) value of anything is the amount of labor that is necessary to produce it. That is, it should only cost as much as is necessary to have someone produce it. AI simply makes decent quality production cheaper by enabling more competition in the field, instead of a few experts who are constantly overworked and priced up to the point that 99.9% of the potential customers simply go away.

            You have the wrong idea about the valuation of digital art. If the artist is doing their work as a full- or part-time job, as opposed to being a hobbyist, they have to receive payment that covers the cost of living. At the risk of sounding dramatic, artificially driving down the price of digital art is equivalent to taking food out of their mouths.

            I agree there will still be a demand for human-made art, but it concerns me how cavalier some people are being about the *livelihoods* of these artists, especially when the contention starts with copyright infringement. That includes people who put in “minimum token effort.” If you don’t want people to pay their Patreon fees, then say so, but stealing IP is not a morally correct solution.

          5. > This has not been the case in my experience as an active participant in several generative art communities.

            Well, it’s been my experience in the non-generative art community. I don’t know what to tell you. I mean, I would love it if it weren’t the case, but it can’t just be ignored.

            > Fan art is NOT plagiarism, however, and it is the lifeblood of the indie art community. If you’d like to argue that fan art is unethical, that’s your prerogative. Most don’t consider it that way.

            Fan art doesn’t come at the expense of devaluing the original artwork. AI art does, either by loss of commission, or by directly monetizing the derivative work, or both. I mean, imagine what will happen if we get to the point where people could pump out entire AI-generated episodes of some animated show like SpongeBob SquarePants, and toss them onto YouTube for unsupervised kids to watch instead of the actual show. Nickelodeon would be fully within their right to sue for that, wouldn’t they? So why not start treating AI-generated “fan art” the same way?

          6. “Well, it’s been my experience in the non-generative art community.”
            Perhaps in a community that feels threatened and wronged by another group (whether the feeling is valid or not), the perception of that other group might be a bit biased – that’s all I’m suggesting.
            “AI art does, either by loss of commission, or by directly monetizing the derivative work, or both.”
            The AI doesn’t work for itself. The “loss of commission” is either due to the person who might’ve otherwise commissioned the art doing the work themselves, or another digital artist using generative art. Competition. Fan art by commission is already literally “directly monetizing the derivative work”. Whether it’s by a digital artist using a Cintiq and Photoshop, or a digital artist using Stable Diffusion and Krita, those artists are directly monetizing derivative work.
            Also, knockoff cartoons on youtube has been a huge issue for years. Youtube “kids” content is rampantly abused with production line/generated content, and has been long before generative art became accessible.

          7. > Perhaps in a community that feels threatened and wronged by another group (whether the feeling is valid or not), the perception of that other group might be a bit biased – that’s all I’m suggesting.

            “Biased” and “incorrect” are not synonymous.

            > The “loss of commission” is either due to the person who might’ve otherwise commissioned the art doing the work themselves, or another digital artist using generative art. Competition.

            “Competition” and “intellectual property theft” are not synonymous.

            > Fan art by commission is already literally “directly monetizing the derivative work”.

            Which is illegal without express permission from the copyright owner.

            > Also, knockoff cartoons on youtube has been a huge issue for years. Youtube “kids” content is rampantly abused with production line/generated content, and has been long before generative art became accessible.

            Not just “knockoffs,” I’m talking *visually indistinguishable.* And I’m aware of the problems with Youtube Kids. I am not going to be any more accepting of the situation at the point when AI art is capable of exacerbating the scale of the issue. You’re not answering the point when you say the problem already exists in a different form.

          8. >are not synonymous.
            Is not a counter-argument.

            >Which is illegal without express permission from the copyright owner.
            Just to be clear, are you suggesting that ANY fan art by commission WITHOUT the copyright owner’s permission is illegal?

            >I am not going to be any more accepting of the situation at the point when AI art is capable of exacerbating the scale of the issue. You’re not answering the point when you say the problem already exists in a different form.
            My answer is that the problem isn’t related to the tools involved. It will always be the human committing the violation. If we were to outlaw tools that could be misused, where exactly should we start? Photoshop and other digital photo editing suites vastly simplify the process of removing watermarks or manipulating copyrighted works for derivative use, should they go too? Give it a couple minutes thought and see how many tools we take for granted that CAN be misused, and the implications of trying to build a legal framework to regulate those.

          9. >Is not a counter-argument.

            Neither is yours, that’s my point.

            Just to be clear, are you suggesting that ANY fan art by commission WITHOUT the copyright owner’s permission is illegal?

            They are only legal if they’re distributed for free. You can’t require people to pay for it. This is established law, look it up.

            > My answer is that the problem isn’t related to the tools involved. It will always be the human committing the violation. If we were to outlaw tools that could be misused, where exactly should we start? Photoshop and other digital photo editing suites vastly simplify the process of removing watermarks or manipulating copyrighted works for derivative use, should they go too? Give it a couple minutes thought and see how many tools we take for granted that CAN be misused, and the implications of trying to build a legal framework to regulate those.

            This is not that difficult a question. It’s not about what the tool can do, it’s about whose work it’s being used on. Stable Diffusion and other algorithms should not be trained against works they don’t have permission from each respective artist to use. It’s not ethical. If the algorithm is “outlawed” AS A CONSEQUENCE of protecting people’s work from unauthorized use (rather than being targeted solely as a tool), then so be it.

          10. >Neither is yours, that’s my point.
            1. My point was simply that you’re opining on the perception of a community you are explicitly not a participant in, and in fact are openly hostile to, and that it might be worth considering whether your opinion is biased.
            2. You seem to have completely misunderstood, or perhaps intentionally misinterpreted, my statement. All I said was that the “lost commissions” would in fact not be lost at all, the work would just be done by someone else. I wasn’t even discussing IP at all, which would remain the same in either case – if the person has permission from the IP holder, it’s legal. If not, it’s not. No matter how the work is done.

            >They are only legal if they’re distributed for free. You can’t require people to pay for it. This is established law, look it up.
            Since you asked, I did. You’re wrong. Even distributing it for free isn’t legal. https://youtu.be/xKBsTUjd910

            >This is not that difficult a question. It’s not about what the tool can do, it’s about whose work it’s being used on. Stable Diffusion and other algorithms should not be trained against works they don’t have permission from each respective artist to use.
            Agree to disagree. Publicly published content is publicly published. Using it in a way that is legal and transformative, and not copying or redistributing it in any way, is not the binary ethical scenario you claim.

          11. > My point was simply that you’re opining on the perception of a community you are explicitly not a participant in, and in fact are openly hostile to, and that it might be worth considering whether your opinion is biased.

            It is biased. I don’t care. If you want to prove me wrong take a survey or something instead of claiming your anecdotal evidence proves that YOUR perception is the correct one. You’re biased too, don’t claim you aren’t.

            > You seem to have completely misunderstood, or perhaps intentionally misinterpreted, my statement. All I said was that the “lost commissions” would in fact not be lost at all, the work would just be done by someone else.

            It’s lost to the person who did the work by hand, and, dare I say it, the person that deserves it more. I am not talking about the entire concept of economic demand, I am looking at individuals. The capitalist idea that a person is more deserving of a certain amount of income only because they can take advantage of someone else’s output doesn’t fly with me, especially when it comes to artistic pursuits.

            > Since you asked, I did. You’re wrong. Even distributing it for free isn’t legal. https://youtu.be/xKBsTUjd910

            Touché. But this doesn’t help your point. You asked if I was “suggesting that ANY fan art by commission WITHOUT the copyright owner’s permission is illegal,” and now we know the answer is yes, so there.

            The majority of fan art passes notice by the copyright holders due to the implicit agreement that unmonetized works don’t get penalties. But artists DO see AI art generated without consent as equivalent to “plagiarism” because generally the idea is to replicate the stylistic qualities of the original works without actually having the artist craft it. If that aforementioned “implicit agreement” isn’t present for AI art, what’s to stop the original artist from suing the people who are either making the derivative works or maintaining the dataset?

            > Agree to disagree. Publicly published content is publicly published. Using it in a way that is legal and transformative, and not copying or redistributing it in any way, is not the binary ethical scenario you claim.

            If the idea is to approximate the aesthetic qualities of the original work, it doesn’t meet my definition for “transformative.” And what I’m arguing is that it SHOULDN’T meet the legal definition either. Not because I’m arguing that IP law should protect “style” as a concept, but because on an ideological level, the intent is to commit forgery and gain income that would otherwise go to the original artist.

          12. Let me clarify something in my last comment since I can’t edit it: When I say “the idea is to approximate the aesthetic qualities of the original work,” I mean specifically “doing so to such a degree that the original artist is seen as ‘replaceable.'” I am aware some fan artists make art which approaches the exact visual qualities of the works they are emulating, but if the idea is to do so as a means of SUPPLANTING the original work, that is what I see as morally wrong.

          13. >You’re biased too, don’t claim you aren’t.
            Of course I am. I’ll happily admit I have an opinion, and not a fact.

            >It’s lost to the person who did the work by hand, and, dare I say it, the person that deserves it more.
            The person who “did the work by hand” never did the work. Do you boycott assembly line products (the device you’re writing your reply on, for instance) because artisans missed out on the opportunity to build the device by hand? Do you boycott agriculture because hunter gatherers missed out on subsistence living?

            Do you boycott digital artists because they didn’t use artisanal paints on a hand-made canvas with a hand-made brush?

            I am not talking about someone commissioning a copy or imitation of someone else’s work, just the act of commissioning a piece of art in general. The “person who did the work by hand” didn’t do anything at all in this scenario – they were simply not commissioned.

            >But this doesn’t help your point.
            Actually, this reinforces my point. The IP violation happens independent of the tool used to do it. It doesn’t matter if someone paints a forgery by hand, by stylus or uses generative art – IP law applies the same to each case. Copying someone else’s work is equally illegal in any case.

            >But artists DO see AI art generated without consent as equivalent to “plagiarism” because generally the idea is to replicate the stylistic qualities of the original works without actually having the artist craft it.
            Generative art isn’t designed to replicate INDIVIDUAL STYLES. Can it be (mis)used to? Absolutely. But the intent is to learn from as many examples as possible to be able to create new output. If all you want to do is copy someone else’s work, literally copying the file and redistributing it. You could also trace the art. Both would be easier than accurately reproducing the original piece in stable diffusion (img2img is comparable to photoshop filters, I’m only talking about txt2img which is the main use case of generative art).

            >what’s to stop the original artist from suing the people who are either making the derivative works or maintaining the dataset?
            For copyright infringement? Nothing, if they violate copyright. For accessing publicly displayed works? The lack of a legal precedent.

            >If the idea is to approximate the aesthetic qualities of the original work
            But that’s not the idea at all. Generative models aren’t designed to replicate their input – as I’ve said before, there’s much easier ways to do that. They are designed to generate new content. That’s why they need big data for their datasets. The more samples used in training, the more diverse the model’s abilities.
            Going back to a tired example I used earlier, it doesn’t learn “Mickey Mouse”, it learns that the string “Mickey Mouse” is associated with rounded cartoony anthromorphic mouse/human characters with black and white coloration and wearing red clothing. Those are all stylistic elements that can be invoked independently and are the sum of all examples of, for instance “red clothing” in the training dataset.
            There are techniques like finetunes, embeddings, hypernetworks or LoRAs that can be trained on a smaller set of images to better reproduce a specific style, character, pose, background, etc – but those are additional tools on top of the original generative model and not part of for instance Midjourney or Stable Diffusion itself. In the case someone created a finetune/etc to imitate a specific artist or character, they would be subject to the same laws as any other fan art.

            >the intent is to commit forgery and gain income that would otherwise go to the original artist
            Again, see above. Generative art is not intended to create forgery or duplicate works from it’s training dataset – there are far simpler ways to do that, and being generative art doesn’t except it from any existing laws on copyright infringement.

          14. >The person who “did the work by hand” never did the work. Do you boycott assembly line products (the device you’re writing your reply on, for instance) because artisans missed out on the opportunity to build the device by hand? Do you boycott agriculture because hunter gatherers missed out on subsistence living?

            I mean, I’m certainly not in favor of sweatshops. There’s plenty about that sort of industry I think needs to be changed. With computers and phones and such, I accept the current situation in so far as I recognize a universal boycott of mass-produced products actually harms my chances of effecting change because of their ubiquity. Art doesn’t have that problem, I don’t think.

            >Do you boycott digital artists because they didn’t use artisanal paints on a hand-made canvas with a hand-made brush?

            >I am not talking about someone commissioning a copy or imitation of someone else’s work, just the act of commissioning a piece of art in general. The “person who did the work by hand” didn’t do anything at all in this scenario – they were simply not commissioned.

            They did all the work that came before, of creating images which compose the dataset, without which the AI would be less- or un-able to function. A digital artist’s work is easier than that of a painter, but they are not so crippled by the mere idea of relying on their own training. Look at the trees, not the forest. The artist who made the original work should be compensated. They were not “commissioned,” but their labor was still what enabled the derivative to be completed.

            >Actually, this reinforces my point. The IP violation happens independent of the tool used to do it. It doesn’t matter if someone paints a forgery by hand, by stylus or uses generative art – IP law applies the same to each case. Copying someone else’s work is equally illegal in any case.

            I’m not following why you think this supports your point at all.

            >Generative art isn’t designed to replicate INDIVIDUAL STYLES. Can it be (mis)used to? Absolutely.

            And one way or another we are going to have to deal with this. Why are you so averse to discussing it? You’re going on and on to me about all the things that are irrelevant to my argument. I am aware that there is some subset of people who use Stable Diffusion or other AI solely on their own work. But when the potential for misuse is INNATE to the technology in a way not shared by any prior medium, including Photoshop, we need to lay down ground rules. You say it’s not designed “to replicate individual styles” but it is designed to replicate. (Your Mickey Mouse example proves that they CANNOT generate “new content” because whatever they make is ONLY that which exists in the dataset, both conceptually and visually. A dataset which consists only of flowery meadows will not output the Millennium Falcon.) There should be protections in place for people who don’t consent to that, and compensation for those that do. The work being public, or the ways in which red clothing constitutes a dataset, are not adequate counter-arguments.

          15. >………….are not adequate counter-arguments.
            If you’re of the opinion that tools that can be misused should be banned, that hill is all yours to do with as you please. I’ve said my piece.

          16. Tools which are INNATELY DESIGNED TO ACCOMMODATE misuse. And for the record, I have not at any point brought up the idea of banning the concept altogether. I have argued that those whose work is used should be compensated, and that any “outlawing” that occurs will be the fault of dataset curators who either willfully or negligently fail to comply with such regulations. I don’t believe this to be an unreasonable request, “big data” and “diversity” notwithstanding, and even so, I don’t see progress in AI development as being more important or morally desirable than protecting artists’ work anyway. In my opinion, your equivocation of the two suggests to me an ulterior motive to which I am very much opposed.

      2. Basically, all the copyright royalties etc. are just bilking. I have to pay money because someone else is playing songs on the radio that I don’t care about, and that someone is paying because I’m listening to music I like, or looking at pictures I enjoy and they don’t. Even the price of bread is ever so slightly higher because the supermarket is playing Christmas jingles that are copyrighted. What’s the point in that?

        1. I won’t say that you are completely incorrect. But a bit short sighted perhaps.

          Copyright isn’t just about art. Or music. Or even video.
          Copyright is also for text, code, software in general, and so forth.

          Undermining the foundation of copyright in one area (for an example art) also undermines the foundation of copyright in other areas. (like software)

          Now, a lot of people don’t in themselves care about copyright, since it “doesn’t” directly bring them anything positive from it.

          But the thing is that most people that are affected by copyright infringement are also the people that put in effort in their respective fields. Enthusiasm doesn’t grow on trees, it grows on dependable means of continuing personal efforts in the thing one is enthusiastic about. And in our capitalistic world, that is money. Be it donations, or sales of services, or royalties, or employment.

          Without enthusiasts, a lot of development will not happen. Since most people won’t consider it worth while to waste their time on. Especially if their discoveries gets absorbed into some dataset and then mindlessly accredited to the AI overlords and their associated curator. (And yes, I couldn’t help but slip that meme in there…)

          The long term ramifications of machine learning is so far debatable.
          But the general sentiment that generative ML systems using source material without authorization is “fair use” should probably be reconsidered.

          1. I learn from everything i see on the Internet on a daily basis. ML does that in an automated way, so i am not so sure about your copyright claims. We all get inspired by the work of others and from that inspiration we make new tech, tools and even art sometimes. Copyright is a very two edged sword and it certainly isn’t always used for the right reasons.

          2. >Copyright is also for text, code, software in general, and so forth.

            Copyright started from text. The original point was that publishers were granted a monopoly to print by the king, for which they paid money, hence, “royalties”. The king dictated who printed which book, and which books were not to be printed. Once that system went away, the publishers argued that the original authors had “natural rights” – which of course could be bought for a penny or else the author could not find a publisher to sell any books. It was from the very start a means of publishers to monopolize content – not about encouraging or fairly compensating the authors, and it is still mainly used for the same effect today.

            And, name a software patent that wasn’t about trying to trip your competition and monopolize some trivial idea, like the name of the Recycle Bin in your computer, or round cornered icons in your phone. Software wasn’t even considered copyrightable until cases like Apple getting a court decision to copyright their BIOS to kill mac clones off the market and frustrate attempts at making compatible hardware – again a clear case of legalizing a monopoly to jack up prices. Software patents are a highly contested idea in the first place.

            Of course the ability to monopolize creates “enthusiasm”, because it’s basically a license to break capitalism – which ideally does not have monopolies. The point of capitalism is that there is some need and demand, so prices will rise to the point that someone will do the job, and no more. If you want it, you have to figure out a way to pay for it. Copyrights don’t add anything to this, they merely enable the corporations that own the copyright to extract greater prices than necessary.

          3. Or, things like news outlets making copyright claims over Google scraping their content and bypassing the website. The whole problem is born due to the fact that these outlets depend on ad-revenue from visitors to their site, which they nominally offer “for free”, so they a) depend on Google to direct traffic their way, b) rely on Google to display ads on their sites. It’s their choice of a business model to just throw the content out there, no subscriptions or anything, and expect to get paid as per course.

            The issue is, funding a news site by ads bypasses the normal supply-and-demand calculation by again spreading the cost of the service over a public which largely isn’t even interested in the particular site, isn’t aware of how much they’re paying, or even aware that they’re paying anything at all. There’s no negotiation between the supplier and the consumers, yet the price of bread in your supermarket is ever-so-slightly greater by some never-heard newspaper in Timbuktu that gets ad revenue from Google, while Google sits in the middle and extracts profit out of this transaction. It has created a situation of over-supply where two thousand websites publish the exact same news with no original input and still claim monopoly over it.

            Again, what’s the point?

      3. “I think the copyright violations happen at the point of distributing the training datasets to the public”
        And this is the misunderstanding. Nobody distributes the training dataset. Huggingface keeps tags, metadata and URLs (https://laion.ai/blog/laion-5b/ – read under “Download the data”) of the images publicly available. If someone wants to build a pre-trained model off the laion-5b dataset, they can download the metadata/tags/URLs from huggingface, then download the images from the public-facing internet themselves. This is what happens when stability.ai trained Stable Diffusion.

        Once a model is trained, it DOES NOT CONTAIN ANY IMAGE DATA WHATSOEVER. It contains associations of tokens with noise patterns, and nothing else. If you downloaded stablediffusion2.0.ckpt for instance, none of the training dataset is contained in the pre-trained model.

        IP law doesn’t have an answer to this. Stability.AI accessed public facing web content, trained Stable Diffusion on it, and released the pre-trained model that DOES NOT CONTAIN any copyrighted material.

          1. “At the point when it enters the dataset in the first place.”
            You’re going to have to explain your reasoning on this. Accessing an image posted by the IP owner publicly on the internet isn’t a copyright violation in any jurisdiction I’m aware of. And if the image wasn’t posted without the IP owner’s consent, the violation happened when someone posted it without permission.

            The difference between a person using a web browser to click a link and a script loading the URL is not a legal differentiation, and would be a nightmare to legislate (Does a user have to manually click every single image file URL manually, completely breaking the user experience of the internet? What about content aggregation services like Twitter that show a preview of content on another platform? How much human agency would the law require to determine if the use was legitimate or not?). If you want to make an argument about legislating this, you need to actually consider what a law would require.

          2. > The difference between a person using a web browser to click a link and a script loading the URL is not a legal differentiation

            The difference is intent. That much should be obvious. “Dude” said no original content was being distributed against copyright, but the internal process of using said art, to produce a work visually similar to the original, can still be prosecuted if the person generating the derivative work attempts to make a profit off of it. This is why fan artists are legally obligated to distribute fan art for free, even though in most cases the visual difference is even more apparent compared to AI art.

    2. “fair use on the grounds that it helps people find the source material since the source material itself isn’t provided”
      And here’s where your argument falls down. Diffusion models like SD do not store any of the source material, and the source material is NEVER given to the end user as an output. The laoin-5b dataset is “240TB in 384″(x384 resolution images). The compiled models are between 2-8GB typically. It is literally impossible for them to contain all the source material, or even a significant fraction of it.

      A more accurate comparison of how a pre-trained diffusion model works is that it has “learned” that a token is associated with a specific noise reduction pattern. The ‘tokens’ “Mickey Mouse” might be associated with patterns like round shapes, black and white colors, mice, humans, gloves, red suspenders, etc. Discussing how CLIP parses tokens is a subject in it’s own, but the point I’m trying to make is that the pre-trained model doesn’t know what Mickey Mouse is – it simply associates those tokens with styles, colors and shapes. Reproducing a source image exactly is for all intents and purposes impossible – you could carefully and deliberately prompt one exceedingly similar to the original but it would never be 1:1 accurate because it is a *new* image (unless the original image is also AI-generated using the same model).

      I understand the concern about “stealing jobs” and “devaluing work”, but legally and technically this is actually more abstracted than arguing photographers are stealing the work of landscape and portrait artists. IP law is not set up for this and absolutely needs to change, but the people discussing it need to actually understand the technology and not be acting on ignorance or intentional misinformation to create GOOD policies.

      1. I have never stated that the ML system contains the source material.

        What I have stated is that data mining for search engine optimization and content recognition has a fair use exception under US copyright law. (And the EU copyright directive) And yes, Google among others have been sued on this before, and web crawling is in itself “illegal” if a site states that web crawlers aren’t welcome.

        But doing data mining for a generative ML system is likely not falling under the definition of search engine optimization, nor content recognition.

        One can though argue that most websites perhaps shouldn’t let a random IP address just access a URL for a file unless said IP address either has a valid session cookie or otherwise expressed fair dealing. (especially if the “uninvited” visitor pesters the servers with many requests. But a lot of anti DDoS services tends to limit this.)

        “I understand the concern about “stealing jobs” and “devaluing work”, but legally and technically this is actually more abstracted than arguing photographers are stealing the work of landscape and portrait artists. IP law is not set up for this and absolutely needs to change, but the people discussing it need to actually understand the technology and not be acting on ignorance or intentional misinformation to create GOOD policies.”

        This paragraph I can strongly agree with.
        A lot of court files I have read on the subject are often highly skewed towards ML systems being the best thing since sliced bread.

        I think this court file is particularly egregious: https://www.uspto.gov/sites/default/files/documents/OpenAI_RFC-84-FR-58141.pdf
        It is skewed in OpenAI’s favor to the point I wonder if someone bribed the US patent and trademark office. At least it has the benefit of the doubt, since it is from 2019. But much legal development haven’t happened since.

        At least there is a new lawsuit against Github’s copilot. (that is also made by OpenAI.) Since apparently the Open source crowd doesn’t enjoy seeing copilot spit out their GPL licensed code (word for word including comments) to the commercial sector among others. (Copilot doesn’t even attempt basic clean room practices when it comes to “copying” code.)

        1. There are laws on “scraping”, and robots.txt has existed for decades. To the best of my knowledge, in neither the US or EU is the intended use of the scraped data relevant to the legality of it. Whether it’s to create a database for a search engine or train an ML model makes no difference, legally. Copying and distributing copyrighted content is absolutely illegal, accessing and viewing it is not.

          The content used in laion-5b is publicly accessible and scrapable without ignoring robots.txt or putting unnecessary burden on the hosting servers.

          This isn’t a case of the Laion Project or Stability.AI breaking the rules or even being immoral – it’s a case of people publishing their content publicly without understanding the implications of it. You won’t find content stolen from behind paywalls or in protected members-only areas, in the dataset.

          Also, the file you linked is OpenAI’s response to the USPTO’s “Request for Comments on Intellectual Property Protection for Artificial Intelligence Innovation”. It’s literally OpenAI. Of course it sounds like it was written in OpenAI’s favor. The USPTO made an open request for anybody to submit their comments. You could’ve submitted something too. https://www.federalregister.gov/documents/2019/10/30/2019-23638/request-for-comments-on-intellectual-property-protection-for-artificial-intelligence-innovation

        2. As far as I’m concerned, whatever content you choose to put out there on the wide open internet is free game. People simply use boilerplate copyright claims to avoid “abandoning” their works in the legal sense (you may forfeit your copyright if you publish before formally claiming it), but what they are effectively doing is exactly that: throw the hook out for free, then come back to pull a copyright claim if someone takes the bait.

          1. To elaborate: all your original works are automatically copyright to you (except when “working for hire”), but if you throw it out there with no indication of copyright, without filing a formal copyright claim in advance, it can be construed as abandonment as if you effectively CC0:d it.

            Formal copyright claims take time and money, so people don’t do it for the bulk of their content. So instead of putting a (C) on every picture, websites use boilerplate text like “all content are copyright to X” etc. which is usually incorrect and there’s no way of telling directly which image belongs to whom, or to nobody.

            So, people are doing what is basically abandonment of copyright, and then come back saying they didn’t whenever someone uses their freely published content in any way.

          2. Nope. Copyright is yours when you create content.

            The only way for someone else to legally use your content is to get your permission.

            In other words:

            No explicit permission means you can’t use it.

            All the copyright marks people put on stuff are irrelevant and redundant. There is no requirement to register your content anywhere. You made, it is yours and no one else’s.

            Anyone using content made by someone else must provide proof that they have permission to use it.

            That’s the way it works – or is intended to work, at least.

            Way too many people think “finders, keepers.” It ain’t so, not by a long shot.

          3. This doesn’t involve copyright in any way. Copyright doesn’t restrict people from accessing PUBLIC content – ONLY DUPLICATING, DISTRIBUTING and MONETIZING it without authorization. At no point in the process of training a Diffusion Model duplicate, distribute or monetize the copyrighted content. If you put up a mural in public, and someone sees it, that’s not copyright violation. They can even legally take a photo of it, still doesn’t involve copyright law. They can paint a painting of your mural. Still not a copyright violation. If they attempted to sell the photos or paintings they made of your work without your authorization, THEN you would be able to argue a case of copyright infringement.

            Copyright is not designed to restrict inspiration. It only applies to replication. Trademark protection extends beyond that to things that could be “confused for” other things, and patent protection is used for protecting concepts. None of this is relevant to publicly accessible content being viewed/consumed by an ML system.

          4. >The only way for someone else to legally use your content is to get your permission.

            There is copyright abandonment, which normally happens by “explicit permission” as you said, but can also happen when the author fails to register copyright AND has published the work without any attempts at claiming copyright over it. In other words, if the author has made no contest of the public use of the work in the past, but suddenly decides to call copyright on someone, they can defend themselves in court by saying that the author has by their inaction permitted the public use of the work and therefore abandoned their copyright.

    3. People are allowed to look at existing material and form concepts, stable diffusion is the same. It’s far more transformative than many other projects which have used the definition to deflect copyright issues.

    4. “Even if generative machine learning systems creating content of their own don’t help people find the original, but rather directly competes with the author of the original.”

      There is a common misconception that AI-generated images rely on ‘an original’, under the apparent assumption that an AI-generated image is made up of snippets of source images in the same way that a human artist photobashes an image from source images. That is not how AI image generation works.

    5. Yeaaaahhhh all of this can just be no.

      Artists are inspired by each other all the time. They copy styles. Recreate styles. Meld styles with their own. All of this is done frequently and often. Shockingly when an artist looks at a painting and decides to do some of their own work in that style that don’t have to send a check to the original artist. What justification is there for why a human is allowed to be inspired by something but an algorithm, which we all agree is not just copying, can not freely be inspired by that same work.

  2. It’s an interesting topic. The major difficulty for the AI is the fact that it lacks “grounding”.

    In linguistics this refers to the idea that words need to be “grounded” on a lower abstraction level to have any meaning at all – otherwise all words are just circular references to each other. You can look up in a dictionary the meaning of a word, and all you get is a synonym or a paragraph where each word is also pointing to other words in the same dictionary – you’re stuck in a loop because nothing really explains what any of the words mean. The fact that we know anything at all is because we have direct experience of what at least some of the words mean.

    For the AI, every picture it sees is just data. Seeing a picture tagged as “cat”, it does not know what actually is “cat” in the picture – it treats all of the data as relevant: it is not grounded. Having a depth map to separate picture elements is one step in the way of giving the algorithm some sense of what it is actually seeing in that it can separate objects in an image easier than simply comparing a million pictures and finding what’s common between data labeled as “cat”.

    Does that however work to “ground” the algorithm in the same sense as what we do with words?

    1. A typical way to make a content recognition ML system is to scan pixel by pixel through the image and looking at the neighboring pixels around the currently scanned one. From there the ML system uses this array to generate a value for how certain it is that the scanned pixel is on the object it is trained to detect. So one more or less gets a heat map of where the object is on the image.

      To start with, we can just give it a few hundred images that a good portion has the content, and the rest doesn’t. After a fair bit of training until it correctly identifies the pictures, one can manually look at the heat map, and see if it is on the right track.

      One can likewise manually make a heat map to compare the results to and pick out the ML iterations with the best result and further improve on those.

      But yes. If we just give the ML system a million images, it won’t have a clue what a cat or a dragon is. Without guidance it is just left to its own devices and likely won’t do much of interest at all.

      1. >the object it is trained to detect

        Yes, but that’s assuming the algorithm is already trained to detect “cats”. You may throw it any object, so wouldn’t that mean you first have to train the algorithm to detect everything under the sun so that it could then take an arbitrary command to switch from cats to dogs to cars?

        There’s many ways of detecting objects, but does it still give the algorithm any understand of what it is requested to do?

        1. How to train it to detect X:

          Step 1.
          Collect a dataset that has assets that contain the object you desire to detect and assets without the object.

          Step 2.
          Make it output a value of how certain it is that the object exists in the image. (a heat map is best for more nuanced results. Yes/No answers is rather bland for most content recognition applications.)

          Step 3.
          Compare its output to what you know is correct. (You have to filter the database manually first. But that isn’t too much manual work.)

          The iterations of the ML system that generates “better” results are the one you keep and they go to the next round.

          Step 4.
          Each round we copy our current “best” contenders and add a bit of noise to their internal network to make them “evolve”, some will get better, some will get worse. So we just redo from step 2.

          At any point from Step 2 and forth we can just take a look at the heat map it generates to see if it actually detects a cat, or if it detected the noisy texture that gravel tends to have. (however, if all pictures that actually have a cat in them also consistently have something else, then one’s dataset is fairly poorly made.)

          The way we anchor this system to correctly detect the object is by simply manually tagging the images ahead of time. And also to sanity check its output occassionally.

          To make it detect more than just 1 thing, then have it output more than just 1 value per pixel, where each value corresponds to a certain thing of your choosing. (This also requires more tags for our dataset. But what value corresponds to what tag is totally up to you as the teacher to choose.)

          1. Yes. That much was more or less explained by your previous message.

            It still appears that you have to train it specifically for each and every object it must detect. It won’t just detect “an object” in an arbitrary picture and then understand by the description that this object is referred to as a “cat” – much as we would do when seeing an previously unseen object. The fault of the method is that in order to detect an object at all, it must already know something about the object, which is a chicken-and-egg problem.

            If it could separate objects without knowing anything about them, it would speed up the training for the actual purpose – and here the automatically generated depth map helps. It might be something that is necessary to “ground” the abstract labels to the actual experience of objects rather than going from top level information down to the sensory data (picture) to see anything at all.

            Imagine if you needed to be told what a car is before you could actually see cars passing down the road.

          2. Still waiting for the previous message to go through…

            Anyways, the point was that the system should rather detect an object first and then connect it to the label, than detect objects based on pre-computed parameters associated with the label. Otherwise it is blind unless told what to look for and that makes it cumbersome, in the general case almost impossible, to teach it new objects as the number of potential objects to see becomes too large.

      2. I imagine combining the two methods would turn out quite powerful. Separating objects first allows the ML algorithm to concentrate on the detailed differences between objects with fewer examples given.

        But, without “grounding”, the algorithm still doesn’t understand that a “cat” is not just the statistical commonality between sets of data, just as it doesn’t understand that a peg legged stool turned upside down is actually a boot drying rack. It doesn’t have access to the underlying reality of the object.

        1. Grounding the neural network is indeed important for practical work.

          But remember, it doesn’t really understand what a cat is regardless. All it knows is that a certain patter means that it shall output a certain value on one of its outputs. What that in turn means isn’t important to the network.

          Sometimes the value can be accurate, other times it can be noise. But that generally doesn’t matter.

          Oftentimes one makes it output a heat map, since then we can further filter that. If a few pixels is detected as an output, we can simply sanitize that away, since we know a few pixels is insufficient for an actual result. Even if the network thinks something is there.

          1. Well, as you said, for practical work. What’s the importance of a ML system that can only do “party tricks” by getting it right often enough that it fools the onlooker?

            Think about self driving cars for example. Isn’t this what we actually have in place?

          2. As for “grounding”, the idea is that the machine should perceive what a cat IS before it knows it is called a cat. The fact that it could see the object as it is before it knows its abstract meaning is paramount to dependable and accurate machine vision and image generation.

            As pointed out by the original article, by separating the object with a depth map, you begin to avoid all sorts of trivial errors like mangled and missing fingers on a person because the algorithm had a hard time deciding which part of the object actually belongs. This is already well on the way of “grounding” the ML algorithm to the actual experience of seeing.

    2. A good illustration about the point is the famous Gripsholm Castle lion.

      Having been sent a lion pelt, the taxidermists attempted to stuff it, but none of them had actually seen a lion or knew what it was, so they made their best guesses. At least they had some examples appearing in heraldry, so they got it “mostly right”. With only second hand information, would you have done better?

      https://en.wikipedia.org/wiki/Lion_of_Gripsholm_Castle

Leave a Reply to RobCancel reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.