AI Sound Effects

A new AI thing launched a few days ago, which can generate sound effects from a text prompt:
https://elevenlabs.io/app/sound-effects

From all of my posts criticising the hype of AI, I do try to keep an open mind but also keep the ‘critical thinking’ gears engaged.

With any of these AI projects, my first two questions are:
1. what was it trained on? did they have permission?
2. what are the rights for anyone who uses it?

So for Q1: First I hunted through their website and could find no mention whatsoever of what it was trained on, which seems strange. But checking their twitter feed they announced: “Thank you to our partners @Shutterstock who provided licensed tracks from their expansive and diverse audio library to help create our model.”

Interesting.
More on that in a second, but for Q2 there is nothing in their terms about copyright. Who owns a sound effect that is generated? Why this matters is that if a film soundtrack is audited, or a DMCA notice submitted to a production company due to a possible copyright breach, then there has to be proof that a legit license exists for the use case. Without it, the sound effects are worthless for professional work.

I eventually found a contact for legal questions and submitted a request:

“I have searched your terms & conditions but cannot find the answer to a very simple question. Can you please advise? When I generate a sound effect with your new AI product, who owns the copyright of the generated sound?”

This morning I got an initial response:

“Thank you for reaching out with your question about copyright for sound effects generated with our AI product. I understand that clear information about ownership is important, and I appreciate your patience as I look into this for you.
 I’m going to escalate this question with our legal department to get the specific details and ensure you have an accurate answer on this as it’s a new feature. I’ll get back to you as soon as possible with the information you need.”

So they launched a product without clear legal info on the rights of use.
Interesting.

While I wait for an answer from the legal team I kept searching.
First I found this:

Wow ok. So you “can” use it for commercial purposes BUT you are the one who is liable for it.
I kept reading. Then I found this:

Interesting.
While you the user are liable to the full extent of the law, they are protected to a maximum of US$100.

Next I thought I’d check out what Shutterstock have to say about AI, since it is their sounds being mashed up by the AI.
With regards to compensation for people who upload media to their site, for licensing they state:

Wow.
It feels like a pattern is forming: “all care, no responsibility.”

I’ll update this when the legal team clarify the actual use of their “text to sound effects generator”

Next I thought I will try it out, how useable is it?
The hype is certainly there in large letters.

Now I don’t know if this is the marketing department having too much coffee, or their example users having very small imaginations but one thing I can assure you is that no, they cannot generate any sound imaginable. When I have mentioned the pathetic hype associated with AI on this forum many times, this is what I mean. Sure, make marketing claims. But don’t promise the entire world of sound when (a) you can’t and (b) you dont even have the legal framework resolved.

From my tests I found that the sounds it generated were very low fidelity, like worse than a $200 handheld recorder quality. And the specificity is non-existent. A very explicit text description might get vaguely close one or two times out of ten. Now I can imagine the typical response is ‘but wait, it will get better’ but that idea has some issues. First, it only gets better with a combination of (a) more data and (b) user feedback. Eg if you ask for a “brick thrown on the bonnet of a car” and you choose the 10th version, you are then training the AI with user feedback. Good luck with that, you’ll need a lot of people with spare time on their hands…

But to that point, what I really discovered, or realised was that “AI Generation” is incredibly unreliable and the only thing that makes it useful is a human sifting through the useless crud looking for a gem. And by a gem, I do not mean a pearl or a diamond. I mean a bit of coal, or a stone, or something even remotely useful. Why is that a problem? Well as the quote goes “Time is the school in which we learn, Time is the fire in which we burn.” The one thing all humans have in common is that time is their most valuable commodity. When I think of how a sound effects editor or sound designer works, they have a huge resource right in front of them: their sound library. And when they put a “text prompt” into their sound library app, eg if there is a “brick thrown on to a bonnet” it will be shown immediately. They audition it & beginning working with it. Some sounds require many components and some of the best components have nothing to do with the first search term at all. People have been putting lion roars etc into explosions etc for a very long time. We love that!

But AI reduces you to someone auditioning sounds from an unreliable low resolution source, where even explicit descriptions do not guarantee anything even vaguely close. It really is like the ‘use glue on your pizza’ google AI search.

I’m sure they will continue receiving VC to progress whatever it is they aim to achieve. But the residual thoughts I have at this stage are:

– its a solution looking for a problem. Casual use on their free tier does not keep the lights on. They have to find a paid commercial use.

– AI use like this will die from a thousand cuts. Just as there are legal ambulance chasers, there will also be AI copyright infringement lawyers, who will gum up the aspirations of these companies before they ever get to the point of it being fully functional. Its a whole new industry for them, and ironically those ambulance chasing lawyers will be using AI to do it!

I’ll update this when their team clarify the legality of use…

Ok thanks for coming to my TED talk

_________________________________________________________________________

And today, another:
https://stability.ai/news/introducing-stable-audio-open

“The new model was trained on audio data from FreeSound and the Free Music Archive. This allowed us to create an open audio model while respecting creator rights.”

I have submitted a request to them too, for clear legal guidance on use.

FreeSound terms are here

“License restrictions when publishing new sounds that include/modify/remix other sounds…

I have also submitted a request to Freesound, asking if they are aware & whether it follows allowed use.
And also whether there is an OPT OUT button for FreeSound users.

I also contacted the FREE MUSIC ARCHIVE, asking if they are aware & whether it follows allowed use.

I will update this as soon as I get any answers…

 

_________________________________________________________________________

20240606
Update 1: from Free Music Archive
“We did not give Stability.ai permission.
To be continued.

Team Tribe of Noise”
_________________________________________________________________________

20240606
Update 2:

stable-audio-open:
“All audio files are licensed under CC0, CC BY, or CC Sampling+”

I found this info here:
https://news.ycombinator.com/item?id=40587685#40588214

blargey
If you look at the repo where the model is actually hosted they specify
> All audio files are licensed under CC0, CC BY, or CC Sampling+.
These explicitly permit derivative works and commercial use.
> Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.
So it’s not being glossed over, and licenses are being abided by in good faith imo.
I wish they’d just added a sentence to their press release specifying this, though, since I agree it looks suspect if all you have to go by is that one line.
(Link: https://huggingface.co/stabilityai/stable-audio-open-1.0#dat… )

https://huggingface.co/stabilityai/stable-audio-open-1.0#datasets-used

Datasets Used
Our dataset consists of 486492 audio recordings, where 472618 are from Freesound and 13874 are from the Free Music Archive (FMA). All audio files are licensed under CC0, CC BY, or CC Sampling+. This data is used to train our autoencoder and DiT. We use a publicly available pre-trained T5 model (t5-base) for text conditioning.

Attribution
Attribution for all audio recordings used to train Stable Audio Open 1.0 can be found in this repository.
FreeSound attribution [csv] FMA attribution [csv] _________________________________________________________________________

20240607
Update 3
Freesound has written a blog post discussing the issue

 

 

5 thoughts on “AI Sound Effects

  1. nickj.lavigne says:

    Thanks for this article, Tim. I’m a sound editor for tv/film and I enjoyed your thoughts on this topic, and have been following the whole AI SFX/post audio thing myself, as closely as I can. Quality aside (which, after trying ElevenLabs generator, I agree, is currently really not great at all), I find the legal aspect very interesting and I really appreciate you looking at both ElevenLabs and ShutterStock’s fine print. It’s crazy (though maybe not surprising?) that a company would operate in this field without proper legal frameworks in place.

    • Tim Prebble says:

      The aspect that I find really frustrating is that I can easily imagine some great uses for AI with sound. Uses that don’t require harvesting other peoples data. It seems so odd that the idea of typing a text prompt is the primary goal for creation, “prompt engineering” ie the least effort possible, which also equates to the least control possible. The results are generic and basic while they tell us it can generate anything “imaginable”…
      While people can do what they like, and it’s not for me to criticise them all that concerns me is INPUT and OUTPUT. For input, a new form of licensing is required. Same as how there are different licenses for use of sound or music in an app, compared with in a film. AI training is a unique use case. It needs a unique licensing scheme, with the ability to opt out. Techbros talk about it like its all inevitable but nothing is inevitable.
      Output licenses and copyright dictate how a sound can be used. My output is legitimate and viable due to copyright, as applied to the EULA and MULA license. If I used AI on my own corpus, then I should own the copyright of the output. If someone uses third party sounds as a corpus with AI then ownership of the output is the question I want answered. It’s surprising how they cannot instantly provide an answer.

    • Tim Prebble says:

      I do mean to test that generator again, thats trained on FreeSound sounds. All of those vintage sound effects that Craig Smith uploaded will be in the training set so I suspect adding “vintage” to a prompt might result in weighting the output towards his efforts. Be interesting to see if that results in anything remotely useable. If not, they have an uphill battle finding enough sound libraries to legitimately license and train on….

  2. sola says:

    I genuinely think it is short sighted to think that Ai text to sound generation will not be extremely effective in say 5-10 years time. It will take a while, and it doesn’t mean human content will not have advantages, but AI generated sound effects will be used for HUGE swathes of content where absolute top quality is not required. Basically anything outside of large budget films and games will be using significant amounts of AI generated sounds.

    • Tim Prebble says:

      How? Most actual work projects require copyright, otherwise it cannot be monetised & distributed. Free stuff does not pay for the servers and power that AI chews through. But it actually being usable in reality remains to be seen, same as with music. AI requires a MASSIVE shift in copyright law, which is starting to happen with all these legal cases which will set precedents… But the hype is real!

Leave a Reply