Before we get into actual sound editing and sound design, we need to establish some ground rules. To make sure everyone has a clear understanding of the basics and is at the same starting point.
Post-production sound editing and sound design is about working to picture. This means SYNC TO PICTURE is mission-critical. So part of this tutorial will be to establish a basic standard setup and workflow.
Relative to this, I have always believed an important part of post-production is avoiding problems before they occur. This can be difficult to achieve when you are just starting out, but forever more appreciate it is part of your job: to know what you do not know!
So this tutorial is going to make sure you understand how and why sync is crucial, and how to verify you are working to the correct picture, at the correct speed. This is important BEFORE you start work on a project. But it’s also important during the project and the final delivery.
Making a mistake with this aspect can range from simply being embarrassed (eg on the dub stage, the rerecording mixer notices that the sync of sound XYZ is not tight) to putting all of your work in jeopardy (eg none of your work is in sync with the final picture.)
Another of our aims is to have a REPEATABLE WORKFLOWso we are not inventing our method on each project and can problem solve by retracing our steps. An important part of that is VERSION CONTROL so we can revisit earlier versions of our work.
Another aspect of this tutorial will be philosophical, with the aim of helping you build skills at observing sync and developing a ‘feel’ for sync, an instinct. And its important to appreciate how the idea of ‘sync’ in music (eg playing ‘in time’) is very different to sync in sound post. I’ll explain common sync terminology and how to interpret & react to it.
SYNC IN THE REAL WORLD
SYNC takes many forms. If you clap your hands, sync is usually very clearly perceived as being a finite fixed point in time. The sound generated has sharp attack and the visual sync point is clearly viewed. Now compare with a large truck passing on a motorway. The approach, pass and away may have no single specific sync point. Accordingly, we could be even one second out of sync and it still appears normal.
So one of the aims of this tutorial is to encourage you to develop an awareness of sync, as you observe it in daily life. Because reality is the reference that we will always consult.
When I first started working as a sound editor, one of the tasks I was assigned was to record foley, and part of recording foley (or being a foley mixer as per US terminology) is to decide if the footsteps or props sound ‘right’ and also to comment on sync. So again reality is the reference and I consciouslly started observing the sounds of footsteps when walking around down town, and in different rooms in houses. When you consciously pay attention & notice sounds in the context of reality, you build a perceptual resource as to what sounds ’normal’ in a real context.
The same is true with sync. You can observe it in the real world. When you slowly close that door, when does it make sound? How about if you slam it? If there is a creak or hinge movement, when does it occur? What if you slowly open the door, like in a creepy horror film? When does the creepy door open actually generate sound?
No one ever says “Picture is out of sync.”
Well, if they do it is rarely. If there is a sync issue, it is always sound that is out of sync. While this is partly hierarchical, its also based in physics – the speed of light is far faster than the speed of sound. So sometimes in reality we do experience sound out of sync, for example when a plane flies overhead sometimes the source of the sound does not match where we see the plane. Because when we see the plane, light is travelling at 300,000,000 m/s whereas sound is travelling at 343m/s.
As a sound editor, we often have to choose sync. We watch video and place sounds in sync to picture. As with the hand clap example sometimes sync position is very easy & straightforward to ascertain. And sometimes it is far more complex and open to interpretation.
An example where sync is entirely psychological often occurs when editing ambiences for a scene. While many of the elements of an ambience are continuous & run for the entire length of the scene, others are sporadic. Imagine a scene where someone is creeping around the backyard of a suburban house. The sounds of distant city, and wind in trees etc might be continuous. But a distant passing car or siren, or a neighbours dog suddenly barking may have no direct visual sync point onscreen, and we place these sounds where we feel is right, to help tell the story and support the drama of the scene.
A SYNC ANECDOTE
As a kid I grew up on a farm that was a half hour drive from the nearest town. Every Thursday my Mum would take us into town for piano lessons & a visit to the library. Sitting in the back of the car, I’d be bored for the half-hour drive but for some odd reason, I remember creating a ritual. Driving up State Highway 1, we would pass crossroads which would join the main highway. And every time we passed one and the powerlines from a crossroad lined up perfectly, I’d click my teeth. For one beat there was perfect synchronicity inside my head. I don’t know why I did this, other than to pass the time. But in hindsight, I was developing a feel for sync. I could see the crossroad approaching and for that one instant when the visuals confirmed, I would quietly create a little sound, in sync. So I had a little flashback when I saw this clever Chemical Brothers video for Star Guitar many years ago, which pursues a directly related sync technique.
There are two forms of sync terminology I’ll briefly outline.
The first is the idea of sync between machines. When I started work in the 90s the sync relationship between machines was always referred to as Master and Slave. Back then we would work to a Umatic video tape, which was the Master. And in the studio there would be a multitrack reel to reel, and a ProTools rig which were both Slaves. If we hit play on the master, the two slave machines would follow with frame accurate sync.
This terminology with an abhorrent history has since been replaced by LEADER and FOLLOWER. So in our studio back in the 90s, the video tape was the sync LEADER, and the tape machine and ProTools were FOLLOWERS. And in this situation sync was absolute i.e. the only purpose for sync was to keep all three machines tightly locked together.
The second form of sync is when we are working to picture, and sync does not feel correct. The common terms used every day to describe the sync error or relationship is to consider: is the sound EARLY or LATE? To take an obvious example of a handclap. We see the handclap, but if we hear the sound AFTER the hands impact then we would say the sound is “LATE.” We see the handclap, but if we hear the sound BEFORE the hands impact, then we would say the sound is ‘EARLY”
Apart from getting used to using this terminology, we also need to develop instincts as to HOW late or early the sound is. Does it feel 3 frames late? or 5 frames late? or 10 frames late?
An important consideration when thinking about out of sync sound, is that in reality sound is NEVER early. There is no circumstance when sound travels faster than light. Why does this matter? Well for the entire time of our existence, our psyche has mostly observed sound in sync, with what we see. Occasionally it observes sound arriving late due to distance. But it never experiences sound arriving early. The result of all these years of learning & experience can mean that if we sync a sound and it is early, it is very obviously wrong – it just instantly feels wrong. But if it’s a little late, we may be more inclined to forgive it or not notice it so much.
In the homework associated with this tutorial I will ask you to sync some sounds, and will provide a way to check your sync placement is correct. And then I will ask you push the sound out of sync, by a fixed amount and observe it: 1 frame late. 2 frames late. 3 frames late, 4 frames late etc… And then the same for placing them early. You need to develop your skills and instincts, so that when you see an out of sync sound you instinctively feel ‘that footstep feels 3 frames late’
Again referring back to recording foley, sometimes I would have to help the performer by providing feedback eg ‘that was good, but your fifth step always feels a little late….’ And sure enough on the next take they would nail it.
Another perceptual observation with regard to sync is related to density.
How many layers of sync can you perceive?
Walter Murch wrote a brilliant article about this very subject, and I strongly suggest you read it. While it primarily discusses mixing, it also provides a great example of sync layers. The entire article is worth a read, available HERE but the second part “Dense Clarity – Clear Density” is where this example from THX1138 is mentioned:
“A case in point: the footsteps of the policemen in the film, who were supposed to be robots made out of six hundred pounds of steel and chrome. During filming, of course, these robots were actors in costume who made the normal sound that anyone would make when they walked. But in the film we wanted them to sound massive, so I built some special metal shoes, fitted with springs and iron plates, and went to the Museum of Natural History in San Francisco at 2am, put them on and recorded lots of separate ‘walk-bys’ in different sonic environments, stalking around like some kind of Frankenstein’s monster.
They sounded great, but I now had to sync all these footstep up. We would do this differently today – the footsteps would be recorded on what is called a Foley stage, in sync with the picture right from the beginning. But I was young and idealistic – I wanted it to sound right! – and besides we didn’t have the money to go to Los Angeles and rent a Foley stage.
So there I was with my overflowing basket of footsteps, laying them in the film one at a time, like doing embroidery or something. It was going well, but too slowly, and I was afraid I wouldn’t finish in time for the mix. Luckily, one morning at 2am a good fairy came to my rescue in the form of a sudden and accidental realization: that if there was one robot, his footsteps had to be in sync; if there were two robots, also, their footsteps had to be in sync; but if there were three robots, nothing had to be in sync. Or rather, any sync point was as good as any other!
This discovery broke the logjam, and I was able to finish in time for the mix.
But why does something like this happen?
Somehow, it seems that our minds can keep track of one person’s footsteps, or even the footsteps of two people, but with three or more people our minds just give up – there are too many steps happening too quickly. As a result, each footstep is no longer evaluated individually, but rather the group of footsteps is evaluated as a single entity, like a musical chord. If the pace of the steps is roughly correct, and it seems as if they are on the right surface, this is apparently enough. In effect, the mind says “Yes, I see a group of people walking down a corridor and what I hear sounds like a group of people walking down a corridor…..
If you have gotten every single footstep in sync but failed to capture the energy of the group, the space through which they are moving, the surface on which they are walking, and so on, you have made the same kind of mistake that Manet’s student was making. You have paid too much attention to something that the mind is incapable of assimilating anyway, even if it wanted to…..”
ANOTHER SYNC ANECDOTE:
After Film School I moved to Auckland and started working as a trainee sound editor. I have a vivid memory of sitting down at the waterfront in Auckland at lunchtime and watching a new jetty being built. There was a crane quite far away, but I could clearly see the crane slowly raise a battering ram and then release it, driving piles in. And while I could clearly see the battering ram strike the pile, I did not hear the sound ‘in sync’ – there was a delay due to the distance the sound had to travel across the water. So the heavy thump of the pile driver felt about 10 frames late, and I remember asking myself: if this scene was in a film, how would I cut sound for it? Would I put the thumps in sync? or 10 frames late? How would you cut sound for it? (Of course the classic sync cliche is the lightning strike & thunder. I was taught as a kid to count the number of seconds between the flash & the sound, to get an estimate of how far away the storm was)
SYNC x MUSIC
If you come from the world of music, you will know musical forms of sync very well. It is very apparent when someone or some element of the music is ‘out of time.’ And every musician appreciates ‘feel’ and how eg rock has a different feel to reggae. But it will take time and practical experience to learn & appreciate the scope & ramifications of unquantized sync. There is no master tempo with sound.
Years ago I had a brilliant local composer visit my studio. I played him a scene I was working on, which involved slow motion. He made me play the scene to him over and over, as he could not understand what he was hearing. Now this is someone capable of conceiving & composing incredibly complex music for an orchestra, but somehow my sound design was outside his realm of experience. The scene had cadence, clear form and intention but it did not adher to any musical form. It sync’d to picture beautifully and punctuated key aspects, but if you put a metronome against it, it would not make any musical sense. And if it was music, it only existed for one bar. It did not repeat.
This will become a recurring theme in our discussion of sync. Just as with music, no one wants a robotically precise performance. Sound has ‘feel’ just as music does. But working to picture, sound serves a higher purpose. It has to be in sync, but it also has to feel real, or have a relationship to ‘realness’
Another simple example: if you ever try to edit single footsteps into a performance, you will soon learn what a limited approach that is. A foley artist performs those same footsteps in real-time, in sync to picture but with feel and performance. They inhabit the character onscreen that they are walking, and sync has a continuity and flow to it. They make the bad guy’s footsteps sound menacing etc…
So its important to remember sync is not only about direct finite synchronicity.
Sync is also about context.
PROJECT – VERIFY START
Before we get into working with some practical examples of editing sound to picture, we need to work through some material concerns:
– What picture are we working to?
– Who edited it? Using what software?
– What frame rate was the project shot at?
– What frame rate was the project edited at?
– What frame rate is the project being finished to?
– What video will we work to?
Remember our aim is to have a REPEATABLE WORKFLOW so that we aren’t inventing our method every time, and also so we can problem solve by retracing our steps.
It is very important to learn the significance of these questions:
WHY matters as much as HOW.
Why do we care? It’s just boring specs right?
Tell me what spec to use and I can skip the rest of this.
Nope! I believe that is the wrong apporoach, and here is why.
If I say to you ‘always use X format video’ and thats all you remember, when the situation changes you won’t have any understanding as to how your approach should also change. We need to understand WHY we use a certain video codec.
Working on professional projects should mean that your questions are very easily answered by the picture editor or post supervisor.
But when starting out as a sound editor it is very likely you will work on some indie projects which may have no budget and limited technical resources. And it may involve collaborating with people with less experience than you have.
If before you start work on any project you do not verify the workflow, then you open yourself to the one criticism that undermines any good work you do: your sound is out of sync!
Now even if it turns out the problem lies with the picture being out of sync, it will be your sound that is wrong.
You absolutely MUST always verify the sync workflow for every single project. While making a call or messaging someone to find the answers might be the more social way of gaining this info, I strongly insist that you do so via email. Email creates a virtual paper trail!
If a problem later occurs, you can refer back to that reply to your email.
As a side note: as a freelancer all of my working life I apply the same logic to invoicing. If I provide a budget for my work and have it approved by the producer, I make sure a copy of that approved budget exists in an email. So when I send my invoice, I make sure I quote the previous email where they confirmed the budget. So I am pre-empting the possibility of a problem occurring by being proactive. That’s post-production!
Working to picture as a sound editor is far more demanding on video playback than that of a composer. For example it is rare for a composer to watch video played backwards, or to varispeed scrub video with sync audio. But these are very common techniques for a sound editor.
Some video formats make playing a video backwards almost impossible, or if possible it makes huge demands on the host computer such that smooth playback is unlikely.
When I started writing this tutorial I well know what formats etc I use, but I did some basic research by running a little poll on Twitter:
What video codec are you currently using?
What is your fave app for video conversion?
Based on the answers I could usually guess what kind of work the person does. (Note: just because someone has ‘sound designer’ in their bio does not necessarily mean they work in post-production. It’s now a pretty common term used by musicians, synth programmers and soundware developers who do not work in sound post.)
So WTF is a video codec?
Every video is created using a codec – the codec basically takes continuous video and decides how it should be stored, and whether it uses a form of compression when storing it.
Some codecs are very high resolution with no compression at all, while other codecs excel at compression and reducing the size of the video.
Now we need to differentiate between a delivery codec and a work codec.
An example of a delivery codec is h264, as this uses clever ‘inter-frame’ compression to make the download size small. But h264 also assumes you will be playing the video forwards, as it use a form of compression known as long GOP compression. Basically, it takes frame 1 of your video, then it looks at frame 2 and it only stores the difference. So when you play your video, frame 1 plays, then the codec assembles frame 2 from the saved differences. Frame 3 may have minimal differences so only a small amount of data is stored.
This works reasonably well when watching a video forwards. But when you play back this video backwards, the codec has to work far harder, as it has to re-assemble each frame from multiple frames it has not played yet. This makes performance slower, less responsive and makes your computer work far harder.
Now lets compare that approach with a work codec such as ProRes. This codec stores discrete frames, so each frame contains 100% of what’s needed to display the frame. So there is no difference between playing forwards or backwards. Access is instant and sync is accurate and reliable.
Some DAWs specify the codec they prefer, for example ProTools specifies the DNXHD36 codec be used. This means the DAWs video engine is optimised to work with this codec.
So what do you do if the video you were given is h264?
You convert it! And thats why in my poll I asked that second question:
“What is your fave app for video conversion?”
CONVERTING VIDEO FORMATS
While it would be easy for me to just say: use app X, there isn’t a single permanent answer. For many, many years I relied on a free app called StreamClip to do all of my video format conversions. But after an OSX update suddenly it no longer worked, and as the developers had abandoned it I had to find a new solution.
While I use Adobe Premiere Pro to edit video and output OMFs etc its a hefty app and I wanted a simple utility app to do my conversions. So from the poll, at the top of the list of suggestions was this app:
If I am given an MP4 video I load it in Shutter Encoder, and I choose an output codec such as DNXHD36, and I output a new version of the video.
(another handy function of Shutter Encoder is replacing the audio of a video. So eg with my HISSandaROAR videos, I output a final video and OMF from Premiere, then in ProTools edit the OMF and output a final mix. I then open the final video file and the final mix WAV in Shutter Encoder and I select the ‘replace audio’ function. It then deletes the old guide video and replaces it with my new mix, with the output being a video ready to upload to youtube)
It’s important to note that some video formats are actually just enclosures and you need to GET INFO to find out the video codec used. Apple QuickTime MOV and Windows AVI are examples of this. The video you are given might be named ‘SHORT FILM FINAL.mov” but “.mov” tells us nothing about the codec used. If we GET INFO in the Finder or open the video in QuickTime Player and GET INFO we will see what actual codec is used.
EXAMPLES OF DELIVERY CODECS (which use inter-frame compression)
EXAMPLES OF WORK CODECS (which use discrete frames)
Note: this is an incomplete list, feel free to comment with codecs that you use.
OK So you’re about to start work on a short film project and the picture editor has emailed asking what you need. What do you say?
We know we need a video. We know if they make an MP4 we can convert it to a format that we can actually work with. But this is our chance to confirm basic specs for the project.
What editing program are they using?
What frame rate are we working at?
What frame rate is the project being finished to?
Sometimes the choice of them providing an MP4 versus requesting a much larger uncompressed video format is about how it will get to you. If the editor is going to upload the video for you they won’t want to deal with a large file size, so they will more likely provide a compressed MP4 video.
If you are going to take a hard drive over to copy the media, they may happily create an uncompressed version. This tended to be the case when working on feature films for me, whereas a short film might well be uploaded.
OK so lets say the project is being made at 23.976fps
So we want a 1080p MP4 video but what else?
OTHER VIDEO REQUIREMENTS
– BURNT IN TIMECODE
– BURNT IN SECURITY ID
– GUIDE TRACK Dialogue Left
– GUIDE TRACK Music Right
– SMPTE leader at head, correct speed with audio 2 POP
– SMPTE reverse leader, correct speed at tail with audio 2 POP
What are all these? Let me explain:
– BURNT IN TIMECODE
This is the timecode readout onscreen, I prefer the video timecode to start at 01.00.00.00.
On a feature film, the video is usually broken into 20 minute reels, so reel 1 vid would start at 01.00.00.00 Reel 2 would start at 02.00.00.00 Reel 3 would start at 03.00.00.00
But I also appreciate some people prefer that the FFOA is at 01.00.00.00
– BURNT IN SECURITY ID + VERSION #
Usually a transparent security message is embedded on the video.
eg “PROPERTY OF XYZ FILMS – COPY FOR SOUND – TIM PREBBLE – cut 27”
This is to protect both the film company and you, from the film being leaked.
The cut version is also often included.
– GUIDETRACK Dialogue Left
A video can have two audio tracks, so on the left we want all production dialogue (as a sync reference and so we can play dialogue when working)
– GUIDETRACK Music Right
On the right we want any temp music or temp FX. We do not want this mixed in with dialogue because we won’t refer to it as often and we need DX isolated from MX.
– SMPTE leader at head, correct speed with audio 2 POP
At the very start of the video, this is the standard film leader with count down 8,7,6,5,4,3 and on 2 there is a flash frame, with an audio ‘pop’ or beep.
– SMPTE reverse leader, correct speed at tail with audio 2 POP
At the end of the video, this is the standard film leader except reversed. Again with an audio ‘pop’ or beep.
All of these elements provide us vital info we will refer to, some constantly eg the burnt-in timecode, but others like the tail pop give us a quick way to check the duration of the video has not changed (eg at the mix you discover a shot has been extended 5F. You can quickly verify this by checking where your tail pop is compared with the new vid)
OTHER TURNOVER REQUIREMENTS
FFOA and LFOA
AAF or OMF
All original sound materials
What is the significance of these?
1. Script – usually a PDF
2. FFOA = the timecode of the first frame of action
So if our video starts at 01.00.00
The 2 POP should be at 01.00.06.00
FFOA should be 01.00.08.00
3. LFOA = the timecode of the last frame of action
This is valuable to know the timecode point as it may be at the end of a fade to black, making it hard to detect.
A picture EDL is useful for a few reasons. First it enables us to see what shot and take was used osncreen. So if the car explodes at 01.12.11.02 we can see what take it was and check all the location recordings for explosion sounds. But it is also useful if we need to do any conforms, to update to a new picture edit. I’ll cover this in more detail in future, but for now check the Letter to the Editor at The Cargo Cult
5. OMF or AAF
An OMF or AAF is a means of transferring all individual audio clips, in sync and in place as per the picture edit.
6. All original sound materials
This is a copy of all of the production sound recordings, which we will access any wild sound FX, ambiences, room tones and alternative takes.
Now this is all pretty standard, but if you’re helping a buddy with a video for Youtube you may get a blank response to some of these. I’ve worked on plenty of short films where a SMPTE leader or 2Pop was never used.
But for example if I had a dollar for every time a picture editor has said ‘I made some changes after we locked picture, but they don’t affect sound’ I would likely have enough to buy a nice bottle of wine. We ask for these things so we can easily verify if the video we are final mixing to, is actually the same for sync as our work video. And if there are late changes, which there often are, that we can efficiently deal with them.
Now its important I voice some opinions about the DAW you will use. For film sound editing & sound design ProTools is the industry standard (yes I can hear the groans already)
Don’t bother telling me how great Reaper is, or Nuendo etc… I own a license for Reaper & I know people who love Neuendo so I am not here to diss them. If you prefer to use those then you may not be able to follow the tutorials exactly, but you can still do them & adapt as is possible.
But this work is really not suitable for most music apps. For example, I own a copy of ableton LIVE and have done since version 1, and I love it for music. But ableton LIVE is not a program to do frame accurate sound editing. As an obvious example, it does not even have a timecode timeline available. As a sound editor we (mostly) have no interest in quantising to bars & beats, but we NEED to work in reference to timecode.
Can you edit & design sounds in ableton LIVE? Sure. But like many music apps it simply does not scale i.e. it cannot cope with the level of complexity required for a film soundtrack. In the past when I have said this out loud, some music producers replied angrily saying their music sessions are hugely complex/wide etc… My response: ok. Now imagine that your song lasts for 2 hours, and involves collaboration with a team of ten other sound editors all working in reference to the same video. And that video will go through maybe 50 versions, requiring conforms of all audio, and constant recutting for VFX updates, before picture is locked and a final mix proceeds….
The aim is to learn and use a DAW that you will still be using ten years from now, on feature film projects as a sound editor.
While Reaper is interesting, one of its great attributes is also one of its fundamental flaws. It takes seemingly major work to customise it for everyday use. And that customization is going to be different for every user. There is no consistent UI. Now that might be great if you’re the only person using your studio, but eg on the dub stage I would usually be managing two ProTools rigs, neither of which I own – they belong to the Post facility. And they have to be 100% stable, reliable and predictable as they are used by hundreds of different sound editors. Consistency of UX and UI is essential.
So if you use ProTools, or Nuendo then you should have no issues completing the homework.
If you use Reaper I have no idea how you will get on, as it depends how you have it configured.
If you try to use a music app, I doubt you will get half of the assigned tasks done. You may find ‘work arounds’ to achieve some aspects, but keep in mind that you will have to remember every ‘work around’ as you otherwise will not be creating a REPEATABLE WORKFLOW which is one of our primary aims.
Some people love to dump on ProTools but it’s worth remembering the soundtrack for pretty much every film and TV series you have seen in the last few decades was created using ProTools. If you are a hobbyist you can use your time however you like. But when working professionally and collaboratively with a team, you need reliable systems because those delivery deadlines do not move!
For the purposes of giving you some experience with working to picture, I’ve prepared a short video and a turnover of materials including a PDF with a series of exercises for you to complete.
You will be required to:
– convert the provided MP4 to a useable work codec
– set up a 24bit 48kHz DAW session with timecode timeline matching the video
– import a bunch of provided sounds & manually sync them
– open the OMF and import the tracks
– verify whether your manually sync’d sounds match the OMF (no cheating!)
– follow some instructions in the PDF to nudge some sounds out of sync by fixed amounts & observe.
The download includes:
– PDF with step by step tasks to be done
– video x1 MP4
– sounds x12 – 24 48 WAV
– OMF 24 48
Total = 150MB
Download = 98MB