Audio Technical Requirements


It is the responsibility of the producer to ensure that dialogue is easy to hear and understand by a first-time viewer who is using consumer equipment. Even viewers with slightly impaired hearing must be able to understand what is being said.

SVT receives many complaints about unclear dialogue, especially when background music and effects have been used. Remember, the audience has not seen the programme before transmission and has not seen a script. If background music or sound effects are necessary, the sound mix must be made with great care. Use the so-called ‘interleaving technique’ (the music or sound effect is established in the pause between the spoken words, but is adequately attenuated during dialogue).

Ensure that the background sound is low enough that hearing-impaired audience clearly can hear what is being said. Speech and dialogue must have the highest priority!

Normal speech must be mixed with an even loudness level throughout the programme. Normal speech should be levelled close to -23 LUFS (0 LU on the relative scale) measured with the Short-term Loudness meter. However, the Programme Loudness target must be fulfilled (except regarding deliberately low loudness) and has priority over the dialogue level recommendation. Regarding dialogue mixing practises, see section 9.2 in EBU Tech 3343.


All programmes must be mixed to comply with the EBU recommendation EBU R 128.

Programmes mixed according to the old standard will only be accepted by prior agreement with SVT. In that case, the old standard for measuring programme audio levels was the EBU Tech 3205 recommendation for Quasi Peak Programme Meters and was rendered on the Nordic Scale. 0 dBu corresponded to -18 dBFS and the integration time was set to 10 ms. Typical peak levels regarding normal speech hovered between 0 to +6 dBu and maximum programme peak level was not exceeding +11 dBu.

The programme metadata element ‘Audio Comments’ must be used to note whether the programme, with prior agreement, has been mixed according to the old “Nordic Scale” standard.

Loudness terms EBU R 128 terms used in this document, how they are measured and the delivery requirements, are listed in the table below. All programmes must be compliant with the ‘Programme Loudness’ and ‘Maximum True Peak’ requirements. Other parameters are given for guidance only.

Loudness diagram
Loudness diagram II

Guidlines for True Peak Audio Levels

Guildlines for True Peak Audio Levels

The table above is only for guidance on the true peak levels of diverse types of audio. At all times dialogue must be distinctand clear

Metering Requirements

Meters must comply with the specifications in EBU Tech 3341. Programmes must be measured using the EBU Integrated (I) mode and the measurement must be applied to the whole programme (EBU Tech 3343). The optional LFE channel must be excluded from all measurements.

Stereo Audio Requirements

Stereo tracks must carry sound in the A/B (Left/Right) form.

If mono originated sound is used, it must be recorded as dual mono, so that it may be handled exactly as stereo. It must meet all the stereo standards regarding levels, balance and phase.

Stereo Line-Up Tones

The use of line-up tones for File delivery is optional. When used, each stereo audio pair must have either EBU stereo or GLITS line-up tone (not a mix of both). Tone must be 1 kHz (2 kHz is acceptable on M&E channels), sinusoidal, free of distortion and phase coherent between channels.

Audio files of GLITS and EBU stereo tones may be downloaded from the DPP website, Länk till annan webbplats..

Digital Audio Reference level is defined as 18dB below the maximum coding value (-18 dBFS).

Stereo Phase

Stereo programme audio must be capable of down-mixing to mono without causing any noticeable phase cancellation.

Surround Sound Requirements

Surround sound should be delivered as discrete tracks, i.e. preferably not as ‘Dolby E’.

For programmes carrying surround sound (>2.0) it is optional to deliver an additional stereo (2.0) mix. SVT transmits a stereo audio stream in conjunction with a multichannel audio stream, but it is made from an in-house mix-down of the multichannel audio with Audio Metadata applied – i.e. stereo listeners will receive either a mix-down from the surround channels generated in SVT’s playout chain or a mix-down generated in their receiver.

Surround Line-Up Tones UHD Programmes

Ultra-High Definition (UHD) programmes are not accepted by SVT.

Surround Line-Up Tones HD Programmes

The use of line-up tones for File delivery is optional. When used, all surround tracks must carry BLITS tone, as described in EBU Technical Paper 3304. An audio file of BLITS tone may be downloaded from the DPP website, Länk till annan webbplats..

AES Sample Timing

Very small timing differences between audio tracks in a surround programme will not be heard unless the stereo down-mix is monitored acoustically. An error of as little as one or two samples between the Left, Right and Centre channels can cause phasing and comb filtering for those listening in stereo.

Timing differences between audio channels must be no more than 0.2 samples (i.e. the timing between each channel of the six audio tracks of a surround sound signal).

Surround Sound Mixing Requirements

To help programme makers meet their responsibilities, it is important that all transmitted audio can be easily and clearly monitored by both Editorial and Technical staff during the production process.

Dialogue in a Surround Mix

For speech intelligibility reasons, it is preferred to use the centre channel for dialogue, a k a “film style”. Note that it is not precluded to mix dialogue in left or right front channels for certain artistic purposes. Mixing techniques such as “centre spread”, is allowed. In exceptional cases such as music mixing, sometimes known as “music style” with singing voice placed mainly in left and right front channels with just a little of the singing voice in the centre channel, could be accepted. Dialogue with almost equal levels in all front channels should be avoided, since it is not down-mix compatible.

When down-mixed to stereo (with down-mix metadata applied), the down-mix must have similar loudness of dialogue in relation to music and effects compared to when listening to the surround mix.

General Mixing Requirements

Viewers of the HD channels listening in stereo (or mono) will either hear a receiver derived automated down-mix of a surround sound programme using the Dolby Metadata parameters or an in-house derived down-mix. Some HD platforms only transmit AC3 audio switching between Stereo or Surround. Some HD platforms also includes a Stereo stream, as well that is an automated down-mix derived in-house.

The stereo mix is not transmitted on the Standard Definition channel(s) either. SD channels only transmit an automated down-mix.

The audio parameters controlled by the metadata include: centre and rear down-mix levels, and the extent of any dynamic range control applied. Therefore:

  • it is essential to check the automated down-mix using a monitoring system that applies or simulates the metadata settings. Any external processor (e.g. a Dolby DP570) must be set to apply the programme’s metadata;
  • pre-mixed stereo content should be up-mixed, where appropriate, to match the surround sound to maintain the audio image throughout a surround broadcast. A method of up mixing approved by the broadcaster must be adopted, which anchors dialogue to the front and disperses effects around the image;
  • up-mixed material must also down-mix to stereo and mono with no audible artefacts. The injudicious use of phase shifting and delay within some up-mixing algorithms may become more noticeable in the subsequent receiver down-mix process, and result in unacceptable down-mixed audio.

For general surround sound (e.g. audience reaction) phase-coherence invariably benefits both the wrap-around effect in 5.1 and the stereo down-mix. Coincident microphone techniques (e.g. crossed-pairs) tend to outperform spaced mono microphones in this context.

Stereo and Centre Channel Monitoring

It is essential that the mono and stereo down-mixes of a surround programme are monitored in at least equal measure to the surround mix. A large majority of viewers will be listening in stereo rather than 5.1.

Consistency of Image

When a surround programme contains mono content interleaved with stereo pre-recorded items, it is important to maintain the consistency of the sound image and prevent the effect of dialogue appearing to jump between Centre Only and Phantom Centre (Left/Right) only.

Dolby Metadata Settings

Programmes must be delivered together with Audio Metadata.

Use the Excel-file (the ‘Programme Metadata File’ Excel, 346 kB.), based on a template provided by SVT, that includes specific Dolby and Loudness metadata items.

Audio Metadata values, including SVT’s subset of Dolby metadata values, must remain constant throughout a programme. For the time being, SVT uses two alternative pre-sets of Dolby Metadata for transmission: one regarding stereo 2.0 and another regarding multichannel 5.1. Where Dolby Digital is used for transmission, the following fixed metadata values are used:

  • Downmix to Lo = L + (C - 3 dB) + (Ls - 3 dB)
  • Downmix to Ro = R + (C - 3 dB) + (Rs - 3 dB)
  • DRC profile = Music Light

Guidance for Acquired Programmes and Movies

Guidance for Acquired Programmes and Movies

Acquired programmes and movies must be received with metadata according to section 2.7. above. If no metadata exists, the above parameters are anticipated by SVT:

Sound to Vision Synchronisation

The relative timing of sound to vision should not exhibit any perceptible error. Sound must not lead or lag the vision by more than 5 ms.

Audio / Video Sync Markers

The following, regarding sync markers, is optional. To assist in maintaining A/V sync through the post-production process, a ‘sync plop’ should be used which must meet the following conditions:

  • the sync plop must be between timecode 09:59:57:06 and 09:59:57:08;
  • the audio plop must be 1 kHz tone in all channels (82.5 Hz in the LFE channel) at -24 dBFS (-18 dBFS is acceptable for stereo programmes);
  • the duration of the vision flash must be 2 frames to allow it to pass through standards conversion successfully;
  • the duration of the audio plop must be 1 frame, starting on the first frame of the vision flash. It must be synchronous across all audio channels and with the video flash (within ±5 ms).

If an end sync plop is used it must be no closer than 10 seconds to the end of the programme and comply with the relevant points above.

Note: The above is applicable in the case of 50 Hz motion portrayal via interlaced video (25i) as well as 25 Hz motion portrayal via progressive scan (25p) or progressive scan segmented frame (25PsF). Regarding 50 Hz motion portrayal via progressive scan video (50p) – see the table in Programme Layout for File Delivery .