Dear WCAG Plus Forum community members,
After exploring Criterion 1.2.1 for audio-only and video-only content, let's now move to another crucial requirement for the accessibility of time-based media: Success Criterion 1.2.2: Captions (Prerecorded). This is also a Level A criterion, which underscores its fundamental importance for ensuring inclusion.
What does Criterion 1.2.2 require?
Criterion 1.2.2 states that for all prerecorded video or synchronized media content (i.e., videos with significant audio, like most videos we watch), captions must be provided.
- Captions: These are the textual representation of all audio in the video – both dialogue and significant non-speech sounds (e.g., "[applause]", "[background music]", "[phone ringing]"). They must be synchronized with the audio and presented in a way that users can turn them on or off ("closed captions").
Captions are vital for ensuring access to audio content within videos, particularly for:
- Deaf or Hard of Hearing Individuals: For them, captions are the only way to access dialogues and essential sound information for understanding the video. Without captions, video content with audio is largely inaccessible.
- Individuals with Cognitive or Learning Disabilities: Captions can enhance comprehension, provide visual support to speech, and allow information to be re-read.
- Users in Silent or Noisy Environments: They enable content consumption in contexts where audio cannot be heard (office, public places) or where background noise makes it difficult to hear the audio.
- Non-native speakers: Captions can aid language comprehension.
- Closed Captions (CC): These are captions that can be turned on or off by the user via the video player. They are the preferred format and usually what WCAG requires because they offer user control.
- Open Captions: These are captions "burned-in" directly into the video and cannot be turned off. Although they provide access, they do not offer user control and can be problematic for some (e.g., obstructing visual elements). To fully meet 1.2.2, Closed Captions are preferred.
1. Synchronization: Captions must appear on screen in sync with when the words are spoken or significant sounds occur.
2. Accuracy and Completeness: They must faithfully reflect the dialogue and all sounds important for understanding the content.
- Speaker Identification: When multiple people are speaking, it's crucial to identify who is speaking (e.g., "Michele: Hello!").
- Non-Speech Sounds: Include bracketed descriptions for non-verbal sounds (e.g., "[Upbeat music]", "[Laughter]", "[Door slams]").
4. Testing: It's crucial to test captions to ensure they are accurate, synchronized, and easily toggleable across different platforms and devices.
Common Pitfalls to Avoid:
- Relying Solely on Auto-Generated Captions: Automatic generation systems (e.g., YouTube) are a good starting point, but they often contain transcription errors, timing inaccuracies, and fail to correctly identify non-verbal sounds or speakers. They always require human review and correction.
- Inaccurate or Out-of-Sync Captions: Captions that don't match the audio or appear too early/late make content incomprehensible and frustrating.
- Omitting Significant Non-Speech Sounds: The lack of descriptions for crucial sounds (e.g., an alarm, a tone, an explosion) can compromise understanding.
We invite the community to share their experiences:
- What tools or services do you use to create accurate captions?
- Do you have tips on managing captions for long or complex videos?
- What have been your biggest challenges in implementing this criterion?
Warm regards,
Michele (wcgadmfrm)
WCAG Plus Forum Team