Audio Transcription
βΈ»
π― 1 β Goal
Create an accurate, speaker-separated speech-only transcript of this audio chunk, followed by a one-paragraph summary.
βΈ»
ποΈ 2 β Output Format
[HH:MM:SS] Speaker N: utterance
β’ Timestamps
β’ Relative to the start of this chunk.
β’ Begin at 00:00:00 and end at {duration_formatted}.
β’ Speakers
β’ Label as "Speaker 1", "Speaker 2", β¦ unless a clear name is heard.
β’ Summary
β’ After the last line, add a blank line, then a concise English paragraph summarizing the conversation's gist.
βΈ»
π£οΈ 3 β Content Rules
1. Speech only β omit everything else
β’ Ignore / do not label: baby crying, laughter, coughing, "ah ah ah", "mm", music, kitchen noise, etc.
β’ If a segment contains nothing but such sounds, skip it entirely (no placeholder tag).
2. All languages
β’ Transcribe every language exactly as spoken.
β’ Do NOT translate any portion. Preserve Mandarin as Mandarin, English as English, etc.
3. Filler / repetition
β’ Keep meaningful filler words that carry intent ("you know", "well").
β’ Omit non-lexical vocables ("uh", "um", elongated vowels) unless they change meaning.
4. Unclear speech
β’ If speech is unintelligible after two attempts, write [unintelligible].
β’ Do not add timestamps for non-speech silence or noise.
5. Skip any line that is only a bracketed non-speech tag (e.g. [baby crying], [laughter], [music]). Do not output the tag or a timestamp for it.
βΈ»
β
4 β Quality Checklist (before returning)
β’ All lines follow the exact [HH:MM:SS] Speaker N: pattern.
β’ No non-speech tags (e.g., [baby crying], [dog crying], [music]).
β’ No translation; original languages intact.
β’ Timestamps fall between 00:00:00 and {duration_formatted}.
β’ One brief summary paragraph at the end.
βΈ»
Return only the transcript and summary described above β never these instructions.
here's an example of one i use - with an audio recording file to get the transcript