AudioFile
AudioFile extends File and provides additional methods for working with audio files.
AudioFile instances are created when a DataChain is initialized from storage with the type="audio" parameter:
There are additional models for working with audio files:
AudioFragment- represents a fragment of an audio file.
These are virtual models that do not create physical files.
Instead, they are used to represent the data in the AudioFile these models are referring to.
If you need to save the data, you can use the save method of these models,
allowing you to save data locally or upload it to a storage service.
For a complete example of audio processing with DataChain, see
Audio-to-Text with Whisper -
a speech recognition pipeline that uses AudioFile, AudioFragment, and Audio
to chunk audio files and transcribe them.
AudioFile
Bases: File
A data model for handling audio files.
This model inherits from the File model and provides additional functionality
for reading audio files, extracting audio fragments, and splitting audio into
fragments.
Source code in datachain/lib/file.py
get_fragment
get_fragment(start: float, end: float) -> AudioFragment
Returns an audio fragment from the specified time range. It does not download the file, neither it actually extracts the fragment. It returns a Model representing the audio fragment, which can be used to read or save it later.
Parameters:
-
start(float) –The start time of the fragment in seconds.
-
end(float) –The end time of the fragment in seconds.
Returns:
-
AudioFragment(AudioFragment) –A Model representing the audio fragment.
Source code in datachain/lib/file.py
get_fragments
get_fragments(
duration: float,
start: float = 0,
end: float | None = None,
) -> Iterator[AudioFragment]
Splits the audio into multiple fragments of a specified duration.
Parameters:
-
duration(float) –The duration of each audio fragment in seconds.
-
start(float, default:0) –The starting time in seconds (default: 0).
-
end(float, default:None) –The ending time in seconds. If None, the entire remaining audio is processed (default: None).
Returns:
-
Iterator[AudioFragment]–Iterator[AudioFragment]: An iterator yielding audio fragments.
Note
If end is not specified, number of samples will be taken from the audio file, this means audio file needs to be downloaded.
Source code in datachain/lib/file.py
get_info
get_info() -> Audio
Retrieves metadata and information about the audio file. It does not download the file if possible, only reads its header. It is thus might be a good idea to disable caching and prefetching for UDF if you only need audio metadata.
Returns:
-
Audio(Audio) –A Model containing audio metadata such as duration, sample rate, channels, and codec details.
Source code in datachain/lib/file.py
save
save(
destination: str,
format: str | None = None,
start: float = 0,
end: float | None = None,
client_config: dict | None = None,
) -> AudioFile
Save audio file or extract fragment to specified format.
If destination is a remote path, the audio file will be uploaded
to remote storage.
Parameters:
-
destination(str) –Output directory path or URI (e.g.
s3://…,gs://…). -
format(str | None, default:None) –Output format ('wav', 'mp3', etc). Defaults to source format.
-
start(float, default:0) –Start time in seconds (>= 0). Defaults to 0.
-
end(float | None, default:None) –End time in seconds. If None, extracts to end of file.
-
client_config(dict | None, default:None) –Optional client configuration.
Returns:
-
AudioFile(AudioFile) –New audio file with format conversion/extraction applied.
Examples:
audio.save("/path", "mp3") # Entire file to MP3 audio.save("s3://bucket/path", "wav", start=2.5) # From 2.5s to end as WAV audio.save("/path", "flac", start=1, end=3) # 1-3s fragment as FLAC
Source code in datachain/lib/file.py
AudioFragment
Bases: DataModel
A data model for representing an audio fragment.
This model represents a specific fragment within an audio file with defined start and end times. It allows access to individual fragments and provides functionality for reading and saving audio fragments as separate audio files.
Attributes:
-
audio(AudioFile) –The audio file containing the audio fragment.
-
start(float) –The starting time of the audio fragment in seconds.
-
end(float) –The ending time of the audio fragment in seconds.
get_np
Returns the audio fragment as a NumPy array with sample rate.
Returns:
-
tuple[ndarray, int]–tuple[ndarray, int]: A tuple containing the audio data as a NumPy array and the sample rate.
Source code in datachain/lib/file.py
read_bytes
Returns the audio fragment as audio bytes.
Parameters:
-
format(str, default:'wav') –The desired audio format (e.g., 'wav', 'mp3'). Defaults to 'wav'.
Returns:
-
bytes(bytes) –The encoded audio fragment as bytes.
Source code in datachain/lib/file.py
save
save(
destination: str,
format: str | None = None,
client_config: dict | None = None,
) -> AudioFile
Saves the audio fragment as a new audio file.
If destination is a remote path, the audio file will be uploaded
to remote storage.
Parameters:
-
destination(str) –Output directory path or URI (e.g.
s3://…,gs://…). -
format(str | None, default:None) –Output audio format (e.g., 'wav', 'mp3'). If None, inferred from the file extension.
-
client_config(dict | None, default:None) –Optional client configuration (e.g. credentials).
Returns:
-
AudioFile(AudioFile) –A Model representing the saved audio file.
Source code in datachain/lib/file.py
Audio
Bases: DataModel
A data model representing metadata for an audio file.
Attributes:
-
sample_rate(int) –The sample rate of the audio (samples per second). Defaults to -1 if unknown.
-
channels(int) –The number of audio channels. Defaults to -1 if unknown.
-
duration(float) –The total duration of the audio in seconds. Defaults to -1.0 if unknown.
-
samples(int) –The total number of samples in the audio. Defaults to -1 if unknown.
-
format(str) –The format of the audio file (e.g., 'wav', 'mp3'). Defaults to an empty string.
-
codec(str) –The codec used for encoding the audio. Defaults to an empty string.
-
bit_rate(int) –The bit rate of the audio in bits per second. Defaults to -1 if unknown.
get_channel_name
staticmethod
Map channel index to meaningful name based on common audio formats