Core IO and DSP

Audio processing

load(path[, sr, mono, offset, duration, dtype]) Load an audio file as a floating point time series.
to_mono(y) Force an audio signal down to mono.
resample(y, orig_sr, target_sr[, res_type, …]) Resample a time series from orig_sr to target_sr
get_duration([y, sr, S, n_fft, hop_length, …]) Compute the duration (in seconds) of an audio time series or STFT matrix.
autocorrelate(y[, max_size, axis]) Bounded auto-correlation
zero_crossings(y[, threshold, …]) Find the zero-crossings of a signal y: indices i such that sign(y[i]) != sign(y[j]).
clicks([times, frames, sr, hop_length, …]) Returns a signal with the signal click placed at each specified time

Spectral representations

stft(y[, n_fft, hop_length, win_length, …]) Short-time Fourier transform (STFT)
istft(stft_matrix[, hop_length, win_length, …]) Inverse short-time Fourier transform (ISTFT).
ifgram(y[, sr, n_fft, hop_length, …]) Compute the instantaneous frequency (as a proportion of the sampling rate) obtained as the time-derivative of the phase of the complex spectrum as described by [R3].
cqt(y[, sr, hop_length, fmin, n_bins, …]) Compute the constant-Q transform of an audio signal.
hybrid_cqt(y[, sr, hop_length, fmin, …]) Compute the hybrid constant-Q transform of an audio signal.
pseudo_cqt(y[, sr, hop_length, fmin, …]) Compute the pseudo constant-Q transform of an audio signal.
fmt(y[, t_min, n_fmt, kind, beta, …]) The fast Mellin transform (FMT) [R5] of a uniformly sampled signal y.
phase_vocoder(D, rate[, hop_length]) Phase vocoder.
magphase(D) Separate a complex-valued spectrogram D into its magnitude (S) and phase (P) components, so that D = S * P.
logamplitude(S[, ref_power, amin, top_db]) Log-scale the amplitude of a spectrogram.
perceptual_weighting(S, frequencies, **kwargs) Perceptual weighting of a power spectrogram:
A_weighting(frequencies[, min_db]) Compute the A-weighting of a set of frequencies.

Time and frequency conversion

frames_to_samples(frames[, hop_length, n_fft]) Converts frame indices to audio sample indices
frames_to_time(frames[, sr, hop_length, n_fft]) Converts frame counts to time (seconds)
samples_to_frames(samples[, hop_length, n_fft]) Converts sample indices into STFT frames.
samples_to_time(samples[, sr]) Convert sample indices to time (in seconds).
time_to_frames(times[, sr, hop_length, n_fft]) Converts time stamps into STFT frames.
time_to_samples(times[, sr]) Convert timestamps (in seconds) to sample indices.
hz_to_note(frequencies, **kwargs) Convert one or more frequencies (in Hz) to the nearest note names.
hz_to_midi(frequencies) Get the closest MIDI note number(s) for given frequencies
midi_to_hz(notes) Get the frequency (Hz) of MIDI note(s)
midi_to_note(midi[, octave, cents]) Convert one or more MIDI numbers to note strings.
note_to_hz(note, **kwargs) Convert one or more note names to frequency (Hz)
note_to_midi(note[, round_midi]) Convert one or more spelled notes to MIDI number(s).
hz_to_mel(frequencies[, htk]) Convert Hz to Mels
hz_to_octs(frequencies[, A440]) Convert frequencies (Hz) to (fractional) octave numbers.
mel_to_hz(mels[, htk]) Convert mel bin numbers to frequencies
octs_to_hz(octs[, A440]) Convert octaves numbers to frequencies.
fft_frequencies([sr, n_fft]) Alternative implementation of np.fft.fftfreqs
cqt_frequencies(n_bins, fmin[, …]) Compute the center frequencies of Constant-Q bins.
mel_frequencies([n_mels, fmin, fmax, htk]) Compute the center frequencies of mel bands.

Pitch and tuning

estimate_tuning([y, sr, S, n_fft, …]) Estimate the tuning of an audio time series or spectrogram input.
pitch_tuning(frequencies[, resolution, …]) Given a collection of pitches, estimate its tuning offset (in fractions of a bin) relative to A440=440.0Hz.
piptrack([y, sr, S, n_fft, hop_length, …]) Pitch tracking on thresholded parabolically-interpolated STFT

Deprecated

ifptrack(y[, sr, n_fft, hop_length, fmin, …]) Instantaneous pitch frequency tracking.