Enterprise Java

Spring AI – OpenAI TTS Example

The ability to convert text into natural-sounding speech has unlocked exciting opportunities, from building voice assistants and narrating content to enhancing accessibility. OpenAI’s Text-to-Speech (TTS) model delivers high-quality voices that integrate smoothly into Java applications using Spring AI.

This article explores how to build a Spring Boot application that leverages OpenAI’s TTS features for speech generation using Spring AI.

1. Project Setup

To start, set up a Spring Boot project and include the required Spring AI dependency in your pom.xml file to access OpenAI’s audio models.

Dependency

		<dependency>
			<groupId>org.springframework.ai</groupId>
			<artifactId>spring-ai-starter-model-openai</artifactId>
		</dependency>

Configuration

Here is the configuration you need to enable OpenAI’s Text-to-Speech capabilities using Spring AI.

src/main/resources/application.properties

spring.ai.openai.api-key=${OPENAI_API_KEY}  
spring.ai.openai.audio.speech.options.model=tts-1
spring.ai.openai.audio.speech.options.voice=alloy  # alloy, echo, fable, onyx, nova or shimmer
spring.ai.openai.audio.speech.options.response-format=mp3  # mp3, opus, aac, flac, wav or pcm
spring.ai.openai.audio.speech.options.speed=1.0  # from 0.0 to 1.0

You must set the OPENAI_API_KEY as an environment variable or provide it via a secrets manager.

  • spring.ai.openai.audio.speech.options.model=tts-1
    This specifies the OpenAI model to use for speech synthesis. The model tts-1 is currently OpenAI’s production-ready TTS engine and must be explicitly declared to activate audio capabilities.
  • spring.ai.openai.audio.speech.options.voice=alloy
    The voice setting determines which synthetic voice will be used. OpenAI offers several distinct voice profiles: alloy, echo, fable, onyx, nova, and shimmer.
  • spring.ai.openai.audio.speech.options.response-format=mp3
    This setting defines the audio format for the generated speech. Common formats include mp3, opus, aac, flac, wav, and pcm.
  • spring.ai.openai.audio.speech.options.speed=1.0
    The speed option controls the playback speed of the synthesized speech. A value of 1.0 means normal speed, while lower values (e.g., 0.8) slow the speech down.

2. Converting Text to Speech (Static Audio)

To convert a given block of text into spoken audio using OpenAI’s TTS model, we can use the OpenAiAudioSpeechModel class, which implements Spring AI’s SpeechModel interface. This model is capable of receiving a text prompt and returning the corresponding speech response as a binary audio stream.

Below is a simple REST controller that exposes this functionality via an HTTP POST endpoint.

@RestController
@RequestMapping("/api/audio")
public class TextToSpeechController {

    private final OpenAiAudioSpeechModel openAiAudioSpeechModel;

    public TextToSpeechController(OpenAiAudioSpeechModel openAiAudioSpeechModel) {
        this.openAiAudioSpeechModel = openAiAudioSpeechModel;
    }

    @PostMapping
    public byte[] generateSpeech(@RequestBody String inputText) {
        SpeechPrompt prompt = new SpeechPrompt(inputText);
        SpeechResponse response = openAiAudioSpeechModel.call(prompt);
        return response.getResult().getOutput();  // returns the audio byte array
    }
}

In this controller, an instance of OpenAiAudioSpeechModel is injected through the constructor. The generateSpeech() method handles POST requests sent to /api/audio, accepting plain text in the request body. This text is wrapped inside a SpeechPrompt, which is then passed to the call() method of the TTS model.

The result returned from the TTS model contains the generated audio data as a byte array, which is then returned as the response. You can use the curl command-line tool to send a POST request to the /api/audio endpoint with a text payload. The response will be an audio byte stream, which you can save to a file (e.g., output.mp3) for playback.

curl -X POST http://localhost:8080/api/audio \
     -H "Content-Type: text/plain" \
     --data "Hello, this is a demo using Spring AI and OpenAI TTS." \
     --output output.mp3

The result of the request is a newly created file named output.mp3 in your current working directory, containing the synthesized audio for the input text. You can play this file using any standard audio player.

3. Streaming Audio in Real-Time

Streaming audio is ideal for use cases like voice apps, blog narration, or interactive storytelling where real-time playback is needed. Instead of waiting for full audio synthesis, OpenAI’s TTS model with Spring AI streams audio in chunks, enabling low-latency delivery. The following Spring MVC controller uses StreamingSpeechModel to generate and stream MP3 audio using StreamingResponseBody.

@RestController
@RequestMapping("/api/stream-audio")
public class AudioStreamingController {

    private final RestClient.Builder restClientBuilder;

    public AudioStreamingController(RestClient.Builder restClientBuilder) {
        this.restClientBuilder = restClientBuilder;
    }

    @GetMapping(produces = MediaType.APPLICATION_OCTET_STREAM_VALUE)
    public ResponseEntity<StreamingResponseBody> streamAudio(@RequestParam String text) {

        // Build OpenAI API client
        OpenAiAudioApi openAiAudioApi = new OpenAiAudioApi.Builder()
                .apiKey(System.getenv("OPENAI_API_KEY"))
                .restClientBuilder(restClientBuilder)
                .build();

        OpenAiAudioSpeechOptions options = OpenAiAudioSpeechOptions.builder()
                .model(OpenAiAudioApi.TtsModel.TTS_1.value)
                .responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.MP3)
                .build();

        // Create the streaming speech model
        StreamingSpeechModel streamingModel = new OpenAiAudioSpeechModel(openAiAudioApi, options);

        Flux<byte[]> audioStream = streamingModel.stream(text);

        StreamingResponseBody body = outputStream -> {
            audioStream.toStream().forEach(bytes -> {
                try {
                    outputStream.write(bytes);
                    outputStream.flush();
                } catch (IOException e) {
                    throw new UncheckedIOException(e);
                }
            });
        };

        return ResponseEntity.ok()
                .contentType(MediaType.APPLICATION_OCTET_STREAM)
                .body(body);
    }

}

The above AudioStreamingController builds an OpenAiAudioApi client with a configured API key, sets the model (tts-1) and output format, and uses StreamingSpeechModel to stream audio as a Flux<byte[]>. The audio chunks are written to the response using StreamingResponseBody, enabling real-time playback as the data is received.

4. Voice Customization

You can change the voice, response format, and speed dynamically using the OpenAiAudioSpeechOptions builder.

       OpenAiAudioSpeechOptions options = OpenAiAudioSpeechOptions.builder()
                .model(OpenAiAudioApi.TtsModel.TTS_1.value)
                .speed(0.9f)
                .voice(OpenAiAudioApi.SpeechRequest.Voice.ALLOY)
                .responseFormat(OpenAiAudioApi.SpeechRequest.AudioResponseFormat.OPUS)
                .build();

This flexibility allows you to offer different voice styles based on user preferences or context.

5. Conclusion

This article examined creating a text-to-speech application using Spring AI and OpenAI TTS. It covered both standard and streaming audio generation with openAiAudioSpeechModel and StreamingSpeechModel, as well as customizing the output format, speed, and voice. By integrating OpenAI’s advanced TTS capabilities with Spring Boot, developers can build applications that produce natural-sounding speech for use in virtual assistants, content narration, and accessibility features.

6. Download the Source Code

This article explored using Spring AI with OpenAI TTS for text-to-speech functionality in Spring Boot applications.

Download
You can download the full source code of this example here: spring ai openai tts

Omozegie Aziegbe

Omos Aziegbe is a technical writer and web/application developer with a BSc in Computer Science and Software Engineering from the University of Bedfordshire. Specializing in Java enterprise applications with the Jakarta EE framework, Omos also works with HTML5, CSS, and JavaScript for web development. As a freelance web developer, Omos combines technical expertise with research and writing on topics such as software engineering, programming, web application development, computer science, and technology.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button