How to Use Spring AI to Extract Structured Data from Images
Extracting structured data from images is a requirement in many domains, including scanning receipts and documents, analyzing visual content in surveillance systems, educational tools, and inventory management. With the rise of multimodal AI models like OpenAI’s GPT-4o, it’s now possible to retrieve structured information directly from images using straightforward prompts. This article will show how to extract structured data from an image containing balloons of different colors, using Spring Boot and OpenAI’s GPT-4o through Spring AI.
1. Project Setup
Configure your project to use Java 21 and include the necessary Spring AI dependencies. Below is the essential pom.xml configuration:
<properties>
<java.version>21</java.version>
<spring-ai.version>1.0.0-M6</spring-ai.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
</dependencies>
These dependencies provide the components for our image analysis application. spring-boot-starter-web for the REST API and spring-ai-openai-spring-boot-starter for GPT-4o integration.
application.properties
spring.servlet.multipart.max-file-size=10MB
spring.servlet.multipart.max-request-size=10MB
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o
These properties configure OpenAI integration, set a 10MB upload limit, and enable image processing with the GPT-4o model.
Use Case: Counting Balloons by Color
Let’s say a user uploads a party photo filled with colored (red, blue, and yellow) balloons. The system should analyze the image and return how many balloons of each specified color are present.
2. Define the Data Models
We will model the AI’s response as a list of color-count pairs and a total.
public class BalloonColorCount {
private String color;
private int count;
// Constructor, Getters and setters
}
public class BalloonCountSummary {
private List<BalloonColorCount> colorCounts;
private int totalCount;
// Constructor, Getters and setters
}
Next, the service below creates an AI prompt, attaches the image, and requests structured data in response.
@Service
public class BalloonAnalysisService {
private final ChatClient chatClient;
public BalloonAnalysisService(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder.build();
}
public BalloonCountSummary countBalloons(InputStream imageStream, String contentType, String userColors) {
return chatClient.prompt()
.system(systemMessage -> systemMessage
.text("Count the number of balloons in the uploaded image.")
.text("Only count balloons in the colors specified by the user.")
.text("Ignore any objects or balloons not in those colors.")
.text("Respond with a JSON object listing each color with its count and a total count.")
)
.user(userMessage -> userMessage
.text(userColors)
.media(MimeTypeUtils.parseMimeType(contentType), new InputStreamResource(imageStream))
)
.call()
.entity(BalloonCountSummary.class);
}
}
The BalloonAnalysisService class uses Spring AI’s ChatClient to send an image and a list of colors to OpenAI’s GPT-4o model, requesting a structured JSON response that counts balloons by color. It builds a system prompt to instruct the AI to focus only on the specified colors and ignore unrelated objects, then attaches the user’s color input and image file to the user prompt. The AI’s response is automatically mapped to a BalloonCountSummary object, allowing easy access to the extracted data in a structured format.
3. REST API to Handle Image Upload
This controller provides an endpoint to upload the image and color filter.
@RestController
@RequestMapping("/api/balloons")
public class BalloonImageController {
private final BalloonAnalysisService analysisService;
public BalloonImageController(BalloonAnalysisService analysisService) {
this.analysisService = analysisService;
}
@PostMapping("/count")
public ResponseEntity<?> countColoredBalloons(
@RequestParam("colors") String colors,
@RequestParam("file") MultipartFile file
) {
try (InputStream inputStream = file.getInputStream()) {
BalloonCountSummary summary = analysisService.countBalloons(inputStream, file.getContentType(), colors);
return ResponseEntity.ok(summary);
} catch (IOException e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body("Failed to analyze image.");
}
}
}
Sample Request
Make a POST request to:
POST /api/balloons/count
cURL Request:
curl -X POST http://localhost:8080/api/balloons/count \ -F "colors=red, blue, yellow" \ -F "file=@/path/to/balloon-photo.jpg"
Sample Output (JSON)
Here’s what the GPT-4o-backed response might look like:
{
"colorCounts": [
{ "color": "red", "count": 5 },
{ "color": "blue", "count": 3 },
{ "color": "yellow", "count": 2 }
],
"totalCount": 10
}
4. How It Works
This application leverages Spring AI’s ChatClient to send:
- A system prompt to instruct the model on what task to perform.
- A user prompt specifying what data to extract (colors).
- An image attachment, enabling the multimodal GPT-4o model to reason over both text and visuals.
Spring AI internally handles serialization and response mapping into Java classes.
5. Conclusion
In this article, we explored how to use Spring AI and OpenAI’s GPT-4o model to extract structured data from images through a simple REST API. By guiding the model with a clear prompt and attaching an image alongside user-defined input, we were able to automatically receive a structured JSON response and map it directly into a Java object.
6. Download the Source Code
This article covered how to extract data from images using Spring AI.
You can download the full source code of this example here: spring ai extract data from images




