Enterprise Java

How to Use Spring AI to Extract Structured Data from Images

Extracting structured data from images is a requirement in many domains, including scanning receipts and documents, analyzing visual content in surveillance systems, educational tools, and inventory management. With the rise of multimodal AI models like OpenAI’s GPT-4o, it’s now possible to retrieve structured information directly from images using straightforward prompts. This article will show how to extract structured data from an image containing balloons of different colors, using Spring Boot and OpenAI’s GPT-4o through Spring AI.

1. Project Setup

Configure your project to use Java 21 and include the necessary Spring AI dependencies. Below is the essential pom.xml configuration:

<properties>
    <java.version>21</java.version>
    <spring-ai.version>1.0.0-M6</spring-ai.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    </dependency>
</dependencies>

These dependencies provide the components for our image analysis application. spring-boot-starter-web for the REST API and spring-ai-openai-spring-boot-starter for GPT-4o integration.

application.properties

spring.servlet.multipart.max-file-size=10MB
spring.servlet.multipart.max-request-size=10MB

spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o

These properties configure OpenAI integration, set a 10MB upload limit, and enable image processing with the GPT-4o model.

Use Case: Counting Balloons by Color

Let’s say a user uploads a party photo filled with colored (red, blue, and yellow) balloons. The system should analyze the image and return how many balloons of each specified color are present.

2. Define the Data Models

We will model the AI’s response as a list of color-count pairs and a total.

public class BalloonColorCount {
    private String color;
    private int count;

    // Constructor, Getters and setters
}
public class BalloonCountSummary {
    private List<BalloonColorCount> colorCounts;
    private int totalCount;

    // Constructor, Getters and setters
}

Next, the service below creates an AI prompt, attaches the image, and requests structured data in response.

@Service
public class BalloonAnalysisService {

    private final ChatClient chatClient;

    public BalloonAnalysisService(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    public BalloonCountSummary countBalloons(InputStream imageStream, String contentType, String userColors) {
        return chatClient.prompt()
                .system(systemMessage -> systemMessage
                .text("Count the number of balloons in the uploaded image.")
                .text("Only count balloons in the colors specified by the user.")
                .text("Ignore any objects or balloons not in those colors.")
                .text("Respond with a JSON object listing each color with its count and a total count.")
                )
                .user(userMessage -> userMessage
                .text(userColors)
                .media(MimeTypeUtils.parseMimeType(contentType), new InputStreamResource(imageStream))
                )
                .call()
                .entity(BalloonCountSummary.class);
    }
}

The BalloonAnalysisService class uses Spring AI’s ChatClient to send an image and a list of colors to OpenAI’s GPT-4o model, requesting a structured JSON response that counts balloons by color. It builds a system prompt to instruct the AI to focus only on the specified colors and ignore unrelated objects, then attaches the user’s color input and image file to the user prompt. The AI’s response is automatically mapped to a BalloonCountSummary object, allowing easy access to the extracted data in a structured format.

3. REST API to Handle Image Upload

This controller provides an endpoint to upload the image and color filter.

@RestController
@RequestMapping("/api/balloons")
public class BalloonImageController {

    private final BalloonAnalysisService analysisService;

    public BalloonImageController(BalloonAnalysisService analysisService) {
        this.analysisService = analysisService;
    }

    @PostMapping("/count")
    public ResponseEntity<?> countColoredBalloons(
            @RequestParam("colors") String colors,
            @RequestParam("file") MultipartFile file
    ) {
        try (InputStream inputStream = file.getInputStream()) {
            BalloonCountSummary summary = analysisService.countBalloons(inputStream, file.getContentType(), colors);
            return ResponseEntity.ok(summary);
        } catch (IOException e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body("Failed to analyze image.");
        }
    }
}

Sample Request

Make a POST request to:

POST /api/balloons/count

cURL Request:

curl -X POST http://localhost:8080/api/balloons/count \
  -F "colors=red, blue, yellow" \
  -F "file=@/path/to/balloon-photo.jpg"

Sample Output (JSON)

Here’s what the GPT-4o-backed response might look like:

{
  "colorCounts": [
    { "color": "red", "count": 5 },
    { "color": "blue", "count": 3 },
    { "color": "yellow", "count": 2 }
  ],
  "totalCount": 10
}

4. How It Works

This application leverages Spring AI’s ChatClient to send:

  • A system prompt to instruct the model on what task to perform.
  • A user prompt specifying what data to extract (colors).
  • An image attachment, enabling the multimodal GPT-4o model to reason over both text and visuals.

Spring AI internally handles serialization and response mapping into Java classes.

5. Conclusion

In this article, we explored how to use Spring AI and OpenAI’s GPT-4o model to extract structured data from images through a simple REST API. By guiding the model with a clear prompt and attaching an image alongside user-defined input, we were able to automatically receive a structured JSON response and map it directly into a Java object.

6. Download the Source Code

This article covered how to extract data from images using Spring AI.

Download
You can download the full source code of this example here: spring ai extract data from images

Omozegie Aziegbe

Omos Aziegbe is a technical writer and web/application developer with a BSc in Computer Science and Software Engineering from the University of Bedfordshire. Specializing in Java enterprise applications with the Jakarta EE framework, Omos also works with HTML5, CSS, and JavaScript for web development. As a freelance web developer, Omos combines technical expertise with research and writing on topics such as software engineering, programming, web application development, computer science, and technology.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button