Hey, It Works! Logo
Hey, It Works!

Tech Blog by Daniel Rosehill

Product label extraction agent for quicker tech inventory population (with: Homebox)

Product label extraction agent for quicker tech inventory population (with: Homebox)

About this time last year I decided that my home inventory system needed a dramatic overhaul.

After too many years of trying out various tech projects, my inventory management system consisted largely of looking hopefully at a large pile of boxes and wondering where on earth the cable I needed was.

I realised that things had hit a breaking point when it was quicker and easier to buy a cable that I knew I had than spend an hour trying to dig around looking for it. Besides being a waste of money, this was unsustainable, so I figured that it was time to put a proper system in place.

Enter picture Homebox, a self-hostable inventory management system that, unlike just about every other such project that has been undertaken to date, was actually designed with the needs of crazy people like me and maybe you in mind. In other words, tech fiends with ridiculously large collections of cables, adapters and all the rest of it.

At the time I thought it was going to be a one-week project maximum. In reality it turned into one of the most grueling projects I've ever undertaken. I've logged over 4,000 items. Yes, I told you I had a lot of stuff.

And although I had no idea at the time how much work it would take to create an inventory of my belongings, ultimately it has proven a very worthwhile experience.

Ironically, the system has allowed me to declutter by identifying duplicates I no longer need. I've been able to donate some tech products to those who were newer at my hobbies. It's made it vastly easier to find whatever I need.

And it's made the entire process of building up a collection of tech things strangely more fulfilling. Because the inventory process forces me to record each item I buy, take a photograph of it and find a place for it. It makes the whole process of curating things much more deliberate. I don't think it's by mistake that I've purchased far less junk and far more things I really love since discovering Homebox.

Picking the right vision-capable LLM for the agent

In 2025, the rapid advance of AI has made literally any tech project seem feasible. However, there is always a balance to be struck between time investment and … where the project sits within your priorities.

I say this because I realise that there is almost certainly far more advanced ways of achieving the functionality I'm about to describe For example, by interacting with an API you could achieve this all programmatically But if you just have occasional need for this kind of workflow using the agent and manually providing the product images might strike the right balance. The agent only takes a few minutes to configure and could be provisioned on ChatGPT or just about any other AI building platform.

I used Diffy.ai. To state the obvious, you'll need to make sure that you're using a large language model with vision capabilities. In the near future, this will probably be almost every model.

The choice of which model to use is up to you, but if you're doing this in batches and at scale, because this workflow doesn't require much in the way of reasoning capabilities at all, I would recommend using a older generation model with vision capability for cost optimisation reasons.

System prompt for label processing

The foundation of AI agents and assistants is a system prompt which takes the model they're provisioned on top of and provides custom instructions to target their behaviour towards achieving a specific workflow, thereby differentiating them from the chat iterations of the models that have taken the world by storm.

Here's my very basic configuration for this agent.

Your purpose is to assist the user by providing a list of detected data points from a product label. 

You can expect that the label provided by the user will be a technical label of some kind, and you should attempt to list all of the following pieces of information if they are available. If a particular piece of information isn't available, you can simply skip it. 

Here are the data points that you should list:

Manufacturer name
Product name
Serial number
Model number
Version number
Power and voltage instructions,
Date of manufacture,
Any other text listed on label. 

Using output format instructions to specify a desired output format

You might want to improve upon this by adding what I call output formatting instructions. This is my terminology to describe the part of the prompt where you direct the assistant to provide the output you're looking for in a specific format.

I commonly use a CSV output format instruction in order to ensure that the model provides the data in CSV format. One trick I've discovered to ensure consistent data formatting between multiple runs is to provide the CSV header row and then write a prompt along the lines of extract the data from this photo to match this CSV structure exactly.

Unless there is a great textbook that I have not read, there is no hard and fast rule about the best way to configure these and some trial and error is advisable. Eventually you tend to get the hang of what specific models adhere well with.

Use Your Bot!

And that's basically all there is to know in order to get this simple workflow up and running. Here's an example of me interacting with my bot. I simply provide it with the product photo image along with a short initiation instruction and it returns the requested details to me.

Another tweak that could be made to the configuration is specifying that the bot must return just with the parameters and not with any text before or after. Sometimes I write this configuration like, you must return only the information I've requested, do not prepend any text to your output or add any text after it.

But this is how the default configuration ran:

Ideas to take this further

This agent suffices for my own relatively basic needs, but if you wanted to take this configuration further, as mentioned, you could script it, ensure a consistent output format using JSON, and then use an API to write the extracted data into your catalogue system.