CV Resume PDF Parsing with Multimodal Vision AI

Created by

Jimleuk

Last update

Last update a year ago

How it works

Our candidate's CV/Resume is a PDF downloaded via Google Drive for this demonstration.
The PDF is then converted into an image PNG using a tool called Stirling PDF. Since the hidden prompt has a white font color, it is is invisible in the converted image.
The image is then forwarded to a Basic LLM node to process using our multimodal model - in this example, we'll use Google's Gemini 1.5 Pro.
In the Basic LLM node, we'll need to set a User Message with the type of Binary. This allows us to directly send the image file in our request.
The LLM is now immune to the hidden prompt and its response is has expected.

Google Gemini API Key. Alternatively, GPT4 will also work for this use-case.
Stirling PDF or another service which can convert PDFs into images. Note for data privacy, this example uses a public API and it is recommended that you self-host and use a private instance of Stirling PDF instead.

Swap out the manual trigger for another trigger such as a webhook to integrate into your existing services.
This example demonstrates a validation use-case ie. "does the candidate look qualified?". You can try additionally extracting data points instead such as years of experiences, previous companies etc.