Here are the steps that I have documented for running PDFPlumber inside an AWS Lambda function. PDFPlumber is really good at extracting content (especially structured content) from a PDF. It can do this without the use of any GenAI models, but the downside is that it cant handle varying content very well. But if you have PDFs that have the same format and have well structured tables, then this tool can do a very good job at extracting that information.
mypublicnotes/AWS/Lambda/pdfplumber.md at master · rajrao/mypublicnotes