![]() Page_content = page_content.replace("\n\n\n", "\n").strip() Page_content = read_pdf.getPage(page).extractText() Pdf_file_text = 'PDF File: ' pdf_link '\n\n'įor page in range(read_pdf.getNumPages()): ![]() I think thats because PDF has watermark over the page so it does not recognise the text: import requests I have wrote a code that extracts the text from PDF file with Python and PyPDF2 lib.Ĭode works good for most docs but sometimes it returns some strange characters.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |