A web application that allows users to upload PDF or DOC files (through file upload or URL) and search for words within those documents. The application also highlights predefined key terms related to financial contexts.
- File upload for PDF and DOC/DOCX files
- Load documents from URLs
- Text extraction from documents
- Word search with highlighting
- Automatic highlighting of predefined key terms
- Mobile responsive design
- Page-by-page document viewing for PDFs
-
Clone the repository:
git clone https://github.com/shivangidas/doc-viewer.git cd doc-viewer
-
Install dependencies:
npm install
-
Start the development server:
npm start
-
To deploy to GitHub Pages:
npm run deploy
doc-viewer/
├── src/
│ ├── App.tsx # Main application component
│ ├── App.css # Styles for the application
│ ├── index.tsx # Entry point
│ └── index.css # Global styles
├── public/
│ └── index.html # HTML template
└── package.json # Dependencies and scripts
- React
- TypeScript
- PDF.js (via react-pdf) for PDF processing
- Mammoth.js for DOCX processing
- GitHub Pages for hosting
The application automatically highlights the following terms:
- breach
- dispute
- litigation
- covenant
- bad debts
- impaired
- impairment
- write off
- qualified
- adverse
- disclaimer of opinion
To modify the list of key terms, edit the KEY_TERMS
array in App.tsx
.
- The URL-based document loading only works with CORS-enabled sources
- For large files, processing might take some time depending on the client's hardware
- Some complex document formatting might be lost during text extraction
MIT License