Open
Description
Describe the bug
When Record Manager is enabled for an S3 Directory Loader and the SourceId Key is left at its default value (source
), Flowise stores a different temporary path in metadata.source
on every run:
source:"/tmp/s3fileloader-GjaLgi/docs/XXXX.pdf"
source:"/tmp/s3fileloader-fkHJjB/docs/XXXX.pdf"
Because the path changes each time, Record Manager treats the same PDF as a new document and inserts duplicate chunks into the vector database.
To Reproduce
- Go to Data Sources → New Loader → S3 Directory Loader.
- Configure the bucket/prefix
- Enable Record Manager and leave SourceId Key as
source
(default). - Run the loader (process and upsert).
- Run the loader again on the same data set.
- Inspect the vector store or logs – you will see that new chunks are inserted instead of being matched to existing records.
Expected behavior
metadata.source
should be stable for a given S3 object (e.g. use the S3 key docs/XXXX.pdf
), so Record Manager can recognise existing documents and avoid duplicates.
Screenshots
N/A
Flow
N/A
Setup
- Installation: docker
- Flowise Version: 3.0.1
- OS: Windows 11
- Browser: Google Chrome
Additional context
- The current behaviour forces users either to supply a custom SourceId Key or to accept data duplication.
- Setting a custom SourceId Key is not a viable workaround: the same value is applied to every document within the S3 folder, which again prevents Record Manager from distinguishing individual files.
- A consistent identifier derived from each object’s S3 key (or an option to choose that behaviour) would let Record Manager properly update existing documents instead of re-inserting them.