I have a use case where I want to do comparisons between incoming data and some reference data provided by another service.
What's the best way in pyflink
to fetch those data and update them regularly (in intervals of 1-2 hours)
Other considerations:
- The reference data may contain hundreds of thousands of records