Hi there,
The requirements look quite clear and straightforward to implement. However, it's hard to say anything about the data before seeing the target website and layout. For instance, if there's any precautions against bots, or automated scraping in general, that'd affect pretty much everything from top to bottom.
If the data to be collected is huge, and since you already prefer python, I suggest to use scrapy which is a framework specifically created for this type of work. Still, if you want to go with requests that's totally fine as well. As a side note, we'll probably need proxies if you intend to run it with serious parallelism.
I am planning to use py3 so if you instead prefer py2 just let me know please. It's hard to come up with a specific deadline but as a wild estimate that'd take a week at most I guess, thanks.