I need a python script that will take a file, remove duplicate lines, split the file into equal sizes and then save each file to a different location.
This script will run in a Windows environment but I have python installed and can run from command prompt.
Here is a step by step walk through of how it should work.
1) Script looks for file at specified location and copies it to new location to be split. (this file should not be deleted or modified as it is updated often with new contents which is why it should be copied first and should overwrite the file if it already exists in the copy destination )
2) Duplicate lines are removed from the file
3) File is split into X number of parts based on equal number of lines (or as close to equal number of lines as posisble)
4) Each file would then be saved to a differnt location.
There should be a config file where I can specify:
1) Location of file to split.
2) Location to copy file
3) Number of seconds to wait before splitting the file again
4) Number of parts the file should be split into
5) Locations to save the files to. I should be able to add as many destinations as needed to match the number of files being created. So if a file should be split into 5 parts I should be able to specify 5 different locations to save the files. Existing files in those locations should be overwritten.
That is is. Please feel free to ask any questions. The sooner this can be completed the better and will factor into who is chosen for the project.
I wrote a couple of scripts doing similar things, mostly with very big data files, which can be tricky. Please send me some example files to try... Looking forward to working with you!