Two Data engineering/data analytic scenrio based task
Task 1 - Process
What I’d like One to come up with is 5-ish slides on the process and steps you would take in the following situation
A retailer,has agreed to do business with our company,We’ve not worked with them previously and do not know what their data is like. The work will involve product advertising on their website. What we need to do is link this advertising back to their sales data (which will share a common userid with the advertising data) and report in two areas:
[login to view URL] reporting including sales uplift
[login to view URL] operational reports
The data structures are as follows:
PageViewID (which page it was shown on)
As a guidance, I’m not looking for code. What we want to see is a high-level set of steps you would go through on receipt of such a dataset (which may include questions about it), and consideration of the objectives that we’d be trying to achieve. An Entity Relationship Diagram should be part of this and a target data architecture too. A further consideration in this work is that we may want to do this for other retailers in a similar position, so repeatability and scalability are important.
Task 2 – Skills Test
What I’d like from you is an approach to cleansing/filtering streaming data. I’d like to see one (possibly two) approach(es) including:
Reasoning for choosing an approach
Considerations that go into making a decision (inc. Risks and GDPR if appropriate)
Relevant Technical Data Flow Architecture
The hypothetical scenario is as follows:
A third-party tech partner company provides us with an advertising PaaS. From their platform they will provide us with Impression and Click data via an AWS Kinesis stream. They have informed us that they’re unable to filter the data to just our instance of the Platform and will be supplying impressions and clicks from other platform users that they’ve asked us to remove from our dataset. We have a defined list of users that should be used as a whitelist for the data filtering.
Again, this should be no more than 5 slides, but should be a bit more detailed than the previous task.