When we talk about data roles we can find some confusion between Data Scientist and Data Engineers roles, let’s find some info about those roles!
According toa Data Engineer “builds systems that consolidate, store, and retrieve data from the various applications and systems created by software engineers. (A software engineer builds applications and systems).”
On the other hand, Data Scientists “builds analysis on top of data. This may come in the form of a one-off analysis for a team trying to better understand customer behavior, or a machine learning algorithm that is then implemented into the code base by software engineers and data engineers.”
A more recente article from Karlijn Willems, February 23rd, 2017, says that “The data engineer is someone who develops, constructs, tests and maintains architectures, such as databases and large-scale processing systems. The data scientist, on the other hand, is someone who cleans, massages, and organizes (big) data…..
The data engineers will need to recommend and sometimes implement ways to improve data reliability, efficiency, and quality. To do so, they will need to employ a variety of languages and tools to marry systems together or try to hunt down opportunities to acquire new data from other systems so that the system-specific codes, for example, can become information in further processing by data scientists…., the data engineering team will need to develop data set processes for data modeling, mining, and production. ”
And what about the data scientist? Ed Jones says that a “data scientist tends to be more mathematically focused- concentrating on providing an insight into future patterns identified from past and current data. If one takes the two words literally – ‘science’ implying knowledge gained through systematic study; ‘data’ being an information set of qualitative or quantitative variables – a data scientist can therefore be defined literally as one who systematically studies the organisation and property of information…
A data scientist must:
- Look at data with a mathematical mind-set. Learning skills such as machine learning, data mining, data analysis and statistics are crucial. A data scientist will need to interpret and represent data mathematically.
- Use a common language to access, explore and model data. Knowledge of a statistical programming language will be critical. Languages like R, Python or MATLAB, and a database querying language like SQL are some of the most popular skills in demand. Data extraction, exploration and hypothesis testing are central to the data science practice.
- Develop strong computer science and software engineering backgrounds. This involves developing a skill set which could include Java, C++ or knowledge of algorithms and Hadoop. These skills will be used to leverage data to architect systems.”
This infographic is in my opinion the best image for explain some of the Data roles: