Terms of employment: Contract
Project duration: Four years
Duration of contract: one year with possible extension
Place: Ethiopian Public Health Institute, Addis Ababa
Reporting to: National Data Management Center for health (NDMC)
The National Data Management Center for health (NDMC) at the Ethiopian Public Health Institute (EPHI) is a responsible center to centrally archive health and health related data, process and manage health research, apply robust data analytic technics, synthesis evidence and to ensure evidence utilization for decision making by the Federal Ministry of Health (FMoH) and other relevant stakeholders at local, sub-national and national and international levels. The NDMC is looking for high caliber staff for this collaborative project. NDMC has collaborative partnership with Institute for Health Metrics and Evaluation (IHME), University of Washington and has established a Burden of Disease (BoD) Unit. The BoD Unit is responsible for data mapping, collecting, reviewing and archiving available health and health related data in the country and for producing national and subnational burden of disease estimates collaboratively with Global Burden of Disease (GBD) Study centered at IHME for population and demography, mortality and risk factors for a range of communicable diseases, non-communicable diseases, maternal newborn and child health, nutrition and for injuries. The unit creates platforms for translating BoD evidence for decision and policy at national and subnational levels. The NDMC is looking for high caliber staff for this collaborative project.
Roles and responsibilities
· Apply different mathematical modeling implementation techniques on data are archived at EPHI.
· Solve computational and analytic challenges by investigating the data, understanding the root questions, and coming up with alternative measurement strategies.
· Implement code solutions in order to answer analytic questions, perform diagnostics on results, and test and assess methods.
· Work under the senior data analyst and senior biostatistician to create, maintain, update databases containing health data from multiple sources such as surveys, vital registration systems, administrative records, and published studies relevant to NDMC research priorities
· Carry out routine and complex computational processes and statistical modeling that are central to generating estimates of key indicators as guided by NDMC senior data specialist/biostatistician/health economist/public health experts.
· Execute queries on databases and resolve intricate questions in order to respond to the needs of senior researchers and other stakeholders.
· Bring together data, analytic engines, and data visualizations in one seamless computational process.
· Use protocols to identify problems with datasets and routine computational processes, rectify issues, and systematize data for analyses
· Transform and format datasets for ongoing analyses.
· Catalogue and incorporate datasets into databases.
· Develop and implement algorithms to assess data quality.
· Coding and re-coding data contained within various databases to identify patterns by compiling Excel spreadsheets and using Visual Basic for Applications
· Analyze data accurately and presenting results in a clear manner
· Data mining using state-of-the-art methods
· Processing, cleansing, and verifying the integrity of data used for analysis
· Developing and maintaining databases, reports, and maps
· Organizing, manipulating and retrieving archived data for reporting, analysis, and presentation purposes
· Extract data for analysis using standard NDMC protocols concepts, practices and procedures
· Working in agile and sprint software development environment for fast delivery
· Developing, maintaining and using different version control and project management tools
· Producing software solutions by strictly following general guidelines set in each software development life cycles
· Developing test scripts for new and existing systems, and using different software testing tools
· Developing and using procedures, packages and librarys used in developing an automated data preprocessing algorithms in accomplice with the most standard mechanisms.
· Working with and optimizing different R and Python packages which are commonly used in developing different data preprocessing models, predictive models and model testing programs.
Desired skills and experiences
· Skills in computer programming and familiar with SQL, mySQL, Oracle and in developing code in R, Python, SQL, or other coding language.
· Interest in health data analytics, computation and data science
· Demonstrated self-motivation, ability to absorb detailed information, flexibility, and ability to thrive in a fast-paced, energetic, highly creative and entrepreneurial environment.
· Ability to learn new information quickly and to apply analytic skills to better understand complex information in a systematic way.
· Strong quantitative and computational aptitude.
· Robust problem-solving skills, along with a strong familiarity with data warehousing, data mining and data mapping
· Capable of presenting and interpreting results
· Expertise in core Python and R concepts which includes data structures, OOPs concepts, variables and data types, file handling concept, data pre-processing concepts (data cleaning, missing value handling, feature or predictor selection methods (CoX algorithms, Boruta library) and imbalanced data handling), good grasp of web frameworks, object relational mapping, a better understanding of data science packages (Numpy, Pandas, MatPlotlib, Scikit learn, caret, randomForest and etc), machine learning and AI concepts (data mining, neural network for advanced classifications and etc.), MVC and MVT architectural concepts.
· Concepts in R mice package and method like Multivariate Imputation by Chained Equations in order to perform imputation for handling missing values.
· At least some experience in implementing different techniques of removing irrelevant or redundant features from a dataset and select the important features to feed to machine learning predictive or descriptive modeling techniques. Preferably for Python using models like Recursive Feature Elimination along with a Logistic Regression, and for R “Boruta” package using Random Forest mechanism. Also an understanding in the principles of predictor ranking methods.
· Knowledge in Python or R implementation of different sampling mechanisms which are used in handling imbalanced datasets. Preferably an understanding in SMOTE, ROSE and DMwR R packages. Also some knowledge in testing the accuracy of these implementations for better performance on trained models.
· Experience in developing predictive models for binary response including Logistic Regression, Random Forest, Survival Analysis, Neural Network, Naïve Bayes, Decision Tree, Support Vector Machine and K-Nearest Neighbors algorithms using Python and R packages. Also some knowledge in testing the accuracy of these models for better predicted value by developing accuracy testing mechanism like ROC.
· Some knowledge in tools and packages used in developing a web-scrapping application.
· Familiarity with common existing tools and templates which are used in data extraction.
· BSc. Degree in Computer engineering, software engineering, computer sciences 0 – 4 years’ experience.
· GPA 3.25 and above
· Thesis work/participation in any other project involving Big Data Analytics or data processing and visualizing work.
Let Employers Find You
Upload/Update Your CVFeatured Jobs