Highly effective Data Scientist/Data Engiener with over 25 years of experience. I have extensive Spark experience both in PySpark and Scala also in MLLib.
TECHNICAL SKILLS
• Big Data (Hortonworks and Cloudera) – Spark(PySpark, Scala), Kafka, Hive,Impala, NiFi, HDFS, Sqoop, Ranger, Yarn, Solr, SAM, Schema Registry, SuperSet
• Language: Python, Scala, R, JavaScript
• Data Visualization – Tableau, PowerBI, OBIEE, DOMO
• Plunk & ELK Stack – ElasticSearch, Logstash, Filebeat, Kibana
• AWS – S3, RefShift, DynamoDB, Athena, Kinesis, EMR, Aurora, Glue
• Azure – Data Warehouse, Polybase, SQL Server, HDInsight, SSIS, SSAS
• ETL – Informatica Power Centre, Informatica Big Data Management (BDM), SSIS
• Data Science & Engineering – R / RStudio / SparkR Packages: dplyr, ggplot2, stringr, plyr, carrat, SparkR, NLP, tibble, TensorFlow, curl, Python / PySpark, MLLib
• Libraries: MLLib, NumPy, SciPy, Pandas, Matplotlib, Seaborn, SciKit-Learn
• DBMS / OLAP – Oracle, SQL Server, TeraData, MySQL, Postgres, Essbase, SSAS
• Data Modelling – Kimball, Vault
• Machine Learning / Statistical Modelling - Linear Regression, ?Logistic Regression, Classification and Regression Trees, Naive Bayes, ?K-Nearest Neighbors