· Developing complex SQL scripts for data analysis and extraction, developing and maintaining programs as required for the ETL process
· Design and implement distributed data processing pipelines using Spark, Hive, Sqoop, Python, and other tools and languages prevalent in the Hadoop ecosystem. Ability to design and implement an end-to-end solutions.
· Build utilities, user-defined functions, and frameworks to better enable data flow patterns.
· Research, evaluate, and utilize new technologies/tools/frameworks centered around Hadoop and other elements in the Big Data space.
· Define and build data acquisitions and consumption strategies
· Build and incorporate automated unit tests, participate in integration testing efforts.
· Work with teams to resolving operational & performance issues
· Work with architecture/engineering leads and other teams to ensure quality solutions are implements and engineering best practices are defined and adhered to.
· Assists in the development and training of the IT department.
· Requires a four-year degree in Computer Science/Information Technology, Computer/Electrical Engineering or related discipline
· Hands-on Experience with big data tools like Hadoop, Spark, Kafka, Hive, Sqoop, etc.
· Expert in at least one of the programming languages, such as Python, Java, Scala. Python is preferred.
· Experience with Shell Scripts
· Expert in SQL, such as nested queries, stored procedures, data modeling
· MySQL/Oracle and/or NoSQL experience with the ability to develop, tune and debug complex SQL/NoSQL applications
· Experience with different data store such as HBase, Cassandra, MongoDB, Neo4j, GraphQL
· Hands-on experience with Data pipeline and ELK
· Understanding of data pipeline deployment either on the Cloud or on-premise
· Good understanding of data streaming tools like Kafka or RabbitMQ
· Strong written and verbal communication skills
· Ability to work both independently and as part of a team
· Solid experiences with Spark including different Spark API, Spark SQL, and Spark Streaming; OR
· Hands-on experience with Spark Python API, Spark Java API or Spark Scala API, and configure the Spark Jobs; OR
· Solid experiences with Hive including HUE, Joins, Partitions and Buckets.
· Familiar with Cloud Technologies preferred, such as AWS S3, AWS RedShift, AWS EMR, AWS RDS, or similar.
· Hands-on experience with creating a dashboard using Tableau, Spark, or PowerBI preferred