Key Skills
Spark Core/Streaming with Any programming language Scala, Good Understanding of Data Structures, Algorithms, Data Transformation, Data Ingestion, Optimization mechanism/techniques, Good understanding of Big Data (Hadoop, MapReduce, Kafka, Cassandra) Technologies, Strong in Logical and problem solving skills.
Responsibilities
Create and maintain Scala/Spark jobs for data transformation and aggregation, ranging from simple to complex transformations involving structured and unstructured data.
Produce comprehensive unit tests for Spark transformations, helper functions, and performance optimization methods.
Develop robust data processing pipelines and architect data storage and management systems.
Define scalable calculation logic for both interactive and batch use cases.
Collaborate closely with infrastructure and data teams to perform complex data analysis.
Work within a unique and challenging big data ecosystem, emphasizing storage efficiency, data security, privacy, query scalability and performance, expandability, and flexibility.
Contribute to building a big data platform capable of processing and managing exabytes of data, ensuring efficient access to the data.
Requirements
Minimum 5 years of professional experience in Big Data platforms and building large-scale distributed systems with high availability.
Proficiency in developing Spark Applications using Spark RDD API, Spark-SQL, Spark GraphX API, Spark Streaming API, Spark MLlib API, and Data frames APIs.
Comprehensive understanding of Spark Advantages, Spark Workflows, writing Spark Jobs, Spark query tuning, and performance optimization.
Strong foundation in Data Structures, Algorithms, and extensive hands-on experience with the Scala programming language. Demonstrated investigative and problem-solving skills.
Expertise in Data Ingestion, Optimization Techniques, and designing/developing data transformation and aggregation pipelines.
Experience working with leading-edge Big Data storage systems and technologies such as Hadoop, HDFS, AWS S3, AWS Lambda, Storm/Heron, Cassandra, Apache Kafka, Solr/ElasticSearch, MongoDB, DynamoDB, Postgres, and/or MySQL.
Apply