Sameer Enjapuri
I'm a Data
I am a detail-oriented and results-driven professional with 5 years of expertise in Data Engineering and Analytics. I am proficient in designing, building, and optimizing data pipelines for efficient extraction, transformation, and loading (ETL) processes. I excel in interpersonal skills and am comfortable working with stakeholders across disciplines, propelled by an unwavering dedication to seamless data integration.
Certifications
Skills & Abilities
Work Experience
Graduate Research Assistant
Contributed to the Tampa Bay E-Insights report, which examines the economic performance of the Tampa Bay region relative to 19 comparable Metropolitan Statistical Areas (MSAs). Developed comprehensive Tableau reports that analyze key areas such as economic outcomes, affordability, and the talent pipeline. These efforts provided critical insights for policy recommendations.
Data Engineer/Analyst
Analyzed data streams, reducing processing time by 40%. Engineered real-time data integration solutions with Confluent Kafka, improving streaming efficiency by 25%. Designed interactive Tableau dashboards for predictive analytics, enhancing decision-making by 30%.
Data Engineer/Analyst
Engineered and implemented a real-time data integration solution using Confluent Kafka, resulting in a 25% increase in data streaming efficiency and enabling continuous data streaming and analytics. This Change Data Capture (CDC) system facilitated timely insights into market trends and customer behaviors.
Data Engineer/Analyst
Managed ETL pipelines in Informatica PowerCenter, achieving a 95% success rate in transitioning workflows to Kafka. Optimized database performance by 15% using advanced data modeling techniques. Applied business intelligence principles for accurate reporting and insights.
Projects
Realtime Data Streaming Pipeline
Developed an end-to-end Data Engineering project using Spark, Kafka, Airflow, Docker, Cassandra, and Python. The project involved fetching data from an API via scheduled scripts, sending it to Kafka using Airflow, processing it with Spark Structured Streaming, and storing it in Cassandra. All components were containerized with Docker for seamless integration and scalability, resulting in a 30% reduction in data processing time.
Data Engineering YouTube Analysis
Designed a scalable data pipeline using AWS Glue ETL to transform over 100k records, securely stored in an S3 bucket. Employed Lambda functions to clean and preprocess JSON data, converting it into optimized Parquet format, and automated the process with a file load trigger, reducing manual work by 10%. Used QuickSight to analyze engagement metrics, identifying top-performing videos and enhancing marketing strategies.
Movie Shows Clustering and Recommendation System
Conducted exploratory data analysis on Netflix content, revealing key trends such as the dominance of movies over TV shows and the significant increase in content produced in the United States over time. Developed and optimized clustering algorithms (K-Means, Agglomerative) and a content-based recommender system using cosine similarity, providing 10 personalized show recommendations based on user preferences.
Get in Touch
My Address
14308 Wedgewood Ct
Tampa, FL 33613
enjapurisameer@gmail.com
senjapuri@usf.edu
Contact
+1 813-389-8792