Photo Calendar Recommendation System

Mar 1, 2016

Data Extraction Pipeline Architecture

Project Overview (March 2016 – December 2016)

At Know Center Research, I developed a machine learning system for a digital photo printing company. The system intelligently assigned user-uploaded images to appropriate calendar months, significantly enhancing the user experience by automating photo selection.

Quick Highlights:

End-to-end ML pipeline analyzing image metadata
Technologies: Java, Spring Boot, RabbitMQ, Apache Solr, PySpark, Weka
Achieved substantial accuracy improvements over manual/random assignments
Scalable solution suitable for production workloads

Technical Challenges

Key challenges addressed included:

Designing a scalable microservices architecture for image processing
Extracting and analyzing relevant image metadata efficiently
Developing accurate predictive models for assigning images to calendar months
Establishing a robust, scalable data pipeline

Technologies & Methods

The solution leveraged various technologies and approaches:

Architecture: Microservices using Java 8, Spring Boot, Swagger, and RabbitMQ
Data Storage: Apache Solr for efficient metadata indexing and retrieval
Machine Learning: Weka (J48, Random Forest, Naive Bayes classifiers)
Data Processing: PySpark, HDFS, Hadoop, and Python for large-scale analysis and visualization
CI/CD: Jenkins, Maven, Git, JUnit, and integration testing for quality assurance
Development: IntelliJ IDEA, Agile methodology with Scrum

Results & Impact

Significantly improved accuracy in month assignment compared to manual/random assignments
Robust handling of images with incomplete metadata
Enhanced user experience through streamlined calendar creation
Delivered a scalable, reliable system capable of production deployment

Personal Contribution

As the primary developer on this project, I:

Designed and implemented the data extraction pipeline
Conducted feature engineering and classifier evaluations using Weka
Performed large-scale data analysis using PySpark and Hadoop
Developed scalable microservices architecture adhering to industry best practices
Created comprehensive test suites (JUnit, integration tests)
Authored detailed project documentation

Tomislav Đuričić

Researcher / Machine Learning Engineer / Software Engineer

My research interests include social-based recommender systems, graph neural networks and user modeling.