Dynamic CNN Models For Fashion Recommendation

Fashion in the age of Instagram is changing the way in which clothing items are presented and even the way they are designed [1]. Millions of users post photos of their ”outfit of the day – #ootd”, receive questions from other users about the outfits, and market the fashion brands they wear. A fashion look in Instagram can attract many customers to the brands that are available in the post [2]. Sponsors invest in the details of outfits of their fashion influencers not just to impress the followers, but as an investment to attract more customers to purchase the outfits. With the appropriate hashtags, fashion retailers increase their findability and incorporate more followers into their business. Instagram is a great platform to analyze lifestyle trends. Its powerful analytics system uses advanced AI and Big Data technologies to process the massive amounts of data created by users, along with their search preferences, in order to feed their ads systems [3]. This in turn yields great returns to brands who are actively looking to be found by users with certain interests. http://www.aleeshainstitute.com/ As the quantity and types of data increase, this personalization task becomes more complicated. Hence, a more accurate detection of users’ favorite styles and brands can open up novel possibilities for enhancing online shopping recommendations and advertisements in Instagram. Although characterized as an image-sharing platform, Instagram also hosts large volumes of unstructured usergenerated text. Specifically, an Instagram post can be associated with an image caption written by the author of the post, by comments written by other users, and by hashtags in the image that refer to other users or brands. In the context of fashion recommendation, we performed a case study on Instagram posts in the fashion community and found that clues about the image contents can be found in the associated text. For instance, the author of the post might describe the contents of the image, or users might comment about details in the image.

A. Deep Learning For Large-Scale Multi-Labeled Image Classification With the continuous increase in the amounts of visual and textual data, the need for scalable methods of classification in machine learning is becoming increasingly important. Many approaches which have been proposed for this purpose depend on label space compression, such as: [8] and [9]. The main idea in such approaches is performing predictions on a smaller dimensional label space and then recovering the labels to their original higher dimension. A recent example of a label compression solution in a generic empirical risk minimization (ERM) framework is [9]. Other approaches depend on feature space reduction such as: [10], where they apply Canonical Correlation Analysis for dimensionality reduction. Focusing on deep neural networks multi-label classification, [11] and [12] are examples of work that improved pairwise ranking for the purpose of multi-label image classifications, where the objective is to maximize ranks of positive labels. An example of scalable neural network design is proposed in [13] for the purpose of achieving the quick classification of high-dimensional and large data. The classification speed in their model is achieved by adding an extra layer for clustering learned features into large clusters and activating only one or a few clusters.

B. Pruning Convolutional Neural Networks For Efficient Resource Usage Generally, pruning algorithms can be classified into two groups: a group that calculates the sensitivity of network errors with respect to the removal of certain parameters or feature maps, and another group that modifies the error function in a way that rewards the network for choosing an efficient combination of parameters [14]. Practically, pruning is done in an iterative way, where the weights/feature maps are ranked according to how much they contribute, and the low ranking ones are removed. Then, the network is fine-tuned and based on the network loss, this process can be repeated again [6]. One challenge in pruning methods is that they do not necessarily detect correlated elements. Thus, the criterion of choosing neuron-ranking methods and the speed of pruning are essential, as they might result in a big drop in accuracy, and a longer training time of the network to recover. The ranking of neurons can be done according to the L1/L2 regularization mean of neuron weights, their mean activations, the number of times a neuron was not zero, and some other methods. Different strategies for pruning weights were proposed in [15], and for pruning entire filters (feature maps) in [16]. Usually, ranking methods are to prune filters, and then observe how the cost function changes when running on the training set. But, in our approach, we dynamically choose the parameter connections based on the identified category from text which is our criteria to decide the importance of connections.

We propose for two novel techniques to incorporate social media textual content to support the visual classification in a dynamic way. Adaptive neural pruning and Dynamic Layers models are dynamic frameworks in which multiple classification layers exist and connections are activated dynamically based upon the mined text from the image. Our experiments show the improvments achieved by our models DynamicPruning and Dynamiclayers on the average precision and recall of image classification. The network size might be a factor that affects DynamicLayers algorithm. We have noticed that DynamicLayers and DynamicPruning algorithms behaved similarily for VGG16, but DynamicLayers behaved better for ResNet archiecture. The average validation time per batch was higher for DynamicLayers and DynamicPruning when compared to the base architectures. We assume that this is related to the processing that happens to choose the dynamic ranges. However, we plan to analyze the network’s behavior for more than 20 epochs to see the effect of training time on reducing the validation time. The network might need more time to adapt to the patterns of data and changes in prediction scopes. The same explanation applies for the average loss. Interestingly, average loss was less for Resnet architectures when compared to VGG16.

Dynamic CNN Models For Fashion Recommendation

Dynamic CNN Models For Fashion Recommendation

Recent Posts

Recent Comments

Archives

Categories

Meta