Data Analysis - Clustering

Project # 116026

7 Bids	budget Up to 2,500 ILS	bidding ends in mins	bid range 90 ILS - 300 ILS / hour	average bid 162 ILS / hour

budget

Up to 2,500 ILS

bidding ends in

mins

bid range

90 ILS - 300 ILS / hour

average bid

162 ILS / hour

Email Report

Posted: 14:35, 24 Jun., 2017

Ends: 16:00, 3 Aug., 2017

Data Analysis - Clustering

I'm interested in clustering movies.
I would like the clusters to indicate which movies “go together” in people’s preferences.

To do this, I want to implement the PIVOT algorithm:
https://en.wikipedia.org/wiki/Correlation_clustering

This involves reading the data set of movies, finding for each pair of movies whether they should be marked + or −, and implementing the pivot algorithm using these marks as input.

After that, I want to build an algorithm that improves the results of the PIVOT algorithm, using the probabilities of each pair of movies. (probability of a random person to watch one or both of them)
The improvement can be a totally different algorithm, or it can take the output of the PIVOT
algorithm and improve it, or it can change its input, or anything else you think might be helpful.

The data set can be downloaded from here: https://grouplens.org/datasets/
movielens/1m/, it is the Movielens 1M dataset.

All the details are explained in the attached file.