-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustering problems #25
Comments
@silvia1993 : I have similar question, indeed. I search for a better architecture to use DCC losses because all the datasets (MNIST, YTF, Coil100, and YaleB) are toy datasets, and the current fully connected or convolutional architectures will not be enough to use 227x227 RGB images. @shahsohil : Do you have any recommendations to try on ImageNet-like images? Did you experiment on them? |
@sumeromer, @silvia1993 I had another problem, with some similarity to yours, I always got a very dominant cluster of all, with the most of the data, and a lot singleton or just few examples cluster, did it happen to you ? |
It also happened to me, it's kind of like overfitting to all data points and clustering them all in a single group. Don't know if that makes sense as overfitting is mostly used in supervised algorithms. |
Hello,
thank you very much for sharing your project!
I'm trying to apply this algorithm on a set of RGB images (cartoons), in particular I have 2344 samples with dimension [227,227,3] composed by 7 classes. The algorithm is not able to correctly cluster the images, at the end I have ~ 0.2 ACC with 1220 clusters. I read carefully all the issues solved in this repository but I cannot solve my problem so I list each step that I did to have a feedback about a possibile mistake:
I made my dataset using the file "make_data.py" using normalization [-1,1]. At the end I have testdata.mat and traindata.mat. Each row in this matrices is composed by the concatenation of the three channels, so I have [R,G,B] -> [51529,51529,51529] (51529=227x227). Considering together testdata.mat and traindata.mat I have a matrix 2344x154587.
Next I run the "pretraining.py" file using --batch_size=256, --niter=1831 (in order to have 200 epochs as suggested), --step=733 (to have 80 epochs as suggested) --lr=0.01 (since the dimension of the data samples is higher than the other datasets used with this framework I though that this could be a good choice for mine), --dim=10.
With the file checkpoint_4.pth.tar obtained after 2 I extract the features of the dataset obtaining "pretrained.pkl".
I construct the graph with the original data using "edge_construction.py" with --algo knn, --k 10, --samples 2344 and I get "pretrained.mat" file.
After I launch "copyGraph.py" to the final "pretrained.mat" file.
Finally I use "DCC.py" leaving all the default values.
I tried also to use an higher k (k=20) and mknn instead of knn but the things seems not change.
Do you have any idea about the reason why the algorithm not work properly with my data?
The text was updated successfully, but these errors were encountered: