-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBSCAN do not free memory #18
Comments
It is possible that didn't handle the reference counts in the Python wrapper correctly. Can you share a small example that I can use to reproduce the issue? This seems to reproduce it: import dbscan
import sys
f = np.random.rand(5000000, 5)
for i in range(100):
q = dbscan.DBSCAN(f)
sys.getrefcount(q)
sys.getrefcount(q[0])
Does the following build fix the issue for you? pip install git+https://github.com/anivegesana/dbscan-python@memleak |
Good day, I'm sorry for late response. I ran your code and I have the same memory leak. Outputs of sys.getrefcount(q), sys.getrefcount(q[0]) are 2 and 3. I also tried pip install git+https://github.com/anivegesana/dbscan-python@memleak and have the same memory leak. |
Hey @ShJacub, I think I need some help reproducing the memory leak on my new branch. This is code snippet that I am currently running and its output. >>> import numpy as np
>>> import dbscan
>>> dbscan.__version__
'0.0.12.dev1+gc993316.d20230420'
>>> x = np.random.rand(5000000, 5)
>>> r = dbscan.DBSCAN(x)
>>> sys.getrefcount(r)
2
>>> sys.getrefcount(r[0])
2
>>> sys.getrefcount(r[1])
2
>>> import weakref
>>> del r
>>> g = weakref.ref(r[0])
>>> g()
array([0, 0, 0, ..., 0, 0, 0], dtype=int32)
>>> gc.collect()
480
>>> g() Thank you for pointing out the memory leak. I need to be a little bit more careful when reading Python C API documentation since information about borrowed and owned references isn't always in an obvious place. 😅 |
Is it possible to solve this problem? |
Yes, it is. Just need some more information. Can you run the code that I shared in the previous comment and share the output? Also, can you share the OS, Python version, and NumPy version that you are using? |
Ubuntu 20.04.6 LTS I ran code placed above. These are outputs:
|
Sorry about the delay. Was a little bit busy at work for a couple of days and didn't have a chance to take a look at this. It seems like the version of the dbscan-python library doesn't match up. You have the production version (0.0.12) and I have the version with the fix (0.0.12.dev1+gc993316.d20230420.) Perhaps it failed to compile on your machine? I will build you a wheel tonight for you to try it out. For some reason, the version name on the wheel is messed up, but it should mostly work fine. curl 'https://drive.google.com/uc?export=download&id=1Wrglr9Xo9dyDiD9ngPLcPjI9sFCiYLIJ' -o "dbscan-0.1.dev90+g6a8f3e3-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl"
pip install "dbscan-0.1.dev90+g6a8f3e3-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl" --force-reinstall --no-deps |
Hello, Please, I have encountered a memory problem but in C++. When I call the DBSCAN function inside a for loop, the memory usage increases and keeps increasing until it crashes
|
I ran into a similar problem in some code I wrote a while back (not in DBSCAN) and here was its solution: seung-lab/mapbuffer@3746bd8 I suspect the issue has something to do with not releasing Xobj or X. You may want to call: |
Oh yes. You are absolutely right! |
Any updates on this? It increasingly takes up more memory in python when run continuously |
Update: This is what worked for me: Since I just needed labels and not the core mask I have changed the return statement and I haven't seen any memory leak after this for my use case: Instead of returning: I am returning: After making these changes I did pip install -e . in the repo |
Hey Yuki
Thanks a lot for this. Is it possible to publish a new pip package with the
fixes? That would be very useful.
Gunjan
…On Wed, 19 Jun 2024 at 10:36, yuki-inaho ***@***.***> wrote:
JFYI, I have addressed the memory leak issue by implementing the following
changes. You can see the commits that were made to resolve the issue in the
forked repository:
Commit 1
<yuki-inaho@1bb13f3>
Commit 2
<yuki-inaho@748040c>
We confirmed the improvement by running tests outlined in the following
notebook:
Memory Leak Check Notebook
<https://github.com/yuki-inaho/dbscan_comparison/blob/main/check_memory_leaks.ipynb>
—
Reply to this email directly, view it on GitHub
<#18 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACOFE36LNUKB4X5DLHXQPODZIEGUPAVCNFSM6AAAAABAJ7AECGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZXG42TENJWHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thank for your fast DBSCAN realization. I have a problem. Calling dbscan.DBSCAN(x) consums additional memory. If I call dbscan.DBSCAN(x) n time consums n*V memory, where V is memory for one dbscan.DBSCAN(x) calling.
The text was updated successfully, but these errors were encountered: