-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Having difficulties to run the Docker Service Mode #8
Comments
Hi @Gi7w0rm Thanks for reaching out to us. I verified the issue and you are absolutely correct. There is indeed an problem with the service mode. The issue is in connection with the Redpanda installation as it seems. Here is the reference to the code where the urls are consumed: Here is the reference to the code where the consumer is configured As it turns out the timeout of the Redpanda consumer does not work as expected anymore. After trying various Redpanda versions we decided to replace the Redpanda installation with a Kafka & Zookeeper installation. Now the service mode runs Kafka & Zookeeper in a docker container similar to the architecture before. Note that the input modules are configured to download new urls every 5 minutes. Therefore the input module will wait 5 minutes after start and then request the urls for the first time. This was implemented to not overload the used APIs. Referenced in the code: Also the scanned urls are stored in batches. You can configure the batch size in the config.yml file. Due to those implementations it can take up to 10 minutes until you will receive results in the dashboard. If you want to verify if the crawler is running befre urls can be seen you can check the logfiles under Let us know if that works for you. Regarding the errors you noticed at startup. There seems to be a problem with the Clamav installation. I would suggest to rebuild the installation, which must be done due to the Kafka changes anyway, and watch out for exceptions during the Clamav installation. The Dockerfile should install all needed Clamav packages and run freshclam as well.
Also let me know if this works or if you can share logs which show the installation process of the docker image. Best regards |
Hey Patrick, On this note I would like to advice you that the docker file is missing the commands:
I am not sure if the way docker works makes these obsolete, but on a normal ubuntu system the update command errors out with After deleting the line: However, trying to run subcrawl service mode puts some errors into the logs at
See output of all 3 error logs here: I have also uploaded you a log which shows the whole stdout output of my subcrawler service mode from start to the first registered hosts to crawl: Another issue I see is that the service mode jumps from 0 to
in a second. As the dashboard does not seem to update with every scanned url but only after a successfull run and the logs are not logging any process in the default mode, I think it would be nice to have some kind of progress logging in the default If you need any further info, feel free to ask, I will provide what I can to help. Best regards and wishing you a nice start of the week |
Hi Chris I'll check the suggestion with clamav. You still seem to get the clamav errors during the build process.
Maybe you could add those commands to the Dockerfile and let me know if you still receive the clamav errors during the build process. I'm not sure why I don't get such errors. I'll try to reproduce them on my system as well. The errors in The jump from 0 to By default you get the information which modules are loaded, how many URL's are being scanned and when the data is stored in the database. As this service mode should be running in the background and you most likely shouldn't care about the log output but only the dashboard and the web application, I don't want to blow up the logfile. To see more information you can change the loglevel in the configuration file as well. Best regards |
Hey @stoerchl To make it short: Systemd and all related commands are not enabled inside docker environments. As such, adding these commands will make the docker --build fail. I have to admit I am rather surprised myself as to why you are not seeing these issues and I have them appearing every time. Especially as I am very close to running a Vanilla Ubuntu here. As to the Errors in Thanks for pointing out about batch sizes and the dashboard again :) Best regards |
Hey @jstrosch,
It's me again.
As I told you some days ago, I am having trouble getting the Service Mode of subcrawl to run.
I am still using the following system:
which in turn is run in VMware Workstation 16 Pro
As suggested in the Readme.md I am trying the following to get it running:
Once starting up, I can actually access the Dashboard at
127.0.0.1:8000 as expected.
However, doesn't matter how long I wait, no Domains get scanned, nothing changes in the Dashboard at all.
I am seeing following output in my terminal:
For me it kinda looks like I am indead receiving and logging data.
However, it's not getting where it should be ?
(Keep in mind, we had to change the config.py to scan other domains, could this issue also apply here ?)
Also, I am observing some errors at startup:
To be honest, this doesn't look good, but it seems not to be stopping the docker-compose up --build command.
I hope we can find a solution here, as I would love to test out an import module I wrote for subcrawler.
Best regards
Chris
The text was updated successfully, but these errors were encountered: