-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wv scraper #496
base: main
Are you sure you want to change the base?
Wv scraper #496
Conversation
@Ash1R. I am still seeing MI being deleted from the repo, I think. Do you see this same thing on the files tab? |
companydone = False | ||
row = [] | ||
for k in range(len(data)): | ||
if data[k][0] is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it necessary to the range
in the loop here? Can you not simple do something more like for row in data
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each company's data is contained in two consecutive rows, with some blank rows in between these company row-pairs. Alternative company names and addresses are stored on the second row. I used range so I can access the second row using an index of k + 1. I did unnecessarily use range later, so I removed that.
Triggering tests by closing and reopening. |
mypy is flagging some type errors: |
@Ash1R , I think I see maybe an easy way to work around the mypy type conflict and also maybe make this a bit more readable, something like:
Then start folding in those changes into the flagged rows, like |
The landing page perhaps has been killed off. This is the closest I could find, and I can't guarantee it'd be updated in the same way notice after notice. https://workforcewv.org/about-us/ I have not tried seeing if this scraper works with that PDF. |
This is for issue #375, for West Virginia.
It was a large pdf on their workforce site pdfplumber extracted the tables pretty well, although there were some irregularities in the pdf (for example, some tables had boxes that had the specific sites of the layoffs).
There are a couple of errors on their end (switching up values), but nothing too significant.