Wv scraper #496

Ash1R · 2022-10-28T05:43:11Z

This is for issue #375, for West Virginia.
It was a large pdf on their workforce site pdfplumber extracted the tables pretty well, although there were some irregularities in the pdf (for example, some tables had boxes that had the specific sites of the layoffs).
There are a couple of errors on their end (switching up values), but nothing too significant.

warn/scrapers/mi.py

palewire · 2022-12-05T18:40:18Z

@Ash1R. I am still seeing MI being deleted from the repo, I think. Do you see this same thing on the files tab?

https://github.com/biglocalnews/warn-scraper/pull/496/files

warn/scrapers/wv.py

palewire · 2023-01-29T13:20:22Z

warn/scrapers/wv.py

+                companydone = False
+                row = []
+                for k in range(len(data)):
+                    if data[k][0] is not None:


Why is it necessary to the range in the loop here? Can you not simple do something more like for row in data?

Each company's data is contained in two consecutive rows, with some blank rows in between these company row-pairs. Alternative company names and addresses are stored on the second row. I used range so I can access the second row using an index of k + 1. I did unnecessarily use range later, so I removed that.

stucka · 2023-08-21T00:39:24Z

Triggering tests by closing and reopening.

stucka · 2023-08-21T00:50:28Z

mypy is flagging some type errors:
warn/scrapers/wv.py:65: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:66: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:68: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:72: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:74: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/wv.py:75: error: Item "None" of "Optional[str]" has no attribute "strip" [union-attr]
warn/scrapers/la.py:170: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs [annotation-unchecked]

stucka · 2023-08-21T00:58:52Z

@Ash1R , I think I see maybe an easy way to work around the mypy type conflict and also maybe make this a bit more readable, something like:

if not data[k][0]:
     rowkey = None
else:
    rowkey = data[k][0].strip()

Then start folding in those changes into the flagged rows, like if rowkey in in header_whitelist: and then keep working down to down. Last bit might be more readable as elif ((not rowkey) and (k != 0)) ... ?

… version

stucka · 2023-09-22T17:23:39Z

The landing page perhaps has been killed off. This is the closest I could find, and I can't guarantee it'd be updated in the same way notice after notice. https://workforcewv.org/about-us/

I have not tried seeing if this scraper works with that PDF.

Ash1R added 3 commits August 5, 2022 10:46

fixed issue biglocalnews#469 . First commit to this project

42a4e08

added a scraper for wv for issue biglocalnews#375

b83733b

Merged in main

3d8312d

palewire requested changes Nov 7, 2022

View reviewed changes

warn/scrapers/mi.py Outdated Show resolved Hide resolved

warn/scrapers/mi.py Outdated Show resolved Hide resolved

warn/scrapers/mi.py Outdated Show resolved Hide resolved

warn/scrapers/mi.py Outdated Show resolved Hide resolved

removing michigan scraper as per request

568ca53

Merged in main

6e814da

palewire requested changes Jan 4, 2023

View reviewed changes

warn/scrapers/wv.py Outdated Show resolved Hide resolved

Ash1R added 2 commits January 8, 2023 01:25

Merged in main

9faca27

removed some redundancy

17b8455

palewire requested changes Jan 29, 2023

View reviewed changes

remove an unncessary range()

de7ea50

stucka closed this Aug 21, 2023

stucka reopened this Aug 21, 2023

stucka added 3 commits August 20, 2023 21:05

Try to patch mypy errors

18182b5

Editing on Github may have been a bad idea

280cb56

Editing in Github was a terrible idea. Restoring back to Ashir's last…

920d488

… version

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wv scraper #496

Wv scraper #496

Ash1R commented Oct 28, 2022

palewire commented Dec 5, 2022

palewire Jan 29, 2023

Ash1R Feb 10, 2023

stucka commented Aug 21, 2023

stucka commented Aug 21, 2023

stucka commented Aug 21, 2023 •

edited

Loading

stucka commented Sep 22, 2023

Wv scraper #496

Are you sure you want to change the base?

Wv scraper #496

Conversation

Ash1R commented Oct 28, 2022

palewire commented Dec 5, 2022

palewire Jan 29, 2023

Choose a reason for hiding this comment

Ash1R Feb 10, 2023

Choose a reason for hiding this comment

stucka commented Aug 21, 2023

stucka commented Aug 21, 2023

stucka commented Aug 21, 2023 • edited Loading

stucka commented Sep 22, 2023

stucka commented Aug 21, 2023 •

edited

Loading