Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to send either raw HTML or body.innerText (default) #56

Open
eklem opened this issue Aug 22, 2019 · 4 comments
Open

Option to send either raw HTML or body.innerText (default) #56

eklem opened this issue Aug 22, 2019 · 4 comments
Assignees

Comments

@eklem
Copy link
Owner

eklem commented Aug 22, 2019

Up for debate. Today, just the text from body is grabbed, but having a better extraction process for this in the back end (browser that too, but more code space than a bookmarklet).

Then it can be up to the receiving end to do extraction of text. Daq-proc could have a cheerio processor included.

@eklem eklem self-assigned this Aug 22, 2019
@eklem eklem changed the title Send body HTML instead of body.innerText Send body HTML instead of body.innerText? Aug 22, 2019
@eklem
Copy link
Owner Author

eklem commented Aug 23, 2019

Make it a switch so the user can choose. Not sure how to do this without doing code for all the endpoint services.

@eklem
Copy link
Owner Author

eklem commented Aug 25, 2019

To not break everything, make body.innerText default, so re-added bookmarklets behave the same.

@eklem eklem changed the title Send body HTML instead of body.innerText? Option to send either full HTML or body.innerText Aug 25, 2019
@eklem eklem changed the title Option to send either full HTML or body.innerText Option to send either raw HTML or body.innerText (default) Aug 25, 2019
@eklem
Copy link
Owner Author

eklem commented Sep 18, 2019

This is what I'll try:

When reading
Check if rawHTML key exists. If yes, check if true (get raw HTML) / false (get body.innerText). If it doesn't exist / is not set, get body.innerText. This way, old bookmarklets indexedDBs will work with new code.

When writing / creating
rawHTML defaults to true. This will be the most common case for a search engine, document processing when the search engine reads the data from where it's stored. At this point it's easier to create elaborate document processors than in the bookmarklets code.

@eklem
Copy link
Owner Author

eklem commented Jul 31, 2020

Whis goes really well with daq-proc, since cheerio is now a part of it. Will be able to have lots more logic in how to extract content from a page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant