Option to send either raw HTML or body.innerText (default) #56

eklem · 2019-08-22T19:55:26Z

Up for debate. Today, just the text from body is grabbed, but having a better extraction process for this in the back end (browser that too, but more code space than a bookmarklet).

Then it can be up to the receiving end to do extraction of text. Daq-proc could have a cheerio processor included.

eklem · 2019-08-23T08:25:50Z

Make it a switch so the user can choose. Not sure how to do this without doing code for all the endpoint services.

eklem · 2019-08-25T16:27:59Z

To not break everything, make body.innerText default, so re-added bookmarklets behave the same.

eklem · 2019-09-18T08:18:42Z

This is what I'll try:

When reading
Check if rawHTML key exists. If yes, check if true (get raw HTML) / false (get body.innerText). If it doesn't exist / is not set, get body.innerText. This way, old bookmarklets indexedDBs will work with new code.

When writing / creating
rawHTML defaults to true. This will be the most common case for a search engine, document processing when the search engine reads the data from where it's stored. At this point it's easier to create elaborate document processors than in the bookmarklets code.

eklem · 2020-07-31T12:18:49Z

Whis goes really well with daq-proc, since cheerio is now a part of it. Will be able to have lots more logic in how to extract content from a page.

eklem self-assigned this Aug 22, 2019

eklem added the enhancement label Aug 22, 2019

eklem changed the title ~~Send body HTML instead of body.innerText~~ Send body HTML instead of body.innerText? Aug 22, 2019

eklem changed the title ~~Send body HTML instead of body.innerText?~~ Option to send either full HTML or body.innerText Aug 25, 2019

eklem changed the title ~~Option to send either full HTML or body.innerText~~ Option to send either raw HTML or body.innerText (default) Aug 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to send either raw HTML or body.innerText (default) #56

Option to send either raw HTML or body.innerText (default) #56

eklem commented Aug 22, 2019 •

edited

Loading

eklem commented Aug 23, 2019

eklem commented Aug 25, 2019

eklem commented Sep 18, 2019

eklem commented Jul 31, 2020

Option to send either raw HTML or body.innerText (default) #56

Option to send either raw HTML or body.innerText (default) #56

Comments

eklem commented Aug 22, 2019 • edited Loading

eklem commented Aug 23, 2019

eklem commented Aug 25, 2019

eklem commented Sep 18, 2019

eklem commented Jul 31, 2020

eklem commented Aug 22, 2019 •

edited

Loading