Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config file and logic #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion index.mjs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import fetch from 'node-fetch'
import { load } from 'cheerio'
import links from './links.json' assert { type: 'json' }
Copy link
Owner

@styfle styfle Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize this feature is already available

Although it prints a warning every time https://nodejs.org/api/esm.html#json-modules

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just call it config.mjs to avoid the warning. The user can't even configure this as its written.


export async function check(url = new URL('http://example.com'), referer, seen = new Set(), depth = 10) {
url.hash = ''; // since hash is client-side only, we remove it in order to avoid duplicate requests
Expand All @@ -8,6 +9,26 @@ export async function check(url = new URL('http://example.com'), referer, seen =
}
seen.add(url.href);
const res = await fetch(url, { headers: { 'user-agent': 'npm:links-awakening' } }).catch(e => e);
const html = res.ok ? await res.text() : null;

// deal with config cases
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// deal with config cases

for (const link of links.config) {
if (url.href.includes(link.url)) { // TODO: this check needs precision in url comparison (www vs no www), etc
Copy link
Owner

@styfle styfle Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, we need startsWith. But we probably need something like this new feature in the future: https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API

if (res.status !== link.status) {
console.log(`🟨 ${url.href} differs from the expected value in the config (status: ${res.status}, referer: ${referer})`);
return;
} else if (link.status === 200) {
if (html.includes(link.body)) {
console.log(`🟨 ${url.href} final content is no longer available (status: ${res.status}, referer: ${referer})`);
return;
}
} else {
console.log(`🟨 ${url.href} is preset to have status: ${res.status}`);
return;
}
}
}

if (res.ok) {
console.log(`✅ ${url.href}`);
} else if (res.status) {
Expand All @@ -20,7 +41,7 @@ export async function check(url = new URL('http://example.com'), referer, seen =
if (depth === 0) {
return;
}
const html = await res.text();

const $ = load(html);
const hrefs = $('a[href]').map((_, el) => el.attribs.href).get();
for (const href of hrefs) {
Expand Down
18 changes: 18 additions & 0 deletions links.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"config": [
{
"url": "https://www.youtube.com",
"body": "This video isn't available",
"status": 200
},
{
"url": "https://www.twitter.com",
"body": "this page doesn't exist",
"status": 200
},
{
"url": "https://vercel.com/docs",
"status": 302
}
]
}