elron
July 6, 2022, 8:48am
1
site: mevorach
Problem:
Trying to use Headless Chrome to get a website’s title and return it as json.
The code works locally but when I deploy it, I get an error:
Jul 6, 10:50:51 AM: 11918208 INFO error [Error: ENOENT: no such file or directory, open '/var/task/netlify/bin/chromium.br'] {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: '/var/task/netlify/bin/chromium.br'
}
Function link:
https://mevorach.netlify.app/.netlify/functions/get-title
Function log:
10:50:51 AM 11918208 INFO spawning chrome headless
10:50:51 AM 11918208 INFO error [Error: ENOENT: no such file or directory, open '/var/task/netlify/bin/chromium.br'] {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: '/var/task/netlify/bin/chromium.br'
}
10:50:51 AM 11918208 Duration: 11.16 ms Memory Usage: 65 MB Init Duration: 269.48 ms
Original code taken from here
Another code that influenced me
my function code:
import chromium from 'chrome-aws-lambda';
import puppeteer from 'puppeteer-core';
exports.handler = async (event, context, callback) => {
let theTitle = null;
let browser = null;
console.log('spawning chrome headless');
try {
const executablePath = await chromium.executablePath;
// setup
browser = await puppeteer.launch({
args: chromium.args,
executablePath: executablePath,
headless: chromium.headless
});
// Do stuff with headless chrome
const page = await browser.newPage();
const targetUrl = 'https://neegzar.com';
// Goto page and then do stuff
await page.goto(targetUrl, {
waitUntil: ['domcontentloaded', 'networkidle0']
});
await page.waitForSelector('#phenomic');
theTitle = await page.title();
console.log('done on page', theTitle);
} catch (error) {
console.log('error', error);
return callback(null, {
statusCode: 500,
body: JSON.stringify({
error: error
})
});
} finally {
// close browser
if (browser !== null) {
await browser.close();
}
}
return callback(null, {
statusCode: 200,
body: JSON.stringify({
title: theTitle
})
});
};
Instead of using Chrome in Lambda and making the function extensively bloated, isn’t it better to accomplish this with a simple fetch
and regex
?
Just wanting to know what exactly is the requirement before suggesting the best path. I’ve seen Chrome grow out of lambda size limits, so it won’t be long until you can no longer update your function, which is why I wonder if you can somehow skip it altogether.
elron
July 8, 2022, 2:21pm
3
Right, I didn’t explain my intention.
The intention is to screencast (video) a website and turn it into MP4, and save it to AWS S3.
The code above is just a demonstration of using Chrome in Netlify functions.
Great. Thanks for that.
Based on the provided information, it appears that chromium.br
comes from here: GitHub - alixaxel/chrome-aws-lambda: Chromium Binary for AWS Lambda and Google Cloud Functions
Checking more, it appears that this was previously reported there:
opened 04:22PM - 08 Apr 22 UTC
Im using `chrome-aws-lambda` and `playwright-core` just like the example. At the… moment of usage of it in a serverless functions, i get the error `"Error: ENOENT: no such file or directory, open '/var/task/src/scrapper/f_service/bin/chromium.br'\"`.
Also, i tried to add it in a layer wiht both libraries but still is failing.
This is my code:
``` typescript
import * as playwright from 'playwright-core';
import chromium from 'chrome-aws-lambda';
export const createBrowserContext: () => Promise<
playwright.BrowserContext
> = async () => {
const browser = await playwright.chromium.launch({
args: chromium.args,
executablePath: await chromium.executablePath',
headless: chromium.headless,
});
return await browser.newContext();
};
```
This is how my dir looks like
```
.
├── layers
│ └── chromium
│ └── nodejs
├── resources
└── src
├── scrapper
│ ├── freelancer
│ │ ├── basic
│ │ ├── check-service-state
│ │ ├── record
│ │ ├── emit-invoice
│ │ ├── generate-vep
│ │ ├── get-invoices
│ │ ├── get-sale-points
│ │ └── get-veps
│ ├── workers
│ │ ├── debt
│ │ ├── basic
│ │ ├── check-service-state
│ │ └── generate-vep
│ └── validate-credentials
└── utils
```
My system
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
Vendor ID: AuthenticAMD
CPU family: 23
Model: 8
Model name: AMD Ryzen 5 2600X Six-Core Processor
Distributor ID: Ubuntu
Description: Ubuntu 21.10
Release: 21.10
```
AWSLambda os is the default