Algolia integration with Netlify - Build triggering and what to crawl

I want to add Algolia search to my SSG. Currently I just upload local files to Netlify using CLI (old golang version still works fine). Since doing these uploads a build is not created so would not trigger Algolia search to begin…so to get a build occurring I will upload my files to a Github repo (blog files). Now when I make a change to these blog files a build will be created and will trigger Algolia search but…I don’t want my blog files searched, I want another folder of documents (newsletters) searched that resides on another Netlify site. Is this possible? I have put the newsletters on another site because Github does not want tons of documents/images in their repos I’ve been told.

So what am I missing > Where does everyone store all their documents to be indexed by Algolia? It must be a repo somewhere to generate a build trigger but the docs cannot be there, too many and not suitable for a repo.

@play4sale I presume what you’re referring to is the Algolia plugin for Netlify?

I’ve never used it, but after a quick google and reading some of the info, it looks like it’s powered by a crawler.

So my understanding is that it’s not iterating through the files of your build to produce an index, but crawling the public facing pages of the site, regardless of where they’re actually hosted.

The overview for the crawler specifically mentions…

If your data is hosted on a variety of sites and in a variety of formats, it can be daunting to centralize your data or upload your records to Algolia. The Algolia Crawler simplifies this process.

Thanks. So if I have a link in my blog that points to my other Netlify domain I’m wondering if the content in that html file link will be indexed? It seems to say yes? Do you agree?

I can’t say, since I’ve not used it, and have read only a small potion of the documentation.

If I were you I’d just keep reading the documentation and give it a try.

At a wild guess, it’d have to be down to configuration, crawlers don’t tend to crawl everything that they link to, only what they’re told to crawl, as they would otherwise crawl everything.

I’ve just been reading about Github file limits. It seems there is no limit when uploaded via command line. Max size is 25MB for browser uploads. My newsletters are rarely over 10MB. There is about 72 of them though. Maybe I should just try uploading them to Github.

I’m not sure what kind of large files you’re storing, but you may also want to refer to:

I just read that large files are defined as larger than 100MB. GitLFS should then be used, but my files are nowhere near that size.