Product Promotion
for different kinds of informations and explorations.
GitHub - Tjatse/node-readability: Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.
Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English. - Tjatse/node-readability
Visit SiteGitHub - Tjatse/node-readability: Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.
Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English. - Tjatse/node-readability
Powered by 💗
- Readability reference to Arc90's.
- Scrape article from any page (automatically).
- Make any web page readable, no matter Chinese or English.
- Features
- Performance
- Installation
- Usage
- Debug
- Score Rule
- Extract Selectors
- Image Fallback
- Threshold
- Customize Settings
- Output
- Notes
How it works
In my case, the speed of spider is about 1500k documents per day, and the maximize crawling speed is 1.2k /minute, avg 1k /minute, the memory cost are about 200 MB on each spider kernel, and the accuracy is about 90%, the rest 10% can be fixed by customizing Score Rules or Selectors. it's better than any other readability modules.
(4) Server infos:
- 20M bandwidth of fibre-optical
- 8 Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz cpus
- 32G memory
NodeJS Resources
are all listed below.
Made with ❤️
to provide different kinds of informations and resources.