

Apache Nutch
2 likes
Apache Nutch is a highly extensible and scalable open source web crawler software project.
Features
Apache Nutch News & Activities
Highlights All activities
Recent activities
POX added Apache Nutch as alternative to Scraperr
Apache Nutch information
No comments or reviews, maybe you want to be first?
What is Apache Nutch?
Apache Nutch is a highly extensible and scalable open source web crawler software project.
Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architecture, allowing developers to create plug-ins for media-type parsing, data retrieval, querying and clustering.
The fetcher ("robot" or "web crawler") has been written from scratch specifically for this project.



