Apache Nutch AlternativesWeb Scraping Tools and other similar apps like Apache Nutch

Apache Nutch is described as 'Highly extensible and scalable open source web crawler software project' and is a Web Scraping tool. There are more than 10 alternatives to Apache Nutch for a variety of platforms, including Windows, Linux, Mac, Web-based and BSD apps. The best Apache Nutch alternative is Scrapy, which is both free and Open Source. Other great apps like Apache Nutch are Lookyloo, Flyscrape, Mixnode and Crawlbase.

Copy a direct link to this comment to your clipboard
Apache Nutch alternatives page was last updated

Alternatives list

  1. Scrapy icon
     104 likes

    Scrapy is a free and open-source web-crawling framework written in Python. Originally designed for web scraping, it can also be used to extract data using APIs or as a general-purpose web crawler. It was developed and is maintained by Zyte formerly Scrapinghub, a web-scraping...

    105 Scrapy alternatives

    Cost / License

    Application type

    Platforms

    • Mac
    • Windows
    • Linux
    • BSD
     
  2. Lookyloo icon
     4 likes

    Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

    Cost / License

    • Free
    • Open Source

    Application type

    Platforms

    • Windows
    • Linux
    • Online
     
  3. Flyscrape icon
     6 likes

    Flyscrape is a standalone and scriptable web scraper, combining the speed of Go with the flexibility of JavaScript. — Focus on data extraction rather than request juggling.

    Cost / License

    Application type

    Platforms

    • Mac
    • Windows
    • Linux
     
  4. Mixnode icon
     38 likes

    Mixnode is a fast, flexible, massively scalable platform to extract and analyze data from the web.

    Cost / License

    • Paid
    • Proprietary

    Application type

    Platforms

    • Online
     
    |
    22
  5. Crawlbase icon
     3 likes

    Crawlbase, formerly ProxyCrawl, helps you stay anonymous while crawling the web, web crawling protection the way it should be.

    82 Crawlbase alternatives

    Cost / License

    • Freemium
    • Proprietary

    Application type

    Platforms

    • Online
     
  6. StormCrawler icon
     2 likes

    StormCrawler is an open source SDK for building distributed web crawlers with Apache Storm. The project is under Apache license v2 and consists of a collection of reusable resources and components, written mostly in Java.

    Cost / License

    Platforms

    • Mac
    • Windows
    • Linux
     
  7. Heritrix icon
     5 likes

    Open-source, extensible web crawler designed for large-scale, archival-quality web archiving, preserves digital artifacts, supports modular plugins, distributed crawling, detailed monitoring, scheduling, and exports data in standardized formats for preservation.

    Cost / License

    • Free
    • Open Source

    Platforms

    • Mac
    • Windows
    • Linux
     
  8. Scraperr icon
     1 like

    Scraperr is a self-hosted web application that allows users to scrape data from web pages by specifying elements via XPath. Users can submit URLs and the corresponding elements to be scraped, and the results will be displayed in a table.

    Cost / License

    • Free
    • Open Source (MIT)

    Application type

    Platforms

    • Self-Hosted
     
  9. Kaddara icon
     Like

    Kaddara is a platform designed for professionals who need fresh leads to run their business and whose business is affected by how competitors operate.

    Cost / License

    • Paid
    • Proprietary

    Application type

    Platforms

    • Software as a Service (SaaS)
     
  10. ACHE Crawler icon
     2 likes

    ACHE is a web crawler for domain-specific search.

    Cost / License

    • Free
    • Open Source

    Application type

    Platforms

    • Mac
    • Windows
    • Linux
     
10 of 10 Apache Nutch alternatives