Product Promotion
0x5a.live
for different kinds of informations and explorations.
GitHub - s0rg/crawley: The unix-way web crawler
The unix-way web crawler. Contribute to s0rg/crawley development by creating an account on GitHub.
Visit SiteGitHub - s0rg/crawley: The unix-way web crawler
The unix-way web crawler. Contribute to s0rg/crawley development by creating an account on GitHub.
Powered by 0x5a.live ๐
crawley
Crawls web pages and prints any link it can find.
features
- fast html SAX-parser (powered by x/net/html)
- js/css lexical parsers (powered by tdewolff/parse) - extract api endpoints from js code and
url()
properties - small (below 1500 SLOC), idiomatic, 100% test covered codebase
- grabs most of useful resources urls (pics, videos, audios, forms, etc...)
- found urls are streamed to stdout and guranteed to be unique (with fragments omitted)
- scan depth (limited by starting host and path, by default - 0) can be configured
- can be polite - crawl rules and sitemaps from
robots.txt
brute
mode - scan html comments for urls (this can lead to bogus results)- make use of
HTTP_PROXY
/HTTPS_PROXY
environment values + handles proxy auth (useHTTP_PROXY="socks5://127.0.0.1:1080/" crawley
for socks5) - directory-only scan mode (aka
fast-scan
) - user-defined cookies, in curl-compatible format (i.e.
-cookie "ONE=1; TWO=2" -cookie "ITS=ME" -cookie @cookie-file
) - user-defined headers, same as curl:
-header "ONE: 1" -header "TWO: 2" -header @headers-file
- tag filter - allow to specify tags to crawl for (single:
-tag a -tag form
, multiple:-tag a,form
, or mixed) - url ignore - allow to ignore urls with matched substrings from crawling (i.e.:
-ignore logout
) - subdomains support - allow depth crawling for subdomains as well (e.g.
crawley http://some-test.site
will be able to crawlhttp://www.some-test.site
)
examples
# print all links from first page:
crawley http://some-test.site
# print all js files and api endpoints:
crawley -depth -1 -tag script -js http://some-test.site
# print all endpoints from js:
crawley -js http://some-test.site/app.js
# download all png images from site:
crawley -depth -1 -tag img http://some-test.site | grep '\.png$' | wget -i -
# fast directory traversal:
crawley -headless -delay 0 -depth -1 -dirs only http://some-test.site
installation
- binaries / deb / rpm for Linux, FreeBSD, macOS and Windows.
- archlinux you can use your favourite AUR helper to install it, e. g.
paru -S crawley-bin
.
usage
crawley [flags] url
possible flags with default values:
-all
scan all known sources (js/css/...)
-brute
scan html comments
-cookie value
extra cookies for request, can be used multiple times, accept files with '@'-prefix
-css
scan css for urls
-delay duration
per-request delay (0 - disable) (default 150ms)
-depth int
scan depth (set -1 for unlimited)
-dirs string
policy for non-resource urls: show / hide / only (default "show")
-header value
extra headers for request, can be used multiple times, accept files with '@'-prefix
-headless
disable pre-flight HEAD requests
-ignore value
patterns (in urls) to be ignored in crawl process
-js
scan js code for endpoints
-proxy-auth string
credentials for proxy: user:password
-robots string
policy for robots.txt: ignore / crawl / respect (default "ignore")
-silent
suppress info and error messages in stderr
-skip-ssl
skip ssl verification
-subdomains
support subdomains (e.g. if www.domain.com found, recurse over it)
-tag value
tags filter, single or comma-separated tag names
-timeout duration
request timeout (min: 1 second, max: 10 minutes) (default 5s)
-user-agent string
user-agent string
-version
show version
-workers int
number of workers (default - number of CPU cores)
flags autocompletion
Crawley can handle flags autocompletion in bash and zsh via complete
:
complete -C "/full-path-to/bin/crawley" crawley
license
GoLang Resources
are all listed below.
GitHub - GuilhermeCaruso/anko: :crystal_ball: Simple application watcher
resource
~/github.com
resource
GitHub - jidicula/go-fuzz-action: GitHub Action for Go 1.18 fuzz testing
resource
~/github.com
resource
GitHub - tucnak/climax: Climax is an alternative CLI with the human face
resource
~/github.com
resource
GitHub - lawrencewoodman/roveralls: A Go recursive coverage testing tool
resource
~/github.com
resource
GitHub - nakagami/firebirdsql: Firebird RDBMS sql driver for Go (golang)
resource
~/github.com
resource
GitHub - liweiyi88/onedump: Effortlessly database dump with one command.
resource
~/github.com
resource
GitHub - beefsack/go-astar: Go implementation of the A* search algorithm
resource
~/github.com
resource
GitHub - lxn/walk: A Windows GUI toolkit for the Go Programming Language
resource
~/github.com
resource
GitHub - mongodb/mongo-go-driver: The Official Golang driver for MongoDB
resource
~/github.com
resource
GitHub - bykof/gostradamus: Gostradamus: Better DateTimes for Go ๐ฐ๏ธ
resource
~/github.com
resource
GitHub - mozillazg/go-unidecode: ASCII transliterations of Unicode text.
resource
~/github.com
resource
GitHub - bolknote/go-gd: Go bingings for GD (http://www.boutell.com/gd/)
resource
~/github.com
resource
GitHub - mosajjal/dnsmonster: Passive DNS Capture and Monitoring Toolkit
resource
~/github.com
resource
GitHub - haxpax/gosms: :mailbox_closed: Your own local SMS gateway in Go
resource
~/github.com
resource
GitHub - wajox/gobase: This is a simple skeleton for golang applications
resource
~/github.com
resource
GitHub - VividCortex/gohistogram: Streaming approximate histograms in Go
resource
~/github.com
resource
GitHub - malaschitz/randomForest: Random Forest implementation in golang
resource
~/github.com
resource
GitHub - google/gopacket: Provides packet processing capabilities for Go
resource
~/github.com
resource
GitHub - khezen/evoli: Genetic Algorithm and Particle Swarm Optimization
resource
~/github.com
resource
GitHub - didip/tollbooth: Simple middleware to rate-limit HTTP requests.
resource
~/github.com
resource
GitHub - mustafaakin/gongular: A different approach to Go web frameworks
resource
~/github.com
resource
GitHub - songgao/colorgo: Colorize (highlight) `go build` command output
resource
~/github.com
resource
Made with โค๏ธ
to provide different kinds of informations and resources.