Product Promotion
0x5a.live
for different kinds of informations and explorations.
GitHub - cyucelen/walker: Seamlessly fetch paginated data from any source. Simple and high performance API scraping included!
Seamlessly fetch paginated data from any source. Simple and high performance API scraping included! - cyucelen/walker
Visit SiteGitHub - cyucelen/walker: Seamlessly fetch paginated data from any source. Simple and high performance API scraping included!
Seamlessly fetch paginated data from any source. Simple and high performance API scraping included! - cyucelen/walker
Powered by 0x5a.live ๐
walker
Walker simplifies the process of fetching paginated data from any data source. With Walker, you can easily configure the start position and count of documents to fetch, depending on your needs. Additionally, Walker supports parallel processing, allowing you to fetch data more efficiently and at a faster rate.
The real purpose of the library is to provide a solution for walking through the pagination of API endpoints. With the NewApiWalker
, you can easily fetch data from any paginated API endpoint and process the data concurrently. You can also create your own custom walker to fit your specific use case.
Features
- Provides a walker to paginate through the pagination of API endpoint. This is for scraping an API, if such a term exists.
cursor
andoffset
pagination strategies.- Fetching and processing data concurrently without any effort.
- Total fetch count limiting
- Rate limiting
Examples
Basic Usage
func source(start, fetchCount int) ([]int, error) {
return []int{start, fetchCount}, nil
}
func sink(result []int, stop func()) error {
fmt.Println(result)
return nil
}
func main() {
walker.New(source, sink).Walk()
}
Output:
[0 10]
[1 10]
[4 10]
[2 10]
[3 10]
[5 10]
[8 10]
[9 10]
[7 10]
[6 10]
...
to Infinity
source
function will receivestart
as the page number andcount
as the number of documents. Use this values to fetch data from your source.sink
function will receive the result you returned fromsource
and astop
function. You can save the results in this function and decide to stop sourcing any further pages depending on your results by callingstop
function, otherwise it will continue to forever unless a limit provided.- Beware of order is not ensured since source and sink functions called concurrently.
Walking through the pagination of API endpoints
Fetching all the breweries from Open Brewery DB
:
func buildRequest(start, fetchCount int) (*http.Request, error) {
url := fmt.Sprintf("https://api.openbrewerydb.org/breweries?page=%d&per_page=%d", start, fetchCount)
return http.NewRequest(http.MethodGet, url, http.NoBody)
}
func sink(res *http.Response, stop func()) error {
var payload []map[string]any
json.NewDecoder(res.Body).Decode(&payload)
if len(payload) == 0 {
stop()
return nil
}
return saveBreweries(payload)
}
func main() {
walker.NewApiWalker(http.DefaultClient, buildRequest, sink).Walk()
}
To create API walker you just need to provide:
RequestBuilder
function to create http request using provided valuessink
function to process the http response
Check examples for more usecases.
Configuration
Option | Description | Default | Available Values |
---|---|---|---|
WithPagination | Defines the pagination strategy | walker.OffsetPagination{} |
walker.OffsetPagination{} , walker.CursorPagination{} |
WithMaxBatchSize | Defines limit for document count to stop after reached | 10 |
int |
WithParallelism | Defines number of workers to run provided source | runtime.NumCPU() |
int |
WithLimiter | Defines limit for document count to stop after reached | walker.InfiniteLimiter() |
walker.InfiniteLimiter() , walker.ConstantLimiter(int) |
WithRateLimit | Defines rate limit by count and per duration | unlimited |
(int, time.Duration) |
WithContext | Defines context | context.Background() |
context.Context |
Contribution
I would like to accept any contributions to make walker
better and feature rich. Feel free to contribute with your usecase!
GoLang Resources
are all listed below.
Made with โค๏ธ
to provide different kinds of informations and resources.