Krawl is a web crawling library written in Go, designed to efficiently crawl and extract data from web pages.
To use krawl
in your Go project, follow these steps:
-
Ensure you have Go installed on your system.
krawl
requires Go version 1.16 or higher. You can check your current Go version with:go version
-
Inside your Go project, add
krawl
as a module dependency:go get github.com/AgentGino/krawl
-
Import
krawl
in your Go files where you need it:import "github.com/AgentGino/krawl"
A quick example of how to use the library in your Go application:
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"github.com/AgentGino/krawl"
)
func SaveToJSON(data []*krawl.RunnerOutput, filename string) error {
fileData, err := json.MarshalIndent(data, "", " ")
if err != nil {
return err
}
return ioutil.WriteFile(filename, fileData, 0644)
}
func main() {
allData, err := krawl.Runner(
krawl.RunnerInput{
StartUrl: "https://go.dev",
PathPatterns: []string{"doc/.*"},
},
)
if err != nil {
panic(err)
}
// Saving data
if err := SaveToJSON(allData, "output.json"); err != nil {
fmt.Errorf("Error while save to json file", err)
}
fmt.Println(" ✔️ Data saved to output.json")
}
- Concurrent web crawling capabilities (coming soon).
- Respect robots.txt file
This project is licensed under the Apache 2.0 License - see the LICENSE.md file for details.