improved docs

This commit is contained in:
stefanodvx 2025-04-24 00:25:04 +02:00
parent a24d9348bf
commit 491867d7e4
8 changed files with 232 additions and 145 deletions

18
CONFIGURATION.md Normal file
View file

@ -0,0 +1,18 @@
# configuration
the `ext-cfg.yaml` file allows you to set custom options for each extractor. this is useful for advanced configuration of the bot, mostly related to network settings.
> [!NOTE]
> this configuration will override the global configuration in the `.env` file. this is useful in case you want to set a global proxy in the `.env` file and then override it for specific extractors in the `ext-cfg.yaml` file.
## structure
the file uses yaml format. each top-level key is the name of an extractor. under each extractor, you can define options supported by that extractor, for example:
```yaml
instagram:
edge_proxy_url: https://example.com
impersonate: true
```
## available options
* `http_proxy` | `https_proxy`: the http(s) proxy to use for this extractor. see [proxying](README.md#proxying) for more information.
* `no_proxy`: the domains that should not be proxied for this extractor.
* `edge_proxy_url`: the url of the edge proxy to use for this extractor. see [edge proxy](EDGEPROXY.md) for more information.
* `impersonate`: whether to impersonate a browser for this extractor. this is useful for extractors that require specific browsers' fingerprints to work.

41
EDGEPROXY.md Normal file
View file

@ -0,0 +1,41 @@
# edge proxy
edge proxy is an optional feature that allows routing some extractor requests through a custom proxy endpoint, instead of a classic http/https proxy. this is useful if you want to centralize or control the traffic of certain platforms via your own proxy service, for example to bypass geo-restrictions, add caching, logging, or other customizations.
## configuration
edge proxy is configured via the `ext-cfg.yaml` file.
you can set the proxy url for each extractor that supports it.
example:
```yaml
instagram_share:
edge_proxy_url: https://example.com
reddit:
https_proxy: https://example.com
```
## response format
the edge proxy must respond with a JSON object in the following format (see [models.EdgeProxyResponse](models/edgeproxy.go))
```json
{
"url": "https://example.com/resource",
"status_code": 200,
"text": "response body",
"headers": {
"Content-Type": "application/json"
},
"cookies": [
"cookie1=value1; Path=/; HttpOnly",
"cookie2=value2; Path=/"
]
}
```
## http proxy vs edge proxy
the main difference between http proxy and edge proxy is that http proxy is a standard proxy that forwards requests and responses, while edge proxy is a custom proxy that can modify the requests and responses in any way you want.
## notes
* edge proxy is for advanced use ant not required for most users.
* this feature is experimental and may change in the future.
* you can check full implementation of the edge proxy in the [edgeproxy](util/edgeproxy.go) package.

View file

@ -1,7 +1,7 @@
# govd
a telegram bot for downloading media from various platforms
a telegram bot for downloading media from various platforms.
this project draws significant inspiration from [yt-dlp](https://github.com/yt-dlp/yt-dlp)
this project draws significant inspiration from [yt-dlp](https://github.com/yt-dlp/yt-dlp).
- official instance: [@govd_bot](https://t.me/govd_bot)
- support group: [govdsupport](https://t.me/govdsupport)
@ -12,7 +12,7 @@ this project draws significant inspiration from [yt-dlp](https://github.com/yt-d
* [installation](#installation)
* [build](#build)
* [docker](#docker-recommended)
* [options](#options)
* [configuration](#configuration)
* [authentication](#authentication)
* [proxying](#proxying)
* [todo](#todo)
@ -30,7 +30,7 @@ this project draws significant inspiration from [yt-dlp](https://github.com/yt-d
> [!NOTE]
> there's no official support for windows yet. if you want to run the bot on it, please follow [docker installation](#docker-recommended).
1. clone the repository
1. clone the repository:
```bash
git clone https://github.com/govdbot/govd.git && cd govd
```
@ -68,7 +68,9 @@ this project draws significant inspiration from [yt-dlp](https://github.com/yt-d
docker compose up -d
```
# options
# configuration
you can configure the bot using the `.env` file. here are the available options:
| variable | description | default |
|-------------------------------|----------------------------------------------|---------------------------------------|
| DB_HOST | database host | localhost |
@ -87,15 +89,16 @@ this project draws significant inspiration from [yt-dlp](https://github.com/yt-d
| REPO_URL | project repository url | https://github.com/govdbot/govd |
| PROFILER_PORT | port for profiler http server (pprof) | 0 _(disabled)_ |
you can configure specific extractors options with `ext-cfg.yaml` file. documentation is not available yet, but you can check the source code for more information.
you can configure specific extractors options with `ext-cfg.yaml` file ([learn more](CONFIGURATION.md)).
> [!IMPORTANT]
> to avoid limits on files, you should host your own telegram botapi and set `BOT_API_URL` variable according. public bot instance is currently running under a botapi fork, [tdlight-telegram-bot-api](https://github.com/tdlight-team/tdlight-telegram-bot-api), but you can use the official botapi client too.
# proxying
there are two types of proxying available: http and edge.
there are two types of proxying available:
* **http proxy**: this is a standard http proxy that can be used to route requests through a proxy server. you can set the `HTTP_PROXY` and `HTTPS_PROXY` environment variables to use this feature. (SOCKS5 is supported too)
* **edge proxy**: this is a custom proxy that is used to route requests through a specific url. currenrly, you can only set this proxy with `ext-cfg.yaml` file. this is useful for routing requests through a specific server or service. however, this feature is not totally implemented yet.
* **edge proxy**: this is a custom proxy that is used to route requests through a specific url. currenrly, you can only set this proxy with `ext-cfg.yaml` file ([learn more](EDGEPROXY.md)).
> [!TIP]
> by settings `NO_PROXY` environment variable, you can specify domains that should not be proxied.

View file

@ -1,6 +1,6 @@
package models
type ProxyResponse struct {
type EdgeProxyResponse struct {
URL string `json:"url"`
StatusCode int `json:"status_code"`
Text string `json:"text"`

View file

@ -39,4 +39,5 @@ type ExtractorConfig struct {
HTTPSProxy string `yaml:"https_proxy"`
NoProxy string `yaml:"no_proxy"`
EdgeProxyURL string `yaml:"edge_proxy_url"`
Impersonate bool `yaml:"impersonate"`
}

141
util/edgeproxy.go Normal file
View file

@ -0,0 +1,141 @@
package util
import (
"bytes"
"fmt"
"govd/models"
"io"
"net/http"
"net/url"
"strconv"
"time"
"github.com/bytedance/sonic"
)
type EdgeProxyClient struct {
client *http.Client
proxyURL string
}
func NewEdgeProxyFromConfig(cfg *models.ExtractorConfig) *EdgeProxyClient {
var baseClient *http.Client
if cfg.Impersonate {
baseClient = NewChromeClient()
} else {
baseClient = &http.Client{
Transport: GetBaseTransport(),
Timeout: 60 * time.Second,
}
}
return &EdgeProxyClient{
client: baseClient,
proxyURL: cfg.EdgeProxyURL,
}
}
func NewEdgeProxy(
proxyURL string,
) *EdgeProxyClient {
return &EdgeProxyClient{
client: &http.Client{
Transport: GetBaseTransport(),
Timeout: 60 * time.Second,
},
proxyURL: proxyURL,
}
}
func (c *EdgeProxyClient) Do(req *http.Request) (*http.Response, error) {
if c.proxyURL == "" {
return nil, fmt.Errorf("proxy URL is not set")
}
targetURL := req.URL.String()
encodedURL := url.QueryEscape(targetURL)
proxyURLWithParam := c.proxyURL + "?url=" + encodedURL
bodyBytes, err := readRequestBody(req)
if err != nil {
return nil, err
}
proxyReq, err := http.NewRequest(
req.Method,
proxyURLWithParam,
bytes.NewBuffer(bodyBytes),
)
if err != nil {
return nil, fmt.Errorf("error creating proxy request: %w", err)
}
copyHeaders(req.Header, proxyReq.Header)
proxyResp, err := c.client.Do(proxyReq)
if err != nil {
return nil, fmt.Errorf("proxy request failed: %w", err)
}
defer proxyResp.Body.Close()
return parseProxyResponse(proxyResp, req)
}
func readRequestBody(req *http.Request) ([]byte, error) {
if req.Body == nil {
return nil, nil
}
bodyBytes, err := io.ReadAll(req.Body)
if err != nil {
return nil, fmt.Errorf("error reading request body: %w", err)
}
req.Body.Close()
req.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
return bodyBytes, nil
}
func copyHeaders(source, destination http.Header) {
for name, values := range source {
for _, value := range values {
destination.Add(name, value)
}
}
}
func parseProxyResponse(proxyResp *http.Response, originalReq *http.Request) (*http.Response, error) {
body, err := io.ReadAll(proxyResp.Body)
if err != nil {
return nil, fmt.Errorf("error reading proxy response: %w", err)
}
var response models.EdgeProxyResponse
if err := sonic.ConfigFastest.Unmarshal(body, &response); err != nil {
return nil, fmt.Errorf("error parsing proxy response: %w", err)
}
resp := &http.Response{
StatusCode: response.StatusCode,
Status: strconv.Itoa(response.StatusCode) + " " + http.StatusText(response.StatusCode),
Body: io.NopCloser(bytes.NewBufferString(response.Text)),
Header: make(http.Header),
Request: originalReq,
}
parsedResponseURL, err := url.Parse(response.URL)
if err != nil {
return nil, fmt.Errorf("error parsing response URL: %w", err)
}
resp.Request.URL = parsedResponseURL
for name, value := range response.Headers {
resp.Header.Set(name, value)
}
for _, cookie := range response.Cookies {
resp.Header.Add("Set-Cookie", cookie)
}
return resp, nil
}

View file

@ -3,6 +3,7 @@ package util
import (
"crypto/tls"
"net/http"
"time"
)
func ChromeClientHelloSpec() *tls.ClientHelloInfo {
@ -72,13 +73,12 @@ func NewChromeClient() *http.Client {
Renegotiation: tls.RenegotiateNever,
}
transport := &http.Transport{
TLSClientConfig: tlsConfig,
// chrome enables HTTP/2
ForceAttemptHTTP2: true,
}
transport := GetBaseTransport()
transport.TLSClientConfig = tlsConfig
// chrome uses HTTP/2, but it's enabled by default in base transport
return &http.Client{
Transport: transport,
Timeout: 60 * time.Second,
}
}

View file

@ -1,11 +1,8 @@
package util
import (
"bytes"
"fmt"
"govd/config"
"govd/models"
"io"
"log"
"net"
"net/http"
@ -13,8 +10,6 @@ import (
"strings"
"sync"
"time"
"github.com/bytedance/sonic"
)
var (
@ -26,14 +21,14 @@ var (
func GetDefaultHTTPClient() *http.Client {
defaultClientOnce.Do(func() {
defaultClient = &http.Client{
Transport: createBaseTransport(),
Transport: GetBaseTransport(),
Timeout: 60 * time.Second,
}
})
return defaultClient
}
func createBaseTransport() *http.Transport {
func GetBaseTransport() *http.Transport {
return &http.Transport{
Proxy: http.ProxyFromEnvironment,
DialContext: (&net.Dialer{
@ -65,26 +60,27 @@ func GetHTTPClient(extractor string) models.HTTPClient {
var client models.HTTPClient
if cfg.EdgeProxyURL != "" {
client = NewEdgeProxyClient(cfg.EdgeProxyURL)
client = NewEdgeProxyFromConfig(cfg)
} else {
client = createClientWithProxy(cfg)
client = NewClientFromConfig(cfg)
}
extractorClients[extractor] = client
return client
}
func createClientWithProxy(cfg *models.ExtractorConfig) *http.Client {
transport := createBaseTransport()
func NewClientFromConfig(cfg *models.ExtractorConfig) *http.Client {
var baseClient *http.Client
if cfg.Impersonate {
baseClient = NewChromeClient()
} else {
baseClient = GetDefaultHTTPClient()
}
transport := GetBaseTransport()
if cfg.HTTPProxy != "" || cfg.HTTPSProxy != "" {
configureProxyTransport(transport, cfg)
}
return &http.Client{
Transport: transport,
Timeout: 60 * time.Second,
}
baseClient.Transport = transport
return baseClient
}
func configureProxyTransport(
@ -100,20 +96,16 @@ func configureProxyTransport(
log.Printf("warning: invalid HTTP proxy URL '%s': %v\n", cfg.HTTPProxy, err)
}
}
if cfg.HTTPSProxy != "" {
httpsProxyURL, err = url.Parse(cfg.HTTPSProxy)
if err != nil {
log.Printf("warning: invalid HTTPS proxy URL '%s': %v\n", cfg.HTTPSProxy, err)
}
}
if httpProxyURL == nil && httpsProxyURL == nil {
return
}
noProxyList := parseNoProxyList(cfg.NoProxy)
transport.Proxy = func(req *http.Request) (*url.URL, error) {
if shouldBypassProxy(req.URL.Hostname(), noProxyList) {
return nil, nil
@ -155,112 +147,3 @@ func shouldBypassProxy(host string, noProxyList []string) bool {
}
return false
}
type EdgeProxyClient struct {
client *http.Client
proxyURL string
}
func NewEdgeProxyClient(proxyURL string) *EdgeProxyClient {
return &EdgeProxyClient{
client: &http.Client{
Transport: createBaseTransport(),
Timeout: 60 * time.Second,
},
proxyURL: proxyURL,
}
}
func (c *EdgeProxyClient) Do(req *http.Request) (*http.Response, error) {
if c.proxyURL == "" {
return nil, fmt.Errorf("proxy URL is not set")
}
targetURL := req.URL.String()
encodedURL := url.QueryEscape(targetURL)
proxyURLWithParam := c.proxyURL + "?url=" + encodedURL
bodyBytes, err := readRequestBody(req)
if err != nil {
return nil, err
}
proxyReq, err := http.NewRequest(
req.Method,
proxyURLWithParam,
bytes.NewBuffer(bodyBytes),
)
if err != nil {
return nil, fmt.Errorf("error creating proxy request: %w", err)
}
copyHeaders(req.Header, proxyReq.Header)
proxyResp, err := c.client.Do(proxyReq)
if err != nil {
return nil, fmt.Errorf("proxy request failed: %w", err)
}
defer proxyResp.Body.Close()
return parseProxyResponse(proxyResp, req)
}
func readRequestBody(req *http.Request) ([]byte, error) {
if req.Body == nil {
return nil, nil
}
bodyBytes, err := io.ReadAll(req.Body)
if err != nil {
return nil, fmt.Errorf("error reading request body: %w", err)
}
req.Body.Close()
req.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
return bodyBytes, nil
}
func copyHeaders(source, destination http.Header) {
for name, values := range source {
for _, value := range values {
destination.Add(name, value)
}
}
}
func parseProxyResponse(proxyResp *http.Response, originalReq *http.Request) (*http.Response, error) {
body, err := io.ReadAll(proxyResp.Body)
if err != nil {
return nil, fmt.Errorf("error reading proxy response: %w", err)
}
var response models.ProxyResponse
if err := sonic.ConfigFastest.Unmarshal(body, &response); err != nil {
return nil, fmt.Errorf("error parsing proxy response: %w", err)
}
resp := &http.Response{
StatusCode: response.StatusCode,
Status: fmt.Sprintf("%d %s", response.StatusCode, http.StatusText(response.StatusCode)),
Body: io.NopCloser(bytes.NewBufferString(response.Text)),
Header: make(http.Header),
Request: originalReq,
}
parsedResponseURL, err := url.Parse(response.URL)
if err != nil {
return nil, fmt.Errorf("error parsing response URL: %w", err)
}
resp.Request.URL = parsedResponseURL
for name, value := range response.Headers {
resp.Header.Set(name, value)
}
for _, cookie := range response.Cookies {
resp.Header.Add("Set-Cookie", cookie)
}
return resp, nil
}