Merge branch 'main' of https://github.com/govdbot/govd
All checks were successful
Build and deploy / build-and-push-image (push) Successful in 8m15s
All checks were successful
Build and deploy / build-and-push-image (push) Successful in 8m15s
This commit is contained in:
commit
7664d01a58
8 changed files with 232 additions and 145 deletions
18
CONFIGURATION.md
Normal file
18
CONFIGURATION.md
Normal file
|
@ -0,0 +1,18 @@
|
|||
# configuration
|
||||
the `ext-cfg.yaml` file allows you to set custom options for each extractor. this is useful for advanced configuration of the bot, mostly related to network settings.
|
||||
> [!NOTE]
|
||||
> this configuration will override the global configuration. this is useful in case you want to set a global proxy in the `.env` file and then override it for specific extractors in the `ext-cfg.yaml` file.
|
||||
|
||||
## structure
|
||||
the file uses yaml format. each top-level key is the name of an extractor. under each extractor, you can define options supported by that extractor, for example:
|
||||
```yaml
|
||||
instagram:
|
||||
edge_proxy_url: https://example.com
|
||||
impersonate: true
|
||||
```
|
||||
|
||||
## available options
|
||||
* `http_proxy` | `https_proxy`: the http(s) proxy to use for this extractor. see [proxying](README.md#proxying) for more information.
|
||||
* `no_proxy`: the domains that should not be proxied for this extractor.
|
||||
* `edge_proxy_url`: the url of the edge proxy to use for this extractor. see [edge proxy](EDGEPROXY.md) for more information.
|
||||
* `impersonate`: whether to impersonate chrome. this is useful for extractors that require specific browsers' fingerprints to work.
|
41
EDGEPROXY.md
Normal file
41
EDGEPROXY.md
Normal file
|
@ -0,0 +1,41 @@
|
|||
# edge proxy
|
||||
edge proxy is an optional feature that allows routing some extractor requests through a custom proxy endpoint, instead of a classic http/https proxy. this is useful if you want to centralize or control the traffic of certain platforms via your own proxy service, for example to bypass geo-restrictions, add caching, logging, or other customizations.
|
||||
|
||||
## configuration
|
||||
edge proxy is configured via the `ext-cfg.yaml` file.
|
||||
you can set the proxy url for each extractor that supports it.
|
||||
example:
|
||||
|
||||
```yaml
|
||||
instagram_share:
|
||||
edge_proxy_url: https://example.com
|
||||
|
||||
reddit:
|
||||
https_proxy: https://example.com
|
||||
```
|
||||
|
||||
## response format
|
||||
the edge proxy must respond with a JSON object in the following format (see [`models.EdgeProxyResponse`](models/edgeproxy.go)).
|
||||
|
||||
```json
|
||||
{
|
||||
"url": "https://example.com/resource",
|
||||
"status_code": 200,
|
||||
"text": "response body",
|
||||
"headers": {
|
||||
"Content-Type": "application/json"
|
||||
},
|
||||
"cookies": [
|
||||
"cookie1=value1; Path=/; HttpOnly",
|
||||
"cookie2=value2; Path=/"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## http proxy vs edge proxy
|
||||
the main difference between http proxy and edge proxy is that http proxy is a standard proxy that forwards requests and responses, while edge proxy is a custom proxy that can modify the requests and responses in any way you want.
|
||||
|
||||
## notes
|
||||
* edge proxy is for advanced use and not required for most users.
|
||||
* this feature is experimental and may change in the future.
|
||||
* you can check full implementation of the edge proxy in the [`util/edgeproxy`](util/edgeproxy.go) package.
|
19
README.md
19
README.md
|
@ -1,7 +1,7 @@
|
|||
# govd
|
||||
a telegram bot for downloading media from various platforms
|
||||
a telegram bot for downloading media from various platforms.
|
||||
|
||||
this project draws significant inspiration from [yt-dlp](https://github.com/yt-dlp/yt-dlp)
|
||||
this project draws significant inspiration from [yt-dlp](https://github.com/yt-dlp/yt-dlp).
|
||||
|
||||
- official instance: [@govd_bot](https://t.me/govd_bot)
|
||||
- support group: [govdsupport](https://t.me/govdsupport)
|
||||
|
@ -12,7 +12,7 @@ this project draws significant inspiration from [yt-dlp](https://github.com/yt-d
|
|||
* [installation](#installation)
|
||||
* [build](#build)
|
||||
* [docker](#docker-recommended)
|
||||
* [options](#options)
|
||||
* [configuration](#configuration)
|
||||
* [authentication](#authentication)
|
||||
* [proxying](#proxying)
|
||||
* [todo](#todo)
|
||||
|
@ -30,7 +30,7 @@ this project draws significant inspiration from [yt-dlp](https://github.com/yt-d
|
|||
> [!NOTE]
|
||||
> there's no official support for windows yet. if you want to run the bot on it, please follow [docker installation](#docker-recommended).
|
||||
|
||||
1. clone the repository
|
||||
1. clone the repository:
|
||||
```bash
|
||||
git clone https://github.com/govdbot/govd.git && cd govd
|
||||
```
|
||||
|
@ -68,7 +68,9 @@ this project draws significant inspiration from [yt-dlp](https://github.com/yt-d
|
|||
docker compose up -d
|
||||
```
|
||||
|
||||
# options
|
||||
# configuration
|
||||
you can configure the bot using the `.env` file. here are the available options:
|
||||
|
||||
| variable | description | default |
|
||||
|-------------------------------|----------------------------------------------|---------------------------------------|
|
||||
| DB_HOST | database host | localhost |
|
||||
|
@ -87,15 +89,16 @@ this project draws significant inspiration from [yt-dlp](https://github.com/yt-d
|
|||
| REPO_URL | project repository url | https://github.com/govdbot/govd |
|
||||
| PROFILER_PORT | port for profiler http server (pprof) | 0 _(disabled)_ |
|
||||
|
||||
you can configure specific extractors options with `ext-cfg.yaml` file. documentation is not available yet, but you can check the source code for more information.
|
||||
you can configure specific extractors options with `ext-cfg.yaml` file ([learn more](CONFIGURATION.md)).
|
||||
|
||||
> [!IMPORTANT]
|
||||
> to avoid limits on files, you should host your own telegram botapi and set `BOT_API_URL` variable according. public bot instance is currently running under a botapi fork, [tdlight-telegram-bot-api](https://github.com/tdlight-team/tdlight-telegram-bot-api), but you can use the official botapi client too.
|
||||
|
||||
# proxying
|
||||
there are two types of proxying available: http and edge.
|
||||
there are two types of proxying available:
|
||||
* **http proxy**: this is a standard http proxy that can be used to route requests through a proxy server. you can set the `HTTP_PROXY` and `HTTPS_PROXY` environment variables to use this feature. (SOCKS5 is supported too)
|
||||
* **edge proxy**: this is a custom proxy that is used to route requests through a specific url. currenrly, you can only set this proxy with `ext-cfg.yaml` file. this is useful for routing requests through a specific server or service. however, this feature is not totally implemented yet.
|
||||
* **edge proxy**: this is a custom proxy that is used to route requests through a specific url. currenrly, you can only set this proxy with `ext-cfg.yaml` file ([learn more](EDGEPROXY.md)).
|
||||
|
||||
> [!TIP]
|
||||
> by settings `NO_PROXY` environment variable, you can specify domains that should not be proxied.
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
package models
|
||||
|
||||
type ProxyResponse struct {
|
||||
type EdgeProxyResponse struct {
|
||||
URL string `json:"url"`
|
||||
StatusCode int `json:"status_code"`
|
||||
Text string `json:"text"`
|
||||
|
|
|
@ -39,4 +39,5 @@ type ExtractorConfig struct {
|
|||
HTTPSProxy string `yaml:"https_proxy"`
|
||||
NoProxy string `yaml:"no_proxy"`
|
||||
EdgeProxyURL string `yaml:"edge_proxy_url"`
|
||||
Impersonate bool `yaml:"impersonate"`
|
||||
}
|
||||
|
|
141
util/edgeproxy.go
Normal file
141
util/edgeproxy.go
Normal file
|
@ -0,0 +1,141 @@
|
|||
package util
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"govd/models"
|
||||
"io"
|
||||
"net/http"
|
||||
"net/url"
|
||||
"strconv"
|
||||
"time"
|
||||
|
||||
"github.com/bytedance/sonic"
|
||||
)
|
||||
|
||||
type EdgeProxyClient struct {
|
||||
client *http.Client
|
||||
proxyURL string
|
||||
}
|
||||
|
||||
func NewEdgeProxyFromConfig(cfg *models.ExtractorConfig) *EdgeProxyClient {
|
||||
var baseClient *http.Client
|
||||
if cfg.Impersonate {
|
||||
baseClient = NewChromeClient()
|
||||
} else {
|
||||
baseClient = &http.Client{
|
||||
Transport: GetBaseTransport(),
|
||||
Timeout: 60 * time.Second,
|
||||
}
|
||||
}
|
||||
return &EdgeProxyClient{
|
||||
client: baseClient,
|
||||
proxyURL: cfg.EdgeProxyURL,
|
||||
}
|
||||
}
|
||||
|
||||
func NewEdgeProxy(
|
||||
proxyURL string,
|
||||
) *EdgeProxyClient {
|
||||
return &EdgeProxyClient{
|
||||
client: &http.Client{
|
||||
Transport: GetBaseTransport(),
|
||||
Timeout: 60 * time.Second,
|
||||
},
|
||||
proxyURL: proxyURL,
|
||||
}
|
||||
}
|
||||
|
||||
func (c *EdgeProxyClient) Do(req *http.Request) (*http.Response, error) {
|
||||
if c.proxyURL == "" {
|
||||
return nil, fmt.Errorf("proxy URL is not set")
|
||||
}
|
||||
|
||||
targetURL := req.URL.String()
|
||||
encodedURL := url.QueryEscape(targetURL)
|
||||
proxyURLWithParam := c.proxyURL + "?url=" + encodedURL
|
||||
|
||||
bodyBytes, err := readRequestBody(req)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
proxyReq, err := http.NewRequest(
|
||||
req.Method,
|
||||
proxyURLWithParam,
|
||||
bytes.NewBuffer(bodyBytes),
|
||||
)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error creating proxy request: %w", err)
|
||||
}
|
||||
|
||||
copyHeaders(req.Header, proxyReq.Header)
|
||||
|
||||
proxyResp, err := c.client.Do(proxyReq)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("proxy request failed: %w", err)
|
||||
}
|
||||
defer proxyResp.Body.Close()
|
||||
|
||||
return parseProxyResponse(proxyResp, req)
|
||||
}
|
||||
|
||||
func readRequestBody(req *http.Request) ([]byte, error) {
|
||||
if req.Body == nil {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
bodyBytes, err := io.ReadAll(req.Body)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error reading request body: %w", err)
|
||||
}
|
||||
|
||||
req.Body.Close()
|
||||
req.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
|
||||
|
||||
return bodyBytes, nil
|
||||
}
|
||||
|
||||
func copyHeaders(source, destination http.Header) {
|
||||
for name, values := range source {
|
||||
for _, value := range values {
|
||||
destination.Add(name, value)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func parseProxyResponse(proxyResp *http.Response, originalReq *http.Request) (*http.Response, error) {
|
||||
body, err := io.ReadAll(proxyResp.Body)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error reading proxy response: %w", err)
|
||||
}
|
||||
|
||||
var response models.EdgeProxyResponse
|
||||
if err := sonic.ConfigFastest.Unmarshal(body, &response); err != nil {
|
||||
return nil, fmt.Errorf("error parsing proxy response: %w", err)
|
||||
}
|
||||
|
||||
resp := &http.Response{
|
||||
StatusCode: response.StatusCode,
|
||||
Status: strconv.Itoa(response.StatusCode) + " " + http.StatusText(response.StatusCode),
|
||||
Body: io.NopCloser(bytes.NewBufferString(response.Text)),
|
||||
Header: make(http.Header),
|
||||
Request: originalReq,
|
||||
}
|
||||
|
||||
parsedResponseURL, err := url.Parse(response.URL)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error parsing response URL: %w", err)
|
||||
}
|
||||
resp.Request.URL = parsedResponseURL
|
||||
|
||||
for name, value := range response.Headers {
|
||||
resp.Header.Set(name, value)
|
||||
}
|
||||
|
||||
for _, cookie := range response.Cookies {
|
||||
resp.Header.Add("Set-Cookie", cookie)
|
||||
}
|
||||
|
||||
return resp, nil
|
||||
}
|
|
@ -3,6 +3,7 @@ package util
|
|||
import (
|
||||
"crypto/tls"
|
||||
"net/http"
|
||||
"time"
|
||||
)
|
||||
|
||||
func ChromeClientHelloSpec() *tls.ClientHelloInfo {
|
||||
|
@ -72,13 +73,12 @@ func NewChromeClient() *http.Client {
|
|||
Renegotiation: tls.RenegotiateNever,
|
||||
}
|
||||
|
||||
transport := &http.Transport{
|
||||
TLSClientConfig: tlsConfig,
|
||||
// chrome enables HTTP/2
|
||||
ForceAttemptHTTP2: true,
|
||||
}
|
||||
transport := GetBaseTransport()
|
||||
transport.TLSClientConfig = tlsConfig
|
||||
// chrome uses HTTP/2, but it's enabled by default in base transport
|
||||
|
||||
return &http.Client{
|
||||
Transport: transport,
|
||||
Timeout: 60 * time.Second,
|
||||
}
|
||||
}
|
||||
|
|
145
util/http.go
145
util/http.go
|
@ -1,11 +1,8 @@
|
|||
package util
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"govd/config"
|
||||
"govd/models"
|
||||
"io"
|
||||
"log"
|
||||
"net"
|
||||
"net/http"
|
||||
|
@ -13,8 +10,6 @@ import (
|
|||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/bytedance/sonic"
|
||||
)
|
||||
|
||||
var (
|
||||
|
@ -26,14 +21,14 @@ var (
|
|||
func GetDefaultHTTPClient() *http.Client {
|
||||
defaultClientOnce.Do(func() {
|
||||
defaultClient = &http.Client{
|
||||
Transport: createBaseTransport(),
|
||||
Transport: GetBaseTransport(),
|
||||
Timeout: 60 * time.Second,
|
||||
}
|
||||
})
|
||||
return defaultClient
|
||||
}
|
||||
|
||||
func createBaseTransport() *http.Transport {
|
||||
func GetBaseTransport() *http.Transport {
|
||||
return &http.Transport{
|
||||
Proxy: http.ProxyFromEnvironment,
|
||||
DialContext: (&net.Dialer{
|
||||
|
@ -65,26 +60,27 @@ func GetHTTPClient(extractor string) models.HTTPClient {
|
|||
var client models.HTTPClient
|
||||
|
||||
if cfg.EdgeProxyURL != "" {
|
||||
client = NewEdgeProxyClient(cfg.EdgeProxyURL)
|
||||
client = NewEdgeProxyFromConfig(cfg)
|
||||
} else {
|
||||
client = createClientWithProxy(cfg)
|
||||
client = NewClientFromConfig(cfg)
|
||||
}
|
||||
|
||||
extractorClients[extractor] = client
|
||||
return client
|
||||
}
|
||||
|
||||
func createClientWithProxy(cfg *models.ExtractorConfig) *http.Client {
|
||||
transport := createBaseTransport()
|
||||
|
||||
func NewClientFromConfig(cfg *models.ExtractorConfig) *http.Client {
|
||||
var baseClient *http.Client
|
||||
if cfg.Impersonate {
|
||||
baseClient = NewChromeClient()
|
||||
} else {
|
||||
baseClient = GetDefaultHTTPClient()
|
||||
}
|
||||
transport := GetBaseTransport()
|
||||
if cfg.HTTPProxy != "" || cfg.HTTPSProxy != "" {
|
||||
configureProxyTransport(transport, cfg)
|
||||
}
|
||||
|
||||
return &http.Client{
|
||||
Transport: transport,
|
||||
Timeout: 60 * time.Second,
|
||||
}
|
||||
baseClient.Transport = transport
|
||||
return baseClient
|
||||
}
|
||||
|
||||
func configureProxyTransport(
|
||||
|
@ -100,20 +96,16 @@ func configureProxyTransport(
|
|||
log.Printf("warning: invalid HTTP proxy URL '%s': %v\n", cfg.HTTPProxy, err)
|
||||
}
|
||||
}
|
||||
|
||||
if cfg.HTTPSProxy != "" {
|
||||
httpsProxyURL, err = url.Parse(cfg.HTTPSProxy)
|
||||
if err != nil {
|
||||
log.Printf("warning: invalid HTTPS proxy URL '%s': %v\n", cfg.HTTPSProxy, err)
|
||||
}
|
||||
}
|
||||
|
||||
if httpProxyURL == nil && httpsProxyURL == nil {
|
||||
return
|
||||
}
|
||||
|
||||
noProxyList := parseNoProxyList(cfg.NoProxy)
|
||||
|
||||
transport.Proxy = func(req *http.Request) (*url.URL, error) {
|
||||
if shouldBypassProxy(req.URL.Hostname(), noProxyList) {
|
||||
return nil, nil
|
||||
|
@ -155,112 +147,3 @@ func shouldBypassProxy(host string, noProxyList []string) bool {
|
|||
}
|
||||
return false
|
||||
}
|
||||
|
||||
type EdgeProxyClient struct {
|
||||
client *http.Client
|
||||
proxyURL string
|
||||
}
|
||||
|
||||
func NewEdgeProxyClient(proxyURL string) *EdgeProxyClient {
|
||||
return &EdgeProxyClient{
|
||||
client: &http.Client{
|
||||
Transport: createBaseTransport(),
|
||||
Timeout: 60 * time.Second,
|
||||
},
|
||||
proxyURL: proxyURL,
|
||||
}
|
||||
}
|
||||
|
||||
func (c *EdgeProxyClient) Do(req *http.Request) (*http.Response, error) {
|
||||
if c.proxyURL == "" {
|
||||
return nil, fmt.Errorf("proxy URL is not set")
|
||||
}
|
||||
|
||||
targetURL := req.URL.String()
|
||||
encodedURL := url.QueryEscape(targetURL)
|
||||
proxyURLWithParam := c.proxyURL + "?url=" + encodedURL
|
||||
|
||||
bodyBytes, err := readRequestBody(req)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
proxyReq, err := http.NewRequest(
|
||||
req.Method,
|
||||
proxyURLWithParam,
|
||||
bytes.NewBuffer(bodyBytes),
|
||||
)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error creating proxy request: %w", err)
|
||||
}
|
||||
|
||||
copyHeaders(req.Header, proxyReq.Header)
|
||||
|
||||
proxyResp, err := c.client.Do(proxyReq)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("proxy request failed: %w", err)
|
||||
}
|
||||
defer proxyResp.Body.Close()
|
||||
|
||||
return parseProxyResponse(proxyResp, req)
|
||||
}
|
||||
|
||||
func readRequestBody(req *http.Request) ([]byte, error) {
|
||||
if req.Body == nil {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
bodyBytes, err := io.ReadAll(req.Body)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error reading request body: %w", err)
|
||||
}
|
||||
|
||||
req.Body.Close()
|
||||
req.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
|
||||
|
||||
return bodyBytes, nil
|
||||
}
|
||||
|
||||
func copyHeaders(source, destination http.Header) {
|
||||
for name, values := range source {
|
||||
for _, value := range values {
|
||||
destination.Add(name, value)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func parseProxyResponse(proxyResp *http.Response, originalReq *http.Request) (*http.Response, error) {
|
||||
body, err := io.ReadAll(proxyResp.Body)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error reading proxy response: %w", err)
|
||||
}
|
||||
|
||||
var response models.ProxyResponse
|
||||
if err := sonic.ConfigFastest.Unmarshal(body, &response); err != nil {
|
||||
return nil, fmt.Errorf("error parsing proxy response: %w", err)
|
||||
}
|
||||
|
||||
resp := &http.Response{
|
||||
StatusCode: response.StatusCode,
|
||||
Status: fmt.Sprintf("%d %s", response.StatusCode, http.StatusText(response.StatusCode)),
|
||||
Body: io.NopCloser(bytes.NewBufferString(response.Text)),
|
||||
Header: make(http.Header),
|
||||
Request: originalReq,
|
||||
}
|
||||
|
||||
parsedResponseURL, err := url.Parse(response.URL)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error parsing response URL: %w", err)
|
||||
}
|
||||
resp.Request.URL = parsedResponseURL
|
||||
|
||||
for name, value := range response.Headers {
|
||||
resp.Header.Set(name, value)
|
||||
}
|
||||
|
||||
for _, cookie := range response.Cookies {
|
||||
resp.Header.Add("Set-Cookie", cookie)
|
||||
}
|
||||
|
||||
return resp, nil
|
||||
}
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue