Skip to content
This repository has been archived by the owner on Sep 11, 2020. It is now read-only.

As a server Shortcuts/Optimisations? #627

Open
freman opened this issue Oct 25, 2017 · 3 comments
Open

As a server Shortcuts/Optimisations? #627

freman opened this issue Oct 25, 2017 · 3 comments

Comments

@freman
Copy link

freman commented Oct 25, 2017

Hi

I'm trying to write a caching proxy server that we can use to cache locally various repos from github (and other places) that we use heavily but I'm running into a performance issue

I'm more than happy to concede that we won't beat github for speed but I'm finding this to be a great deal slower.

This is a greatly simplified version of what I'm running in the main codebase

package main

import (
	"compress/gzip"
	"io"
	"net/http"
	"os"
	"path"
	"strings"

	"gopkg.in/src-d/go-git.v4/plumbing/protocol/packp"
	"gopkg.in/src-d/go-git.v4/plumbing/transport"
	"gopkg.in/src-d/go-git.v4/plumbing/transport/server"
)

func main() {
	wd, _ := os.Getwd()
	http.ListenAndServe(":8822", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		w.Header().Set("Cache-Control", "no-cache")

		s := strings.SplitN(strings.TrimLeft(r.URL.Path, "/"), "/", 2)
		ep, _ := transport.NewEndpoint(path.Join("file://", wd, s[0]))
		ups, _ := server.DefaultServer.NewUploadPackSession(ep, nil)
		if strings.Contains(r.URL.Path, "info") {
			advs, _ := ups.AdvertisedReferences()
			advs.Prefix = [][]byte{
				[]byte("# service=git-upload-pack"),
				[]byte(""),
			}
			w.Header().Set("Content-Type", "application/x-git-upload-pack-advertisement")
			advs.Encode(w)
			return
		}
		defer r.Body.Close()
		var rdr io.ReadCloser = r.Body

		if r.Header.Get("Content-Encoding") == "gzip" {
			rdr, _ = gzip.NewReader(r.Body)
		}

		upakreq := packp.NewUploadPackRequest()
		upakreq.Decode(rdr)

		up, _ := ups.UploadPack(r.Context(), upakreq)
		w.Header().Set("Content-Type", "application/x-git-upload-pack-result")
		up.Encode(w)
	}))
}

if you clone the aws/aws-sdk-go repo git clone --quiet --mirror https://github.com/aws/aws-sdk-go aws-sdk-go into the same directory as you put this go file

Then go run main.go you can do the following tests

$ time git clone --quiet https://github.com/aws/aws-sdk-go

real	0m23.816s
user	0m4.047s
sys	0m1.289s
$ time git clone --quiet http://localhost:8822/aws-sdk-go

real	2m18.718s
user	0m3.793s
sys	0m1.165s

The time cost here is entirely in cloning from scratch, pulling seems plenty fast.

I did profile my code and I found it spent most of it's time in encoder.go
image

Can anyone think of any way to shortcut this process for a clone if not optimise the code?

If instead of using git clone --mirror to create the aws-sdk-dir and I use go-git to clone it I even get pre-packed files

./objects/pack/pack-41174c775d8b7f517d5db3c20d52b0e5379fe9de.idx
./objects/pack/pack-41174c775d8b7f517d5db3c20d52b0e5379fe9de.pack

Perhaps for a fresh clone I can just ship that?

@freman
Copy link
Author

freman commented Oct 25, 2017

Out of curiosity I tested piping to and from git-upload-pack

$ time git clone --quiet http://localhost:8822/aws-sdk-go

real	0m3.545s
user	0m3.573s
sys	0m0.790s

This is the result I was kinda hoping for, but I'd still be happy with githubish speed

@mcuadros
Copy link
Contributor

mcuadros commented Nov 2, 2017

The problem is that the packfile is being calculated and all the deltas, and this is expensive operation.

@mcuadros
Copy link
Contributor

mcuadros commented Dec 20, 2017

Just a bit more information about the evolution of the problem:

The baseline a git server local server serving the example repository of aws-sdk-go, executed with git daemon --verbose --base-path=/tmp --export-all /tmp/aws-sdk-go

git clone

Baseline, git daemon (0:04.80elapsed)

time  git clone git://localhost/aws-sdk-go                                                                                                                                        mcuadros@mcuadros-xps-arch
Cloning into 'aws-sdk-go'...
remote: Counting objects: 43180, done.
remote: Compressing objects: 100% (13799/13799), done.
remote: Total 43180 (delta 25368), reused 43176 (delta 25366)
Receiving objects: 100% (43180/43180), 47.17 MiB | 29.56 MiB/s, done.
Resolving deltas: 100% (25368/25368), done.
9.01user 0.40system 0:04.80elapsed 196%CPU (0avgtext+0avgdata 142104maxresident)k
816inputs+251424outputs (0major+48256minor)pagefaults 0swaps

After #697 (1:10.97elapsed)

 time git clone http://localhost:8080/aws-sdk-go                                                                                                                                               mcuadros@mcuadros-xps-arch
Cloning into 'aws-sdk-go'...
Receiving objects: 100% (43180/43180), 43.85 MiB | 3.61 MiB/s, done.
Resolving deltas: 100% (26823/26823), done.
9.35user 0.50system 1:10.97elapsed 13%CPU (0avgtext+0avgdata 159668maxresident)k
0inputs+0outputs (0major+50427minor)pagefaults 0swaps

Before #697 (2:28.55elapsed)

time git clone http://localhost:8080/aws-sdk-go                                                                                                                                                    mcuadros@mcuadros-xps-arch
Cloning into 'aws-sdk-go'...
Receiving objects: 100% (43180/43180), 55.80 MiB | 3.65 MiB/s, done.
Resolving deltas: 100% (24464/24464), done.
9.82user 0.63system 2:28.55elapsed 7%CPU (0avgtext+0avgdata 135676maxresident)k
0inputs+0outputs (0major+36211minor)pagefaults 0swaps

git fetch origin v0.6.0

Baseline, git daemon (0:00.98elapsed):

 time git fetch origin v0.6.0                                                                                                                                                     mcuadros@mcuadros-xps-arch
remote: Counting objects: 12247, done.
remote: Compressing objects: 100% (4166/4166), done.
remote: Total 12247 (delta 6741), reused 12233 (delta 6741)
Receiving objects: 100% (12247/12247), 8.89 MiB | 27.17 MiB/s, done.
Resolving deltas: 100% (6741/6741), done.
From git://localhost/aws-sdk-go
 * tag               v0.6.0     -> FETCH_HEAD
1.52user 0.09system 0:00.98elapsed 165%CPU (0avgtext+0avgdata 9148maxresident)k
0inputs+18896outputs (0major+4389minor)pagefaults 0swaps

After #697 (0:11.95elapsed)

time git fetch origin v0.6.0                                                                                                                                                                       mcuadros@mcuadros-xps-arch
Receiving objects: 100% (12247/12247), 8.13 MiB | 2.62 MiB/s, done.
Resolving deltas: 100% (7324/7324), done.
From http://localhost:8080/aws-sdk-go
 * tag               v0.6.0     -> FETCH_HEAD
1.65user 0.14system 0:11.95elapsed 15%CPU (0avgtext+0avgdata 9888maxresident)k
0inputs+0outputs (0major+5590minor)pagefaults 0swaps

Before #697 (0:19.10elapsed)

time git fetch origin v0.6.0                                                                                                                                                                       mcuadros@mcuadros-xps-arch
Receiving objects: 100% (12247/12247), 8.38 MiB | 3.01 MiB/s, done.
Resolving deltas: 100% (6967/6967), done.
From http://localhost:8080/aws-sdk-go
 * tag               v0.6.0     -> FETCH_HEAD
1.63user 0.08system 0:19.10elapsed 8%CPU (0avgtext+0avgdata 12980maxresident)k
0inputs+0outputs (0major+5480minor)pagefaults 0swaps

@smola smola mentioned this issue Jul 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants