Pipe out the archives directly from git-archive(1) Using a crude for i in $(seq 1 10); do curl -w '%{time_connect}:%{time_starttransfer}:%{time_total}\n' http://127.0.0.1:5001/~nabijaczleweli/linux/archive/HEAD.tar.gz --output /dev/null >> timing; done where the HEAD was at the v3.0 tag of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git yielded 0.001489:45.133830:46.793669 0.001673:27.780585:29.399757 0.001416:27.351536:29.024689 0.001388:27.486558:29.180511 0.002239:27.299490:28.925065 0.001342:27.041805:28.740544 0.001558:27.465697:29.030950 0.001546:28.010680:29.604439 0.001819:27.551466:29.148465 0.001145:27.407098:29.040098 0.001493:27.597439:29.094110 0.001772:27.429221:29.095903 before this and 0.001991:0.285543:28.874766 0.001626:0.186180:28.290034 0.001195:0.196463:28.570427 0.001632:0.182806:29.050415 0.001598:0.184604:29.399892 0.001398:0.192858:29.184659 0.001458:0.186850:29.141446 0.002366:0.194297:28.997083 0.001390:0.184061:29.152253 0.001932:0.219032:29.557687 0.001435:0.182397:28.982165 after; stripping the obvious outlier at the top, this averages out to 0.001581 :27.4928704545455 :29.1167755454545 0.00163827272727273:0.199553727272727:29.018257 Note the *27.5 seconds* to first byte, as gzip was writing to tmpfs first (qemu-system-x86_64 -enable-kvm -smp 6 -m 4g -drive format=raw -device virtio-blk-pci,drive=rootfs -net nic,model=virtio-net-pci on a two-Xeon E5645 @ 2.4GHz + six-16GB HMT42GR7AFR4A-PB @ 1600MT/s host, gzip was at 100% CPU, git hovered around 22%), but low overall impact on total transfer time. However, the cURL output also reveals another set of data: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 92.5M 0 92.5M 0 0 3223k 0 --:--:-- 0:00:29 --:--:-- 22.2M 100 92.5M 0 92.5M 0 0 3265k 0 --:--:-- 0:00:29 --:--:-- 24.4M 100 92.5M 0 92.5M 0 0 3247k 0 --:--:-- 0:00:29 --:--:-- 23.4M 100 92.5M 0 92.5M 0 0 3276k 0 --:--:-- 0:00:28 --:--:-- 25.1M 100 92.5M 0 92.5M 0 0 3297k 0 --:--:-- 0:00:28 --:--:-- 20.5M 100 92.5M 0 92.5M 0 0 3264k 0 --:--:-- 0:00:29 --:--:-- 19.3M 100 92.5M 0 92.5M 0 0 3201k 0 --:--:-- 0:00:29 --:--:-- 21.2M 100 92.5M 0 92.5M 0 0 3251k 0 --:--:-- 0:00:29 --:--:-- 23.6M 100 92.5M 0 92.5M 0 0 3263k 0 --:--:-- 0:00:29 --:--:-- 24.3M 100 92.5M 0 92.5M 0 0 3257k 0 --:--:-- 0:00:29 --:--:-- 19.0M before and % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 92.5M 0 92.5M 0 0 3349k 0 --:--:-- 0:00:28 --:--:-- 3134k 100 92.5M 0 92.5M 0 0 3317k 0 --:--:-- 0:00:28 --:--:-- 3478k 100 92.5M 0 92.5M 0 0 3262k 0 --:--:-- 0:00:29 --:--:-- 3322k 100 92.5M 0 92.5M 0 0 3223k 0 --:--:-- 0:00:29 --:--:-- 3174k 100 92.5M 0 92.5M 0 0 3247k 0 --:--:-- 0:00:29 --:--:-- 3369k 100 92.5M 0 92.5M 0 0 3252k 0 --:--:-- 0:00:29 --:--:-- 3132k 100 92.5M 0 92.5M 0 0 3268k 0 --:--:-- 0:00:28 --:--:-- 3305k 100 92.5M 0 92.5M 0 0 3250k 0 --:--:-- 0:00:29 --:--:-- 3216k 100 92.5M 0 92.5M 0 0 3206k 0 --:--:-- 0:00:29 --:--:-- 3084k 100 92.5M 0 92.5M 0 0 3269k 0 --:--:-- 0:00:28 --:--:-- 3164k after, and the speed in the after run was relatively constant, but, rather predictably, the before speed just filled the SSH tunnel, so I'm expecting huge differences for users across slow links, with changes to the time equation from 92.5MB/3MBs^-1 + 92.5MB/link_speed to 92.5MB/max(3MBs^-1, link_speed) The one potential downside of this approach is that we can no longer return a 500 if git returns non-0, but I doubt that's a common occurrence
1 files changed, 11 insertions(+), 29 deletions(-) M gitsrht/blueprints/repo.py
M gitsrht/blueprints/repo.py => gitsrht/blueprints/repo.py +11 -29
@@ 288,36 288,18 @@ def archive(owner, repo, ref): if not isinstance(commit, pygit2.Commit): abort(404) path = f"/tmp/{commit.id.hex}{binascii.hexlify(os.urandom(8))}.tar.gz" try: args = [ "git", "--git-dir", repo.path, "archive", "--format=tar.gz", "--prefix", f"{repo.name}-{ref}/", "-o", path, ref ] subp = subprocess.run(args, timeout=30, stdout=sys.stdout, stderr=sys.stderr) except: try: os.unlink(path) except: pass raise if subp.returncode != 0: try: os.unlink(path) except: pass return "Error preparing archive", 500 args = [ "git", "--git-dir", repo.path, "archive", "--format=tar.gz", "--prefix", f"{repo.name}-{ref}/", ref ] subp = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=sys.stderr) f = open(path, "rb") os.unlink(path) return send_file(f, mimetype="application/tar+gzip", as_attachment=True, attachment_filename=f"{repo.name}-{ref}.tar.gz") return send_file(subp.stdout, mimetype="application/tar+gzip", as_attachment=True, attachment_filename=f"{repo.name}-{ref}.tar.gz") class _AnnotatedRef: def __init__(self, repo, ref):