Python's packaging is still a mess.

I need to download source distributions to later build them on a remote system. Python's package manager "pip" turns out to be useless for this. It has no programmatic API and can't solve this simple task even as a subprocess. Even when I request no dependencies of a package to be downloaded, its metadata will be collected which for source distributions means executing, which means installing build dependencies which means wasting a ton of time.


All that to download somewhere between a few KB and MB of data. So I need to do that manually but the package index API, while technically "simple" (it's in the name) requires me to parse HTML, which is slow and is really uncomfortable to do using the stdlib parser (or any other popular parser that I've seen).

Oh, and did I mention that the API is so bare that it doesn't even contain *any* metadata? Yes, really. It not only doesn't provide information about dependencies (you have to download the package for that) but also lacks version, format, and platform data! You have to extract that all from the file names and I've yet to see where that's standardized (there's but it's just a draft and nobody follows it).

Try to make sense of this:

Show thread

I imagine the only "standard" is what pip does for which you'd have to go digging through its sources and copy their implementation because they don't expose a function for parsing file names (or anything else for that matter).

Or you can try to guess how it works on your own. Because that's definitely how one should tackle this problem. By figuring out the pattern and parsing it with a half-assed regex, right? I mean, how else would you figure out which file to download?

Show thread

I forgot PEP 503 does actually provide two kinds of metadata. One is a Python version constraint provided in the `data-requires-python` attribute of the HTML `a` element pointing to a specific file. There are also checksums of the files. Where? Of course in the fragment part of the URL pointing to the file (the part after `#`). That's it.

Show thread

There are also other APIs that PyPI provides that pip probably uses. The problem is that you might not always be using PyPI. That's the case for me and my package index supports none of the fancy APIs. PEP 503 is the lowest common denominator and that's kind of what we're stuck with.

It's also worth mentioning that IIRC PyPI's APIs don't have full coverage of all packages. So with some, you might get complete information in an easily digestible format, with others some stuff might be missing.

Show thread
Sign in to participate in the conversation

Spokojna przystań dla hakerów, mejkerów, i wszelkiej maści kreatywnych i technicznych osób z okolic trójmiasta. Celem jest łączyć osoby zaangażowane w różne społeczności na terenie trójmiasta i pozwalać na wymianę wspólnych zainteresowań.