<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Blog</title>
    <link rel="self" type="application/atom+xml" href="/atom.xml"/>
    <link rel="alternate" type="text/html" href="/"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-03-12T00:00:00+00:00</updated>
    <id>/atom.xml</id>
    <entry xml:lang="en">
        <title>What makes uv slow?</title>
        <published>2026-03-12T00:00:00+00:00</published>
        <updated>2026-03-12T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Konstantin Schütze
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="/uv-is-slow/"/>
        <id>/uv-is-slow/</id>
        
        <content type="html" xml:base="/uv-is-slow/">&lt;p&gt;A lot has been said about uv being fast, but not about how and when uv is slow. Let&#x27;s talk about both.&lt;&#x2F;p&gt;
&lt;p&gt;The paragraphs are in no particular order, except the first two (overview and benchmarks) and the last one (tooling).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;overview&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#overview&quot; aria-label=&quot;Anchor link for: overview&quot;&gt;Overview&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;We get a lot of questions if there&#x27;s some sort of hidden magic, if there are shortcuts we&#x27;re taking, but in practice it&#x27;s trying optimizations that worked well for others as well as a loop of &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;mstange&#x2F;samply&quot;&gt;profile&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;jyn.dev&#x2F;i-m-just-having-fun&#x2F;&quot;&gt;poke at the code&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;sharkdp&#x2F;hyperfine&quot;&gt;benchmark&lt;&#x2F;a&gt;, PR, repeat.&lt;&#x2F;p&gt;
&lt;p&gt;If you&#x27;re mainly interested in the resolver algorithm, head over to the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.astral.sh&#x2F;uv&#x2F;reference&#x2F;internals&#x2F;resolver&#x2F;&quot;&gt;uv internals docs&lt;&#x2F;a&gt;, the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;nex3.medium.com&#x2F;pubgrub-2fb6470504f&quot;&gt;pubgrub introduction&lt;&#x2F;a&gt; and the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pubgrub-rs-guide.pages.dev&#x2F;internals&#x2F;overview&quot;&gt;pubgrub-rs guide&lt;&#x2F;a&gt;. They cover the subject in-depth, i&#x27;ll only touch briefly on them here.&lt;&#x2F;p&gt;
&lt;p&gt;In general, what&#x27;s slow is always roughly the same, from high-level to low-level:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Invoking external (build) tools&lt;&#x2F;li&gt;
&lt;li&gt;Network requests&lt;&#x2F;li&gt;
&lt;li&gt;cpu heavy code, such as (de)compression, hashing, parsing and constraint solving&lt;&#x2F;li&gt;
&lt;li&gt;Disk reads and writes&lt;&#x2F;li&gt;
&lt;li&gt;Singlethreaded and synchronous code&lt;&#x2F;li&gt;
&lt;li&gt;Allocating, memory copies and syscalls&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Optimizations happen roughly in that order: For example, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pythonwheels.com&#x2F;&quot;&gt;transition from source distributions to wheels&lt;&#x2F;a&gt; removed the need to invoke build tools each time, then caching wheels reduced network IO massively, an unzipped cache built reduced the decompression burden, reflinking&#x2F;hardlinking turns file read&#x2F;writes into link read&#x2F;writes. Each of those is sped up proportionally by multithreading and IO concurrency, while avoiding allocations, memoization, faster hashmaps and similar local optimizations remove remaining bottlenecks throughout. uv uses existing heavily optimized implementations of staple cpu-heavy algorithms ((de)compression, hashing, http parsing, etc.)&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-24-1&quot;&gt;&lt;a href=&quot;#fn-24&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;; classic &quot;Big O&quot; optimizations are rare. The one exception is pubgrub, which exists as a library outside uv.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;benchmarks&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#benchmarks&quot; aria-label=&quot;Anchor link for: benchmarks&quot;&gt;Benchmarks&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Throughout this article, i&#x27;m using three main benchmarks: airflow, boto3 and homeassistant.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;apache&#x2F;airflow&#x2F;&quot;&gt;Apache airflow&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; is a massive workspace with 126 &lt;code&gt;apache-*&lt;&#x2F;code&gt; packages and 892 total packages in a resolved workspace. Resolving its dependencies requires some backtracking and includes some source distributions. It&#x27;s useful as the biggest real example that&#x27;s freely available.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;boto&#x2F;boto3&quot;&gt;boto3&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; is the official interface for AWS. It&#x27;s one of the most downloaded python packages, and creates a new release every time AWS changes. boto3 depends on botocore, which depends on a recent urllib3 version. When resolving with an old urllib3 version, such as &lt;code&gt;urllib3&amp;lt;1.25.4 boto3&lt;&#x2F;code&gt;, the resolver needs to do a lot of backtracking through boto3 and botocore versions, to find a boto3 version that depends on a botocore version that is compatible with &lt;code&gt;urllib3&amp;lt;1.25.4&lt;&#x2F;code&gt;. This is something that happens to users when they depend on one package that can use data stored on S3 and another one that requires a specific urllib3 range, so we adopted it as backtracking and parallel networking benchmark&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-19-1&quot;&gt;&lt;a href=&quot;#fn-19&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.home-assistant.io&#x2F;&quot;&gt;Homeassistant&lt;&#x2F;a&gt;&lt;&#x2F;strong&gt; is a home automation platform that can connect to about every smart home there is. Supporting all those diverse protocols requires a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;home-assistant&#x2F;core&#x2F;blob&#x2F;dev&#x2F;requirements_all.txt&quot;&gt;large number of dependencies&lt;&#x2F;a&gt;, and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;developers.home-assistant.io&#x2F;blog&#x2F;2024&#x2F;04&#x2F;03&#x2F;build-images-with-uv&#x2F;&quot;&gt;install performance matters a lot&lt;&#x2F;a&gt;. Despite the large number of dependencies, it&#x27;s not particularly interesting as a resolution, the dependency tree is rather flat and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;home-assistant&#x2F;core&#x2F;blob&#x2F;bbe1bf14aeec2315002133ee1da0cf394d7d7594&#x2F;script&#x2F;gen_requirements_all.py&quot;&gt;heavily constrained&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;If you have cases that are particularly slow except for source distributions, please let me know!&lt;&#x2F;p&gt;
&lt;p&gt;To give you a sense of the numbers, let&#x27;s compare a linux desktop (fiber), a raspberry pi 4B (fiber) and a windows laptop (wifi).&lt;&#x2F;p&gt;
&lt;p&gt;A cold cache &lt;code&gt;uv sync&lt;&#x2F;code&gt; in airflow takes 54s on my desktop and 10.5min on the raspberry pi. It&#x27;s dominated by building two large packages, art of that are two parallel builds (krb5 and gssapi) that take 44s and 22s respectively, the bottleneck for the whole process. The same operation on the raspberry pi takes 10.5 min. After the installation, &lt;code&gt;uv run python -V&lt;&#x2F;code&gt; takes 171ms on my desktop and 1.1s on the raspberry pi.&lt;&#x2F;p&gt;
&lt;p&gt;Plotly is ideal as small installation speed benchmark: It&#x27;s a popular package with a lot of files, but only two dependencies, all three with platform-independent builds. Installing plotly with a cold cache takes 310ms on the desktop, 2.3s on the raspberry pi and 4.8s on the windows laptop. With a warm cache, we get to skip the first phase with network requests and the download, and it goes down to 40ms for the dektop, 120ms on the raspberry pi and 900ms on the windows laptop. Windows is slow due NTFS, on a ReFS dev drive it takes only 450ms. &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-25-1&quot;&gt;&lt;a href=&quot;#fn-25&quot;&gt;3&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;toml-parsing&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#toml-parsing&quot; aria-label=&quot;Anchor link for: toml-parsing&quot;&gt;Toml parsing&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Toml is a file format that&#x27;s easy to write, doesn&#x27;t have a lot of complexity and is compatible with a lot of json tooling such as jsonschema. uv chose toml for &lt;code&gt;uv.lock&lt;&#x2F;code&gt; (just like &lt;code&gt;poetry.lock&lt;&#x2F;code&gt; and &lt;code&gt;pylock.toml&lt;&#x2F;code&gt;) for being user readable, and more importantly, generating good diffs when updating dependencies. Those are important for auditing dependency changes, such as through dependabot or renovate. &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-21-1&quot;&gt;&lt;a href=&quot;#fn-21&quot;&gt;4&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;&lt;&#x2F;p&gt;
&lt;p&gt;While json parsers can do &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;serde-rs&#x2F;json-benchmark&quot;&gt;1GB&#x2F;s parsing on a 10 year old laptop&lt;&#x2F;a&gt;, rust&#x27;s toml parser unfortunately is &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;epage.github.io&#x2F;blog&#x2F;2025&#x2F;07&#x2F;toml-09&#x2F;&quot;&gt;slow&lt;&#x2F;a&gt; &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-1-1&quot;&gt;&lt;a href=&quot;#fn-1&quot;&gt;5&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. For a large project such as apache airflow&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-6-1&quot;&gt;&lt;a href=&quot;#fn-6&quot;&gt;6&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;, it takes uv ~30ms just to read the lockfile. For comparison, &lt;code&gt;print(&quot;hello world :3&quot;)&lt;&#x2F;code&gt; is 10-15ms&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-11-1&quot;&gt;&lt;a href=&quot;#fn-11&quot;&gt;7&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;&#x2F;uv-is-slow&#x2F;airflow-spans.svg&quot;&gt;&lt;img src=&quot;&#x2F;uv-is-slow&#x2F;airflow-spans.png&quot; alt=&quot;An export from tracing-durations-export showing that uv.lock reading takes a lot of time, and then all the workspace pyproject.toml take a lot of time&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure: &lt;code&gt;uv run python -V&lt;&#x2F;code&gt; in a airflow fully installed airflow checkout. The x-axis is time, each blue bar represents a span of the kind on left running. In this case, the top bar is parsing &lt;code&gt;uv.lock&lt;&#x2F;code&gt;, while the many small bars are parsing the workspace member &lt;code&gt;pyproject.toml&lt;&#x2F;code&gt; files to decide whether the lock needs to be updates (satisfies). Click the image for a (large) svg with interactive tooltips showing the path for each parsed file.&lt;&#x2F;p&gt;
&lt;p&gt;Obviously, you should also avoid &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;18311&quot;&gt;parsing &lt;code&gt;pyproject.toml&lt;&#x2F;code&gt; 131 times&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;But even if you parse every &lt;code&gt;pyproject.toml&lt;&#x2F;code&gt; only once, large workspaces such as airflow have an overhead: &lt;code&gt;uv run&lt;&#x2F;code&gt; needs to check whether the lockfile is fresh or whether we need to resolve and install change dependencies, and that needs to look at each local package and parse its &lt;code&gt;pyproject.toml&lt;&#x2F;code&gt;. Non-local dependencies either have static metadata (registry dependencies), are only updated on demand (git dependencies) or follow refreshing rules (http dependencies). For airflow, it takes ~20ms just to check whether the lockfile is satisfied, almost entirely for workspace member parsing (the &quot;satisfies&quot; bar in the plot above).&lt;&#x2F;p&gt;
&lt;p&gt;For registry, git and http (link to a file) dependencies, uv stores the metadata for registry dependencies in a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;rkyv.org&#x2F;&quot;&gt;zero-copy format&lt;&#x2F;a&gt;. With rkyv it only needs to do the minimal validations against undefined behavior and pointer rewriting, the data on disk is in the same format as the in-memory representation, no parsing involved. Loading those is fast (though not entirely memory efficient at the moment) in cases in which uv does need to resolve, and not only to validate.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;version-representation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#version-representation&quot; aria-label=&quot;Anchor link for: version-representation&quot;&gt;Version representation&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Surprisingly, a big bottleneck is the type used for representing package versions. As the index page for a package gives us all versions of that package, and all clauses in the resolver use versions as literals, we create and copy a lot of versions, so much that with a naive version type, benchmarks are dominated by version operations. This isn&#x27;t only the case for uv, but also &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;iscinumpy.dev&#x2F;post&#x2F;packaging-faster&#x2F;&quot;&gt;a bottleneck for pip and packaging&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;uv uses an &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;789&quot;&gt;optimized version parser and small version representation&lt;&#x2F;a&gt;. The version type is internally an enum (tagged union)&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-22-1&quot;&gt;&lt;a href=&quot;#fn-22&quot;&gt;8&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; of a small type for a subset of versions that fits into 8 plus 8 bytes &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-3-1&quot;&gt;&lt;a href=&quot;#fn-3&quot;&gt;9&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; and a large, heap allocated type that can represent any version. The small type can represent notably represent three part versions with components &amp;lt;255 and an optional prerelease. Its bytes are ordered in the same way that versions are and copying it is a 16 byte copy. The large variant supports the full feature set of &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0440&#x2F;&quot;&gt;PEP 440&lt;&#x2F;a&gt;, some of which require separate heap allocations, such as the strings in a local version. In the version enum, the large version is wrapped in an &lt;code&gt;Arc&lt;&#x2F;code&gt;, rust&#x27;s basic reference counting type, which itself is only a point to the heap allocation of reference count and payload. Copying &lt;code&gt;Arc&lt;&#x2F;code&gt;s is generally fast, it&#x27;s an atomic number increase, but can become notable at resolver-scale.&lt;&#x2F;p&gt;
&lt;p&gt;Resolutions that include packages with a huge number of releases that don&#x27;t fit into the small version, such as many too large dev releases, make uv slow. For example, concluding and reporting that &lt;code&gt;tensorflow-io-nightly&lt;&#x2F;code&gt; can&#x27;t be used with python 3.14 takes 500ms, as this package has a huge number of releases that all have too large dev versions. (Finding a valid solution on python 3.10 takes 40ms.)&lt;&#x2F;p&gt;
&lt;p&gt;Removing the small version optimization makes resolving boto3 2.1x slower (490ms -&amp;gt; 1030ms). Additionally replacing the &lt;code&gt;Arc&lt;&#x2F;code&gt; in the full version with a &lt;code&gt;Box&lt;&#x2F;code&gt;, this becomes 4.7x (490ms -&amp;gt; 2270ms). Notably, removing the &lt;code&gt;Arc&lt;&#x2F;code&gt; optimization without removing the small version has almost no effect: Almost all versions fit into the small variant.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;object type=&quot;image&#x2F;svg+xml&quot; alt=&quot;A flamegraph showing the baseline&quot; data=&quot;boto3-uv-0-10-12.svg&quot; width=&quot;100%&quot;&gt;&lt;&#x2F;object&gt;
&lt;object type=&quot;image&#x2F;svg+xml&quot; alt=&quot;A flamegraph showing increased amounts of `Version::cmp_slow`&quot; data=&quot;boto3-uv-no-version-small.svg&quot; width=&quot;100%&quot;&gt;&lt;&#x2F;object&gt;
&lt;object type=&quot;image&#x2F;svg+xml&quot; alt=&quot;A flamegraph showing increased amounts of `Version::cmp_slow`, `Drop` and `Box::clone`&quot; data=&quot;boto3-uv-no-version-small-no-version-arc.svg&quot; width=&quot;100%&quot;&gt;&lt;&#x2F;object&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure: A performance ablation of uv resolving boto3. The first flamegraph is the optimized version. In the second, the small version type is disabled and the version comparisons become notable. In the third once, the large version doesn&#x27;t use an &lt;code&gt;Arc&lt;&#x2F;code&gt; anymore on top of the small version being, and version cloning and dropping starts to dominate. Note that they while they have the same width, they represent increasing amounts of time. TODO: Cut to 1% frames&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;uv-is-slow&#x2F;no_version_optimization_boto3.png&quot; alt=&quot;A screenshot from samply with the inverted stack trace active, showing a lot of uv_pep440 functions&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure: The samply inverted stack trace of resolving boto3 without the small version optimization in uv&#x27;s version implementation, &lt;code&gt;uv_pep440&lt;&#x2F;code&gt;. It shows the functions that take the most time on their own (flamegraph leaves), rather than the total time of each.&lt;&#x2F;p&gt;
&lt;p&gt;TODO: EXAMPLES and DATA for the outlier from the ecosystem test&lt;&#x2F;p&gt;
&lt;p&gt;TODO: samply plot from the ecosystem test&lt;&#x2F;p&gt;
&lt;p&gt;Another important packaging type are platform markers, whose arbitrary boolean expressions uv represents with &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;5898&quot;&gt;algebraic decision diagrams&lt;&#x2F;a&gt;. Those are technically an SMT solver, parallel to pubgrub being a SAT solver. This is one of the few locations where with a classic algorithms and data structures optimization. The marker representation is canonical, deterministic and allows queries such as whether one marker expression is a subset of another.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;missing-wheels-and-dynamic-metadata&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#missing-wheels-and-dynamic-metadata&quot; aria-label=&quot;Anchor link for: missing-wheels-and-dynamic-metadata&quot;&gt;Missing wheels and dynamic metadata&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Python packages come in two forms, source distributions and (built) wheels. If any wheel exists, uv can use the dependency information from the wheel during resolution, which is always statically encoded &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-12-1&quot;&gt;&lt;a href=&quot;#fn-12&quot;&gt;10&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;&lt;sup&gt;,&lt;&#x2F;sup&gt;&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-13-1&quot;&gt;&lt;a href=&quot;#fn-13&quot;&gt;11&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. When there&#x27;s no wheel, modern source distributions (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0643&#x2F;&quot;&gt;PEP 643&lt;&#x2F;a&gt;, core metadata v2.2) can specify static dependency information. If they don&#x27;t, or they use a too old core metadata version, getting this information from a source distribution is a complex process that requires downloading the entire source distribution, setting up a separate venv with an independently resolved set of build dependencies and invoking python hooks. This alone can be slower than an entire resolution with static metadata. If the source distribution requires building native code, it&#x27;s even slower.&lt;&#x2F;p&gt;
&lt;p&gt;During resolution, uv may have to do this process for several versions of a package as it backtracks through finding a suitable version. During installation, it can use cached builds, but installing on a cold cache system needs to build all missing wheels, which is several times to magnitudes slower than the wheel installation process.&lt;&#x2F;p&gt;
&lt;p&gt;TODO: An example is resolving airflow, ...&lt;&#x2F;p&gt;
&lt;h2 id=&quot;missing-index-protocols&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#missing-index-protocols&quot; aria-label=&quot;Anchor link for: missing-index-protocols&quot;&gt;Missing index protocols&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Another potential reason for it being slow is that the registry&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-14-1&quot;&gt;&lt;a href=&quot;#fn-14&quot;&gt;12&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; doesn&#x27;t implement &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0658&#x2F;&quot;&gt;PEP 658&lt;&#x2F;a&gt; (Static Distribution Metadata in the Simple Repository API) nor supports &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;developer.mozilla.org&#x2F;en-US&#x2F;docs&#x2F;Web&#x2F;HTTP&#x2F;Guides&#x2F;Range_requests&quot;&gt;http range requests&lt;&#x2F;a&gt;, an &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20120213004500&#x2F;https:&#x2F;&#x2F;www.codeproject.com&#x2F;articles&#x2F;8688&#x2F;extracting-files-from-a-remote-zip-archive&quot;&gt;old trick&lt;&#x2F;a&gt; that allows reading the metadata file from a wheel without downloading the whole wheel. When both are missing, uv has to download the whole file for each package, and in the bad case that we&#x27;re backtracking on torch, we download hundreds of megabytes repeatedly.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;compression&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#compression&quot; aria-label=&quot;Anchor link for: compression&quot;&gt;Compression&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Wheels are zip-compressed with default DEFLATE algorithm, while source distributions are gzip compressed tar-archives. DEFLATE (1990) and gzip (1992) both produce larger files and with slower decompression than modern compressions such as zstandard (2015). While uv uses a streaming &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;rs-async-zip&quot;&gt;zip&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;tokio-tar&quot;&gt;tar&lt;&#x2F;a&gt; decoder to combine downloading and unzipping into the cache, at least the larger download sizes are notable. The slower decompression may not be noticeable as it&#x27;s usually still faster than the download itself, but leads to unnecessarily high cpu load.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;hashing&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#hashing&quot; aria-label=&quot;Anchor link for: hashing&quot;&gt;Hashing&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;uv needs to check the hashes of all artifacts it downloads, to ensure they haven&#x27;t been manipulated&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-4-1&quot;&gt;&lt;a href=&quot;#fn-4&quot;&gt;13&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. Using SHA-256, this is notably slow. While hashing is by design slow, a different hashing algorithms such as blake3 would be faster. uv currently follows SHA-256 being the ecosystem standard and widely available from registries.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;virtual-environments&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#virtual-environments&quot; aria-label=&quot;Anchor link for: virtual-environments&quot;&gt;Virtual environments&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Installing python packages involves copying all the packages into a directory for each installation. Even if you reflink them (also known as copy-on-write, supported on the default mac filesystem, some linux filesystems and rarely on windows) or hardlink them, and even if you do this with a thread per package, you still have to walk the whole source tree and copy it. There is no fundamental reason why python can&#x27;t use packages from a centralized store, as other ecosystems do.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;&#x2F;uv-is-slow&#x2F;transformers-torch-spans.svg&quot;&gt;&lt;img src=&quot;&#x2F;uv-is-slow&#x2F;transformers-torch-spans.png&quot; alt=&quot;An export from tracing-durations-export showing that solve takes 10ms and installing (multithreaded) 75ms&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure: &lt;code&gt;uv pip install transformers[torch]&lt;&#x2F;code&gt; into a fresh venv from a warm cache on linux (hardlinking). Each blue bar in the link step is a package, where the long 70ms bar is installing torch. &quot;solve&quot; is the dependency resolution step, each &quot;link_wheel_files&quot; us installing a package from the cache. The source svg with more details is linked.&lt;&#x2F;p&gt;
&lt;p&gt;TODO: use a &lt;code&gt;uv sync&lt;&#x2F;code&gt; plot, prune it and display the svg&lt;&#x2F;p&gt;
&lt;p&gt;All those packages in the cache are already unpacked and in the correct general shape for python to import, the installation process largely duplicates the existing tree.&lt;&#x2F;p&gt;
&lt;p&gt;uv recently switched to trying &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;18117&quot;&gt;reflinking by default on linux&lt;&#x2F;a&gt;, which is faster on the filesystems that support it (many servers and CI machines use ZFS or XFS), unless it&#x27;s a filesystem where somehow &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;issues&#x2F;18259&quot;&gt;reflinking is slower than hardlinking&lt;&#x2F;a&gt;. There isn&#x27;t a known way to detect which methods are supported without trying them, nor to know which one is the fastest.&lt;&#x2F;p&gt;
&lt;p&gt;Accepting venvs, uv is comparatively slower to bun building &lt;code&gt;node_modules&lt;&#x2F;code&gt; inside of a package.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;backtracking&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#backtracking&quot; aria-label=&quot;Anchor link for: backtracking&quot;&gt;Backtracking&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;uv uses pubgrub for dependency resolution. Checkout &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.astral.sh&#x2F;uv&#x2F;reference&#x2F;internals&#x2F;resolver&#x2F;&quot;&gt;this page&lt;&#x2F;a&gt; and the linked pubgrub docs on how uv&#x27;s solver works - it&#x27;s too complex to repeat here. The talk &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=gSKTfG1GXYQ&quot;&gt;uv: an extremely fast package manager&lt;&#x2F;a&gt; also has a great explanation, including some of the algorithmic and implementation optimizations. Pubgrub is responsible when uv is fast during complex backtracking scenarios and also when it gives helpful error messages about failed resolutions&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-15-1&quot;&gt;&lt;a href=&quot;#fn-15&quot;&gt;14&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;In a basic pubgrub implementation, once you choose a package, you try versions from most preferred to least preferred (newest to oldest), only discarding are version if no solution with it is possible. A bad decision early on can lead to trying a lot of versions, only to fail at the same conflict over and over again. uv extends the basic pubgrub algorithm to record which packages are involved in a conflict, and when two packages cause too much backtracking, switch their priorities and manually backtrack before both of them, allowing pubgrub to explore a different part of the solution space. There is a detailed write-up of &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;issues&#x2F;8157&quot;&gt;the problem&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;9843&quot;&gt;the solution&lt;&#x2F;a&gt;. This however &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;issues&#x2F;12060&quot;&gt;doesn&#x27;t work&lt;&#x2F;a&gt; if the conflict is through an indirection.&lt;&#x2F;p&gt;
&lt;p&gt;uv has to be somewhat conservative when it comes package priorities and heuristics, to ensure the robustness of the solution and to keep churn minimal, e.g. from updating a single package. A faster resolver could always pick packages and versions in the order in which it gets metadata, but that would mean running &lt;code&gt;uv pip compile&lt;&#x2F;code&gt; or &lt;code&gt;uv lock&lt;&#x2F;code&gt; twice with no new version could give different results. Instead, uv prioritizes based constraint wideness and the order of discovery, which is deterministic giving the registry doesn&#x27;t change (or an exclude newer cutoff is set).&lt;&#x2F;p&gt;
&lt;p&gt;In the case from above, where there&#x27;s a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;15969&quot;&gt;missing optimization&lt;&#x2F;a&gt;, we ended up &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;issues&#x2F;12060&quot;&gt;not shipping&lt;&#x2F;a&gt; it. This would have been a behavior change in the resolver, and the library that initially motivated the change had solved the problem with better dependency bounds in the meantime. When resolving the top 15k pypi packages by downloads with &lt;code&gt;uv pip compile&lt;&#x2F;code&gt;, the slowest ones take ~160ms on my machine, without particularly strong outliers. How good or bad dependency structures worked in old tools, most notably pip, has shaped what subset of the theoretically SAT-hard space packages actually occupcy. A resolver can be fast by handling those patterns, and while it&#x27;s easy to craft stress test examples, i&#x27;m not aware of any real-world examples where a warm cache uv resolve is notably slow (If you got any, let me know!).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;simple-api&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#simple-api&quot; aria-label=&quot;Anchor link for: simple-api&quot;&gt;Simple api&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The repository api for python packaging evolved from user-facing html pages of code archives to download, which started to get scraped by installers. While there&#x27;s now a proper &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0691&#x2F;&quot;&gt;json interface&lt;&#x2F;a&gt;, it&#x27;s still fundamentally a list of filenames. That&#x27;s kinda slow to parse (TODO: numbers).&lt;&#x2F;p&gt;
&lt;p&gt;More fundamentally, it means that for each version the resolver tries, it needs to do another request to fetch the dependencies of that version. The information a package manager actually needs is small, in most cases it&#x27;s only the concatenated contents of &lt;code&gt;project.dependencies&lt;&#x2F;code&gt; and &lt;code&gt;project.optional-dependencies&lt;&#x2F;code&gt;, yet we have to pay a separate http roundtrip each time. On top of that, each response is separate in uv&#x27;s cache, so even with a warm cache we pay extra for loading the metadata for each version.&lt;&#x2F;p&gt;
&lt;p&gt;TODO: perf of parsing a large index page&lt;&#x2F;p&gt;
&lt;p&gt;TODO: In uv&#x27;s cache we save the responses raw, which is much larger than they&#x27;d need to be, which gives uv such a big memory footprint, even through rkyv.&lt;&#x2F;p&gt;
&lt;p&gt;TODO: boto3 pic&lt;&#x2F;p&gt;
&lt;h2 id=&quot;multithreading-is-hard&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#multithreading-is-hard&quot; aria-label=&quot;Anchor link for: multithreading-is-hard&quot;&gt;Multithreading is hard&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Scheduling diverse network, IO and CPU loads is hard. It&#x27;s not intuitive that you need to &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;3627&quot;&gt;put the resolver into its own thread&lt;&#x2F;a&gt; even when it&#x27;s largely IO bound, the resolver&#x27;s CPU boundness still makes network IO slower. There&#x27;s also convenience vs. performance: uv likely isn&#x27;t faster by using async&#x2F;await than by using threaded IO, but there&#x27;s a large async ecosystem and async abstractions combine well. There&#x27;s a hard to quantify tokio scheduling overhead visible in uv. Still, uv is using a mixture of regular futures on tokio, &lt;code&gt;spawn_blocking&lt;&#x2F;code&gt; for CPU intensive work to use the tokio threadpool, a subprocess pool of python subprocesses for bytecode compilation and rayon for parallel unzipping and installation.&lt;&#x2F;p&gt;
&lt;p&gt;There&#x27;s also a lot of file locking and atomic moves involved to ensure that uv processes running in parallel can&#x27;t break another.&lt;&#x2F;p&gt;
&lt;p&gt;On the upside, concurrency is key to uv&#x27;s performance. We can simulate a single-threaded uv with mostly serial network requests by setting the concurrency limits for network, builds and installs to 1 each&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-23-1&quot;&gt;&lt;a href=&quot;#fn-23&quot;&gt;15&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;TODO: Benchmarks of concurrency 1&lt;&#x2F;p&gt;
&lt;h2 id=&quot;prefetching&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#prefetching&quot; aria-label=&quot;Anchor link for: prefetching&quot;&gt;Prefetching&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;h2 id=&quot;unreliable-network&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#unreliable-network&quot; aria-label=&quot;Anchor link for: unreliable-network&quot;&gt;Unreliable network&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Some environments, such as github actions have (as of writing) unreliable networking. If a download fails mid-request, which it does a noticeable number, uv has to start the download anew as it can&#x27;t resume downloads (another slowness), or because the request wasn&#x27;t streaming (only for small requests though, where this matters less). Similarly, if a server replies if intermittent 4xx or 5xx codes, we have to retry and&#x2F;or wait for the server to recover.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;lack-of-batch-refresh&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#lack-of-batch-refresh&quot; aria-label=&quot;Anchor link for: lack-of-batch-refresh&quot;&gt;Lack of batch refresh&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If you &lt;code&gt;uv pip install …&lt;&#x2F;code&gt;, uv caches (for pypi) all responses for 10min, following pypi&#x27;s caching headers, then it has to revalidate. But there&#x27;s no endpoint to ask whether all packages are fresh, uv needs to send &lt;em&gt;n&lt;&#x2F;em&gt; revalidation requests for &lt;em&gt;n&lt;&#x2F;em&gt; packages. Those ideally all return 304 not modified, but if &lt;em&gt;n&lt;&#x2F;em&gt; is larger than the network parallelism (50 by default), it takes a multiple of the roundtrip time. This applies similarly to &lt;code&gt;uv lock&lt;&#x2F;code&gt;&#x2F;&lt;code&gt;uv lock&lt;&#x2F;code&gt; with &lt;code&gt;--upgrade&lt;&#x2F;code&gt; or &lt;code&gt;uv add&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;forking&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#forking&quot; aria-label=&quot;Anchor link for: forking&quot;&gt;Forking&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Platform markers in Python allow&lt;&#x2F;p&gt;
&lt;p&gt;TODO: Forking is kinda wasteful, because we clone and rebuild structures, only helped by preferences.&lt;&#x2F;p&gt;
&lt;p&gt;It&#x27;s also not the most efficient inside uv, while most forks reuse the solution of the previous fork through version preferences, each fork clones the entire resolver state.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;fat-wheels&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#fat-wheels&quot; aria-label=&quot;Anchor link for: fat-wheels&quot;&gt;Fat wheels&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Some wheels need different compiled artifacts for platform differences that can&#x27;t be expressed in wheel tags. Numpy functions get compiled for different sets of CPU extensions, introduced in different cpu generations, which are merged into one big binary and with runtime dispatch (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;numpy.org&#x2F;neps&#x2F;nep-0038-SIMD-optimizations.html&quot;&gt;NEP 38&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;numpy.org&#x2F;doc&#x2F;stable&#x2F;reference&#x2F;simd&#x2F;index.html&quot;&gt;living docs&lt;&#x2F;a&gt;). Torch similarly compiles its nvidia gpu code for a number of gpu architectures (streaming multiprocessor architecture, not cuda) and merges all of them into a binary that&#x27;s somewhere between 100MB and 2GB. This is hugely wasteful, as there&#x27;s no support for selecting only specific &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;astral.sh&#x2F;blog&#x2F;wheel-variants&quot;&gt;wheel variants&lt;&#x2F;a&gt; yet. Projects without the resources of numpy or torch generally build against the oldest supported hardware only (at least for pypi), so uv installs code that&#x27;s slower than necessary at runtime (or worse, fails on old machines).&lt;&#x2F;p&gt;
&lt;h2 id=&quot;system-and-abi-dependencies&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#system-and-abi-dependencies&quot; aria-label=&quot;Anchor link for: system-and-abi-dependencies&quot;&gt;System and abi dependencies&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Some packages, such as psycopg2, mysqlclient, flash-attn, a number of scientific packages and a number of hardware integrations, don&#x27;t ship wheels because they depend on system headers or on the specific version of an already installed library. The most popular packages now &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pythonwheels.com&#x2F;&quot;&gt;~all ship wheels&lt;&#x2F;a&gt;, but the remaining non-wheel packages are a noted pain for many users, both for performance and for setting up the build environment outside uv.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;allocating-memory&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#allocating-memory&quot; aria-label=&quot;Anchor link for: allocating-memory&quot;&gt;Allocating memory&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;uv puts a lot of effort in uv to avoid unnecessary allocations, such as passing around references as default, using &lt;code&gt;Box&lt;&#x2F;code&gt; to constrain type sizes, using &lt;code&gt;Arc&lt;&#x2F;code&gt; for shared references, using &lt;code&gt;Cow&lt;&#x2F;code&gt; to only allocate if an object was changed and using small string optimizations &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-7-1&quot;&gt;&lt;a href=&quot;#fn-7&quot;&gt;16&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;&lt;sup&gt;,&lt;&#x2F;sup&gt;&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-8-1&quot;&gt;&lt;a href=&quot;#fn-8&quot;&gt;17&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Contrary to its reputation, rust doesn&#x27;t default to being good about memory usage and allocations. A modern GC does a lot to offset allocations, and bakes optimizations into the language, and avoiding clones requires ceremony throughout the code. In python as counterexample, all objects are reference counted, while Rust requires changing the type to &lt;code&gt;Arc&lt;&#x2F;code&gt; throughout. Python has a free list of objects that gets reused, and java features a generational garbage collector to deal with short-lived and long-lived objects, while in rust you need to build or bring your own arena. Some types, such as &lt;code&gt;Dist&lt;&#x2F;code&gt;, could also be smaller. In rust, all objects include their children directly by default (not references), and enums have the size of their largest variant recursively. This requires well-placed &lt;code&gt;Box&lt;&#x2F;code&gt; types to keep type sizes in check, including for error types (which determine the size of the entire return type) and for futures. What rust does provide is precise control over allocations and memory layout, allowing to pick the optimal bytes in critical paths. uv for example has assertions about type sizes in critical locations.&lt;&#x2F;p&gt;
&lt;p&gt;Rust uses the system allocator by default, which slow, on musl doubly so than on glibc, so uv has to switch to &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;399&quot;&gt;jemalloc and mimalloc&lt;&#x2F;a&gt;. I bet an allocator optimized for Rust instead of C&#x2F;C++ usage patterns would be a speedup.&lt;&#x2F;p&gt;
&lt;p&gt;A location where uv isn&#x27;t optimized much yet is memory. For example, uv takes 110MB just to say that the lockfile is fresh, with a total of 10.7 GB and 8.8 million allocations for a warm cache &lt;code&gt;uv run&lt;&#x2F;code&gt; in airflow:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;codspeed.io&#x2F;astral-sh&#x2F;uv&#x2F;runs&#x2F;69b71b6a1f4400c035e70c47?uri=exec_harness%3A%3Atarget%2Fprofiling%2Fuv%2520run%2520-p%25203.12%2520-v%2520--project%2520..%2Fairflow%2520python%2520-V&amp;amp;runnerMode=Memory&quot;&gt;&lt;img src=&quot;&#x2F;uv-is-slow&#x2F;airflow-uv-run.png&quot; alt=&quot;A screenshot of memory usage over time peeking at 110MB from the linked site&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure: A local codspeed memory profiler run for warm cache &lt;code&gt;uv run&lt;&#x2F;code&gt; with the full results linked.&lt;&#x2F;p&gt;
&lt;p&gt;On the flipside, the reports about OOM errors we could confirm were caused by source distribution builds, not by uv itself.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;freeing-memory&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#freeing-memory&quot; aria-label=&quot;Anchor link for: freeing-memory&quot;&gt;Freeing memory&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Rust uses the &lt;code&gt;Drop&lt;&#x2F;code&gt; trait to recursively free datastructures and to run pre-&lt;code&gt;free&lt;&#x2F;code&gt; hooks, such as refcount decreases. To deallocate an object or a collection, &lt;code&gt;drop()&lt;&#x2F;code&gt; on the object or collection first needs to call &lt;code&gt;drop()&lt;&#x2F;code&gt; on all of its members. This can be noticeable, for example when having a large number of &lt;code&gt;Arc&lt;&#x2F;code&gt;&#x27;d versions where each drop is an atomic refcount decrease and check if the count is zero to deallocate the object it wraps, especially combined with large datastructures such as the distributions and incompatibilities built up during a resolution.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;uv-is-slow&#x2F;boto3.png&quot; alt=&quot;A screenshot of the firefox profiler showing a samply profile&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure: The samply profile for resolving &lt;code&gt;urllib3&amp;lt;1.25.4 boto3&lt;&#x2F;code&gt; with &lt;code&gt;uv pip compile&lt;&#x2F;code&gt; (10x for better data). boto3 is a backtracking heavy benchmark, that builds up a large database of incompatibilities in the resolver. The highlighted block is a recursive drop after a successful resolution. &lt;br&#x2F;&gt;
Command: &lt;code&gt;samply record --rate 5000 --iteration-count 10 --reuse-threads .&#x2F;uv-main pip compile boto3.in&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tooling&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#tooling&quot; aria-label=&quot;Anchor link for: tooling&quot;&gt;Tooling&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;This isn&#x27;t a performance problem, this section is about all the great tools that make these investigations possible. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;bheisler&#x2F;criterion.rs&quot;&gt;citerion&lt;&#x2F;a&gt; is our benchmark harness and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;codspeed.io&#x2F;&quot;&gt;CodSpeed&lt;&#x2F;a&gt; provides our continuous benchmarking. I want to shout them out for being great with feedback an implementing a lot of features we need and often fixing problem same-day. Their web interface even has a flamegraph diff tool&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-9-1&quot;&gt;&lt;a href=&quot;#fn-9&quot;&gt;18&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. Below, you can see a small regression from a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;18373&quot;&gt;correctness fix&lt;&#x2F;a&gt; we merged:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;uv-is-slow&#x2F;codspeed-resolve_warm_airflow.png&quot; alt=&quot;An export from tracing-durations-export showing that solve takes 10ms and installing (multithreaded) 75ms&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;For local profiling, i use &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;mstange&#x2F;samply&quot;&gt;samply&lt;&#x2F;a&gt;, which uses the firefox profiler UI, and for benchmarking, i use &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;sharkdp&#x2F;hyperfine&quot;&gt;hyperfine&lt;&#x2F;a&gt;, which very helpfully includes standard errors, so you can tell what&#x27;s below the noise threshold and what&#x27;s real.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a href=&quot;&#x2F;uv-is-slow&#x2F;resolve_warm_airflow_criterion&#x2F;index.html&quot;&gt;&lt;img src=&quot;&#x2F;uv-is-slow&#x2F;criterion-airflow.png&quot; alt=&quot;A screenshot showing the comparative part in the linked criterion support&quot; &#x2F;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Figure: A criterion report from a noisy machine, causing the two outliers. Criterion features a t-test, which helps with telling if something is actually faster even for benchmarks with high variance. hyperfine can do t-tests through a separate &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;sharkdp&#x2F;hyperfine&#x2F;blob&#x2F;master&#x2F;scripts&#x2F;welch_ttest.py&quot;&gt;script&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn-24&quot;&gt;
&lt;p&gt;A mature library ecosystem is a requirement for adding a cpu-intensive operation to a packaging standard. For example, a (previous) blocker for zstandard was for example a missing std implementation. &lt;a href=&quot;#fr-24-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-19&quot;&gt;
&lt;p&gt;While uv is technically a SAT solver, the cases that exist are very specifically shaped, often driven by what earlier tools would support or struggle. It&#x27;s important to pick realistic examples, rather than artificial or stress test benchmarks that do not represent real workloads. See also the talk &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=eh3VME3opnE&quot;&gt;Gotta Go Fast&lt;&#x2F;a&gt;. &lt;a href=&quot;#fr-19-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-25&quot;&gt;
&lt;p&gt;NTFS is mysteriously slow for a lot of things, including uv&#x27;s own test suit, but i&#x27;ve never found an explanation why. &lt;a href=&quot;#fr-25-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-21&quot;&gt;
&lt;p&gt;Bun was using a binary lockfile for maximum performance, but &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bun.com&#x2F;blog&#x2F;bun-lock-text-lockfile&quot;&gt;migrated to a text-based format&lt;&#x2F;a&gt; as a binary file is tricky to review, merge conflicts are hard to resolve and tooling can&#x27;t easily read it. &lt;a href=&quot;#fr-21-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;The toml and toml_edit crates are optimized on features, such as preserving spans on parsing for excellent error messages and style-and comment-preserving rewrites on files. It&#x27;s just not a performance audience, most people aren&#x27;t bound by parsing configuration files. We use both its great error messages for configuration errors and its rewriting features editing commands such as &lt;code&gt;uv add&lt;&#x2F;code&gt;, &lt;code&gt;uv remove&lt;&#x2F;code&gt; and &lt;code&gt;uv version&lt;&#x2F;code&gt;. &lt;a href=&quot;#fr-1-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;Benchmarked as of airflow &lt;code&gt;7fa400745ac7aebc7cc4ec21d3a047e9fb258310&lt;&#x2F;code&gt; on Ubuntu 24.04. You can get some reproducibility with &lt;code&gt;--exclude-newer 2026-03-15&lt;&#x2F;code&gt;, but as new distributions are published, the size of the server responses increases, which slightly changes the numbers. For scale, the warm cache boto3 benchmark that takes 420ms on my desktop machine and 3s on a raspberry pi 4b. &lt;a href=&quot;#fr-6-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-11&quot;&gt;
&lt;p&gt;As a general note, the numbers in this post aren&#x27;t overly exact, and sometimes even show some divergence between different runs of the same benchmark. Due to the diverse kinds of resources uv uses (network (pypi, cloudflare CDN, github, other internal and external registries), disk, CPU (singlethreaded), CPU (multithreaded), CPU-accelerated code such as SHA-256, different concurrency frameworks), bottlenecks vary. It doesn&#x27;t make too much sense to optimize for getting exact low noise numbers on a specific machine. When developing speedups, we focus on determining if something is speedup beyond the noise threshold (there&#x27;s enough of them that we usually that we don&#x27;t need more sensitive benchmarking) or if there&#x27;s another indicator for optimization, for example a flamegraph. The numbers here are for giving a sense of scale. For example, the profiling build i use for benchmarking (&lt;code&gt;cargo build --profile profiling --features tracing-durations-export&lt;&#x2F;code&gt; is 12% slower than a regular release build (&lt;code&gt;cargo build --release&lt;&#x2F;code&gt;). There&#x27;s some tricks to reduce noise for low-percentage performance optimizations, such as ensuring a quiet machine, setting the CPU governor to performance and using &lt;code&gt;taskset&lt;&#x2F;code&gt; to pin the benchmark to performance cores, but they may not even be realistic when users aren&#x27;t setting these options in practice. &lt;a href=&quot;#fr-11-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-22&quot;&gt;
&lt;p&gt;In rust, enum variants can have different members. In python terms, this would be a version protocol with the small and the large representation as classes implementing it. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;manishearth.github.io&#x2F;blog&#x2F;2017&#x2F;03&#x2F;04&#x2F;what-are-sum-product-and-pi-types&#x2F;&quot;&gt;What Are Sum, Product, and Pi Types?&lt;&#x2F;a&gt; is a great introduction. &lt;a href=&quot;#fr-22-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;It should really just be 8 bytes and leaving a bit for the enum discriminant, another thing that makes uv slow: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;blob&#x2F;7cd6738def40106b1d0ac9eaefc233fcffedc9de&#x2F;crates&#x2F;uv-pep440&#x2F;src&#x2F;version.rs#L1144-L1154&quot;&gt;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;blob&#x2F;7cd6738def40106b1d0ac9eaefc233fcffedc9de&#x2F;crates&#x2F;uv-pep440&#x2F;src&#x2F;version.rs#L1144-L1154&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-3-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-12&quot;&gt;
&lt;p&gt;If wheels could have dynamic metadata that changes depending on the host, it wouldn&#x27;t be possible to build lockfiles, and it would require code to determine this on installation, which is a security and stability risk. Dynamic metadata here specifically means that you need to run code to get the real values, it does not refer to e.g. platform-specific dependencies, that the package manager can evaluate with its own (or vendored) code, since the description of the platform markers is a well-defined expression. &lt;a href=&quot;#fr-12-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-13&quot;&gt;
&lt;p&gt;There&#x27;s some debate over whether all wheels of a release need to have the same metadata, and whether source distributions need to build into wheels of that same metadata. While it&#x27;s not technically required in any spec yet, it is explictly required by both poetry and uv - both fail when this assumption is violated - and it is respected in the ecosystem. I&#x27;m only aware of two high-profile cases with old releases with tensorflow and torch which accidently shipped different metadata due to the complexity of their build systems. It&#x27;s technically possible to work around wheels having different metadata, though it&#x27;s unclear why this should be done, as on one hand platform markers exist precisely to have a single dependency specification for all platforms, and on the other, it would be a huge overhead in terms of network requests and resolver complexity. The other property is that source distributions are required to build into the same metadata as wheels (if any) or alternatively into the same metadata each time. This is required by every tools that uses lockfile: You can&#x27;t install from a lockfile if a package has a new or mismatching dependency at install time that was never considered in the lock. Even pip behaves inconsistent when this property is violated. &lt;a href=&quot;#fr-13-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-14&quot;&gt;
&lt;p&gt;Index and registry are synonyms &lt;a href=&quot;#fr-14-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;This prevents against attacks where the server that serves the file has been corrupted since the file was audited. It does not replace the part where you need to trust the author of the binary, but it makes this a one time trust, not a continuous trust that the infra is never taken over. &lt;a href=&quot;#fr-4-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-15&quot;&gt;
&lt;p&gt;Shout-out to &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;nex-3.com&#x2F;&quot;&gt;Natalie Weizenbaum&lt;&#x2F;a&gt; for developing pubgrub and writing about it, and to &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;matthieu.pizenberg.fr&#x2F;&quot;&gt;Matthieu Pizenberg&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;Eh2406&quot;&gt;Jacob Finkelnman&lt;&#x2F;a&gt; for writing and maintaining the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pubgrub-rs&#x2F;pubgrub&quot;&gt;pubgrub crate&lt;&#x2F;a&gt;. &lt;a href=&quot;#fr-15-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-23&quot;&gt;
&lt;p&gt;This technically still allows concurrency between the resolver and a single network request as well as a single build, but it shows the effect sufficiently. &lt;a href=&quot;#fr-23-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-7&quot;&gt;
&lt;p&gt;See e.g. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;swatinem.de&#x2F;blog&#x2F;smallstring-opt&#x2F;&quot;&gt;https:&#x2F;&#x2F;swatinem.de&#x2F;blog&#x2F;smallstring-opt&#x2F;&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;mcyoung.xyz&#x2F;2023&#x2F;08&#x2F;09&#x2F;yarns&#x2F;&quot;&gt;https:&#x2F;&#x2F;mcyoung.xyz&#x2F;2023&#x2F;08&#x2F;09&#x2F;yarns&#x2F;&lt;&#x2F;a&gt; and many more under the category &quot;small string optimizations&quot;. &lt;a href=&quot;#fr-7-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-8&quot;&gt;
&lt;p&gt;Arguably, we lose a bunch of free performance and memory because &lt;code&gt;String&lt;&#x2F;code&gt; and &lt;code&gt;PathBuf&lt;&#x2F;code&gt; are mutable by default, when we really need &lt;code&gt;Box&amp;lt;str&amp;gt;&lt;&#x2F;code&gt; and &lt;code&gt;Box&amp;lt;Path&amp;gt;&lt;&#x2F;code&gt; instead. &lt;code&gt;String&lt;&#x2F;code&gt; and &lt;code&gt;PathBuf&lt;&#x2F;code&gt; are three words (pointers) wide, and may over-allocate. We almost never modify string or paths in place, and definitely not in hot loops. We would likely also profit from defaulting to small string optimizations, most strings in uv are small-ish and i don&#x27;t think we require the array-ness often if at all (OS APIs require conversions to C string or Windows datatypes anyway). &lt;a href=&quot;#fr-8-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-9&quot;&gt;
&lt;p&gt;They support instruction counting next to walltime benchmarks, which should give more accurate results especially in CI. For uv there was a too high number of spurious regressions, so we had to deactivate it. Writing this article, i think i figured out why: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;18487&quot;&gt;https:&#x2F;&#x2F;github.com&#x2F;astral-sh&#x2F;uv&#x2F;pull&#x2F;18487&lt;&#x2F;a&gt;. Fingers crossed we get the instruction counting benchmark back as most reliable. &lt;a href=&quot;#fr-9-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;&#x2F;section&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Reimplementing PEP 440</title>
        <published>2022-12-01T00:00:00+00:00</published>
        <updated>2022-12-01T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Konstantin Schütze
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="/reimplementing-pep-440/"/>
        <id>/reimplementing-pep-440/</id>
        
        <content type="html" xml:base="/reimplementing-pep-440/">&lt;p&gt;I&#x27;ve reimplemented &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0440&#x2F;&quot;&gt;PEP 440&lt;&#x2F;a&gt;, the python version standard, for &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;konstin&#x2F;poc-monotrail&quot;&gt;monotrail&lt;&#x2F;a&gt;: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;konstin&#x2F;pep440-rs&quot;&gt;pep440-rs&lt;&#x2F;a&gt;. Did you now that &lt;code&gt;1a1.dev3.post1+deadbeef&lt;&#x2F;code&gt; is a valid python version, there&#x27;s not only &lt;code&gt;==&lt;&#x2F;code&gt; but also &lt;code&gt;===&lt;&#x2F;code&gt; and that version specifiers are context sensitive?&lt;&#x2F;p&gt;
&lt;p&gt;Let&#x27;s start with the normal stuff: There are basic version numbers with dots in between (like &lt;code&gt;2.3.1&lt;&#x2F;code&gt;) and optionally alpha&#x2F;beta&#x2F;release candidate suffixes (canonically &lt;code&gt;2.3.1b1&lt;&#x2F;code&gt;, but conveniently lenient so &lt;code&gt;2.3.1-beta.1&lt;&#x2F;code&gt; also works). For dependencies, there operators for minimums and maximums separated by comma such as &lt;code&gt;&amp;gt;=2.5.1,&amp;lt;3&lt;&#x2F;code&gt;. You can of course also select a specific prerelease (e.g. &lt;code&gt;1.1a1&lt;&#x2F;code&gt; being matched by &lt;code&gt;==1.1a1&lt;&#x2F;code&gt;) and maybe you&#x27;ve also seen constraints like &lt;code&gt;1.2.*&lt;&#x2F;code&gt;. But below the clear semver-y surface lie many demons of the old.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;reimplementing-pep-440&#x2F;gloomy-forrest.jpg&quot; alt=&quot;A gloomy forest, one where the demons would hide&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;em&gt;Photography by &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;unsplash.com&#x2F;photos&#x2F;4IqzgSMrgMk&quot;&gt;Norbert Buduczki&lt;&#x2F;a&gt;&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;It all starts with the part of the version that&#x27;s hidden in the default: The epoch. By default it&#x27;s zero, but if you want to switch versioning system, you can add the new epoch with an exclamation mark like &lt;code&gt;1!4.2.0&lt;&#x2F;code&gt;. Since version ordering is defined as a total order, &lt;code&gt;2020.1&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1!0.1.0&lt;&#x2F;code&gt;, but also for some reason &lt;code&gt;&amp;lt;1!0.1.0&lt;&#x2F;code&gt; matches &lt;code&gt;2020.1&lt;&#x2F;code&gt; and &lt;code&gt;&amp;gt;2020.1&lt;&#x2F;code&gt; match &lt;code&gt;1!0.1.0&lt;&#x2F;code&gt; (the specifiers are not a total order normally, I don&#x27;t know why it doesn&#x27;t specify to never match across epochs). Being a mere mortal, I have never witnessed the turning of epoch myself, but the feature remains part of The Old Code.&lt;&#x2F;p&gt;
&lt;p&gt;You can also add &lt;code&gt;.dev&lt;&#x2F;code&gt; and &lt;code&gt;.post&lt;&#x2F;code&gt; with some number to all versions, e.g. &lt;code&gt;1.0.0.dev1&lt;&#x2F;code&gt; or &lt;code&gt;1.0.0.post1&lt;&#x2F;code&gt;. Or just combine them and do &lt;code&gt;1.0.0.post1.dev1&lt;&#x2F;code&gt;, which is a developmental release of a post-release. That of course doesn&#x27;t stop at final releases, you can now do &lt;code&gt;1.0.0a1.post1.dev1&lt;&#x2F;code&gt; to have a developmental release of a post release of a prerelease (in canonical form alpha&#x2F;beta&#x2F;rc don&#x27;t have a dot, but dev and post do, while also in PEP 440 dev release are sometime included with the prereleases). If you sort them, obviously &lt;code&gt;1.0.0.dev1&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1.0.0&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1.0.0.post1&lt;&#x2F;code&gt;, and &lt;code&gt;1.0.0a1.dev1&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1.0.0a1&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1.0.0a1.post1&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1.0.0&lt;&#x2F;code&gt;. But dev releases of the final version are sorted lower than any prerelease version, so suddenly we have &lt;code&gt;1.0.0.dev1&lt;&#x2F;code&gt;&amp;lt; &lt;code&gt;1.0.0a1.dev1&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1.0.0a1&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1.0.0&lt;&#x2F;code&gt;, while also having &lt;code&gt;1.0b2&lt;&#x2F;code&gt; &amp;lt; &lt;code&gt;1.0b2.post345.dev456&lt;&#x2F;code&gt;. That is, the try-out release for 1.0 proper is considered older than the try-out release for the 1.0 alpha. The only sensible way to implement this sorting is making a five-tuple where you map (pre-releasity, pre-number, post-number or None as smallest, dev-number or int max as largest, local version) and let tuple-sorting sort out the rest. Even &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pypa&#x2F;packaging&#x2F;blob&#x2F;e404434105723a184967b080fc31c05ba69406c6&#x2F;packaging&#x2F;version.py#L503-L563&quot;&gt;pypa&#x2F;packaging uses tuple logic feat. ±infinity&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Matching version with specifiers such as &lt;code&gt;&amp;gt;=1.2.0&lt;&#x2F;code&gt; or &lt;code&gt;&amp;lt;2.0.0&lt;&#x2F;code&gt; is tricky because PEP 440 says &quot;Pre-releases of any kind, including developmental releases, are implicitly excluded from all version specifiers, unless they are already present on the system, explicitly requested by the user, or if the only available version that satisfies the version specifier is a pre-release&quot;. That&#x27;s really fuzzy, and also it means whether a single version matches a specifier depends on the environment, something I &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pypa&#x2F;packaging&#x2F;issues&#x2F;617&quot;&gt;confused myself with&lt;&#x2F;a&gt;. pypa effectively says that you must add to specifier whether you want to match prereleases (which here again include dev releases) or not when using the library. The consequence is that when you say &lt;code&gt;~=2.2&lt;&#x2F;code&gt; but there&#x27;s only a &lt;code&gt;2.2.1a1&lt;&#x2F;code&gt; it will pick that alpha version (but not &lt;code&gt;2.2a1&lt;&#x2F;code&gt;, which never matches).&lt;&#x2F;p&gt;
&lt;p&gt;There are also local versions which can added with a &lt;code&gt;+&lt;&#x2F;code&gt; after the regular version, such as &lt;code&gt;3.4.0+my.local.123.version&lt;&#x2F;code&gt;. The &lt;code&gt;123&lt;&#x2F;code&gt; is going to get ordered as a number, everything that can&#x27;t be parsed as a number will get ordered as a string. Information about usage is sparse, apparently linux distributions use it to tag their python packaging. Those are also &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;semver.org&#x2F;#spec-item-10&quot;&gt;in semver&lt;&#x2F;a&gt;, but more reasonable as &quot;build metadata&quot;: &quot;Build metadata MUST be ignored when determining version precedence. Thus two versions that differ only in the build metadata, have the same precedence&quot;.&lt;&#x2F;p&gt;
&lt;p&gt;Finally, there&#x27;s also &lt;code&gt;===&lt;&#x2F;code&gt;, &quot;Arbitrary equality&quot;, which is advertised as &quot;simple string equality operations&quot; that &quot;do not take into account any of the semantic information&quot;. pypa&#x2F;packaging has a test that &lt;code&gt;===lolwat&lt;&#x2F;code&gt; parses with the comment &quot;=== is an escape hatch in PEP 440&quot;.&lt;&#x2F;p&gt;
&lt;hr &#x2F;&gt;
&lt;p&gt;For those wondering why python&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-1-1&quot;&gt;&lt;a href=&quot;#fn-1&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; didn&#x27;t pick a sane standard like semver to begin with, the basic syntax format for writing things like &lt;code&gt;&amp;gt;1.0, !=1.3.4, &amp;lt;2.0&lt;&#x2F;code&gt; was written down in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0314&#x2F;#requires-multiple-use&quot;&gt;PEP 314&lt;&#x2F;a&gt; in 2003 (!) &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-2-1&quot;&gt;&lt;a href=&quot;#fn-2&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0386&#x2F;&quot;&gt;PEP 386&lt;&#x2F;a&gt;, the first python version standard, was written in 2009, &quot;codifying existing practices&quot; and its successor and current standard &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0440&#x2F;&quot;&gt;PEP 440&lt;&#x2F;a&gt; in 2013. In comparison, the first commit to npm was made in 2010, semver v1.0.0 was published in 2011, v2.0.0 in 2013, npm inc. was founded in 2014 and cargo had its first commit also in 2014. So python has a hard time doing modern packaging because they were trying to do modern packaging before it was being invented.&lt;&#x2F;p&gt;
&lt;p&gt;I still believe that (a) bringing in features from semver and tools such as poetry, cargo and npm would greatly benefit the python ecosystem and (b) python packaging isn&#x27;t doomed to stay in its current state. While e.g. pypi&#x27;s backend will have to handle everything that ever used to be legal, i believe that the ecosystem at large can and must migrate to better tools and standards. This is largely informed by having to deal with a lot of the breakages of the current state of python packaging and trying to support friends and colleagues.&lt;&#x2F;p&gt;
&lt;p&gt;The easiest is probably to deprecate &lt;code&gt;===&lt;&#x2F;code&gt;, even PEP 440 soft-deprecates it with &quot;Use of this operator is heavily discouraged and tooling MAY display a warning when it is used&quot;.&lt;&#x2F;p&gt;
&lt;p&gt;For epochs, i haven&#x27;t seen them used even a single time. To try to make this at least a bit empirical i ran two queries on the pypi bigquery data&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-3-1&quot;&gt;&lt;a href=&quot;#fn-3&quot;&gt;3&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; &lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-4-1&quot;&gt;&lt;a href=&quot;#fn-4&quot;&gt;4&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;, with the result that in one month there were 19,699,031,713 downloads, 40,281 of which for versions specifying an epoch, that&#x27;s 0.0002%.&lt;&#x2F;p&gt;
&lt;p&gt;Post releases can be replaced with publishing a new patch release or a one-higher pre-release. Historically, it was a good idea that you could specify &lt;code&gt;1.2.3&lt;&#x2F;code&gt; and if the author messed up &lt;code&gt;1.2.3&lt;&#x2F;code&gt; and has to publish fixup wheels you&#x27;d be directly moved to the fixup, but nowadays you want lock files where this doesn&#x27;t work anymore, and it also interacts weirdly with yanking. This applies especially to post releases of prereleases, which PEP 440 acknowledges: &quot;Creating post-releases of pre-releases is strongly discouraged, as it makes the version identifier difficult to parse for human readers. In general, it is substantially clearer to simply create a new pre-release by incrementing the numeric component&quot;.&lt;&#x2F;p&gt;
&lt;p&gt;Dev version on prereleases seem also strange to me (just publish a higher prerelease instead, a test-release of a test-release is kinda redundant). For dev versions of final releases there are certainly workflows that benefit from them&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-5-1&quot;&gt;&lt;a href=&quot;#fn-5&quot;&gt;5&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;, even though other ecosystems do fine without special casing &lt;code&gt;.dev&lt;&#x2F;code&gt;. The main problem are the strange semantics, and while PEP 440 defends this as &quot;far more logical sort order&quot;, i strongly disagree, this was and is super confusing, and also the implementation is a mess. When removing dev (and ideally also post) releases at least for alpha&#x2F;beta&#x2F;rc versions, the semantics would become intuitive again, with dev release simply being a prerelease one level below alpha releases&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-6-1&quot;&gt;&lt;a href=&quot;#fn-6&quot;&gt;6&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For local version the semver style &quot;purely informative and no semantics&quot; definition would be imho more reasonable; i unfortunately can&#x27;t tell if de-semanticizing local version would break anything (as in, is anybody currently depending on the fact that &lt;code&gt;1.0+foo.10&lt;&#x2F;code&gt; has precedence over &lt;code&gt;1.0+foo.9&lt;&#x2F;code&gt;).&lt;&#x2F;p&gt;
&lt;p&gt;Given that &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;pip.pypa.io&#x2F;en&#x2F;stable&#x2F;topics&#x2F;dependency-resolution&#x2F;#backtracking&quot;&gt;pip now has a backtracking dependency resolver&lt;&#x2F;a&gt;, i think we can simplify the spec a lot by separating it into three parts: One part that defines the version number schema and precedence (a total order as it currently is), one part that translates operators such as &lt;code&gt;~=&lt;&#x2F;code&gt; into normal &lt;code&gt;&amp;gt;&lt;&#x2F;code&gt;&#x2F;&lt;code&gt;=&lt;&#x2F;code&gt;&#x2F;&lt;code&gt;&amp;gt;&lt;&#x2F;code&gt; sets that directly translate to the version order, and one part that specifies the rules for resolvers, that is when are they allowed to pick which prerelease. The latter isn&#x27;t well-defined as of PEP 440, but imho we should agree about this across the ecosystem&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-7-1&quot;&gt;&lt;a href=&quot;#fn-7&quot;&gt;7&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. See e.g. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;npm&#x2F;node-semver#prerelease-tags&quot;&gt;node on prereleases&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;doc.rust-lang.org&#x2F;cargo&#x2F;reference&#x2F;resolver.html#pre-releases&quot;&gt;cargo on prereleases&lt;&#x2F;a&gt;&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-8-1&quot;&gt;&lt;a href=&quot;#fn-8&quot;&gt;8&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;. I particularly like the node&#x2F;npm &quot;If a version has a prerelease tag (for example, &lt;code&gt;1.2.3-alpha.3&lt;&#x2F;code&gt;) then it will only be allowed to satisfy comparator sets if at least one comparator with the same &lt;code&gt;[major, minor, patch]&lt;&#x2F;code&gt; tuple also has a prerelease tag&quot;. For comparison, firefox estimates 16–20 minutes for the semver spec, but 57–73 minutes for PEP 440.&lt;&#x2F;p&gt;
&lt;p&gt;For all change there would need to be long announcement and deprecation periods with a specific focus on helping people migrate their workflows. For the deprecation period, tools should print big red warnings whenever they encounter something broken. Speaking of announcements, there&#x27;s really a lack of an official pypa communication channel! An official blog for announcements on deprecations, changes, release, and (proposed) PEP status changes together with a community aggregator like &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;this-week-in-rust.org&quot;&gt;This Week in Rust&lt;&#x2F;a&gt; would be extremely helpful over the current word-of-mouth-in-twitter-replies-and-buried-github-issues system.&lt;&#x2F;p&gt;
&lt;p&gt;Two features that would be great to add are the caret operator (&lt;code&gt;^&lt;&#x2F;code&gt;) and the tilde operator (&lt;code&gt;~&lt;&#x2F;code&gt;) from semver. Nowadays semver is arguably the most popular version scheme even in python, and for most packages you want &lt;code&gt;^1.2.3&lt;&#x2F;code&gt; and for the remainder (including calver projects that treat the last digit as semver-like patch version) &lt;code&gt;~1.8&lt;&#x2F;code&gt; will do the right thing. I&#x27;d like to add them to pep440-rs eventually but i&#x27;m neither sure about the exact semantic yet nor how to let users switch between PEP 440-only specifiers and the modern superset.&lt;&#x2F;p&gt;
&lt;p&gt;Next Up: PEP 508&lt;&#x2F;p&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Well, technically not python as the python interpreter but pypa as the vague group of people who make the packaging PEPs. Python itself didn&#x27;t even have a concept of package versions at all until &lt;code&gt;importlib.metadata&lt;&#x2F;code&gt; introduced optionally reading a version as a string to the standard library, and the language itself still doesn&#x27;t have a concept of packages but merely one of modules. When you &lt;code&gt;import foo&lt;&#x2F;code&gt; it effectively just asks &lt;code&gt;sys.meta_path&lt;&#x2F;code&gt; if anyone can import foo, which will check if any location in &lt;code&gt;sys.path&lt;&#x2F;code&gt; has a &lt;code&gt;foo&lt;&#x2F;code&gt; module, but this has no relation to packaging. If you ask stdlib&#x27;s &lt;code&gt;importlib.metadata&lt;&#x2F;code&gt; for an installed package version, it &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;python&#x2F;cpython&#x2F;blob&#x2F;8af04cdef202364541540ed67e204b71e2e759d0&#x2F;Lib&#x2F;importlib&#x2F;metadata&#x2F;__init__.py#L362-L413&quot;&gt;really just asks &lt;code&gt;sys.meta_path&lt;&#x2F;code&gt; with a different method if anyone optionally wants to tell it about the package version&lt;&#x2F;a&gt;, which by default will just look for &lt;code&gt;.dist-info&lt;&#x2F;code&gt; folders in your &lt;code&gt;sys.path&lt;&#x2F;code&gt;. &lt;a href=&quot;#fr-1-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;If you ever wondered why wheel metadata is in some archaic e-mail-headers RFC 822&lt;&#x2F;p&gt;
&lt;div style=&quot;text-align: center&quot;&gt;&lt;pre&gt;STANDARD FOR THE FORMAT OF
ARPA INTERNET TEXT MESSAGES&lt;&#x2F;pre&gt;&lt;&#x2F;div&gt;
&lt;p&gt;that&#x27;s because it was &lt;a href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0241&#x2F;&quot;&gt;picked in 2001&lt;&#x2F;a&gt;. Even XML 1.0 was &lt;a href=&quot;https:&#x2F;&#x2F;www.w3.org&#x2F;TR&#x2F;1998&#x2F;REC-xml-19980210.html&quot;&gt;published just 3 years prior&lt;&#x2F;a&gt;. I&#x27;m still very much in favor of &lt;a href=&quot;https:&#x2F;&#x2F;peps.python.org&#x2F;pep-0566&#x2F;#json-compatible-metadata&quot;&gt;migrating to a JSON or TOML format&lt;&#x2F;a&gt; such as &lt;code&gt;pkg-info.json&lt;&#x2F;code&gt; or editing &lt;code&gt;pyproject.toml&lt;&#x2F;code&gt; similar to what cargo does, but that&#x27;s for another time.&lt;&#x2F;p&gt;
 &lt;a href=&quot;#fr-2-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;I ran this on 2022-11-29 and the queries were&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SELECT COUNT(*)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;FROM bigquery-public-data.pypi.file_downloads&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE timestamp BETWEEN&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  TIMESTAMP(DATETIME_SUB(CURRENT_DATETIME(), INTERVAL 1 MONTH))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND TIMESTAMP(CURRENT_DATETIME())&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;and&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;SELECT COUNT(*)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;FROM bigquery-public-data.pypi.file_downloads&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;WHERE timestamp BETWEEN&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  TIMESTAMP(DATETIME_SUB(CURRENT_DATETIME(), INTERVAL 1 MONTH))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND TIMESTAMP(CURRENT_DATETIME())&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;  AND CONTAINS_SUBSTR(file.version, &amp;#39;!&amp;#39;)&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt; &lt;a href=&quot;#fr-3-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;Blessed be whoever came up with the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;warehouse.pypa.io&#x2F;api-reference&#x2F;bigquery-datasets.html&quot;&gt;bigquery datasets for pypi&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-4-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;E.g. some people want to build &lt;code&gt;{Major}.{Minor}.{Patch}.dev{YYYY}{MM}{DD}{MonotonicallyIncreasingDailyBuildNumber}&lt;&#x2F;code&gt; in their CI workflows. Local versions are used to indicate when linux distributions did some downstream packing, so you can directly tell when you&#x27;re looking at a distro patched install. &lt;a href=&quot;#fr-5-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-6&quot;&gt;
&lt;p&gt;I&#x27;m still not sure if they provide any benefit over just using alpha versions, but once they behave like normal prereleases their implementation and cognitive overhead is near zero so backwards compatibility is way more significant. Note that semver relies on &quot;alpha&quot;, &quot;beta&quot; and &quot;rc&quot; being alphabetically ordered, while we need to make &quot;dev&quot; lowest manually, otoh semver also allows any random stuff for prereleases and uses the same duck-typed logic for comparing them as PEP 440 uses for local versions. &lt;a href=&quot;#fr-6-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-7&quot;&gt;
&lt;p&gt;Consider the case where a user adds a library &lt;code&gt;A&lt;&#x2F;code&gt; from pypi that has multiple transitive dependencies on &lt;code&gt;B&lt;&#x2F;code&gt;, some specifier with preleases in their specifiers and some without. It would be bad for the authors of &lt;code&gt;A&lt;&#x2F;code&gt; to work if they couldn&#x27;t clearly reason which prereleases of &lt;code&gt;B&lt;&#x2F;code&gt; might or might not be picked independent of which tool the user uses. &lt;a href=&quot;#fr-7-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-8&quot;&gt;
&lt;p&gt;According to the python survey results, those are the two most popular other package managers in use
&lt;img src=&quot;&#x2F;reimplementing-pep-440&#x2F;most-popular-package-managers.png&quot; alt=&quot;Plot showing bars on how much other package managers are being used, with docker, npm, cargo and yarn on top&quot; &#x2F;&gt;
RubyGems &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;guides.rubygems.org&#x2F;patterns&#x2F;&quot;&gt;states&lt;&#x2F;a&gt; &quot;The RubyGems team urges gem developers to follow the Semantic Versioning standard for their gem&#x27;s versions. The RubyGems library itself does not enforce a strict versioning policy, but using an &quot;irrational&quot; policy will only be a disservice to those in the community who use your gems&quot;, but i couldn&#x27;t find any details on what versions and operators are allowed.
Composer on the other hand is very much like python (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;getcomposer.org&#x2F;doc&#x2F;04-schema.md#version&quot;&gt;docs&lt;&#x2F;a&gt;): &quot;This must follow the format of X.Y.Z or vX.Y.Z with an optional suffix of -dev, -patch (-p), -alpha (-a), -beta (-b) or -RC.&quot;, where dev is below alpha. It also seems to allow &lt;code&gt;1.2.*&lt;&#x2F;code&gt; but i couldn&#x27;t find any more documentation on what&#x27;s allowed and what the semantics are except that they apparently &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;composer&#x2F;composer&#x2F;blob&#x2F;bd6a5019b3bf5edf13640522796f54accaad789e&#x2F;src&#x2F;Composer&#x2F;Platform&#x2F;Version.php#L63-L69&quot;&gt;transform prereleases to a version digit&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-8-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;&#x2F;section&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>A dive into packaging native python extensions</title>
        <published>2018-07-21T00:00:00+00:00</published>
        <updated>2018-07-21T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Konstantin Schütze
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="/a-dive-into-packaging-native-python-extensions/"/>
        <id>/a-dive-into-packaging-native-python-extensions/</id>
        
        <content type="html" xml:base="/a-dive-into-packaging-native-python-extensions/">&lt;p&gt;&lt;em&gt;The complete guide to building you own native wheel from scratch&lt;&#x2F;em&gt;&lt;&#x2F;p&gt;
&lt;p&gt;There are cases where you want to extend python with native code, e.g. for scientific computing (numpy, scipy), database connectors (mysqlclient, psycopg2) or UI (pygobject, pyqt). For cpython this is traditionally done in C&#x2F;C++, but you can also use the C api from D (&lt;a rel=&quot;external&quot; href=&quot;http:&#x2F;&#x2F;www.dsource.org&#x2F;projects&#x2F;pyd&quot;&gt;pyd&lt;&#x2F;a&gt;), go (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;hackernoon.com&#x2F;extending-python-3-in-go-78f3a69552ac&quot;&gt;cffi&lt;&#x2F;a&gt;) or rust (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;blog.sentry.io&#x2F;2016&#x2F;10&#x2F;19&#x2F;fixing-python-performance-with-rust.html&quot;&gt;cffi&lt;&#x2F;a&gt; or &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pyo3&#x2F;pyo3&quot;&gt;pyo3&lt;&#x2F;a&gt;).&lt;&#x2F;p&gt;
&lt;p&gt;Distributing those extensions is a big problem. Until recently, the only viable option was to write special plugins for setuptools, e.g. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;getsentry&#x2F;milksnake&quot;&gt;milksnake&lt;&#x2F;a&gt; for cffi or &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;PyO3&#x2F;setuptools-rust&quot;&gt;setuptools-rust&lt;&#x2F;a&gt; for pyo3. Inspired by the new &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;snarky.ca&#x2F;clarifying-pep-518&#x2F;&quot;&gt;pyproject.toml&lt;&#x2F;a&gt;, I wanted to get rid of the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0518&#x2F;#rationale&quot;&gt;flaws&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;blog.ionelmc.ro&#x2F;2015&#x2F;02&#x2F;24&#x2F;the-problem-with-packaging-in-python&#x2F;&quot;&gt;resulting pain&lt;&#x2F;a&gt; of setuptools. So I went to write &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pyo3&#x2F;pyo3-pack&quot;&gt;pyo3-pack&lt;&#x2F;a&gt;, which aims at making packaging and publishing native python modules in rust as easy as &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;rustwasm&#x2F;wasm-pack&quot;&gt;wasm-pack&lt;&#x2F;a&gt; makes it for javascript.&lt;&#x2F;p&gt;
&lt;p&gt;It turns out that writing such a tool is relatively easy (less than a thousand lines of rust to get from source to wheel). The hard part is to find out what you need to do in the first place. The documentation is scattered across different and partially outdated tutorials, PEPs, stack overflow answers, references, examples and source code; I sometimes even had to resort to reverse engineering. So I decided to write down everything I learned about native wheels, which eventually became this blog post.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-good-parts&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#the-good-parts&quot; aria-label=&quot;Anchor link for: the-good-parts&quot;&gt;The good parts&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The official tutorial on native modules, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.python.org&#x2F;3&#x2F;extending&#x2F;&quot;&gt;Extending Python with C or C++&lt;&#x2F;a&gt;, is a good introduction to the core concepts of native modules: The header files, the calling conventions, GC, the object protocol and error handling. It only shows building for C&#x2F;C++ with distutils (the predecessor to setuptools) though, omits the officially blessed manylinux, and lacks an explanation of the abi and linking options (more on those later).&lt;&#x2F;p&gt;
&lt;p&gt;For the daily work, the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.python.org&#x2F;3&#x2F;c-api&#x2F;index.html&quot;&gt;Python&#x2F;C API Reference Manual&lt;&#x2F;a&gt; is often much better. It also has some explanation for the ABI.&lt;&#x2F;p&gt;
&lt;p&gt;For the rest of the post I&#x27;ll assume that you have built your native python module as shared library (e.g &lt;code&gt;PyInit_&amp;lt;modname&amp;gt;&lt;&#x2F;code&gt; function for python 3) with your technology of choice.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;metadata&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#metadata&quot; aria-label=&quot;Anchor link for: metadata&quot;&gt;Metadata&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Each python package, whether it is &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;packaging.python.org&#x2F;discussions&#x2F;wheel-vs-egg&#x2F;&quot;&gt;an egg or a wheel&lt;&#x2F;a&gt; or a source archive, is described by structured metadata, which contains fields required for pip to work and informational fields used e.g. for pypi.&lt;&#x2F;p&gt;
&lt;p&gt;There are five versions for the metadata of python packages: 1.0 (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0241&#x2F;&quot;&gt;PEP 241&lt;&#x2F;a&gt;), 1.1 (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0314&#x2F;&quot;&gt;PEP 314&lt;&#x2F;a&gt;), 1.2 (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0345&#x2F;&quot;&gt;PEP 345&lt;&#x2F;a&gt;), 2.0 (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0426&quot;&gt;PEP 426&lt;&#x2F;a&gt;) and 2.1 (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0566&#x2F;&quot;&gt;PEP 566&lt;&#x2F;a&gt;).&lt;&#x2F;p&gt;
&lt;p&gt;2.0 was an attempt to replace the key-value structure of the metadata with a json like structure. This could have been a big improvement, but was withdrawn (and is not accepted by pypi or pip) since it would have been a to big breakage. This is why the current version is called 2.1, even though it is backwards compatible to 1.0.&lt;&#x2F;p&gt;
&lt;p&gt;The current specification can be found at &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;packaging.python.org&#x2F;specifications&#x2F;core-metadata&#x2F;&quot;&gt;PyPA&#x27;s Core metadata specifications page&lt;&#x2F;a&gt;, which is pretty self-explaining and worth reading.&lt;&#x2F;p&gt;
&lt;p&gt;N.B.: https:&#x2F;&#x2F;www.pypa.io&#x2F;en&#x2F;latest&#x2F;roadmap&#x2F; is completely outdated as it still features as Metadata 2.0 as part of the roadmap. https:&#x2F;&#x2F;packaging.python.org&#x2F;specifications&#x2F;core-metadata&#x2F;#description is misleading since you must not use the RFC 822 in the metadata for the pypi upload (see the section on uploading) and for the METADATA file inside the wheel you can just put the description in the body, i.e. after all the keys.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tags-and-naming&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#tags-and-naming&quot; aria-label=&quot;Anchor link for: tags-and-naming&quot;&gt;Tags and naming&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Native modules need to specify with which platforms and python interpreters they are compatible. Python has two coexisting standards with slightly different syntax: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0425&#x2F;&quot;&gt;PEP 425&lt;&#x2F;a&gt; for packages and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-3149&#x2F;&quot;&gt;PEP 3149&lt;&#x2F;a&gt; for shared libraries. Both are based on abi tags, so let&#x27;s discuss them first.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-cpython-abi&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#the-cpython-abi&quot; aria-label=&quot;Anchor link for: the-cpython-abi&quot;&gt;The cpython ABI&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;cpython abi is composed of the major and minor version of cpython and a set of abiflags, which are determined by compiler flags. According &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-3149&#x2F;#proposal&quot;&gt;PEP 3149&lt;&#x2F;a&gt; there are three such compile time options we need to consider (at least for linux and mac):&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;d&lt;&#x2F;code&gt;: &lt;code&gt;--with-pydebug&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;m&lt;&#x2F;code&gt;: &lt;code&gt;--with-pymalloc&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;u&lt;&#x2F;code&gt;: &lt;code&gt;--with-wide-unicode&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;For practical purposes, &lt;code&gt;d&lt;&#x2F;code&gt; is irrelevant, &lt;code&gt;m&lt;&#x2F;code&gt; is always set and &lt;code&gt;u&lt;&#x2F;code&gt; may or may not be set - more on &lt;code&gt;u&lt;&#x2F;code&gt; below. The tag for this abi is &lt;code&gt;cp{major}{minor}{abiflags}&lt;&#x2F;code&gt; or &lt;code&gt;cpython-{major}{minor}{abiflags}&lt;&#x2F;code&gt;. My python 3.6 installation is for example &lt;code&gt;cp36m&lt;&#x2F;code&gt; and &lt;code&gt;cpython-36m&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;u&lt;&#x2F;code&gt; or wide-unicode flag is about the representation of unicode characters (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.joelonsoftware.com&#x2F;2003&#x2F;10&#x2F;08&#x2F;the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses&#x2F;&quot;&gt;introductory article&lt;&#x2F;a&gt;). Initially, python unicode characters were fixed to two bytes (UCS-2), meaning that any 3 or 4 byte characters were not representable. This changed with &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0261&#x2F;&quot;&gt;PEP 261&lt;&#x2F;a&gt;, which added optional support for wide unicode characters (UCS-4) to python2. The choice between UCS-2 and UCS-4 was made a compile time option, creating the abi without &quot;u&quot; for UCS-2 and one with &quot;u&quot; for UCS-4 (ignoring the option to completly disable unicode). In &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.python.org&#x2F;3&#x2F;whatsnew&#x2F;3.3.html#pep-393&quot;&gt;python 3.3&lt;&#x2F;a&gt; this was replaced by a system that determines the representation at runtime described in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0393&#x2F;&quot;&gt;PEP 393&lt;&#x2F;a&gt;, removing the &quot;u&quot; flag from the abi. This means that the wide-unicode option is only relevant for backwards compatibility with python 2.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-stable-abi&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#the-stable-abi&quot; aria-label=&quot;Anchor link for: the-stable-abi&quot;&gt;The stable abi&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;There are obviously some big drawbacks from having tons of different abis which you all need to support and build and test, so &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0384&#x2F;&quot;&gt;PEP 384&lt;&#x2F;a&gt; introduced the &quot;stable abi&quot; in python 3.2. This abi with the tag &lt;code&gt;abi3&lt;&#x2F;code&gt; contains a subset of the full abi and that is forward compatible with all future 3.x releases of cpython. In the header files, everything that is not part of the stable abi is gated with &lt;code&gt;#if !defined(Py_LIMITED_API)&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The stable abi is extended from time to time, meaning that you can require the stable abi and a minimum version. In the header files this is done by setting &lt;code&gt;Py_LIMITED_API&lt;&#x2F;code&gt; to the minimum support python in the  &lt;code&gt;PY_VERSION_HEX&lt;&#x2F;code&gt; format as described in the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.python.org&#x2F;3&#x2F;c-api&#x2F;stable.html&quot;&gt;documentation&lt;&#x2F;a&gt;. In the headers this is checked e.g. with &lt;code&gt;#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 &amp;gt;= 0x03030000&lt;&#x2F;code&gt; for a function that was added to the stable abi in python 3.3.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;sysconfig&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#sysconfig&quot; aria-label=&quot;Anchor link for: sysconfig&quot;&gt;Sysconfig&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;In the initial version of this post, I wrote about getting the required information about the interpreter through sysconfig. But it turned out that sysconfig behaves inconsistently across python versions and operating systems. E.g. the &lt;code&gt;VERSION&lt;&#x2F;code&gt; field on linux is in the format &lt;code&gt;{major}.{minor}&lt;&#x2F;code&gt;, while it is &lt;code&gt;{major}{minor}&lt;&#x2F;code&gt; on windows (both with python 3.7). There&#x27;s also &lt;code&gt;EXT_SUFFIX&lt;&#x2F;code&gt;, which tells you the complete extension of the library filename on linux (e.g. &lt;code&gt;&quot;.cpython-35m-x86_64-linux-gnu.so&quot;&lt;&#x2F;code&gt;), but on windows it&#x27;s just &lt;code&gt;.pyd&lt;&#x2F;code&gt;. I&#x27;ve collected a few samples in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;PyO3&#x2F;pyo3-pack&#x2F;tree&#x2F;master&#x2F;sysconfig&quot;&gt;a folder in the pyo3-pack repo&lt;&#x2F;a&gt;. You&#x27;ll find more of those weird cases in there.&lt;&#x2F;p&gt;
&lt;p&gt;I&#x27;m currently using the following snippet with &lt;code&gt;python -c&lt;&#x2F;code&gt; and do the logic and sanity checks in rust.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;python&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; sysconfig&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; sys&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; json&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;print&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span&gt;json&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;dumps&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;({&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;    &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;major&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;:&lt;&#x2F;span&gt;&lt;span&gt; sys&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;version_info&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;major&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;    &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;minor&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;:&lt;&#x2F;span&gt;&lt;span&gt; sys&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;version_info&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;minor&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;    &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;abiflags&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;:&lt;&#x2F;span&gt;&lt;span&gt; sysconfig&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;get_config_var&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;(&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;ABIFLAGS&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;),&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;    &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;m&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;:&lt;&#x2F;span&gt;&lt;span&gt; sysconfig&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;get_config_var&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;(&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;WITH_PYMALLOC&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt; ==&lt;&#x2F;span&gt;&lt;span style=&quot;color: #B48EAD;&quot;&gt; 1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;    &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;u&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;:&lt;&#x2F;span&gt;&lt;span&gt; sysconfig&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;get_config_var&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;(&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;Py_UNICODE_SIZE&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt; ==&lt;&#x2F;span&gt;&lt;span style=&quot;color: #B48EAD;&quot;&gt; 4&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;    &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;d&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;:&lt;&#x2F;span&gt;&lt;span&gt; sysconfig&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;get_config_var&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;(&amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;Py_DEBUG&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt; ==&lt;&#x2F;span&gt;&lt;span style=&quot;color: #B48EAD;&quot;&gt; 1&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #616E88;&quot;&gt;    # This one isn&amp;#39;t technically necessary, but still very useful for sanity checks&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;    &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;platform&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;:&lt;&#x2F;span&gt;&lt;span&gt; sys&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;platform&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;,&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;}))&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is than deserialized into the equivalent of the following python 3.7 code:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;python&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;@&lt;&#x2F;span&gt;&lt;span style=&quot;color: #D08770;&quot;&gt;dataclass&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;class&lt;&#x2F;span&gt;&lt;span style=&quot;color: #8FBCBB;&quot;&gt; Interpreter&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    major&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt; int&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    minor&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt; int&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    abiflags&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;:&lt;&#x2F;span&gt;&lt;span&gt; Optional&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;str&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you still want to use sysconfig, the easiest way is through &lt;code&gt;python -m sysconfig&lt;&#x2F;code&gt;. As seen above, you can use &lt;code&gt;WITH_PYMALLOC&lt;&#x2F;code&gt; (1 means &lt;code&gt;m&lt;&#x2F;code&gt;), &lt;code&gt;Py_UNICODE_SIZE&lt;&#x2F;code&gt; (4 means &lt;code&gt;u&lt;&#x2F;code&gt;) and &lt;code&gt;Py_DEBUG&lt;&#x2F;code&gt; (1 would mean &lt;code&gt;d&lt;&#x2F;code&gt;) for the python 2 abiflags.&lt;&#x2F;p&gt;
&lt;p&gt;To get the flags in machine readable as &lt;code&gt;Dict[str, Union[str, int]]&lt;&#x2F;code&gt;, use:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;shellscript&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;python&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt; -c&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt; &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;import json, sysconfig; print(json.dumps(sysconfig.get_config_vars()))&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;For a &lt;code&gt;Dict[str, str]&lt;&#x2F;code&gt;, use:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;shellscript&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;python&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt; -c&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt; &amp;quot;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;import json, sysconfig; print(json.dumps({k:str(v) for k, v in sysconfig.get_config_vars().items()}))&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;quot;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h2 id=&quot;naming-shared-libraries&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#naming-shared-libraries&quot; aria-label=&quot;Anchor link for: naming-shared-libraries&quot;&gt;Naming shared libraries&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-3149&#x2F;&quot;&gt;PEP 3149&lt;&#x2F;a&gt; defines that shared libraries will get a tag between the file name and the extension, separated by dot. It tells you that this tag needs to include at least the implementation (i.e. cpython) with its major and minor version. It also shows &lt;code&gt;.cpython-32mu.so&lt;&#x2F;code&gt; as an example for such file extension, from which we can derive &lt;code&gt;.cpython-{major}{minor}{abiflags}.so&lt;&#x2F;code&gt; as template.&lt;&#x2F;p&gt;
&lt;p&gt;This sounds nice, but is extremely misleading if not plainly wrong in reality.&lt;&#x2F;p&gt;
&lt;p&gt;From picking apart other native libraries and trial and error with filenames I figured the following:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Python 2.7 - 3.2 doesn&#x27;t have any abitags.&lt;&#x2F;li&gt;
&lt;li&gt;Python 3.2 - 3.4 actually use the scheme &lt;code&gt;.cpython-{major}{minor}{abiflags}.so&lt;&#x2F;code&gt; for POSIX (i.e. linux and mac), but accepts files without tag. Windows still doesn&#x27;t use tags.&lt;&#x2F;li&gt;
&lt;li&gt;Python 3.5+ uses the a new scheme with the platform included, which is now also used for windows. 3.5.+ also accepts files without any tag, but not those with a 3.2 - 3.4 style tag.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The only place the new, 3.5+ schema has ever been announced were the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.python.org&#x2F;3&#x2F;whatsnew&#x2F;3.5.html#build-and-c-api-changes&quot;&gt;python 3.5 release notes&lt;&#x2F;a&gt;. But rejoice, even those are wrong. (I tried googling both the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.google.com&#x2F;search?q=cpython-%3Cmajor%3E%3Cminor%3Em-%3Carchitecture%3E-%3Cos%3E.pyd&quot;&gt;wrong&lt;&#x2F;a&gt; and the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.google.com&#x2F;search?q=cpython-%3Cmajor%3E%3Cminor%3Em-%3Carchitecture%3E-%3Cos%3E.so&quot;&gt;correct&lt;&#x2F;a&gt; version, but it really seems to be only in those release notes)&lt;&#x2F;p&gt;
&lt;p&gt;For 3.5+, I found that the following is what&#x27;s actually working (and also what setuptools produce):&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Linux&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Template: &lt;code&gt;.cpython-{major}{minor}{abiflags}-{architecture}-{os}.so&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;architecture&lt;&#x2F;code&gt; is either &lt;code&gt;i386&lt;&#x2F;code&gt; or &lt;code&gt;x86_64&lt;&#x2F;code&gt;, and &lt;code&gt;os&lt;&#x2F;code&gt; is &lt;code&gt;linux-gnu&lt;&#x2F;code&gt;. The release notes state that the file extension is &lt;code&gt;.pyd&lt;&#x2F;code&gt;, which is wrong and doesn&#x27;t work in practice. Also note that os has an internal minus, breaking the general rule of separating parts of the tag with a minus.&lt;&#x2F;p&gt;
&lt;p&gt;Example: &lt;code&gt;steinlaus.cpython-35m-x86_64-linux-gnu.so&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Mac OS&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Template: &lt;code&gt;.cpython-{major}{minor}{abiflags}-darwin.so&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Example: &lt;code&gt;steinlaus.cpython-35m-darwin.so&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Windows&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Template: &lt;code&gt;{name}.cp{major}{minor}-{platform}.pyd&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The platform is either &lt;code&gt;win_amd64&lt;&#x2F;code&gt; or &lt;code&gt;win32&lt;&#x2F;code&gt;. .pyd files are just renamed .dll files, which is confirmed in the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.python.org&#x2F;3&#x2F;faq&#x2F;windows.html#is-a-pyd-file-the-same-as-a-dll&quot;&gt;official windows FAQ&lt;&#x2F;a&gt; (which is otherwise extremely outdated)&lt;&#x2F;p&gt;
&lt;p&gt;Example: &lt;code&gt;steinlaus.cp35-win_amd64.pyd&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;naming-wheels&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#naming-wheels&quot; aria-label=&quot;Anchor link for: naming-wheels&quot;&gt;Naming wheels&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;The documentation for defining wheels is much better than the one for naming so files with most parts being specified in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0425&#x2F;&quot;&gt;PEP 425&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The official schema from that PEP is &lt;code&gt;{distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl&lt;&#x2F;code&gt;, which is used for all python versions. The distribution is you package&#x27;s name escaped with &lt;code&gt;re.sub(&quot;[^\w\d.]+&quot;, &quot;_&quot;, distribution, re.UNICODE)&lt;&#x2F;code&gt;, we can ignore and skip the build tag, the python tag for our case is &lt;code&gt;cp{major}{minor}{abiflags}&lt;&#x2F;code&gt;, the abi tag is either the python tag, &lt;code&gt;abi3&lt;&#x2F;code&gt; or &lt;code&gt;none&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For the platform tag it states that &quot;The platform tag is simply distutils.util.get_platform() with all hyphens - and periods . replaced with underscore _.&quot; This is unfortunate, since the output of &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;docs.python.org&#x2F;3.7&#x2F;distutils&#x2F;apiref.html#distutils.util.get_platform&quot;&gt;distutils.util.get_platform()&lt;&#x2F;a&gt; isn&#x27;t specified, so we need to reverse engineer. Looking only at 32-bit and 64-bit x86, we have either &lt;code&gt;win_amd64&lt;&#x2F;code&gt; or &lt;code&gt;win32&lt;&#x2F;code&gt; for windows. For linux, we have &lt;code&gt;linux_i686&lt;&#x2F;code&gt; or &lt;code&gt;linux_x86_64&lt;&#x2F;code&gt;, even though in practice we must use either &lt;code&gt;manylinux1_i686&lt;&#x2F;code&gt; or &lt;code&gt;manylinux1_x86_64&lt;&#x2F;code&gt; as desribed in the manylinux paragraph below. For mac the tag used by setuptools is  &lt;code&gt;macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64&lt;&#x2F;code&gt; for whatever reason.&lt;&#x2F;p&gt;
&lt;p&gt;Examles:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;steinlaus-1.0.0-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;steinlaus-1.0.0-cp36-cp36m-manylinux1_x86_64.whl&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;steinlaus-1.0.0-cp36-cp36m-win_amd64.whl&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;h3 id=&quot;manylinux&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#manylinux&quot; aria-label=&quot;Anchor link for: manylinux&quot;&gt;Manylinux&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Libraries and binaries on Linux are traditionally (for the better or worse) dynamically linked to libraries in &lt;code&gt;$LD_LIBRARY_PATH&lt;&#x2F;code&gt;, which are installed through the systems package manager. Native modules could require arbitrary versions of arbitrary libraries, but can&#x27;t guarantee they are installed on the target machine, leading to linker errors when importing. To avoid such incompatibilities, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0513&#x2F;&quot;&gt;PEP 513&lt;&#x2F;a&gt; specifies a target &lt;code&gt;manylinux1&lt;&#x2F;code&gt; which contains only a set of old versions of libraries that can be found on basically every Linux. (This is an extremely short summary of the rational in the PEP)&lt;&#x2F;p&gt;
&lt;p&gt;Wheels for the &lt;code&gt;manylinux1&lt;&#x2F;code&gt; target must be in the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pypa&#x2F;manylinux&quot;&gt;manylinux1 docker container&lt;&#x2F;a&gt;. This container is based on CentOS 5, i.e. some very old Linux. Using this docker image is the only officially blessed way to build for the linux target in general. Pypi only accept wheels with the &lt;code&gt;manylinux1&lt;&#x2F;code&gt; tag and rejects those with a &lt;code&gt;linux&lt;&#x2F;code&gt; tag. A slightly more modern target, &lt;code&gt;manylinux2010&lt;&#x2F;code&gt; is currently being working on as a successor for manylinux1 (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0571&#x2F;&quot;&gt;PEP 571 - The manylinux2010 Platform Tag&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pypa&#x2F;manylinux&#x2F;issues&#x2F;179&quot;&gt;tracking issue&lt;&#x2F;a&gt;).&lt;&#x2F;p&gt;
&lt;p&gt;manylinux is accompanied by a tool called &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pypa&#x2F;auditwheel&quot;&gt;auditwheel&lt;&#x2F;a&gt; that checks the library and then &quot;awards&quot; the manylinux1 tag. Afaik this is not checked by pypi, so it&#x27;s possible to lie about that check.&lt;&#x2F;p&gt;
&lt;p&gt;By default rust only links very few system libraries, which are a subset of the manylinux1 target. This means that pyo3-pack only needs to check that the constraints are met and we can otherwise totally skip the whole ancient-docker-mess.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-internals-of-a-binary-wheel&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#the-internals-of-a-binary-wheel&quot; aria-label=&quot;Anchor link for: the-internals-of-a-binary-wheel&quot;&gt;The internals of a (binary) wheel&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;While there alternative ways to install python packages, using wheels with pip is (for good reasons) the officially blessed one, so for pyo3-pack I&#x27;ve only looked into into building wheels. They are specified in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0427&#x2F;&quot;&gt;PEP 427&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Wheels are generally just zip files with a &lt;code&gt;.whl&lt;&#x2F;code&gt; extension. They come in two flavors: sdist and bdist. bdist (&quot;built distribution&quot;) wheels are pre-built packages. Their installation is mostly just unpacking the archive. They specify the compatible python version(s), an abi and a platform. sdist (&quot;source distribution&quot;) wheels contain all the sources including your &lt;code&gt;setup.py&lt;&#x2F;code&gt; or &lt;code&gt;pyproject.toml&lt;&#x2F;code&gt;, so for installing them, they need to be built first.&lt;&#x2F;p&gt;
&lt;p&gt;Every wheel contains a &lt;code&gt;{distribution}-{version}.dist-info&lt;&#x2F;code&gt; folder with the following files inside it, where &lt;code&gt;{distribution}&lt;&#x2F;code&gt; is again the name with the underscore-escapes.&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;WHEEL&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Wheel-Version: 1.0&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Generator: pyo3-pack ({version})&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Root-Is-Purelib: false&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Tag: {python tag}-{abi tag}-{platform tag}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;METADATA&lt;&#x2F;code&gt;: This file contains the metadata as described above. Since metadata 2.1, you can (and want to) put the description in the body of the file, separated from the key value pairs by a newline. The only required keys are &lt;code&gt;Metadata-Version&lt;&#x2F;code&gt;, &lt;code&gt;Name&lt;&#x2F;code&gt; and &lt;code&gt;Version&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Metadata-Version: 2.1&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Name: {name}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Version: {version}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;Summary: {summary or UNKNOWN}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;{description &#x2F; content of readme.md}&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;RECORD&lt;&#x2F;code&gt;: This file contains checksums and sizes of all files. Each line contains a file, a hash and the size of the file in bytes separated by commas like the following:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;path&#x2F;to&#x2F;file,sha256=HASH-AS-URLSAFE-BASE64-NOPAD,1234&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The only exception is the record file itself, for which hash and size are left blank:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;{name}-{version}.dist-info&#x2F;RECORD,,&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The exact format is described in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0376&#x2F;#record&quot;&gt;PEP 376&lt;&#x2F;a&gt;, while &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0427&#x2F;&quot;&gt;PEP 427&lt;&#x2F;a&gt; adds that the hasing algorithm must be &quot;sha256 or better&quot;.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;entry_points.txt&lt;&#x2F;code&gt;: This file isn&#x27;t specified in any PEP, but in the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;packaging.python.org&#x2F;specifications&#x2F;entry-points&#x2F;&quot;&gt;Entry points specification&lt;&#x2F;a&gt;. It contains sections with key-value pairs in the ini format. While there&#x27;s more it can do, the interesting part is a section called &lt;code&gt;console_scripts&lt;&#x2F;code&gt;. This section lists function which should be exposed as shell commands. The keys are the commands, while the value specifies which function to call. Pip will create the scripts which are small wrappers around the functions when installing the package. The functions have the structure &lt;code&gt;some.module.path:object.attr&lt;&#x2F;code&gt;. E.g. poetry defines&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;ini&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span&gt;console_scripts&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;]&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;poetry&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;=&lt;&#x2F;span&gt;&lt;span&gt;poetry.console:main&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;which pip translates to&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;python&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #616E88;&quot;&gt;#!&#x2F;usr&#x2F;bin&#x2F;python3&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #616E88;&quot;&gt;# -*- coding: utf-8 -*-&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; re&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;import&lt;&#x2F;span&gt;&lt;span&gt; sys&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;from&lt;&#x2F;span&gt;&lt;span&gt; poetry&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;console&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt; import&lt;&#x2F;span&gt;&lt;span&gt; main&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;if&lt;&#x2F;span&gt;&lt;span&gt; __name__&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt; ==&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt; &amp;#39;&lt;&#x2F;span&gt;&lt;span style=&quot;color: #A3BE8C;&quot;&gt;__main__&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;#39;:&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    sys&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;argv&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #B48EAD;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;]&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt; =&lt;&#x2F;span&gt;&lt;span&gt; re&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;sub&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;r&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;#39;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #EBCB8B;&quot;&gt;-script\.pyw&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;?|&lt;&#x2F;span&gt;&lt;span style=&quot;color: #EBCB8B;&quot;&gt;\.exe&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;)&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;?&lt;&#x2F;span&gt;&lt;span style=&quot;color: #EBCB8B;&quot;&gt;$&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;&amp;#39;, &amp;#39;&amp;#39;,&lt;&#x2F;span&gt;&lt;span&gt; sys&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span&gt;argv&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;[&lt;&#x2F;span&gt;&lt;span style=&quot;color: #B48EAD;&quot;&gt;0&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;])&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    sys&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;.&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;exit&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;(&lt;&#x2F;span&gt;&lt;span style=&quot;color: #88C0D0;&quot;&gt;main&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt;())&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;top_level.txt&lt;&#x2F;code&gt;: Setuptools also add this file which contains only the name of your package. This is part of the (PEP-less) egg format, the predecessor of wheels, as described in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pypa&#x2F;setuptools&#x2F;blob&#x2F;master&#x2F;docs&#x2F;formats.txt&quot;&gt;The Internal Structure of Python Eggs&lt;&#x2F;a&gt;. This file is not documented and not needed for wheels and therefore not added by other packagers such as poetry.  (Interestingly enough, the &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pypa&#x2F;wheel&quot;&gt;wheel&lt;&#x2F;a&gt; repository, which adds the &lt;code&gt;bdist_whl&lt;&#x2F;code&gt; command to setuptools, does not even contain the string &lt;code&gt;top_level.txt&lt;&#x2F;code&gt;.)&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;For actual package you have two options:&lt;&#x2F;p&gt;
&lt;p&gt;If you only need to package one shared library, you put it at the top level of the zip. The shared library must be named according to the rules describe above, while basename must be the name of the module.&lt;&#x2F;p&gt;
&lt;p&gt;Example:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;├── get_fourtytwo-1.6.8.dist-info&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   ├── METADATA&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   ├── RECORD&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   └── WHEEL&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;└── get_fourtytwo.cpython-36m-x86_64-linux-gnu.so&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;For any wheels containing python files, whether they have native components or not, the top level module is a python module. This means a directory at the top level with the name of the module and a &lt;code&gt;__init__.py&lt;&#x2F;code&gt; inside that direct ry. Inside this directory the same rules as for any other python project apply. Native modules work the same way as pure python single file module, only that the filenames end with &lt;code&gt;.so&lt;&#x2F;code&gt; or &lt;code&gt;.pyd&lt;&#x2F;code&gt; instead of &lt;code&gt;.py&lt;&#x2F;code&gt;. Take a look at numpy&#x27;s wheels for a complex, real world scenario.&lt;&#x2F;p&gt;
&lt;p&gt;Example:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;.&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;├── get_fourtytwo&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   ├── __init__.py&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   ├── native_fourtytwo.cpython-36m-x86_64-linux-gnu.so&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   └── python_fourtytwo.py&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;└── get_fourtytwo-1.6.8.dist-info&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ├── METADATA&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    ├── RECORD&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;    └── WHEEL&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;... where &lt;code&gt;__init__.py&lt;&#x2F;code&gt; contains&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;python&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;from&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt; .&lt;&#x2F;span&gt;&lt;span&gt;native_fourtytwo&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt; import&lt;&#x2F;span&gt;&lt;span&gt; native_class&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt;from&lt;&#x2F;span&gt;&lt;span style=&quot;color: #ECEFF4;&quot;&gt; .&lt;&#x2F;span&gt;&lt;span&gt;python_fourtytwo&lt;&#x2F;span&gt;&lt;span style=&quot;color: #81A1C1;&quot;&gt; import&lt;&#x2F;span&gt;&lt;span&gt; some_class&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Besides the presented wheel 1.0 format, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0491&#x2F;&quot;&gt;PEP 491&lt;&#x2F;a&gt; defining a &quot;Wheel 1.9&quot; format also exists. It is officially in draft status, but it seems completely abandoned, with no mention neither on the mailing list nor in the relevant github repos. The PEP doesn&#x27;t explain why version 1.9 should follow version 1.0.&lt;&#x2F;p&gt;
&lt;p&gt;Note that you can lie to pypi about the metadata. E.g. I actually ran into a case, where a .tar.gz was uploaded as 3.0.{date}, while the installed package identified itself as 3.0.dev0, which didn&#x27;t exist on pypi. This effectively broke pip freeze.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;source-distributions&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#source-distributions&quot; aria-label=&quot;Anchor link for: source-distributions&quot;&gt;Source distributions&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Source distribution, sdist for short, are special source archives that can be build and installed with pip. They are used e.g. when there are no wheels for the current platform&#x2F;abi and as base for building debian or fedora packages. While they existed for a longer time, they are formally specified in &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.python.org&#x2F;dev&#x2F;peps&#x2F;pep-0517&quot;&gt;PEP 517&lt;&#x2F;a&gt;. This PEP differentiates between a source tree, which would be the git repository, and a source distribution, which this paragraph is about.&lt;&#x2F;p&gt;
&lt;p&gt;A source distribution is a .tar.gz archive. It is explicitly stated that zip archives are not allowed anymore, even though it mentions &lt;code&gt;lxml-3.4.4.zip&lt;&#x2F;code&gt; as an example in the beginning. The filename is &lt;code&gt;{name}-{version}.tar.gz&lt;&#x2F;code&gt;. The archive contains one folder, which is named &lt;code&gt;{name}-{version}&lt;&#x2F;code&gt;. This folder contains the required source, a setup.py and&#x2F;or a pyproject.toml, and a file called &lt;code&gt;PKG-INFO&lt;&#x2F;code&gt; which identical to the &lt;code&gt;METADATA&lt;&#x2F;code&gt; file in wheels.&lt;&#x2F;p&gt;
&lt;p&gt;Example:&lt;&#x2F;p&gt;
&lt;pre class=&quot;giallo&quot; style=&quot;color: #D8DEE9; background-color: #2E3440;&quot;&gt;&lt;code data-lang=&quot;plain&quot;&gt;&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;foobar-0.11.2&#x2F;&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;├── foobar&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   ├── __init__.py&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;│   └── main.py&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;├── LICENSE&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;├── PKG-INFO&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;├── pyproject.toml&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;
&lt;span class=&quot;giallo-l&quot;&gt;&lt;span&gt;└── setup.py&lt;&#x2F;span&gt;&lt;&#x2F;span&gt;&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If the archive contains a pyproject.toml with a &lt;code&gt;[build-system]&lt;&#x2F;code&gt; section that specifies a list of packages required for building as &lt;code&gt;requires&lt;&#x2F;code&gt; and the path to a build backend object in &lt;code&gt;build-backend&lt;&#x2F;code&gt;, this backend should be called by pip to build the source distribution into a wheel. pip 10.0.1, which is the latest version as of this writing, refuses to install such wheels stating &quot;This version of pip does not implement PEP 517 so it cannot build a wheel without &#x27;setuptools&#x27; and &#x27;wheel&#x27;.&quot;. We can therefore skip any further details about the build backend because we can&#x27;t use it yet anyway.&lt;&#x2F;p&gt;
&lt;p&gt;Without a pyproject.toml with those entries, pip executes the &lt;code&gt;setup.py&lt;&#x2F;code&gt; in the directory, meaning that currently the way to support source distributions is to use setuptools, which is exactely what I wanted to avoid. This means no sdist in custom packagers for now.&lt;&#x2F;p&gt;
&lt;p&gt;As a side note, both flit and poetry already implement the PEP 517 interface (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;takluyver&#x2F;flit&#x2F;blob&#x2F;0514a13b2172aa717f065f4531b173bb06663057&#x2F;flit&#x2F;buildapi.py&quot;&gt;buildapy.py in flit&lt;&#x2F;a&gt; and &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;sdispater&#x2F;poetry&#x2F;blob&#x2F;1b7492e5659a43c0a05f10c6d60e90603c6c3406&#x2F;poetry&#x2F;masonry&#x2F;api.py&quot;&gt;api.py in poetry&lt;&#x2F;a&gt;) and add a pyproject.toml to the archive. But as they omit the &lt;code&gt;[build-system]&lt;&#x2F;code&gt;, pip instead uses the setup.py they also create.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;finding-python-interpreters&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#finding-python-interpreters&quot; aria-label=&quot;Anchor link for: finding-python-interpreters&quot;&gt;Finding python interpreters&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;It&#x27;s convenient for building and essential for testing to find the installed python versions. For linux and mac, you can check which python binaries are in &lt;code&gt;PATH&lt;&#x2F;code&gt; and then use the snippet from above to get the version and abiflags. (Or you can use a fixed list of 2.7 and 3.5+ and just try each because there&#x27;s no good library to work with &lt;code&gt;PATH&lt;&#x2F;code&gt; yet).&lt;&#x2F;p&gt;
&lt;p&gt;For windows, every python version is just called &lt;code&gt;python.exe&lt;&#x2F;code&gt;. Fortunately, there is a launcher called &lt;code&gt;py&lt;&#x2F;code&gt;. With &lt;code&gt;-0&lt;&#x2F;code&gt; (but not &lt;code&gt;--list&lt;&#x2F;code&gt;, even if the help says otherwise) it will list all known versions, which you can launch with &lt;code&gt;py -{version}&lt;&#x2F;code&gt;. It&#x27;s then easy to get the path of the actual interpreter with &lt;code&gt;py -{version} -c &quot;import sys; print(sys.executable)&quot;&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;contemporary-legacy-uploading&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#contemporary-legacy-uploading&quot; aria-label=&quot;Anchor link for: contemporary-legacy-uploading&quot;&gt;Contemporary legacy uploading&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Now that we&#x27;ve got our wheel built, we also want to publish it, i.e. upload it to pypi (which is now powered by a software called warehouse). It turns out that the api to upload packages is called the &quot;legacy api&quot;, even though there&#x27;s no new api for uploads (there is a json api, but it only supports reading package metadata). The upload part of the legacy api had no documentation other than &quot;use twine&quot;, so I read through the source of poetry uploader, warehouse&#x27;s endpoint and warehouse&#x27;s tests to figure out how to use that api. Eventually I wrote a &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pypa&#x2F;warehouse&#x2F;pull&#x2F;4080&quot;&gt;pull request&lt;&#x2F;a&gt; to warehouse &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;warehouse.readthedocs.io&#x2F;api-reference&#x2F;legacy&#x2F;#upload-api&quot;&gt;documenting that api&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;errors&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#errors&quot; aria-label=&quot;Anchor link for: errors&quot;&gt;Errors&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;As mentioned in the preface, all the information presented here is assembled from many different sources of varying qualitity and up-to-dateness, with some parts being reverse-engineered. So if you find any errors or missing parts, please ping me (konstin@mailbox.org, konstin on github, @konstinx on twitter).&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Meine Stadt Transparent, Teil 2: Die Technik</title>
        <published>2017-12-20T12:00:00+02:00</published>
        <updated>2017-12-20T12:00:00+02:00</updated>
        
        <author>
          <name>
            
              Konstantin Schütze
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="/meine-stadt-transparent-teil-2/"/>
        <id>/meine-stadt-transparent-teil-2/</id>
        
        <content type="html" xml:base="/meine-stadt-transparent-teil-2/">&lt;p&gt;Seit fast vier Monaten entwickeln &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;hoessl.eu&quot;&gt;Tobias Hößl&lt;&#x2F;a&gt; und ich Meine Stadt Transparent (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;meine-stadt-transparent.de&#x2F;&quot;&gt;Demo&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;meine-stadt-transparent&#x2F;meine-stadt-transparent&quot;&gt;GitHub&lt;&#x2F;a&gt;). Im &lt;a href=&quot;&#x2F;meine-stadt-transparent-teil-1&#x2F;&quot;&gt;ersten Teil&lt;&#x2F;a&gt; ging es darum, wie es dazu kam und warum wir dieses Projekt machen. In diesen Teil geht es um den momentanen Status und die technische Details.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;ziele&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#ziele&quot; aria-label=&quot;Anchor link for: ziele&quot;&gt;Ziele&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Wie in Teil 1 beschrieben gibt es bereits genügend kommerzielle Systeme, die sich hauptsächlich an Verwaltungsmitarbeiter und Stadträte richten, aber für Bürger kaum benutzbar sind. Wir wollen die Lücke eines bürgerfreundlichen Systems füllen. Anders gesagt wollen wir &lt;em&gt;Zugänglichkeit für alle&lt;&#x2F;em&gt; schaffen.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Verständliche Sprache&lt;&#x2F;strong&gt;: Existierende Systeme sind oft eine Mischung aus unverständlichen Abkürzungen und Wortmonstern wie „Beschlussvollzugkontrolle&quot;, „Ratsinformationssystem&quot; und „Anliegenmanagement&quot;, die die meisten Nutzer erst mal googlen müssen. Das macht Thema „Stadtpolitik&quot; nur noch unsexier als es sowieso schon ist. Wir vermeiden deswegen Abkürzung, Fachbegriffe und Techspeak soweit möglich.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Googlebarkeit&lt;&#x2F;strong&gt;: Informationen, die man durch die Suche nicht findet, existieren nicht. Natürlich kann man &lt;a rel=&quot;external&quot; href=&quot;http:&#x2F;&#x2F;www.zeit.de&#x2F;digital&#x2F;datenschutz&#x2F;2017-11&#x2F;rathaus-gemeinde-daten-ratsinformationssystem-hack&quot;&gt;auch versteckte Daten finden&lt;&#x2F;a&gt;, aber das ist Zeitaufwendig und scheitert oft auch einfach daran, dass man nicht weiß, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.abgeordnetenwatch.de&#x2F;blog&#x2F;2016-01-22&#x2F;wir-veroffentlichen-die-liste-mit-allen-gutachten-des-wissenschaftlichen-dienstes&quot;&gt;was überhaupt existiert&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Likability&lt;&#x2F;strong&gt;: Politik ist eines dieser Themen, die zwar sehr wichtig aber auch sehr langweilig sind. Deshalb ist es umso wichtiger, dass man eine Website dazu auch benutzen &lt;em&gt;möchte&lt;&#x2F;em&gt;. Technische Perfektion ist schön und gut, aber wenn eine Webseite so interessant ist wie ein Ladebalken wird sich kaum einer freiwillig dort aufhalten. Das heißt nicht, dass wir Seite mit Achievements, Kalendarsprüche und Animationen vollstopfen, aber ein paar &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;aww&#x2F;&quot;&gt;Katzenbilder&lt;&#x2F;a&gt; und &lt;span title=&quot;Möglicherweise können Konami, Fauletiere und MLP darin vorkommen&quot;&gt;Eastereggs&lt;&#x2F;span&gt; dürfen es schon sein.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Barrierefreiheit&lt;&#x2F;strong&gt;: Politik betrifft alle Menschen. Leider werden die Dokumente in der Regel nur als pdf veröffentlicht, woran wir aber im Moment leider nichts ändern können. Wir versuchen deshalb zumindest die Seite an sich durch html-aria-tags und Linter möglichst Barrierefrei zu halten.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Nachhaltigkeit&lt;&#x2F;strong&gt;: Wenn Software nicht so gebaut ist, dass sich gut weiterentwickeln und damit an neue technische Entwicklungen anpassen lässt, veraltet sie und verliert dadurch nach und nach ihre Nutzbarkeit.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;meine-stadt-transparent-teil-2&#x2F;cat-content.jpg&quot; alt=&quot;Likability durch cat content (Symbolbild)&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Likability durch cat content (Symbolbild)&lt;&#x2F;p&gt;
&lt;h3 id=&quot;frameworks-und-bibliotheken&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#frameworks-und-bibliotheken&quot; aria-label=&quot;Anchor link for: frameworks-und-bibliotheken&quot;&gt;Frameworks und Bibliotheken&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Um diese Ziele umzusetzen brauchen wir einen ganzen Stapel an Technik.&lt;&#x2F;p&gt;
&lt;p&gt;Das Backend ist in pyhton 3 mit django geschrieben. Als Datenbank verwenden wir Mariadb bzw. teilweise sqlite und für die Suche elasticsearch. Synchronisiert werden die Daten über django-elasticsearch-dsl, als Server läuft gunicorn hinter einem nignx. Für das Fontend nutzen wir Bootstrap 4 in django-Templates, dazu etwas javascript in einer npm-es6-babel-webpack-pipeline.&lt;&#x2F;p&gt;
&lt;p&gt;Wir hatten am Anfang überlegt, ein Javascript-Framework zu verwenden, uns dann aber dagegen entschieden. Mit Angular hatten wir beide bereits Erfahrung, wollten es aber aber aus unterschiedlichen Gründen nicht verwenden, genauso wenig verwenden wie vue.js noch React. Bei den meisten anderen Frameworks muss man davon ausgehen, dass sie in der schnelllebigen Javascript-Welt untergehen und damit nicht für ein langfristiges Projekt taugen, wie der Stack Overflow Blog &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;stackoverflow.blog&#x2F;2018&#x2F;01&#x2F;11&#x2F;brutal-lifecycle-javascript-frameworks&#x2F;&quot;&gt;sehr anschaulich gezeigt hat&lt;&#x2F;a&gt;. (Immer noch aktuelle Leseempfehlung dazu: &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;hackernoon.com&#x2F;how-it-feels-to-learn-javascript-in-2016-d3a717dd577f&quot;&gt;How it feels to learn JavaScript in 2016&lt;&#x2F;a&gt;). Ich persönlich würde am liebsten &lt;a rel=&quot;external&quot; href=&quot;http:&#x2F;&#x2F;elm-lang.org&#x2F;&quot;&gt;elm&lt;&#x2F;a&gt; oder &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.hellorust.com&#x2F;setup&#x2F;wasm-target&#x2F;&quot;&gt;rust&#x2F;webassembly&lt;&#x2F;a&gt; verwenden, leider sind beide aber nicht annähernd ausgereift genug.&lt;&#x2F;p&gt;
&lt;p&gt;Nach 3 Monaten mit Bootstrap bin ich mit unserer Entscheidung immernoch glücklich; Die Dokumentation von Bootstrap ist exzellent, die fertigen Klassen nehmen einem eine Menge Arbeit ab und für zwei nicht-Designer sieht die Seite nicht schlecht aus. Das Design kann später über Themes gut anpasst werden, was z.B. praktisch ist, um die Seite an verschiedene Städte anzupassen. Für Javascript-Bibliotheken brauchen wir außerdem keine Framework-Anbindung.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;die-live-demo&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#die-live-demo&quot; aria-label=&quot;Anchor link for: die-live-demo&quot;&gt;Die Live Demo&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Wir haben eine Live-Demo unter &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;meine-stadt-transparent.de&#x2F;&quot;&gt;meine-stadt-transparent.de&lt;&#x2F;a&gt;, die echte Daten der Stadt Jülich anzeigt.&lt;&#x2F;p&gt;
&lt;p&gt;Dank des großartigen &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;logsol&#x2F;Github-Auto-Deploy&quot;&gt;Github Auto Deploy&lt;&#x2F;a&gt; ist die Demo immer auf dem aktuellen Stand. Das heißt aber auch, dass ein falscher Commit die Seite zerschießen kann (und momentan auch darf).&lt;&#x2F;p&gt;
&lt;h3 id=&quot;der-datenimport&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#der-datenimport&quot; aria-label=&quot;Anchor link for: der-datenimport&quot;&gt;Der Datenimport&lt;&#x2F;a&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Die Seite ist so gebaut, dass man mit etwas eigenem Python-Code prinzipiell beliebige Daten importieren kann. Damit das Projekt aber nicht nur einen hypothetischen Zweck sondern auch eine realen praktische Nutzen hat, haben wir einen Importer für OParl-Schnittstellen geschrieben. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;oparl.org&#x2F;&quot;&gt;OParl&lt;&#x2F;a&gt; spezifiziert dabei eine API, mit der Daten aus deutschen Ratsinformationssystemen (RIS) in einem einheitlichen json-Format über eine REST-Schnittstelle exportiert werden können. OParl 1.0 wurde zwar komplett ehrenamtlich entwickelt, wird aber mittlerweile von vier großen RIS-Herstellern angeboten bzw. entwickelt. Durch Open.NRW gibt es mittlerweile eine &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.openpr.de&#x2F;news&#x2F;982972&#x2F;21-Kommunen-erfolgreich-mit-OParl-gestartet.html&quot;&gt;offizielle OParl-Unterstützung in 21 Kommunen mit Sternberg SD.NET&lt;&#x2F;a&gt;. Eine der 21 Kommunen, die Kleinstadt Jülich, nutzen wir für unsere Demo-Seite.&lt;&#x2F;p&gt;
&lt;p&gt;Den Importer zu schreiben war komplexer als gedacht und wir hatten kompliziertere &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;meine-stadt-transparent&#x2F;meine-stadt-transparent&#x2F;issues&#x2F;15&quot;&gt;Bugs&lt;&#x2F;a&gt; und &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;meine-stadt-transparent&#x2F;meine-stadt-transparent&#x2F;issues&#x2F;22&quot;&gt;andere Probleme&lt;&#x2F;a&gt;. Eines der gößeren Probleme war und ist die Integration von &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;OParl&#x2F;liboparl&quot;&gt;liboparl&lt;&#x2F;a&gt;, da die Integration von gnome-Bibliotheken wie liboparl in Python deutlich schlechter funktioniert als erwartet. (Bedeutet in der Praxis, dass wir python-gobject&#x2F;gi verwenden müssen, das aber nur mit System-Python und einem symbolischen Link in die virtualenv funktioniert. Das undokumentierte Über-pip-Installieren geht zwar theoretisch, in der Praxis dann aber &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;bugzilla.gnome.org&#x2F;show_bug.cgi?id=784428&quot;&gt;doch nicht&lt;&#x2F;a&gt;)&lt;&#x2F;p&gt;
&lt;h2 id=&quot;die-installation&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#die-installation&quot; aria-label=&quot;Anchor link for: die-installation&quot;&gt;Die Installation&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Meine Stadt Transparent soll sich möglichst einfach aufsetzen lassen. Dafür haben wir eine Schnellstart-Anleitung im readme, die mittels docker compose alle benötigten Dienste (im Moment mariadb, elasticsearch und django) einrichtet, startet und verbindet. Wegen &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;docker&#x2F;compose&#x2F;issues&#x2F;4305#issuecomment-305378202&quot;&gt;schlechten&lt;&#x2F;a&gt; &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;docker&#x2F;compose&#x2F;issues&#x2F;4305#issuecomment-308690795&quot;&gt;Managments&lt;&#x2F;a&gt; bei Docker und dem Datenimport ist immernoch etwas Handarbeit dabei, die wir aber so gering wie möglich halten.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ausblick&quot;&gt;&lt;a class=&quot;zola-anchor&quot; href=&quot;#ausblick&quot; aria-label=&quot;Anchor link for: ausblick&quot;&gt;Ausblick&lt;&#x2F;a&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Wir haben noch etwa zwei Monate bis zum Demoday am 28. Februar. Die Hauptaufgaben bis dahin sind ein OParl-Export, eine Änderungshistorie für alle wichtigen Objekte sowie ein paar interne Umbauten.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Meine Stadt Transparent, Teil 1: Was bisher geschah</title>
        <published>2017-12-20T00:00:00+00:00</published>
        <updated>2017-12-20T00:00:00+00:00</updated>
        
        <author>
          <name>
            
              Konstantin Schütze
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="/meine-stadt-transparent-teil-1/"/>
        <id>/meine-stadt-transparent-teil-1/</id>
        
        <content type="html" xml:base="/meine-stadt-transparent-teil-1/">&lt;p&gt;Seit fast vier Monaten entwickeln &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;hoessl.eu&quot;&gt;Tobias Hößl&lt;&#x2F;a&gt; und ich &lt;em&gt;Meine Stadt Transparent&lt;&#x2F;em&gt; (&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;meine-stadt-transparent.de&#x2F;&quot;&gt;Demo&lt;&#x2F;a&gt;, &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;meine-stadt-transparent&#x2F;meine-stadt-transparent&quot;&gt;GitHub&lt;&#x2F;a&gt;). In diesem ersten Blogpost geht es darum, wie es dazu kam und warum wir dieses Projekt machen. Im zweiten Teil wird es dann um das eigentliche Projekt und dessen technische Details gehen.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;&#x2F;meine-stadt-transparent-teil-1&#x2F;Aktenstapel-CC0.jpg&quot; alt=&quot;Aktenstapel, CCO, https:&#x2F;&#x2F;pixabay.com&#x2F;de&#x2F;dateien-papier-b%C3%BCro-papierkram-1614223&#x2F;&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Stadträte, Beezirksausschüsse und Gemeindräte produzieren seit langem große Mengen Papier. Der Münchner Stadtrat und die 25 Bezirksauschüsse allein haben bespielweise im Jahr 2015 wurden über 15.000 Dokumente mit mehr als 64.000 Seiten veröffentlicht.&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-1-1&quot;&gt;&lt;a href=&quot;#fn-1&quot;&gt;1&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; Um der großen Anzahl an Dokumenten Herr zu werden hat man in den 2000ern einen großen Teil der Stadtratsverwaltung digitalisiert. Dazu wurden spezielle Webportale gebaut, die &lt;em&gt;Ratsinformationssysteme&lt;&#x2F;em&gt; genannt werden. Während Städte besonders in der Anfangszeit auf Eigenentwicklungen oder Spezialanfertigungen gesetzt haben, haben sich mittlerweile fertige Systeme und eine handvoll Anbieter etabliert. Durch solche fertigen Systeme setzen mittlerweile auch immer mehr kleine Kommunen oder Kommunalverbände ein Ratsinformationssystem ein.&lt;&#x2F;p&gt;
&lt;p&gt;Deren öffentliche Oberfläche ist oft die einzige Möglichkeit, an wichtige Dokumente zu kommen. Die Angelenheiten des eigenen Stadt- oder Gemeinderats betreffen eine mehr, als man oft erwartet. Dort wird z.B. jegliche Bauvorhaben, Schulen und Kitas, den ÖPNV sowie Unterstützung für eine Vielzahl von Vereinen und Initiativen entschieden. Leider sind die Webseiten als Informationsquellen für Bürger nicht zu gebrauchen: Die Oberflächen sind völlig veraltet, voll mit bürokratischen Fachbegriffen und außerdem voller technischer Fehler. Das größte Problem ist jedoch meisten das Fehlen einer Suche, mit der man etwas findet. Durch diese Probleme werden viele wichtige öffentliche Dokumente für die Öffentlichkeit faktisch unzugänglich.&lt;&#x2F;p&gt;
&lt;p&gt;Das Problem haben Leute in verschiedenen Städten erkannt und versucht, auf eigenen Faust und fast immer ohne Unterstützung der Verwaltungen bessere Webseiten zu bauen. Diese Arbeit fand zu großen Teilen unter dem Dach der &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;codefor.de&#x2F;&quot;&gt;Open Knowledge Labs&lt;&#x2F;a&gt; statt. Leider sind die meisten dieser Projekte nie fertig geworden und irgendwann eingeschlafen. Zwei Projekte waren jedoch erfolgreich und existieren bis heute.&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-2-1&quot;&gt;&lt;a href=&quot;#fn-2&quot;&gt;2&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Das eine ist &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;politik-bei-uns.de&#x2F;&quot;&gt;Politk bei Uns&lt;&#x2F;a&gt;, welches ursprünglich 2012 als Plattform für das Ruhrgebiet veröffentlicht wurde.&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-3-1&quot;&gt;&lt;a href=&quot;#fn-3&quot;&gt;3&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; Mittlerweile kann man unter anderem die Dokumente für Köln, Bochum und die Berliner Bezirke durchsuchen. Das Hauptziel ist es, möglichst viele Städte durchsuchbar und damit für Bürger (insbesondere auch über google und co.) zugänglich zu machen. Die offiziellen Daten werden automatisiert aufbereitet, indem z.B. der Text aus pdfs extrahiert wird und Adressen erkannt werden. (Von Politik bei Uns wird mittlerweile &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;beta.politik-bei-uns.de&#x2F;&quot;&gt;Version 2&lt;&#x2F;a&gt; entwickelt. Aber das ist eine eigene Geschichte.)&lt;&#x2F;p&gt;
&lt;p&gt;Das andere Projekt ist &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.muenchen-transparent.de&#x2F;&quot;&gt;München Transparent&lt;&#x2F;a&gt;. Angefangen hat es als eine bessere Oberfläche für das &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.ris-muenchen.de&quot;&gt;münchner Ratsinformationssystem&lt;&#x2F;a&gt;, die wie Politik bei Uns die Daten aufbereitet und durchsuchbar macht. Dort werden aber nicht nur nicht alle Daten des offiziellen Systems abgebildet, sondern auch zusätzliche Informationen wie Verordnungen, verständliche Erklärungen und Geodaten. Der Clou des Systems sind jedoch die Benachrichtigungen, die Nutzer per E-Mail informieren, wenn es in seiner Umgebung oder zu von ihm abonnierten Themen Neuigkeiten gibt.&lt;&#x2F;p&gt;
&lt;p&gt;Der größte Teil wurde Tobias Hößl entwickelt, erst unter dem Namen &quot;OpenRIS&quot;, dann als &quot;Ratsinformant&quot;. Ende 2014 bin ich zu dem Projekt gestoßen und habe die Seite erweitert und verbessert. Im Februar 2015 haben wir die Seite dann als München Transparent richtig veröffentlicht. Mittlerweile benutzen sogar Angestellte der Stadt und einige Stadträte München Transparent statt des hauseigenen Systems.&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-4-1&quot;&gt;&lt;a href=&quot;#fn-4&quot;&gt;4&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; Mit der Stadt arbeiten wir auch seit längerem gut zusammen, wofür wir sehr dankbar sind.&lt;sup class=&quot;footnote-reference&quot; id=&quot;fr-5-1&quot;&gt;&lt;a href=&quot;#fn-5&quot;&gt;5&lt;&#x2F;a&gt;&lt;&#x2F;sup&gt; Verteilte Zuständigkeiten und langwierige Prozesse machen aber natürlich in der Stadtverwaltung vieles komplizierter machen als bei einem Hobbyprojekt.&lt;&#x2F;p&gt;
&lt;table&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.muenchen-transparent.de&#x2F;&quot;&gt;&lt;img src=&quot;MT-Startseite.png&quot; alt=&quot;Die Startseite von München Transparent&quot;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;td&gt;
&lt;td&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.ris-muenchen.de&#x2F;RII&#x2F;RII&#x2F;ris_startseite.jsp&quot;&gt;&lt;img src=&quot;RIS-Startseite.png&quot; alt=&quot;Die Startseite des offiziellen münchner Ratsinformationssystems&quot;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.muenchen-transparent.de&#x2F;suche?suchbegriff=freifunk&quot;&gt;&lt;img src=&quot;MT-Suche.png&quot; alt=&quot;Die Suche von München Transparent&quot;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;td&gt;
&lt;td&gt;&lt;a href=&quot;https:&#x2F;&#x2F;www.ris-muenchen.de&#x2F;RII&#x2F;RII&#x2F;ris_suche.jsp&quot;&gt;&lt;img src=&quot;RIS-Suche.png&quot; alt=&quot;Die Suche des offiziellen münchner Ratsinformationssystems&quot;&gt;&lt;&#x2F;a&gt;&lt;&#x2F;td&gt;
&lt;&#x2F;tr&gt;
&lt;&#x2F;table&gt;
&lt;p&gt;Links ist München Transparent, recht das offizielle münchner Ratsinformationssystem, jeweils mit Links zu den ensprechenden Seiten.&lt;&#x2F;p&gt;
&lt;p&gt;Beide Projekte nutzen sogenannte &lt;em&gt;Scraper&lt;&#x2F;em&gt;, um die Informationen der offiziellen Seiten auszulesen. Das sind mehr oder weniger kleine Programme, die jede Seite der Oberfläche abrufen, die interessanten Daten extrahieren und die Ergebnisse in einer Datenbank speichern. Aus dieser Datenbank wird dann wiederum die neue Webseite erstellt.&lt;&#x2F;p&gt;
&lt;p&gt;Durch Scraping kommt man zwar an die benötigten Daten, aber eigentlich ist es ziemlich unsinnige Sache: Man steckt viel Arbeit in ein Programm, das im Endeffekt die Datenbank hinter einer Webseite rekonstruiert, in dem es jede Seite dieser Webseite aufruft. Viel einfacher wäre es, wenn man die Daten direkt in maschinenlesbarer Form bekommen könnte, am besten noch für jede Stadt im gleichen Format. Aus diesem Grund ist &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;oparl.org&quot;&gt;OParl&lt;&#x2F;a&gt; entstanden. OParl ist eine maschinenlesbare Schnittstelle, die man mit geringem Aufwand in existierende Ratsinformationssysteme einbauen kann. Dazu werden die Daten verschiedener Städte auf eine gemeinsame Darstellung gebracht und es ist festgelegt, wie man diese Daten effizient kopieren kann.&lt;&#x2F;p&gt;
&lt;p&gt;Über München Transparent wurden wir irgendwann auf OParl aufmerksam, dass damals noch eine ziemliche Baustelle mit immer wieder verschobenen Veröffentlichungsterminen war. Da ich eine solche Schnittstelle für München Transparent haben wollte (und weil ich damals noch zu viel Zeit und hatte), fing ich an mih bei OParl zu beteiligen. Im Juli dieses Jahres haben wir die erste Stabile Version 1.0 veröffentlicht, die im Moment von mehreren großen Anbietern in ihre Ratsinformationssysteme eingebaut wird. Politik bei Uns, München Transparent und 21 Kommunen in Nordrhein-Westfalen haben die Schnittstelle bereits. Eine Version 1.1 mit Fehlerkorrekturen und eine englische Übersetzung sind in Arbeit.&lt;&#x2F;p&gt;
&lt;p&gt;Bei München Transparent haben wir verschiedene Anfragen bekommen, ob wir sowas wie München Transparent denn nicht auch für andere Städte machen könnten, schließlich ist die Oberfläche schon fertig und man bräuchte nur noch die Daten einer anderen Stadt dahinter zu setzen. Leider sind aber viele Teile des Programmcodes und unseres Datenmodells stark auf München angepasst. Dazu kommen andere technische Probleme, so ist das Framework, die technische Grundlage der Webseite, veraltet, und es fehlt an Tests zur Qualitätssicherung und der Code hat sehr viele Altlasten. Unserer Oberfläche schadet das glücklicherweise nicht, aber wir können sie dadurch nicht einfach auf andere Städte übertragen. Außerdem verhindern diese Altlasten langfristig eine vernünftige Weiterentwicklung.&lt;&#x2F;p&gt;
&lt;p&gt;Deshalb hatte ich schon länger die Idee, das ganze mit den bisherigen Erfahrungen komplett neu zu schreiben. Mit einem normalen Alltag hat man nur leider nicht die Zeit dafür, und so blieb das lange nichts als eine schöne Idee. Das hat sich jedoch durch den Prototype Fund geändert. Der &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;prototypefund.de&quot;&gt;Prototype Fund&lt;&#x2F;a&gt; ist eine Initiative der &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;okfn.de&#x2F;&quot;&gt;Open Knowledge Foundation Deutschland&lt;&#x2F;a&gt;, bei der sich Projekte um eine Förderung von (damals) bis zu 30.000€ Euro über 6 Monate bewerben können. Das Geld dazu kommt vom Bundesministerium für Bildung und Forschung. Bei der zweiten Runde im März 2017 habe ich mich mit der Idee &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;prototypefund.de&#x2F;project&#x2F;open-source-ratsinformationssystem&#x2F;&quot;&gt;beworben&lt;&#x2F;a&gt;, ein neues nutzerfreundliches Ratsinformationssystem zu entwickeln, Arbeitstitel &quot;Open Source Ratsinformationssystem&quot;. Dieses soll sich im Gegensatz zu München Transparent in ganz Deutschland einsetzen lässt und auf einem soliden technischen Unterbau stehen. Daraus ist mittlerweile das Projekt &quot;Meine Stadt Transparent&quot; geworden, an dem ich zusammen mit Tobias Hößl seit fast vier Monaten arbeite. Darüber, wie es zur Zeit bei dem Projekt steht und was noch geplant ist, werde ich im zweiten Teil schreiben.&lt;&#x2F;p&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn-1&quot;&gt;
&lt;p&gt;Die Zahlen stammen aus der Datenbank von München Transparent &lt;a href=&quot;#fr-1-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-2&quot;&gt;
&lt;p&gt;Es gibt natürlich auch noch andere Projekte, die erfolgreich ihr Ratsinformationssystem bereichern, wie z.B. &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;offenesdresden&#x2F;dresden-ratsinfo&quot;&gt;Dresden Ratsinfo&lt;&#x2F;a&gt;. Ich meine in diesem Fall aber nur solche, die eine weitestgehend vollständige, bürgernutzbare Oberfläche bauen. &lt;a href=&quot;#fr-2-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-3&quot;&gt;
&lt;p&gt;&lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;openruhr.de&#x2F;2012&#x2F;06&#x2F;22&#x2F;openruhr-offene-daten-fuer-das-ruhrgebiet&#x2F;&quot;&gt;OpenRuhr – offene Daten für das Ruhrgebiet&lt;&#x2F;a&gt; &lt;a href=&quot;#fr-3-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-4&quot;&gt;
&lt;p&gt;In Anträgen und Anfragen wird oft auf andere Dokumente bezug genommen. Dabei wird dann auch gerne München Transparent verlinkt, wie eine &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.muenchen-transparent.de&#x2F;suche?suchbegriff=muenchen-transparent.de&quot;&gt;Meta-Suche&lt;&#x2F;a&gt; auf München Transparent zeigt. &lt;a href=&quot;#fr-4-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li id=&quot;fn-5&quot;&gt;
&lt;p&gt;Dank eines &lt;a rel=&quot;external&quot; href=&quot;https:&#x2F;&#x2F;www.muenchen-transparent.de&#x2F;antraege&#x2F;3757806&quot;&gt;Stadtratsantrags&lt;&#x2F;a&gt; hat diese Zusammenarbeit auch ein politisches Mandat. &lt;a href=&quot;#fr-5-1&quot;&gt;↩&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;&#x2F;section&gt;
</content>
        
    </entry>
</feed>
