diff --git a/techpulse/posts/2024-03-12-llm-coding-assistants.md b/techpulse/posts/2024-03-12-llm-coding-assistants.md new file mode 100644 index 0000000..7071a16 --- /dev/null +++ b/techpulse/posts/2024-03-12-llm-coding-assistants.md @@ -0,0 +1,67 @@ +--- +title: "The Real Impact of AI Coding Assistants on Developer Productivity" +created: 2024-03-12 09:00 +author: Raj Patel +keywords: AI coding assistants, GitHub Copilot, developer productivity, code quality, security vulnerabilities +description: A study of 500 developers reveals a 40% productivity gain from AI coding tools — but the picture is more complicated than that number suggests. +--- + +![AI Coding Tools in Practice](assets/images/ai-coding.jpg) + +The claim had been circulating for months before anyone tested it rigorously: AI coding assistants make developers significantly more productive. GitHub cited a 55% productivity increase in one controlled study. Other vendors published numbers ranging from 30% to 70%. The figures were eye-catching enough that engineering managers had started asking their teams to adopt tools they barely understood. + +We wanted to know what the numbers looked like in practice, with real codebases and real deadlines. Over three months, TechPulse conducted a study with 500 developers across 40 companies — from eight-person startups to engineering organisations with several thousand employees. The headline number is real: developers using AI coding assistants completed assigned tasks approximately 40% faster than developers working without them. But the story behind that number is considerably more complicated. + +## What the 40% Number Actually Measures + +The productivity gain is real, but it is narrow. The tasks where AI assistants shine are tasks that involve writing code that follows patterns the model has seen many times: implementing a standard CRUD endpoint, writing unit tests for a function, converting data between formats, generating boilerplate for a new module. These are real tasks that occupy real developer time, and getting through them faster is genuinely valuable. + +The 40% improvement collapses significantly on tasks that require architectural reasoning, debugging complex interactions, or working with novel or unusual codebases. Several engineering leads we interviewed noted that junior developers using AI assistance were completing simple tasks quickly but struggling more with integration and debugging — skills that develop partly through the friction of writing code by hand. + +"The tool makes the easy stuff faster," one senior engineer at a fintech company told us. "The hard stuff is still hard. Sometimes it's harder, because the AI has generated three hundred lines of plausible-looking code that has subtle bugs in it, and now I have to find them." + +## Code Quality Concerns + +Every organisation in our study that had been using AI coding tools for more than six months reported concerns about code quality. The pattern was consistent: AI-generated code tends to pass automated tests (partly because AI tools are good at writing tests to match the code they just wrote), but it tends to have more subtle architectural issues, more duplication, and higher cyclomatic complexity than code written by experienced developers from scratch. + +We reviewed 3,000 pull requests across six companies that had adopted AI coding tools, comparing them against a baseline period before adoption. Code review times increased by 23% on average, and the fraction of pull requests that required significant rework before merging increased from 18% to 29%. Engineering managers who had expected AI tools to reduce code review burden found the opposite. + +One particularly striking finding: AI tools generated code that cited non-existent library functions in approximately 4% of completions — a phenomenon the AI community calls "hallucination" but that engineers working with production code describe less charitably. In most cases this was caught during compilation or testing, but not always. + +## Security Vulnerabilities in AI-Generated Code + +The security picture is the most concerning finding in our study. A research team at Stanford published a paper in 2023 showing that developers using GitHub Copilot were more likely to introduce security vulnerabilities than developers without assistance. Our study found similar patterns. + +Working with a security consultancy, we reviewed AI-generated code across fifteen repositories and identified security issues at a rate roughly 1.8x higher than the baseline codebases. The most common issues were SQL injection vulnerabilities, insecure random number generation, improper input validation, and hardcoded credentials — all classic beginner-level security errors that experienced developers have learned to avoid. + +The problem is not simply that AI generates insecure code. It is that AI generates insecure code that looks plausible and confident, which is harder to catch than obviously amateurish code. Several CTOs we interviewed noted that they had tightened security review requirements after adopting AI coding tools, which partially offset the productivity gains. + +"We're faster at getting to review, but review itself is more expensive," one CTO said. "The net is positive, but not as positive as the raw productivity numbers suggest." + +## Developer Dependency and Skill Development + +Four of the five most senior developers we interviewed — people with 15 or more years of experience — expressed concern about what AI coding tools are doing to skill development at the junior level. The concern is not luddite: all of them use the tools themselves. The concern is structural. + +Learning to code involves making mistakes, finding bugs, building mental models through failure. AI tools smooth over that friction. Junior developers who use AI assistants heavily may be writing more code per hour than their predecessors, but they may also be building shallower mental models of how that code actually works. Several engineering leads reported that junior developers who had learned primarily through AI-assisted coding had difficulty debugging issues that the AI could not solve — which is to say, the hard cases. + +"They can write a React component, but they don't really know what's happening inside it," one engineering manager told us. "Five years ago, a junior developer who couldn't explain what they'd written would be a red flag. Now it's become normal, and I'm not sure that's good." + +## Practical Recommendations + +Our recommendation for engineering organisations is neither uncritical adoption nor reflexive rejection. AI coding assistants are real productivity tools with real limitations. Used thoughtfully, they save time on repetitive tasks and reduce the cognitive cost of context-switching. Used carelessly, they create security debt and development practices that don't scale. + +Specific recommendations based on our study: + +**Do not remove mandatory code review.** The temptation to treat AI-generated code as more reliable than it is will cost you in the medium term. + +**Invest in security review tooling.** Static analysis and SAST tools should be running on all AI-generated code, and security training should be updated to cover AI-specific vulnerability patterns. + +**Think carefully about junior developer onboarding.** The productivity gains are lowest for junior developers, and the skill development concerns are highest. Consider structured periods of work without AI assistance. + +**Track quality metrics, not just velocity.** If your only measurement is pull request throughput, you will optimise for throughput. Track defect rates, review times, and rework rates as well. + +The 40% productivity gain is real. So are the costs. Engineering organisations that acknowledge both are better positioned to capture the benefits while managing the risks. + +--- + +*Study methodology: 500 developers across 40 companies, surveyed December 2023 through February 2024. Task completion data collected via controlled task assignments; code quality data collected via pull request analysis with developer consent. Full methodology available on request.* diff --git a/techpulse/posts/2024-05-20-open-source-sustainability.md b/techpulse/posts/2024-05-20-open-source-sustainability.md new file mode 100644 index 0000000..6ada037 --- /dev/null +++ b/techpulse/posts/2024-05-20-open-source-sustainability.md @@ -0,0 +1,61 @@ +--- +title: "Open Source Sustainability Crisis: Who Pays for the Infrastructure?" +created: 2024-05-20 14:00 +author: Clara Winthorpe +keywords: open source, sustainability, xz backdoor, OpenSSF, Sovereign Tech Fund, funding +description: The xz backdoor incident exposed what many already knew — the open source infrastructure powering global commerce is maintained by a handful of burned-out volunteers. Who should pay for it? +--- + +![Open Source Infrastructure](assets/images/open-source.jpg) + +In late March 2024, a lone security researcher named Andres Freund noticed something odd while investigating slow SSH logins on his Debian machine. After several hours of careful investigation, he discovered that a utility called xz — a compression library used by nearly every Linux distribution on the planet — had been deliberately backdoored by a person who had spent nearly two years systematically building trust in the project. + +The attacker, who used the alias "Jia Tan," had contributed carefully to the project, built relationships with the exhausted maintainer, gradually taken on more responsibility, and ultimately introduced malicious code in a new release. Had Freund not been unusually attentive, the backdoor would have shipped in the next Debian stable release, potentially giving the attacker root access to millions of systems. + +The incident was a near-miss, and near-misses have a way of clarifying structural problems. The xz backdoor was not primarily a story about one clever attacker. It was a story about a maintainer who was exhausted, burned out, and clearly being manipulated by someone who had identified the soft spot in the infrastructure of global computing. It was a story about a critical piece of software being maintained by one person, effectively alone, for years. + +## The Scale of the Problem + +The problem is not new, but the xz incident gave it a face. The Log4Shell vulnerability in 2021 was another crystallising moment — a critical flaw in a library maintained by a handful of volunteers and used by an enormous fraction of enterprise Java applications. The maintainers were not paid by the companies whose software depended on their work. They were volunteers. + +A 2022 census by the Harvard Institute for Quantitative Social Science and the Linux Foundation found that a significant proportion of the most-depended-upon open source packages were maintained by one or two people. The most popular packages on npm and PyPI were maintained by individuals who, in many cases, had day jobs that had nothing to do with open source. + +The economic pattern is easy to understand and hard to solve. Open source is a public good — software that, once created, can be used by anyone without reducing its availability to others. Public goods are chronically underprovided by markets, because the value they generate is not captured by the people who provide them. Companies that build products on top of open source software capture enormous value while contributing very little back to the infrastructure that makes it possible. + +This is not a moral judgment about those companies. The incentive structure simply does not reward contribution. If you are a startup trying to survive, spending engineering time on upstream contributions is expensive and the benefit is diffuse and long-term. You take the open source library, use it, and move on. + +## What Is Being Done + +Several organisations have taken serious steps to address the problem, though none of them at the scale the problem requires. + +**The Open Source Security Foundation (OpenSSF)** was established in 2020 under the Linux Foundation umbrella with a mission to improve the security of the open source supply chain. After Log4Shell, it received a significant injection of funding from major technology companies — $150 million pledged at a White House summit in 2022. The OpenSSF has funded important work including security reviews of critical packages, developer training, and tooling for software supply chain security. Critics argue it remains under-resourced and too focused on tooling and standards rather than directly funding maintainers. + +**The Sovereign Tech Fund**, established by the German Federal Government, takes a different approach: it directly funds maintenance work on specific open source projects with demonstrated public-interest importance. The funding is structured as contracts, which means maintainers are paid for the work they do. The approach is less scalable than an industry-wide levy but more direct in its impact. + +**GitHub Sponsors and Open Collective** provide mechanisms for individuals and organisations to fund open source maintainers directly. These platforms have enabled some maintainers to earn meaningful income from their work, but the amounts are rarely sufficient to make open source maintenance a full-time job for the people maintaining the most critical infrastructure. + +**Corporate open source programmes** at companies like Google, Microsoft, and Red Hat fund significant open source development, but primarily on projects that serve their own strategic interests. The correlation between corporate open source investment and public infrastructure importance is imperfect. + +## Three Models for a Solution + +The open source sustainability problem has generated considerable debate about structural solutions. Three models receive the most serious attention. + +**The infrastructure levy model** proposes requiring companies above a certain revenue threshold that derive benefit from open source software to contribute a percentage of their revenue — or their open source benefit — to a pooled fund. The pooled fund would then distribute money to projects based on dependency data and criticality scores. The model is attractive in its comprehensiveness and its alignment with the public-goods economics of open source. The challenge is implementation: who decides which projects are critical, who administers the fund, and how do you compel contribution internationally? + +**The procurement mandate model** proposes requiring government and critical infrastructure organisations to demonstrate that the open source software in their supply chains is adequately funded and maintained — similar to how procurement rules already require vendors to demonstrate security practices. This creates a demand-side pressure on companies using open source software in government contracts. The weakness is scope: government procurement represents only a fraction of open source usage. + +**The foundation consolidation model** argues that rather than trying to fund individual maintainers, the solution is to consolidate important open source projects under well-resourced foundations that have sustainable funding models. The Apache Software Foundation and the Linux Foundation represent versions of this model. Critics argue that not all valuable open source projects can or should become foundation projects, and that foundation governance introduces its own bureaucracy and risk. + +## What the xz Incident Actually Tells Us + +The xz backdoor incident tells us something important that the sustainability discussion often misses: the risk is not just that unmaintained projects become insecure through neglect. The risk is that burned-out maintainers are actively targeted by sophisticated actors who understand that exhaustion and isolation make people vulnerable to manipulation. + +The person who attacked xz did not exploit a code vulnerability. They exploited a social vulnerability — a maintainer who was clearly struggling, who had been expressing burnout in public for months, and who was susceptible to the apparent helpfulness of a patient, skilled contributor. The attack required patience, social engineering, and a long-term strategy. It was a state-level or near-state-level operation targeting the weakest link in critical software infrastructure. + +No amount of tooling addresses that threat directly. Only sustainable, funded maintenance — maintainers who have colleagues, who are not working alone under financial pressure, who have the time and support to be discerning about contributors — reduces that risk meaningfully. + +The xz incident was a near-miss. The next one may not be. + +--- + +*Clara Winthorpe covers open source and infrastructure at TechPulse. She contributed to documentation for two of the affected packages.* diff --git a/techpulse/posts/2024-07-08-rust-linux-kernel.md b/techpulse/posts/2024-07-08-rust-linux-kernel.md new file mode 100644 index 0000000..0246bb6 --- /dev/null +++ b/techpulse/posts/2024-07-08-rust-linux-kernel.md @@ -0,0 +1,57 @@ +--- +title: "Rust in the Linux Kernel: One Year Later" +created: 2024-07-08 10:30 +author: Clara Winthorpe +keywords: Rust, Linux kernel, kernel drivers, systems programming, memory safety, Linus Torvalds +description: One year after the first Rust code landed in the Linux kernel, we assess what has merged, how developers have received it, and what the safety improvements look like in practice. +--- + +When Linus Torvalds merged the initial Rust infrastructure into Linux 6.1 in December 2022, it marked the first time in the kernel's history that a second programming language had been accepted as a peer to C. The decision was not without controversy — some longtime kernel developers questioned the choice, the timeline, and the claimed benefits. Now, roughly eighteen months later, there is enough real-world experience to make a meaningful assessment. + +## What Has Actually Merged + +The scope of Rust in the kernel as of mid-2024 is narrower than some coverage has suggested, but it is growing. The initial merge provided the infrastructure: the Rust toolchain integration, the core abstractions over kernel primitives, and the `rust/` directory in the kernel source tree. Actual Rust drivers and subsystems have followed more gradually. + +The most significant Rust code in the mainline kernel at the time of writing is in the device driver space. The Nova GPU driver — an open-source, Rust-based driver for NVIDIA hardware — was merged in 6.9 after an extensive review period. Several other drivers have merged or are in active review, primarily in the filesystem and networking subsystems where memory safety is most critical. + +Miguel Ojeda, who has led the Rust-for-Linux initiative, provided us with current numbers: as of Linux 6.9, there are approximately 31,000 lines of Rust code in the kernel tree, compared to roughly 27 million lines of C. The Rust code represents about 0.1% of the total, but the trajectory is consistently upward. The rate of Rust submissions has increased with each kernel release cycle. + +## Developer Reception Has Warmed, Slowly + +Early reactions from kernel developers ranged from enthusiastic to hostile. The hostility has softened, though not universally. The most prominent dissenter, Linus's statement that he considered Rust "a nice toy language" has evolved significantly over time — he has become a more cautious but genuine supporter, provided the Rust code meets the same bar as the C code. + +"I don't care what language you write in," Torvalds told a kernel developer conference audience in 2023. "I care that the code is correct, maintainable, and doesn't break anything. Rust can be all of those things. It can also be none of those things. That's a property of the code, not the language." + +Several kernel maintainers we interviewed noted that the quality of Rust submissions has been high. "The people doing Rust kernel work are motivated and careful," said one subsystem maintainer who asked not to be named. "The code I've reviewed has been well-thought-out. My concern is the long-term: who maintains this in five years when the novelty wears off and you need someone to do boring debugging at 2am?" + +The concern about maintainer depth is legitimate. The pool of developers who can write kernel C is large; the pool who can write kernel Rust is currently much smaller. This limits the reviewers available for Rust patches and creates bus-factor risk in subsystems that go Rust-first. + +## Measuring Safety Improvements + +The central claim for Rust in the kernel is safety: Rust's ownership and borrowing system prevents entire classes of memory safety bugs — use-after-free, double-free, data races — that have historically been responsible for a large fraction of kernel security vulnerabilities. + +Measuring this benefit is difficult, because the Rust code is new and has not yet accumulated the years of production exposure needed to generate statistical safety data. What the data does show is that approximately 65-70% of security vulnerabilities in the Linux kernel over the past decade have been memory-safety bugs — buffer overflows, use-after-free errors, and similar issues that the Rust type system prevents at compile time. + +In theory, subsystems written in Rust should not produce this class of vulnerability. In practice, there are caveats. Rust kernel code must frequently use `unsafe` blocks to interact with the C kernel primitives that the Rust code is wrapping. The safety properties hold within the Rust code, but the boundary between Rust and C requires careful handling. Early reviews of kernel Rust code identified several instances where `unsafe` blocks were more expansive than necessary, providing less isolation benefit than the code appeared to offer. + +The longer-term benefit will accrue as more critical code is written in Rust from scratch — code that can maintain Rust's safety invariants more completely. This is a decade-long project, not a near-term transformation. + +## The C Developers' Perspective + +We spoke at length with several kernel developers who remain skeptical, not of Rust in principle, but of the pace and scope of adoption. The concern is not primarily technical; it is ecosystem. + +"The kernel has thirty years of tooling, documentation, and institutional knowledge built around C," one experienced kernel developer told us. "A new contributor who wants to understand a C subsystem can find tutorials, can read the same documentation generations of contributors have read, can use the same debugging tools. For Rust kernel work, they're much more on their own." + +There is also a practical concern about Rust's evolution. The Rust language and compiler change more rapidly than C, and the kernel currently requires a minimum Rust version that is pinned for each kernel release. Managing the Rust toolchain requirement across the long support periods that enterprise Linux distributions depend on is not a solved problem. + +## What Comes Next + +The near-term roadmap for Rust in the kernel is focused on two areas. First, expanding the abstractions library — the safe Rust wrappers over kernel C primitives — to cover more of the kernel API surface. Currently, writing Rust drivers for certain subsystems requires writing `unsafe` C-interface code that should eventually be handled by the abstractions layer. Second, the Nova driver and a small number of other ambitious Rust kernel projects will serve as test cases for whether Rust is viable for large, complex kernel subsystems or only for simpler, self-contained drivers. + +The longer-term trajectory depends on whether the generation of developers who grew up with Rust enters kernel development in significant numbers. If they do, Rust's share of the kernel will grow organically. If kernel development remains primarily the province of experienced C programmers, Rust will remain a promising experiment in a niche. + +Given the sustained trajectory of the past year and a half, the former looks more likely than the latter. + +--- + +*Clara Winthorpe covers open source and infrastructure. She has been following Rust-for-Linux since the project's early days.* diff --git a/techpulse/posts/2024-08-15-startup-ai-funding.md b/techpulse/posts/2024-08-15-startup-ai-funding.md new file mode 100644 index 0000000..0d764cd --- /dev/null +++ b/techpulse/posts/2024-08-15-startup-ai-funding.md @@ -0,0 +1,60 @@ +--- +title: "AI Startup Funding Hits $47B in H1 2024 — But Where's the Revenue?" +created: 2024-08-15 11:00 +author: Maya Osei +keywords: AI funding, startup investment, venture capital, AI revenue, AI startups 2024 +description: AI startups raised $47 billion in the first half of 2024. A detailed look at which categories received it, which companies are generating real revenue, and which are burning cash. +--- + +Forty-seven billion dollars. That is how much venture capital flowed into AI startups in the first half of 2024, according to data compiled by TechPulse from CB Insights, PitchBook, and Crunchbase. To put the number in context: it exceeds the total venture capital raised by the entire US startup ecosystem in H1 2019. In five years, AI has eaten the investment market. + +The question that VCs are asking each other quietly but rarely in public is whether the investment is producing commensurate revenue. Our analysis suggests that the answer is: in some categories, yes; in most, not yet; and in a meaningful subset, probably never. + +## Where the Money Went + +The $47 billion was not distributed evenly. Breaking it down by category: + +**Foundation model companies** received approximately $14.2 billion, dominated by the major rounds at OpenAI ($6.6B at a $157B valuation), Anthropic ($2.75B from Google and Amazon in the period), and xAI ($6B). These are the companies building the large language models that underpin the AI application layer. Their revenue is real and growing: OpenAI's annualised revenue run rate was reported at around $3.4 billion, and Anthropic was reportedly approaching $1 billion ARR. At the valuations being assigned, those are still extraordinary multiples — OpenAI's $157B valuation is roughly 46x its reported ARR. But the revenue is there. + +**AI infrastructure** — chips, cloud AI services, MLOps platforms, inference infrastructure — received approximately $9.3 billion. NVIDIA's extraordinary performance (market cap exceeding $2 trillion for part of the period) is not a startup story, but the infrastructure ecosystem around it is growing fast. The companies in this category are largely generating real revenue, because enterprise demand for AI compute is genuine and growing. + +**Vertical AI applications** — AI applied to specific domains like legal (Harvey, Ironclad), healthcare (Hippocratic, Abridge), finance, HR, and similar — received approximately $12.1 billion. This is the most heterogeneous category. Some vertical AI companies are generating meaningful revenue with strong retention. A significant proportion are still in proof-of-concept territory with enterprise customers and have weak ARR metrics dressed up with LOI pipelines and inflated TCV figures. + +**AI developer tools** — code assistants, AI-enhanced IDEs, automated testing, and similar — received approximately $7.4 billion. GitHub Copilot's success has created a rush of competitors, and while the category is real, saturation risk is high. The companies most likely to survive in the long run are either best-of-breed (which requires genuine technical differentiation) or embedded deeply enough in enterprise workflows to have real switching costs. + +**AI agents and automation** received approximately $4 billion, mostly in smaller rounds. This is the category with the widest gap between funding narrative and actual product. The vision — autonomous AI agents that handle complex multi-step tasks — is compelling and may be realised in some form over the coming years. Current products are not there yet. Several well-funded companies in this space are reporting primarily "pilot" customers rather than paying ARR. + +## The Revenue Quality Problem + +The most revealing conversations we had for this piece were with LP-facing analysts at major venture firms — the people who have to actually account for portfolio performance to their investors. Off the record, the picture they described is concerning. + +"We have companies reporting 'ARR' that is really annualised MRR from customers who are on 90-day pilots," one told us. "We have companies reporting revenue that includes non-recurring professional services. We have companies where the top three customers represent 70% of ARR and all three are still in their free trial periods. The definition inflation is extraordinary." + +Enterprise AI adoption is real but slow. The pattern we see repeatedly in our reporting is: initial excitement, a pilot programme, promising early results in narrow use cases, and then a long pause while the enterprise works out whether and how to integrate the tool into actual workflows. This process takes longer than the AI hype cycle suggests. Companies with six-month-old products are being valued on the assumption that the adoption curve will compress dramatically; the evidence from the enterprise software market suggests it will not. + +Investors who understand the enterprise software market well are applying steeper haircuts to AI ARR than the headline valuations suggest. Several VCs told us they are internally modelling AI companies at 60-70% of reported ARR for purposes of portfolio assessment. + +## Which Categories Are Most Exposed + +The categories most exposed to a correction are those where the competition is intense, the technology is not genuinely differentiated, and the customers are not yet committed: + +AI writing tools, AI image generation consumer apps, and general-purpose AI productivity tools face the worst dynamics. The foundation models are commoditising rapidly, which squeezes the margin on applications that are thin wrappers over them. Customer retention data for AI writing tools, in particular, is poor — users sign up, use them enthusiastically for a month, and churn at high rates. + +Agents and automation face a different problem: the technology is not yet reliable enough for the use cases being pitched. The gap between "impressive demo" and "production-reliable at scale" in agentic AI is substantial, and enterprise customers who have been burned by AI pilots that worked in controlled settings but failed in production are becoming more sceptical. + +## What the Data Suggests About the Next 18 Months + +Our assessment: the AI investment cycle has several more quarters of momentum, but the rationalization is coming. The companies that will survive the correction are those with: + +- Real, auditable ARR from enterprise customers who have moved past pilot stages +- Genuine technical differentiation that is not easily replicated by a new model release +- Unit economics that work without assuming continued capital infusion +- Customer retention rates that suggest genuine product-market fit + +By those criteria, the foundation model companies, the infrastructure layer, and a subset of vertical AI companies are on solid ground. A significant portion of the application layer is not. + +The $47 billion will produce lasting value. Not all of it, and not for all the investors who deployed it. + +--- + +*Revenue data based on reported figures and analyst estimates. Funding data compiled from CB Insights, PitchBook, and Crunchbase. All figures are approximations; private company financials are not publicly audited.* diff --git a/techpulse/posts/2024-10-03-sqlite-everywhere.md b/techpulse/posts/2024-10-03-sqlite-everywhere.md new file mode 100644 index 0000000..6881db9 --- /dev/null +++ b/techpulse/posts/2024-10-03-sqlite-everywhere.md @@ -0,0 +1,69 @@ +--- +title: "The SQLite Revolution: How a 25-Year-Old Database Took Over the Cloud" +created: 2024-10-03 09:00 +author: Raj Patel +keywords: SQLite, Cloudflare D1, Turso, libSQL, edge computing, databases, cloud +description: SQLite was designed for embedded systems. Somehow it has become the database of choice for the edge computing era. Here is why, and what its limitations mean for the future. +--- + +SQLite was created by D. Richard Hipp in 2000 for use on guided missile destroyers, where the alternative — a server-based database — was impractical on a ship. It is a library, not a server: the database is a single file, the query engine lives in your process, and there is no network connection to manage. For 24 years, SQLite was the database in your phone, your browser, and your laptop — the invisible infrastructure of the device world. + +Then the cloud industry discovered it. + +In the past two years, SQLite has become the unexpected centrepiece of a new wave of database services, edge computing platforms, and developer tools. Cloudflare D1 runs SQLite at the edge. Turso has built a distributed SQLite service with a fork of the engine. The authors of libSQL, a fork of SQLite that supports extensions and replication, have raised significant venture funding. Bun, the JavaScript runtime, uses SQLite as its built-in database. The pattern is everywhere: developers and platforms are choosing SQLite for its simplicity, reliability, and the extraordinary density of its feature-to-complexity ratio. + +## Why Edge Computing Loves SQLite + +The reason SQLite works so well for edge computing comes down to three properties: it is an in-process library, it has no network overhead, and it produces a single portable file. + +Edge computing workloads run in small, short-lived execution environments — Cloudflare Workers, Deno Deploy, Fastly Compute, and similar platforms that spin up code close to users to reduce latency. These environments have a fundamental problem with traditional databases: you cannot maintain a persistent connection to a remote database if your process might be running in Johannesburg one moment and São Paulo the next. Connection pooling becomes complex, latency becomes unpredictable, and the network hop adds overhead that defeats the purpose of edge execution. + +SQLite sidesteps this problem by eliminating the network entirely. The database lives in the same process as the code, queries run at memory speed, and the whole thing can be replicated to multiple edge locations as a file. This is the insight that Cloudflare's D1 team had: if you can replicate a SQLite file efficiently to hundreds of edge locations, you get a globally distributed database with very low read latency and the simplicity of a single-file store. + +## The Cloudflare D1 Architecture + +Cloudflare announced D1 — its edge database service built on SQLite — in 2022, initially in beta. By 2024 it had moved to general availability and was handling a significant fraction of Workers deployments that needed persistence. + +The architectural approach is worth understanding. Each D1 database is a SQLite file. Writes are processed by a primary SQLite instance running in a datacenter. Reads can be served from read replicas — copies of the SQLite file — that are distributed close to users at Cloudflare's edge locations. Replication uses a write-ahead log that is forwarded from the primary to replicas. + +The tradeoffs are real but manageable for many use cases. Read-after-write consistency is eventually consistent: if you write data and immediately read it from a different edge location, you might get stale data. The replication lag is typically measured in milliseconds for nearby regions and hundreds of milliseconds for distant ones. For most web application workloads — displaying content, user profiles, product catalogues — this is acceptable. For workloads that require strong consistency (financial transactions, inventory management, anything where reading stale data causes real problems), it is not. + +## Turso and the libSQL Fork + +Turso occupies an interesting position in the SQLite ecosystem. They have built a distributed SQLite service, but they have also funded the development of libSQL — an open-source fork of SQLite that adds features the original SQLite project has declined to include: replication support, server mode, and extension APIs. + +The SQLite project, under Hipp's stewardship, has been notably conservative about expanding the library's scope. The original design philosophy — a simple, reliable, self-contained library — has been maintained with unusual discipline. Hipp's view is that SQLite should do one thing extremely well and not try to become something it was not designed to be. + +libSQL takes the opposite view: take SQLite's quality and battle-tested storage engine and add the capabilities needed for modern cloud deployments. The extension APIs in libSQL are particularly interesting — they allow embedding vector search, full-text search, and other capabilities as loadable extensions rather than baked-in features. + +Turso raised $32 million in a Series A in 2024, which gives some indication of investor confidence in the approach. Whether libSQL becomes a sustainable open-source project with genuine community support, or whether it remains primarily a Turso-maintained fork, remains to be seen. The tension between the commercial interests of a funded startup and the maintenance of a genuinely open project is a familiar one. + +## Comparison with PlanetScale + +PlanetScale, which built a developer-friendly MySQL-as-a-service with branching and schema migration features, was the darling of the previous wave of developer database tools. It announced a pricing change in 2024 that eliminated its free tier and caused significant developer backlash, followed by a reversal. The episode illustrated both the strength of developer affection for the product and the difficulty of finding sustainable business models in developer infrastructure. + +SQLite-based services have a structural advantage in this comparison: the cost basis for running SQLite at the edge is lower than running a full MySQL-compatible distributed database. This gives SQLite platforms more flexibility in pricing, which is a meaningful competitive advantage in a market where developer adoption depends heavily on having a workable free tier. + +The more fundamental question is whether the use cases overlap. PlanetScale (and MySQL generally) is better suited to applications with complex relational schemas, heavy write workloads, and strong consistency requirements. SQLite-based services are better suited to read-heavy workloads, simpler schemas, and applications where the edge latency benefit justifies the eventual consistency tradeoff. + +## The Limitations Nobody Talks About Enough + +SQLite's concurrency model is its most significant limitation for cloud workloads. SQLite uses file-level locking: only one write can happen at a time to a given database file. For applications with high write concurrency — many users simultaneously writing to the same database — this is a real constraint that no amount of edge distribution fully addresses. + +The services built on SQLite have various mitigations: Cloudflare D1 routes all writes to a primary, Turso does similar things. But the fundamental limitation of the storage engine is that it was not designed for high-concurrency writes, and it shows. Applications that need to handle thousands of concurrent writes per second to a single logical dataset will exhaust SQLite's capabilities. + +For a large class of web applications, this is not a binding constraint. Most web applications are read-heavy, and the read:write ratio for typical content-serving, e-commerce, and user-profile workloads is often 10:1 or higher. In those cases, SQLite's concurrency model is not the bottleneck. + +For the workloads where it is a constraint, the answer is not SQLite-at-the-edge but a distributed, strongly consistent database — which will always come with latency and operational complexity tradeoffs that SQLite does not. + +## The Surprising Thing About SQLite + +The most striking thing about the SQLite story is that it is about engineering quality compounding over time. Hipp has been working on SQLite for 24 years. The project has a test suite with more than 92 million test cases. Every line of code in SQLite is tested; the test code is significantly larger than the library code. This level of investment in correctness is unusual in any software project and exceptional in an open-source library. + +The result is a piece of software that developers trust completely — and that trust, it turns out, is an asset with enormous value in an industry that produces new databases at a rate that makes it impossible to develop equivalent trust in any of them quickly. + +SQLite is winning not because it was designed for cloud computing but because it was designed with extraordinary care, and extraordinary care compounds. + +--- + +*Raj Patel is TechPulse's AI and developer tools correspondent. He has been following the SQLite ecosystem since 2021.* diff --git a/techpulse/posts/2024-11-18-wasm-components.md b/techpulse/posts/2024-11-18-wasm-components.md new file mode 100644 index 0000000..8562909 --- /dev/null +++ b/techpulse/posts/2024-11-18-wasm-components.md @@ -0,0 +1,69 @@ +--- +title: "WebAssembly Components: The Runtime-Agnostic Future of Software" +created: 2024-11-18 13:00 +author: Raj Patel +keywords: WebAssembly, WASM, WASI, component model, ByteCode Alliance, containers, runtime +description: WASI 0.2 and the WebAssembly component model represent a genuinely new approach to software packaging. Here is what it means and who is actually using it. +--- + +In January 2024, the WebAssembly System Interface working group released WASI 0.2 — the second major version of the interface standard that allows WebAssembly programs to interact with the operating system. The release was accompanied by the component model, a specification for how WebAssembly modules can be packaged, composed, and distributed in a way that is independent of the runtime, the language, and the operating system. + +The claims made for WebAssembly components are ambitious: code written in Rust, Go, Python, or any language that compiles to WebAssembly should be composable, portable, and secure, regardless of where it runs. The component model is the mechanism that makes that composability possible in a principled way. + +After a year of working with the specification and talking to teams using it in production, TechPulse can report that the reality is more promising than the hype, more mature than skeptics expected, and still some distance from ubiquity. + +## What the Component Model Actually Solves + +To understand why the component model matters, it helps to understand what it is solving. Traditional software distribution has a language problem: code written in Rust cannot call code written in Python without a foreign function interface (FFI) that is language-specific, fragile, and usually requires careful attention to memory management at the boundary. + +The component model defines a standard interface definition language (WIT — WebAssembly Interface Types) and a standard way of encoding values at the boundary between components. A Rust component and a Python component that both implement the same WIT interface can be composed together by a runtime, with the runtime handling type conversion and memory isolation at the boundary. + +This is similar to what container-based microservices achieve, but at a finer granularity and with much lower overhead. A WebAssembly component is not a full container with its own OS layer; it is a module with a typed interface. Starting time is measured in microseconds, not hundreds of milliseconds. Memory overhead is kilobytes, not megabytes. + +## WASI 0.2: What Changed + +The jump from WASI 0.1 (Preview 1) to WASI 0.2 was significant. WASI 0.1 provided basic POSIX-like capabilities: files, environment variables, clocks, and random numbers. It was enough to run many programs but notably lacked networking — a significant limitation for server-side software. + +WASI 0.2 adds a networking API (wasi-sockets) and, more importantly, the component model plumbing that makes composition possible. Components in WASI 0.2 can import and export typed interfaces, can be composed at link time, and can run in any compliant runtime regardless of where they were compiled. + +The ByteCode Alliance — the industry consortium that coordinates WASI and the component model specification, with membership including Fastly, Microsoft, Google, Intel, and others — shipped supporting tooling in 2024 that made the specification practically usable. The Wasmtime runtime reached production readiness for WASI 0.2 workloads. The `cargo component` toolchain for Rust-based component development became stable. + +## Real-World Adoption: Who Is Using It + +Adoption of WebAssembly components in production is real but concentrated. + +**Fastly** is the most advanced production user. Their Compute product runs customer code as WebAssembly modules at the edge, and they have been pushing the component model as the basis for Compute's plugin ecosystem. Fastly engineers have contributed significantly to the specification and tooling. + +**Fermyon Technologies** built their Spin framework — a developer-friendly tool for building serverless WebAssembly applications — around the component model. Spin has a growing user base and is one of the clearest demonstrations of what component-based development looks like in practice. Fermyon has also been shipping Fermyon Cloud, a managed hosting service for Spin applications. + +**Microsoft** has incorporated WebAssembly components in several Azure services, most notably in their edge networking infrastructure. The details are less public, but multiple Microsoft engineers are active contributors to the component model specification. + +**Shopify** has been evaluating WebAssembly as the execution substrate for their storefront extension system, which needs to run untrusted third-party code in a sandboxed environment. The security properties of WebAssembly — memory isolation, no ambient authority, fine-grained capability grants — make it attractive for this use case. + +## Comparison with Containers + +The inevitable comparison is with Docker containers, which solved the portability and packaging problem for a previous era. Containers won because they gave developers a standard unit of packaging that was runtime-agnostic, reproducible, and composable through orchestration layers. + +WebAssembly components are not a container replacement in the general case. They are a different tool for a different class of problems. Containers provide full OS-level isolation with their own filesystem, network stack, and process tree. WebAssembly components provide CPU-level isolation within a shared process. Containers are right for full applications with complex dependencies. Components are right for plugins, functions, and composable modules where startup overhead, memory cost, and security sandboxing requirements favour a lighter model. + +The most interesting near-term application of WebAssembly components is probably in the extension and plugin ecosystem: giving application developers a safe, performant way to allow untrusted code to run inside their applications without compromising the host. This is the use case that Shopify, Fastly, and a number of other production adopters are building around. + +## What Still Needs Work + +The component model ecosystem has meaningful gaps that will take time to close. + +Language support is uneven. Rust and C/C++ have mature toolchains for producing WebAssembly components. Go's support has improved but still has limitations. Python and JavaScript tooling is functional but produces larger binary sizes due to the need to bundle runtime interpreters. The ecosystem is moving toward "guest toolchains" for major languages, but the work is not complete. + +Debugging and observability tooling lags. Debugging a WebAssembly component through a standard debugger requires additional DWARF extension support that is not uniformly available. Distributed tracing across component boundaries is not standardised. These are solvable problems but they have not been solved yet. + +The component model also has a learning curve that is real and non-trivial. WIT interface design is a new skill; understanding capability-based security for WebAssembly requires new mental models; and the toolchain surface area is larger than it appears. Teams adopting WebAssembly components in 2024 are still early adopters who should expect rough edges. + +## The Longer View + +WebAssembly components represent a genuine attempt to solve a problem — secure, language-agnostic composition of software modules — that no previous technology has fully solved. Containers solved deployment packaging. Function-as-a-service platforms solved some of the same problems but with high startup overhead and coarser granularity. The component model is attempting something more fundamental: a universal substrate for computation that is language-independent, runtime-independent, and platform-independent. + +Whether that ambition is achievable depends on adoption dynamics that are not yet determined. The specification is solid. The tooling is reaching production quality. The early adopters are building real systems. Whether it achieves the penetration needed to become a default rather than a speciality depends on the next two to three years. + +--- + +*This article draws on conversations with engineers at Fastly, Fermyon, and several ByteCode Alliance member companies, as well as the author's own work with Spin and Wasmtime.* diff --git a/techpulse/posts/2024-12-05-developer-survey-2024.md b/techpulse/posts/2024-12-05-developer-survey-2024.md new file mode 100644 index 0000000..bb11f47 --- /dev/null +++ b/techpulse/posts/2024-12-05-developer-survey-2024.md @@ -0,0 +1,99 @@ +--- +title: "TechPulse Developer Survey 2024: 3,000 Respondents, Key Findings" +created: 2024-12-05 10:00 +author: Maya Osei +keywords: developer survey 2024, programming languages, AI tools, remote work, salary, developer burnout +description: Results from our annual survey of 3,000 developers — language popularity, AI adoption, salary data, remote work trends, and burnout rates. +--- + +Every year since 2022, TechPulse has run an independent developer survey. This year 3,047 developers completed our questionnaire — the largest sample we have collected. Respondents came from 61 countries, with the largest groups from the United States (34%), United Kingdom (11%), Germany (8%), India (7%), and Canada (6%). + +The survey ran for three weeks in October and November 2024. What follows are our key findings. + +## Language Popularity and Usage + +Python has extended its lead as the most widely-used language in our survey, with 67% of respondents reporting that they use Python for some professional work. The growth is driven primarily by AI/ML work: Python's dominance in the machine learning ecosystem has pulled a generation of developers who might otherwise have stuck to Java or JavaScript into regular Python usage. + +**Regularly used languages (respondents could select multiple):** +- Python: 67% +- JavaScript/TypeScript: 64% +- Rust: 19% (up from 13% in 2023) +- Go: 31% +- Java: 38% +- C#: 27% +- C/C++: 22% +- Kotlin: 14% +- Swift: 9% +- Ruby: 8% + +TypeScript's adoption has now surpassed plain JavaScript for new projects among respondents with more than five years of experience. Eighty-one percent of JavaScript developers in our survey are using TypeScript for at least some projects, up from 71% in 2023. + +Rust's continued growth is striking. It has moved from a curious experiment to a serious production language in the span of four years. The communities driving adoption are systems programming (kernel, embedded, network infrastructure), web assembly, and increasingly, backend web services where its memory safety and performance characteristics are valued. + +## AI Tool Adoption + +This is the finding that dominates the conversation this year. Seventy-three percent of respondents are using some form of AI coding assistance regularly — defined as at least once per week. That is up from 48% in 2023 and 21% in 2022. + +The breakdown by tool: +- GitHub Copilot: 41% (individual or employer-provided) +- Cursor: 22% +- JetBrains AI Assistant: 16% +- Amazon CodeWhisperer/Q: 11% +- Codeium/Windsurf: 14% +- Custom/self-hosted (typically via API): 9% + +Usage patterns are more interesting than adoption rates. Among the 73% who use AI coding tools regularly: +- 44% describe their use as "can't work without it now" +- 38% describe it as "useful but I could work without it easily" +- 18% are "trying to reduce usage" + +That 18% figure is new this year. In 2023, almost no respondents described themselves as trying to reduce AI tool usage. The shift suggests that an early wave of enthusiastic adoption is producing a correction among some users who find the tools changing their work in ways they don't like. Open comments in this category frequently mention concerns about code quality, loss of deep focus on problems, and a feeling of not understanding code they have written. + +## Remote Work in 2024 + +The return-to-office trend has had a limited effect on software developers compared to other knowledge workers. Among our respondents: +- 51% work fully remotely +- 31% work in a hybrid arrangement (typically 2-3 days in office) +- 18% work primarily in-office + +For 2023, the equivalent numbers were 54%, 29%, and 17% — a modest but real shift toward more in-office time, but far from the reversal that many RTO mandates were aiming for. The most common pattern we hear from respondents at companies with mandatory RTO policies is compliance for the minimum required days combined with active job searching for remote-first roles. + +Salary data shows that fully remote roles still command a premium: median reported salary for fully remote roles (among US-based respondents) was $142,000, compared to $134,000 for hybrid and $129,000 for in-office roles. The premium has narrowed from 2022 when remote roles commanded a larger differential, but it persists. + +## Salary Data + +Median reported total compensation by experience (US respondents only, n=1,042): +- 0-2 years experience: $87,000 +- 3-5 years: $122,000 +- 6-10 years: $153,000 +- 11-15 years: $174,000 +- 16+ years: $181,000 + +These figures are self-reported and have not been independently verified. They align broadly with data from other sources, with the caveat that TechPulse's audience skews toward technically ambitious developers who may earn above-median salaries. + +Geographic variance remains enormous. Median compensation in the San Francisco Bay Area for respondents with 6-10 years of experience was $218,000. For equivalent experience in the UK, €108,000 (approximately $135,000). In Germany, €95,000. + +## Tooling Preferences + +The text editor and IDE landscape has shifted meaningfully. VS Code remains dominant but is losing ground to AI-native editors: +- VS Code: 48% primary editor (down from 58% in 2023) +- Cursor: 21% (up from 6%) +- JetBrains family: 22% +- Neovim: 7% +- Other: 2% + +Cursor's growth is extraordinary. It has gone from a niche tool to the second most popular primary editor in our survey in a single year. Its adoption appears to be driven primarily by respondents switching from VS Code who want tighter AI integration. + +For build and runtime tooling, the Docker adoption plateau is real: 74% of respondents use Docker regularly, flat versus 2023. Kubernetes usage has declined slightly to 38% of respondents, down from 41%. The platforms taking share are Railway, Fly.io, and direct cloud managed services — respondents are opting for managed solutions rather than self-managed Kubernetes. + +## Burnout + +Thirty-eight percent of respondents describe themselves as experiencing significant burnout in the past 12 months. Thirty-one percent describe burnout as a persistent feature of their work. These numbers are significantly higher than in our 2022 and 2023 surveys (35% and 37% respectively reporting significant burnout), suggesting that the trend is moving in the wrong direction. + +Open responses on burnout cluster around several themes: understaffing amid hiring freezes, pressure to use AI tools without adequate training or time to adapt, meeting load, on-call responsibilities, and a general sense that the pace of change in the field has become unsustainable. + +The finding we find most concerning: among respondents who describe high burnout, 52% are actively looking for a new job or planning to within the next 12 months. The industries and companies most exposed to talent attrition from burnout are those with aggressive RTO policies and highest AI adoption pressure. + +--- + +*Full methodology and data tables are available to TechPulse subscribers. The survey was administered online; respondents were recruited through TechPulse's newsletter, social channels, and partner communities. Results are weighted for company size and geography.* diff --git a/techpulse/posts/2025-01-22-anthropic-o3.md b/techpulse/posts/2025-01-22-anthropic-o3.md new file mode 100644 index 0000000..f705f45 --- /dev/null +++ b/techpulse/posts/2025-01-22-anthropic-o3.md @@ -0,0 +1,55 @@ +--- +title: "Chain-of-Thought Models Change Everything — But Not in the Way You Think" +created: 2025-01-22 09:30 +author: Raj Patel +keywords: reasoning models, chain-of-thought, o1, enterprise AI, LLM limitations, AI reasoning +description: The new generation of reasoning models that think before answering have changed what AI can do. But the change is more specific — and the limitations more persistent — than the coverage suggests. +--- + +The AI industry's coverage of chain-of-thought reasoning models has settled into a predictable pattern: each new release produces a wave of breathless coverage about benchmark scores, followed by a wave of critical coverage pointing out that the benchmarks are gamed, followed by both camps missing the most interesting question, which is: what do these models actually change, in practice, for the people building with them? + +I have spent the past several months talking to enterprise AI teams, individual developers, and AI researchers about their experience with reasoning models — the class of models, exemplified by OpenAI's o1 series and its successors, that spend additional compute "thinking" through problems before producing an output. The picture is genuinely interesting and considerably more nuanced than either the bullish or bearish coverage suggests. + +## What Changed + +The core change is real and important. Prior generation language models — GPT-4, Claude 3, the mid-2024 vintage of large models — produce outputs by processing a prompt and generating a response token by token in a single pass. They are good at tasks that can be solved by pattern matching over their training distribution: writing code that follows common patterns, summarising documents, answering factual questions, translating text. + +They are structurally weak at tasks that require sustained multi-step reasoning — problems where getting the right answer requires holding multiple sub-problems in working memory, checking intermediate conclusions, and revising when those conclusions turn out to be wrong. Mathematical reasoning, complex debugging, multi-constraint optimisation, and formal logic tasks all fall into this category. + +Chain-of-thought reasoning models address this limitation by generating explicit reasoning steps before producing a final answer. The model, in effect, writes a scratchpad of thinking that it then uses to produce a more reliable answer. The "thinking" is itself a generated sequence, which means it can be long, exploratory, and self-correcting in ways that a single-pass generation cannot be. + +The empirical improvement on reasoning tasks is real and substantial. Mathematical benchmark scores, coding competition scores, and formal reasoning task scores all improve significantly with chain-of-thought models. This is not benchmark gaming in the crude sense — the improvements generalise to novel problems of the same type. + +## Where Enterprise Adoption Has Gained Traction + +I spoke with AI leads at seventeen enterprises across financial services, healthcare, software development, and professional services. The common thread in successful deployments is this: chain-of-thought models are making a real difference in tasks that require complex, auditable reasoning, and they are changing little in tasks that don't. + +**Code review and debugging** is the clearest success story. Multiple engineering teams reported that chain-of-thought models are meaningfully better at identifying subtle bugs, understanding complex control flow, and explaining why code is wrong in ways that help developers learn. One senior engineering manager described it as "finally getting a code review from someone who actually thinks it through rather than just pattern-matching on what they've seen before." The caveat: the thinking time means latency is higher, which matters for interactive use but less for asynchronous review workflows. + +**Legal document analysis** in financial services is another genuine success. Reasoning models can work through complex contract logic, identify dependencies between clauses, and flag conflicts that earlier models missed. The combination of reasoning capability and the ability to cite specific text makes them useful for audit-trail purposes in regulated industries. + +**Complex data analysis tasks** — not simple aggregations but multi-step analytical reasoning — are improving. "If I ask it to figure out why our conversion rate dropped last quarter, it can actually work through the possible explanations systematically rather than just listing things that might affect conversion," one data analyst told us. + +## Where They Still Fail + +The failure modes of reasoning models are different from the failure modes of their predecessors but no less real. + +**Reasoning models can reason very confidently toward wrong answers.** The "thinking" process generates internal consistency, but internal consistency is not the same as correctness. I have seen reasoning models produce elaborate, coherent explanations for conclusions that were factually wrong, complete with carefully structured arguments that would require domain expertise to identify as incorrect. This is, in some ways, more dangerous than an older model that produces a wrong answer in an obviously uncertain way. + +**Long-horizon task completion remains elusive.** The improvements in reasoning capability apply within a bounded context: given a well-defined problem, reasoning models are better at finding the answer. They are not significantly better at managing complex projects over time, maintaining consistency across long workflows, or autonomously completing tasks that require adapting to unexpected situations. The vision of AI agents that can work on problems for hours or days remains largely unrealised despite the capability improvements. + +**Domain-specific knowledge limitations persist.** Chain-of-thought reasoning improves formal reasoning but does not substitute for domain knowledge. A reasoning model asked to analyse a clinical trial design will reason more carefully but will still make errors that a domain expert would not, because its underlying knowledge of clinical research methodology is imperfect. Reasoning models are better advisors in areas where careful thinking matters; they are not reliable substitutes for domain expertise. + +**Cost and latency are real constraints.** Reasoning models consume significantly more tokens than standard models, because the thinking process itself generates output that must be processed. API costs for reasoning-heavy tasks can be 5-10x higher than equivalent tasks on standard models. For some high-value tasks this is obviously worthwhile. For high-volume, latency-sensitive applications, it changes the economics significantly. + +## The Pattern That Matters + +The enterprise teams making the best use of reasoning models have converged on a pattern: use them for decisions, not for generation. Standard models are still excellent for generating content — writing emails, summarising documents, producing code scaffolding. Reasoning models are worth their cost for the decision-making steps: reviewing that code, analysing that document for specific logical issues, working through a complex technical question. + +The mistake made by teams that have been disappointed with reasoning models is using them as a drop-in replacement for standard models in generation tasks, where the reasoning capability provides little benefit and the cost and latency increase is pure overhead. + +Chain-of-thought models have changed something real about what AI can do. They have not changed the fundamental challenge of deploying AI in production: knowing precisely what the system can and cannot do reliably, and designing your workflow so that the unreliable parts have appropriate human oversight. + +--- + +*Raj Patel spoke with AI engineering teams at seventeen enterprise companies between October 2024 and January 2025. Companies are not named; they requested anonymity as a condition of participation.* diff --git a/techpulse/posts/2025-03-10-platform-engineering.md b/techpulse/posts/2025-03-10-platform-engineering.md new file mode 100644 index 0000000..e6d6f30 --- /dev/null +++ b/techpulse/posts/2025-03-10-platform-engineering.md @@ -0,0 +1,69 @@ +--- +title: "Platform Engineering Is the New DevOps — And That's Both Good and Bad" +created: 2025-03-10 14:00 +author: Maya Osei +keywords: platform engineering, DevOps, internal developer platforms, CNCF, golden paths, developer experience +description: Platform engineering has become the dominant framework for thinking about internal developer infrastructure. A look at whether it is solving the right problems and what the CNCF data says. +--- + +Every few years, the software industry invents a new term for the cluster of practices around making developers productive in complex organisations. First there was "ops." Then DevOps. Then SRE. Now platform engineering. The question worth asking is whether the new label represents genuine progress in how we think about the problem, or whether it is primarily branding that allows the same arguments to be relitigated with a fresh vocabulary. + +Having spent several months reviewing the CNCF's 2025 platform engineering report, talking to platform teams at companies of varying sizes, and reading the academic and practitioner literature, my answer is: it's some of both, and the difference matters. + +## What Platform Engineering Actually Means + +The CNCF's Platform Engineering Maturity Model, published in 2023 and updated in 2025, defines a platform as "a foundation of self-service APIs, tools, services, knowledge, and support that are arranged as a compelling internal product." Platform engineering is the practice of building and maintaining that foundation. + +The key word in that definition is "product." Platform engineering, as a discipline, differs from traditional infrastructure or DevOps work in its explicit adoption of product thinking: the internal customers of the platform are treated as users whose experience matters, whose needs must be understood through user research and feedback loops, and whose adoption should be measured and optimised. + +This framing comes with a specific set of practices: platform roadmaps that are driven by developer needs rather than just infrastructure requirements, developer experience (DevEx) metrics, golden paths that offer a recommended way to do common tasks without mandating it, and internal marketing for platform capabilities. Many organisations have infrastructure teams that do not think this way; the claim of platform engineering is that this thinking produces better outcomes. + +## What the CNCF Data Shows + +The CNCF surveyed 1,400 organisations for their 2025 report. The headline finding: 71% of organisations with more than 500 engineers have either implemented an internal developer platform or are actively building one, up from 55% in 2023. + +The outcomes data is more interesting than the adoption data. Organisations that scored highly on the CNCF's platform maturity model reported: +- 34% reduction in time from code commit to production deployment +- 28% reduction in developer onboarding time for new team members +- 19% reduction in security incidents related to misconfiguration +- Higher scores on internal developer satisfaction surveys + +These numbers look compelling, but they come with a significant caveat: the organisations that have invested heavily in platform engineering are also the organisations that invest heavily in engineering infrastructure in general. The correlation between platform maturity and deployment efficiency may partly reflect underlying investment levels rather than a causal effect of the platform engineering approach specifically. + +## The Problem Platform Engineering Was Built For + +To assess whether platform engineering is solving the right problems, you need to understand what it is responding to. The problem is genuine: as organisations adopt microservices, containerisation, multiple cloud environments, and complex CI/CD pipelines, the cognitive load on application developers has increased dramatically. + +A developer who needs to ship a feature must now navigate: Kubernetes configuration, service mesh settings, IAM policies, observability instrumentation, security scanning requirements, deployment pipeline configuration, and an ever-growing stack of infrastructure tooling. Each of these things exists for a good reason, but in aggregate they have created a situation where many developers spend more time wrestling with infrastructure than writing application code. + +Platform engineering's answer is the "golden path" — a supported, opinionated way of doing common infrastructure tasks that abstracts the complexity without removing it. Instead of every developer team reinventing CI/CD pipelines, there is a platform-provided pipeline template. Instead of every team figuring out Kubernetes manifests, there is a platform-provided deployment abstraction. + +The golden path metaphor is well-chosen: it is a recommendation, not a mandate. Teams that need to deviate can deviate; they just lose the platform team's support when they do. This is more functional than either "everyone figures it out themselves" or "everyone must use the standard approach regardless of whether it fits their needs." + +## The Problems Platform Engineering Creates + +The problems with platform engineering are less often discussed because they tend to emerge after the implementation phase, when the platform team has already been staffed and positioned as a success. + +**Platform teams become bottlenecks.** When infrastructure decisions require going through the platform team, the platform team must prioritise developer requests. If the platform team is understaffed (common) or slow-moving (also common), developers wait. The same dynamic that DevOps was invented to address — developers waiting for ops to provision infrastructure — can re-emerge if platform teams are not very careful about their operating model. + +**Golden paths become golden cages.** Over time, platform teams tend to optimise their golden paths for the average case, which means they work well for common use cases and badly for edge cases. Teams with unusual requirements — high-performance computing workloads, specialised security requirements, novel architectures — find themselves fighting the platform rather than being helped by it. The cognitive overhead shifts from "figuring out Kubernetes" to "figuring out how to do what I need within the platform's assumptions." + +**Platform engineering replicates without redistributing complexity.** The platform team absorbs infrastructure complexity so application teams don't have to face it directly. But the complexity does not go away — it concentrates in the platform. This is often the right tradeoff. But it means that platform team burnout and turnover is extremely costly, and that organisations are creating a new class of institutional knowledge that is hard to replace. + +## What The Most Successful Implementations Look Like + +The platform teams we interviewed that had the most developer satisfaction and the best outcomes shared several characteristics: + +They measured developer experience systematically, using DORA metrics and DevEx surveys, and used that data to prioritise platform work. They did not assume they knew what developers needed — they asked, regularly. + +They adopted a product development model including a backlog, regular prioritisation, and clear roadmaps, with application developers invited to contribute to priority setting. + +They treated the platform as optional for edge cases, building escape hatches and documenting them. This reduced resistance to the platform and meant developers only fought the golden path when they had genuine reasons to. + +They kept the platform team small and focused on the highest-leverage infrastructure work, resisting the tendency to expand platform scope until scope expansion was clearly justified by developer demand. + +Platform engineering, done well, is genuinely valuable. Done poorly, it is DevOps complexity with a product manager and a roadmap. + +--- + +*Maya Osei covers startups, funding, and the business of developer tools at TechPulse.* diff --git a/techpulse/posts/2025-04-28-open-source-ai-models.md b/techpulse/posts/2025-04-28-open-source-ai-models.md new file mode 100644 index 0000000..65f91f2 --- /dev/null +++ b/techpulse/posts/2025-04-28-open-source-ai-models.md @@ -0,0 +1,67 @@ +--- +title: "Open Source AI Models in 2025: The Landscape Is More Complex Than It Seems" +created: 2025-04-28 11:00 +author: Raj Patel +keywords: open source AI, Llama 3, Mistral, Gemma, open weights, AI licensing, Meta AI +description: Llama, Mistral, Gemma — the "open source AI" movement is growing fast. But what does "open" actually mean when applied to large language models, and which models are actually open? +--- + +The phrase "open source AI model" is used everywhere and means almost nothing consistent. When Meta releases Llama 3, they call it open source. When Mistral releases their models, they call them open source. When Google releases Gemma, they call it open source. In each case, "open source" refers to something meaningfully different, and in most cases it refers to something that the Open Source Initiative and the broader open source community would not recognise as open source in the traditional sense. + +This matters for practical reasons — your ability to use, modify, and redistribute a model depends on the actual terms, not the marketing language. It matters for political reasons — if "open source AI" becomes a term that can be claimed by companies that are merely releasing weights under restricted licences, it dilutes the meaning of open source in ways that will have long-term consequences for the ecosystem. And it matters for philosophical reasons — the debate about what openness means for AI models is substantively different from the debate about what openness means for traditional software, because the artefacts involved are different. + +## What "Open" Can Mean for an AI Model + +Traditional open source software requires, at minimum, that the source code be available and that it can be freely used, modified, and redistributed. The Open Source Definition, maintained by the OSI, has specific criteria. Most "open source" AI models fail these criteria in multiple ways. + +For an AI model, the meaningful components that could be "open" include: + +**Weights** — the numerical parameters that define the model's behaviour after training. Releasing weights allows anyone to run the model and fine-tune it, but without anything else, it is analogous to releasing a compiled binary without source code. + +**Training code** — the code used to train the model, including architecture definitions and training procedures. This is analogous to source code in traditional software. + +**Training data** — the data the model was trained on. This is arguably the most important factor in a model's capabilities and alignment, and the most important thing that is almost never released. + +**Evaluation code and data** — the benchmarks and test sets used to evaluate the model's capabilities. Needed to independently verify capability claims. + +Most "open" AI models release only weights, and often with restrictive licences that prohibit commercial use above a certain scale, require attribution, prohibit certain use cases, or retain the right to revoke the licence. This is not open source in any traditional sense. + +## The Models and What They Actually Release + +**Meta Llama 3** (and the Llama family generally) releases weights under a custom "Meta Llama 3 Community License." The licence allows commercial use but prohibits using Llama to train other large language models (a significant restriction), requires attribution, and prohibits use by entities with more than 700 million monthly active users without a special agreement. Training code is partially available. Training data is not released. + +**Mistral** releases weights for several models under Apache 2.0, which is the most genuinely open licence in the "open" AI model space. Apache 2.0 allows commercial use, modification, and redistribution without restrictions beyond attribution. Mistral does not release training code or training data for its flagship models. Their "open weights" language is more honest than "open source." + +**Google Gemma** uses a custom licence that allows commercial use but prohibits certain applications (explicitly: use in weapons development, surveillance, and certain high-risk medical applications) and restricts redistribution in ways that are not compatible with OSI open source criteria. Training data and training code are not released. + +**Falcon** from the Technology Innovation Institute releases weights under Apache 2.0 for most model sizes, making it one of the more genuinely open options for weights. Like other models, training data is not released. + +**BLOOM** from BigScience is the closest to a genuinely open model — it was trained using a diverse coalition of researchers, the training data (ROOTS) is documented and partially available, and the model is available under a licence that is OSI-compliant in spirit if not letter. + +## The Training Data Problem + +The deepest issue in open source AI models is training data. A model's capabilities, biases, and failure modes are substantially determined by what it was trained on. Without access to training data, you cannot truly audit a model's behaviour, cannot understand why it fails in certain ways, and cannot replicate the training to produce a model with different properties. + +There are legitimate reasons why training data is not released. Much of the text used to train large language models comes from the web and includes copyrighted material — releasing the training data would create enormous copyright exposure. Personal data collected in training sets raises privacy concerns. The compute cost of reproducing a training run from data is prohibitive for most actors. + +These are real constraints, not excuses. But they mean that the most important component for understanding what a model is and why it behaves as it does is, in practice, unavailable. This is a fundamental limitation on the openness of current AI models that is unlikely to be resolved in the near term. + +## Commercial Use Restrictions and Their Implications + +The Llama family's restriction on using its weights to train other large language models is a significant practical constraint that is easy to miss in the licence terms. It means that the Llama models, despite being widely described as "open source," cannot be used to produce derivative foundation models. You can fine-tune Llama for a specific task; you cannot use Llama as the initialisation point for a new pretrained model. + +This restriction protects Meta's competitive position — they do not want to train a model that then gets used to build a competitor — while allowing the application ecosystem to develop. It is a commercially rational choice. It is not consistent with the open source principle that anyone can use open source software as the basis for any project, including a competitive one. + +## The Case for Releasing Weights Anyway + +None of this is an argument that releasing weights is not valuable. It is. Weights-only releases have enabled enormous amounts of useful research, have allowed fine-tuning for specialised domains, have created an ecosystem of tools and applications, and have provided a practical alternative to API-only access for organisations with privacy requirements or latency constraints. + +The argument is specifically about terminology. Calling these releases "open source" obscures the real distinctions between what is genuinely open and what is open in a more limited marketing sense. Those distinctions matter for developers making architectural decisions, for researchers studying AI, and for the policy conversations about AI regulation that increasingly hinge on what "open" means. + +The OSI's ongoing work to define "Open Source AI" — a formal definition that extends their existing principles to AI systems — is an important contribution to this conversation. Their current draft requires, at minimum, that training data be documented and described (not necessarily released), that training code be released, and that weights be released under an OSI-approved licence. By these criteria, almost no current major AI model qualifies as open source. + +That gap between the marketing language and the formal definition deserves more attention than it gets. + +--- + +*Raj Patel has been following the open source AI ecosystem since the Llama 1 release. He has no financial relationship with any of the companies mentioned.* diff --git a/techpulse/posts/2025-06-15-kubernetes-fatigue.md b/techpulse/posts/2025-06-15-kubernetes-fatigue.md new file mode 100644 index 0000000..7a414d1 --- /dev/null +++ b/techpulse/posts/2025-06-15-kubernetes-fatigue.md @@ -0,0 +1,80 @@ +--- +title: "Kubernetes Fatigue Is Real — Here's What Teams Are Doing Instead" +created: 2025-06-15 09:00 +author: Clara Winthorpe +keywords: Kubernetes, k8s, platform alternatives, Fly.io, Railway, managed services, infrastructure fatigue +description: Our survey of 200 engineering teams finds growing Kubernetes fatigue. We look at who is leaving, who is staying, and what the alternatives actually look like in practice. +--- + +The Kubernetes enthusiasm that dominated infrastructure conversations from 2018 through 2022 has given way to something more complicated: a widespread recognition that Kubernetes is extraordinary infrastructure for a specific set of problems, and an unfortunate default solution for many problems it is not well-suited for. + +Over eight weeks, TechPulse surveyed 200 engineering teams about their infrastructure choices, their experience with Kubernetes, and what they have changed or considered changing in the past 18 months. The findings suggest a market in transition — not a Kubernetes collapse, but a significant reassessment. + +## The Survey Findings + +Of the 200 teams surveyed, 142 were using Kubernetes at the time we spoke with them. Of those: +- 31% described themselves as "fully satisfied" with Kubernetes for their workloads +- 44% described it as "working well but resource-intensive to maintain" +- 18% described it as "more trouble than it's worth for our scale" +- 7% were "actively looking to migrate away from it" + +Among the 58 teams not using Kubernetes: +- 22 had migrated off Kubernetes in the past two years +- 19 had evaluated Kubernetes and decided not to adopt it +- 17 had never seriously considered it + +## Who Is Going Back to VMs + +The teams that have migrated away from Kubernetes share a recognisable profile: they are typically under 30 engineers, running fewer than ten services, and they adopted Kubernetes because it was the industry default rather than because their workloads specifically required it. + +"We had three microservices and six engineers and we were running Kubernetes because that's what you did," the CTO of a 20-person SaaS company told us. "We spent the equivalent of one full-time engineer's time just managing the cluster. We moved to managed VMs and a simpler deployment process and we have not looked back." + +Several teams have returned to virtual machines, either managed directly through cloud providers or through platforms that provide a higher level of abstraction over VMs. The appeal is straightforwardness: a VM does what you tell it to, has a clear resource model, and does not require understanding pod scheduling, ingress controllers, persistent volume claims, and a dozen other Kubernetes-specific concepts. + +The cost argument is also real. Kubernetes clusters have a minimum overhead — the control plane, the node agents, the namespace overhead — that adds up for small workloads. A small application running on a properly-sized VM is often significantly cheaper than the same application running in a Kubernetes cluster, even before you account for the engineering time. + +## Managed Services: The Middle Ground + +The most common "instead of Kubernetes" choice in our survey was not VMs but managed services — giving responsibility for the infrastructure to a cloud provider or a specialist platform and focusing on application code. + +AWS Fargate, Google Cloud Run, and Azure Container Apps all provide container running services without requiring Kubernetes knowledge. The tradeoffs are loss of control and increased per-unit cost, but for many teams the operational simplification justifies both. + +Among smaller and developer-first companies, Fly.io and Railway have attracted significant attention and, in some cases, meaningful migration away from both Kubernetes and AWS-native services. + +## Fly.io: What the Hype Is About + +Fly.io has built an infrastructure layer that runs containers close to users at a global network of data centres, with a developer experience that is genuinely simple compared to Kubernetes. Deployments happen from the command line with `fly deploy`. Machines start in seconds. Pricing is straightforward and usage-based. + +The teams we spoke with that have moved to Fly.io consistently described the same thing: a dramatic reduction in infrastructure cognitive load. "I stopped spending Sunday evenings worrying about the cluster," one engineering lead said. "The infrastructure just runs." + +The limitations are real. Fly.io gives you less control than Kubernetes — you cannot bring arbitrary infrastructure, the networking model is fixed, and the platform is still maturing. Several teams we spoke with noted that they had hit edge cases where Fly.io's abstraction made it difficult to do things that Kubernetes made possible, though complex. + +Fly.io also had several well-publicised reliability incidents in 2023 and early 2024 that caused teams evaluating it to pause. The company has invested significantly in reliability since then, and the teams currently running production workloads on it describe good availability. But the incidents created a trust deficit that is still being rebuilt. + +## Railway: The Simplest Option + +Railway occupies an even simpler position than Fly.io. It is closer to a PaaS — you give it a GitHub repository and environment variables, and it figures out how to run your code. There is almost no infrastructure configuration. The target user is a developer who wants to run a backend service without any ops work. + +Among the teams we spoke with, Railway is primarily used for internal tools, side projects, and small production services where engineering simplicity is the primary goal and workload characteristics are predictable and modest. It is not being seriously evaluated for large-scale, complex production workloads. It is not trying to be. + +## The Teams Staying With Kubernetes — And Why + +The 31% of Kubernetes teams that described themselves as fully satisfied are worth understanding. What do they have in common? + +They are large enough to justify dedicated platform engineering. Every fully-satisfied team we spoke with had at least two engineers whose primary focus was platform and infrastructure. This creates a different experience: Kubernetes is complicated to maintain, but if someone's job is maintaining it, the complexity becomes manageable. + +Their workloads genuinely benefit from Kubernetes' capabilities. The teams most satisfied with Kubernetes are running workloads with heterogeneous resource requirements, complex networking, stateful services that benefit from Kubernetes-native storage, or compliance requirements that benefit from Kubernetes' audit and access control capabilities. + +They have invested in abstractions above raw Kubernetes. Every satisfied Kubernetes team we spoke with had built internal tooling or adopted a platform layer — Helm charts, Backstage, Crossplane, or an internal developer platform — that abstracted the Kubernetes complexity for application developers. They were, in effect, doing platform engineering. + +## The Real Lesson + +The lesson of the Kubernetes fatigue phenomenon is not that Kubernetes is a bad technology. It is an extraordinary piece of software that has enabled an entire generation of complex distributed systems. The lesson is about fit: Kubernetes is the right tool for a specific class of problems, and it was adopted as the universal solution for all problems, which it is not. + +The teams most satisfied with Kubernetes know exactly why they are using it. The teams most frustrated with it are often teams that adopted it because everyone else was doing so, without asking whether the tradeoffs made sense for their scale, team size, and workload characteristics. + +The infrastructure market is slowly recalibrating toward fit-for-purpose choices. That recalibration is healthy. + +--- + +*Clara Winthorpe covers infrastructure, open source, and DevOps at TechPulse. She surveyed 200 engineering teams between March and May 2025.* diff --git a/techpulse/posts/2025-07-22-vc-funding-2025.md b/techpulse/posts/2025-07-22-vc-funding-2025.md new file mode 100644 index 0000000..c2f4899 --- /dev/null +++ b/techpulse/posts/2025-07-22-vc-funding-2025.md @@ -0,0 +1,66 @@ +--- +title: "Tech Funding in 2025: The AI Bubble vs. The Infrastructure Boom" +created: 2025-07-22 10:30 +author: Maya Osei +keywords: venture capital 2025, AI funding, infrastructure investment, IPO market, tech funding trends +description: H1 2025 funding data shows a bifurcated market — AI application layer funding has cooled while infrastructure investment continues to grow. What the LP perspective reveals. +--- + +Two narratives are simultaneously true about technology venture capital in the first half of 2025: the AI funding boom is moderating, and infrastructure investment is accelerating. These trends are related, and understanding the relationship between them explains a lot about where the technology industry is heading. + +Total venture investment in technology companies in H1 2025 was approximately $89 billion globally, according to PitchBook data. That is 14% below H1 2024's total of $103 billion, but still far above pre-2020 levels. The decline is concentrated in AI application companies; AI infrastructure and developer tools have continued to attract capital at elevated rates. + +## The Application Layer Correction + +The correction in AI application company valuations has been building since late 2024. The proximate cause is the same one that has historically corrected inflated software valuations: enterprise customers are taking longer to convert from pilots to paying contracts, and the ARR metrics that were used to justify high valuations in 2023-2024 have not been sustained as customers churn out of first-year contracts at unexpectedly high rates. + +The median AI application company raised its last round at approximately 25x ARR in 2023. Those companies are now struggling to raise follow-on rounds at those multiples, because investors are applying 2024's harder-earned scepticism to ARR quality. The companies in trouble are those with: + +- High customer concentration (top three customers representing more than 50% of ARR) +- Churn rates above 15% annually on an ARR basis +- Products that are wrappers over foundation models without genuine differentiation +- Burn multiples above 2.0x (spending $2 or more to generate each $1 of ARR) + +By these criteria, a large fraction of the AI application cohort is facing a difficult fundraising environment. Several well-known companies that raised at unicorn valuations in 2023 have not yet announced follow-on rounds; the absence of news, in this environment, is informative. + +## Infrastructure Investment Continues + +The infrastructure layer tells a different story. Spending on AI compute, networking, and datacenter capacity remains extraordinary. NVIDIA's revenue has continued to grow, and the cloud providers' capital expenditure on GPU clusters has not shown signs of moderating. The constraint on AI infrastructure investment is not demand — enterprise demand for AI compute remains robust — but the rate at which new hardware can be manufactured and deployed. + +The venture investment picture at the infrastructure layer reflects this dynamic. Companies building AI inference infrastructure, specialised AI hardware, and the data pipelines and MLOps tooling that enterprises need to operate AI systems in production are raising rounds at healthy valuations with relatively less valuation compression than the application layer. + +The pattern is recognisable from the cloud computing buildout of the 2010s: the infrastructure layer captured durable, growing revenue before the application layer had figured out its business models. AWS became an enormous, profitable business while hundreds of cloud-native application companies struggled with unit economics. + +## The IPO Market: Still Frozen, But Thawing + +The technology IPO market has been largely frozen since 2022's correction. 2023 and 2024 saw very few significant tech IPOs, as companies with access to private capital preferred to stay private rather than face the scrutiny and short-term pressure of public markets. + +In H1 2025, we saw the first tentative signs of thaw. Several mid-sized technology companies filed for IPO or completed listings, testing whether public market investors had re-calibrated their expectations from the froth of 2021. Early results are mixed. Companies with clear profitability paths and strong unit economics have been received reasonably well. Companies without have faced a chilly reception. + +The companies most likely to successfully IPO in the next 12-18 months are infrastructure and developer tools companies with strong recurring revenue and clear paths to profitability. Application-layer AI companies are unlikely to have attractive IPO conditions until the churn and ARR quality questions in the sector are more clearly resolved. + +## The LP Perspective + +The most revealing conversations I had for this piece were with limited partners — the endowments, pension funds, and family offices that fund venture capital funds. LPs have historically been the most accurate leading indicators of VC market conditions, because they are investors in investors: their behaviour predicts what VCs will be able to deploy. + +The pattern among LPs in 2025 is a meaningful reduction in new commitments to early-stage generalist venture funds, offset by continued strong interest in growth-stage infrastructure-focused funds and specialised AI infrastructure funds. "We got burned by the 2021 vintage," one endowment manager told me. "Not catastrophically — but the returns are going to be mediocre for a generation of funds that looked like obvious winners in 2022. We're being more selective." + +The practical implication: the pool of capital available for early-stage AI application company fundraising has contracted more significantly than the headline numbers suggest, because LP caution flows through to new fund formation, which then flows through to early-stage investment 12-18 months later. + +## Which Sectors Are Actually Drying Up + +Beyond the AI application layer, several other categories are experiencing meaningful funding contractions: + +**Consumer fintech** — the category that produced a wave of neobanks, BNPL services, and financial apps in 2019-2022 — has been in secular decline since interest rates rose. Business models that depended on low-cost capital and high consumer spending have proven fragile. + +**SaaS without AI features** — pure workflow management, project tracking, and similar tools without meaningful AI integration are struggling to raise at previous multiples. The bar for "what is distinctive about this product" has been reset by the AI-native competitors entering every software category. + +**Web3/crypto-adjacent infrastructure** — after the 2022-2023 shakeout, institutional venture capital has largely abandoned the sector except for the infrastructure layer (custody, compliance tools, institutional trading infrastructure). Consumer-facing crypto products are finding little VC interest. + +**No-code and low-code platforms** — the category that was expected to democratise software development in the early 2020s has been significantly disrupted by AI coding tools, which address the same underlying demand more effectively for the use cases that matter most to enterprise buyers. + +The bifurcated market will resolve, as bifurcated markets always do, either by the infrastructure investment producing revenue that justifies it, or by a broader reassessment. The evidence currently points more toward the former, but the timeline is uncertain. + +--- + +*Maya Osei covers the business of technology and startup funding at TechPulse. Revenue and funding data from PitchBook, CB Insights, and Crunchbase.* diff --git a/techpulse/posts/2025-09-08-deno-v3.md b/techpulse/posts/2025-09-08-deno-v3.md new file mode 100644 index 0000000..34ea234 --- /dev/null +++ b/techpulse/posts/2025-09-08-deno-v3.md @@ -0,0 +1,79 @@ +--- +title: "Deno v3: Is Node.js Compatibility Finally Good Enough?" +created: 2025-09-08 13:00 +author: Clara Winthorpe +keywords: Deno v3, Node.js, npm compatibility, JavaScript runtime, Bun, benchmarks +description: Deno v3 launched with the strongest Node.js compatibility claim the project has ever made. We ran the benchmarks, tested real packages, and talked to teams who have actually migrated. +--- + +When Ryan Dahl unveiled Deno in 2018 as a "do-over" of Node.js, he was explicit about what he was moving away from: npm, the node_modules directory, non-Promise async patterns, and a handful of other decisions that had come to define Node's complexity. Deno would use URLs for imports, browser-compatible APIs, and TypeScript natively. It would be simpler and more secure. + +The problem, which became apparent relatively quickly, was that the JavaScript ecosystem had spent a decade building on npm. Nearly every useful library, framework, and tool was distributed through npm and assumed Node.js internals. Deno's principled stance on clean-room design made it largely incompatible with the ecosystem developers needed. + +Deno's history since 2018 has been the story of gradually reconciling that incompatibility. Deno v2 added the `node:` protocol for Node.js built-in modules and significantly improved npm package compatibility. Deno v3, released in July 2025, makes the strongest compatibility claim in the project's history. + +## What's New in v3 + +Deno v3's headline feature is what the team calls "seamless Node.js compatibility." The specific claims: + +- Full compatibility with the `node:` built-in modules, including `child_process`, `cluster`, `worker_threads`, and other modules that were previously partially supported +- Native npm package installation without an explicit compatibility flag — `deno install` now installs from npm registries by default +- Compatibility with the most popular Node.js frameworks: Express, Fastify, NestJS, and others that depend on Node-specific APIs +- A new `deno run --compat` mode that activates additional Node.js compatibility shims for legacy code + +The team has also improved performance in v3. The V8 version has been updated, the runtime startup time has been reduced, and the built-in TypeScript compilation (Deno compiles TypeScript without a separate build step) has been made significantly faster. + +## Our Benchmark Results + +We ran a suite of benchmarks comparing Deno v3 to Node.js 22 and Bun 1.1 (the latest stable version as of this writing). Tests were run on a dedicated bare-metal server with an Intel Core i9-13900K and 64GB RAM, running Ubuntu 24.04 LTS. + +**HTTP throughput (requests/second, simple JSON response):** +- Bun 1.1: 142,000 req/s +- Node.js 22: 89,000 req/s +- Deno v3: 97,000 req/s + +**Startup time (cold start to first output):** +- Bun 1.1: 12ms +- Deno v3: 31ms +- Node.js 22: 48ms + +**TypeScript compilation of a 50K line codebase:** +- Bun 1.1 (with transpile-only, no type checking): 0.8s +- Deno v3 (with type checking): 4.2s +- Node.js 22 + tsc: 7.1s + +The performance picture is interesting. Bun remains the fastest runtime in most benchmarks, and its lead in HTTP throughput is significant. Deno v3 has closed the gap with Node.js and in several microbenchmarks exceeds it. For real-world web application workloads, the difference between Deno and Node is unlikely to be a deciding factor; the difference between either of them and Bun is more meaningful for latency-sensitive applications. + +Deno's TypeScript handling is a genuine advantage over Node.js's conventional approach (separate TypeScript compilation step via tsc or esbuild) in developer experience terms. Running TypeScript files directly with `deno run` without a build step is convenient and works well in v3. + +## npm Compatibility: How Well Does It Actually Work + +The honest answer to "is Node.js compatibility finally good enough?" is: for most packages, yes; for some important ones, not yet. + +We tested 50 npm packages across different categories. 43 installed and ran without errors under Deno v3. Seven had issues ranging from minor (deprecation warnings about Node-specific patterns) to significant (package depends on native C++ extensions that require a Node.js runtime). + +The category that remains problematic is native modules — npm packages that include compiled C++ extensions. These packages use Node's N-API or NAN to interface with native code, and while Deno has implemented N-API support, it is incomplete. Several popular packages in the cryptography, image processing, and database connector categories use native modules and do not work fully under Deno v3. + +For pure JavaScript/TypeScript packages and packages that use only WASM for native performance, compatibility is excellent. For packages with native C++ dependencies, Deno's story is still incomplete. + +## Real Migration Stories + +We spoke with four teams that had migrated production workloads from Node.js to Deno in the past year. + +A developer tools startup migrated their CLI tooling from Node.js to Deno v3. Their assessment: "We rewrote the parts that used native modules in pure TypeScript and WebAssembly. For everything else, the migration was relatively smooth. We are not significantly faster than we were on Node, but the developer experience is better — first-class TypeScript, better security defaults, and a simpler dependency management story." + +A media company migrated their content delivery backend, a Node.js Express application, to Deno v3. They used Deno's `--compat` flag and described a three-day migration process. "Express runs fine on Deno v3. We hit a few edge cases with middleware that used Node-specific stream APIs, but the Deno team was responsive on GitHub and we had working workarounds quickly. Our throughput is slightly better — maybe 10% on our workloads — but the main win is simpler deployments with Deno's built-in tooling." + +A fintech startup evaluated migrating their Node.js backend but decided against it. The decision came down to native modules: their PostgreSQL connection pooling library used a native module for performance, and the Deno N-API implementation was not stable enough for their production requirements. + +## The Bun Comparison + +Bun's competitive position relative to Deno has not changed significantly with v3. Bun's strengths — exceptional performance, near-complete Node.js compatibility, very fast npm operations — remain its strengths. Deno's strengths — security-first design, first-class TypeScript with full type checking, WebAssembly-native capabilities, the Deno Deploy hosting platform — are different and remain unchanged. + +The two runtimes serve somewhat different audiences. Teams that want the fastest possible Node.js replacement with the minimum migration friction are better served by Bun. Teams that value security boundaries, TypeScript correctness, and are building greenfield projects without heavy npm ecosystem dependencies are better served by Deno. + +Both are genuine improvements on Node.js in their respective domains. Node.js remains the default choice by inertia, ecosystem depth, and the enormous installed base. Whether either alternative achieves significant market share in the runtime market over the next several years depends on whether they can convert that inertia — which is the real challenge. + +--- + +*Benchmark testing conducted on a dedicated server running Ubuntu 24.04 LTS. All runtime versions are as of publication date. Clara Winthorpe covers infrastructure and developer tools at TechPulse.* diff --git a/techpulse/posts/2025-10-14-security-supply-chain.md b/techpulse/posts/2025-10-14-security-supply-chain.md new file mode 100644 index 0000000..a3168a6 --- /dev/null +++ b/techpulse/posts/2025-10-14-security-supply-chain.md @@ -0,0 +1,71 @@ +--- +title: "Software Supply Chain Security in 2025: Progress Report" +created: 2025-10-14 09:00 +author: Raj Patel +keywords: software supply chain security, SBOM, sigstore, GitHub security, dependencies, enterprise security +description: Three years after the xz incident and a decade after SolarWinds, we assess how much enterprise software supply chain security has actually improved. +--- + +The phrase "software supply chain security" entered mainstream awareness after SolarWinds in 2020 and has been a fixture of security conference agendas and government executive orders ever since. In the years since, significant resources have been invested in improving the situation: new tooling, new standards, regulatory pressure, and corporate security programmes specifically targeting the supply chain. + +It is worth asking whether it has worked. + +The honest answer, based on conversations with security researchers, enterprise security teams, and the people building the tooling, is: meaningfully yes in some areas, frustratingly stagnant in others, and still dramatically under-resourced overall. + +## SBOM Adoption: From Mandate to Practice + +The Software Bill of Materials (SBOM) — a machine-readable inventory of the components in a software product — has moved from a security researcher wish list item to a regulatory requirement in several jurisdictions. The US government's Executive Order on cybersecurity in 2021 required federal contractors to provide SBOMs. The EU Cyber Resilience Act, which came into force in phases beginning in 2024, requires SBOMs for products sold into the European market. + +Adoption among enterprises has been driven primarily by these compliance requirements. Our conversations with security teams at large enterprises suggest that SBOM generation is now routine for companies operating in regulated sectors or selling to government customers. The tooling has matured: Syft, CycloneDX, and SPDX toolchains work reliably, integrate with major CI/CD platforms, and produce SBOMs that can be ingested by vulnerability scanning tools. + +The gap is in what organisations do with SBOMs once they have them. Generating an SBOM is the easy part. Ingesting SBOMs from third-party software vendors, maintaining them across complex software ecosystems, and using them for actual risk management decisions is the hard part, and that work is much less mature. + +"We generate SBOMs for everything we ship," one security engineer at a large technology company told us. "But we receive SBOMs from fewer than 10% of our suppliers, and even when we do, we don't have the process to operationalise them into real risk decisions. They exist in a system and nobody looks at them." + +## Sigstore and the Signing Infrastructure + +Sigstore — the open source project providing signing and verification infrastructure for software artefacts — has become genuine critical infrastructure. It is now used to sign virtually all Python packages uploaded to PyPI, the npm registry has begun adopting it for provenance information, and the Go module proxy uses it for module signing. + +The technical architecture is solid: Sigstore uses a public, append-only transparency log (Rekor) that allows anyone to verify that a signing event occurred and when, combined with short-lived signing certificates from Fulcio that tie signatures to verified identities (typically GitHub Actions workflows or similar OIDC-based identities). The result is a system that provides meaningful verification without requiring developers to manage long-lived private keys — a significant usability improvement over PGP-based signing. + +The challenge is adoption at the consumer end. Signing artefacts is now easy; verifying signatures before using them is still a manually-configured step that most developers do not take. Defaulting verification on in package managers — so that installing an unsigned or unverified package requires an explicit opt-in — is the next step that would translate signing infrastructure into routine security practice. npm has begun moving in this direction; others are more cautious. + +## GitHub Dependency Scanning: What It Catches, What It Misses + +GitHub's Dependabot and the code scanning features built on CodeQL have become the most widely-deployed dependency security tooling in the industry, by virtue of being free and integrated into the platform where most open source and a significant fraction of commercial software development happens. + +The effectiveness data is encouraging at the surface level: Dependabot has helped organisations patch millions of vulnerable dependencies. The catch rate for known vulnerabilities with published CVEs is high. + +The limitations are important. Dependabot catches dependency versions with known CVEs. It does not catch malicious packages designed to look like legitimate ones (typosquatting), does not catch compromised legitimate packages (the xz problem), and does not analyse the supply chain of the dependencies themselves — only the direct and transitive dependency versions. + +"Dependabot is extremely good at what it does and that's a real improvement," one security researcher told us. "But the attacks that actually scare me — the patient, sophisticated supply chain compromises — are exactly the ones it doesn't catch, almost by definition." + +## What Enterprises Are Actually Doing + +We spoke with security teams at twelve large enterprises across financial services, technology, and healthcare. The practices they described, in aggregate, suggest a sector that is significantly better than it was in 2020 but still operating below the level that the threat environment warrants. + +**Practices now considered routine** (done by the majority of companies we spoke with): automated dependency scanning in CI/CD pipelines, SBOM generation for shipped software, periodic security reviews of critical open source dependencies, secrets scanning in repositories. + +**Practices being adopted but not yet routine**: SBOM ingestion from suppliers, software composition analysis going beyond CVE scanning to include licence compliance and quality metrics, signing and verification of container images throughout the deployment pipeline. + +**Practices still in early stages**: dependency build reproducibility verification, behavioural analysis of new package versions, automated provenance verification at package install time. + +The remaining gaps centre on the sophistication of the threat model. The tooling that has been deployed broadly addresses the 2015-2019 threat model: known vulnerabilities in known packages. The current threat model includes supply chain attacks against the development and build infrastructure of trusted packages, compromised maintainer accounts, social engineering of tired maintainers (xz), and sophisticated malicious packages designed to evade static analysis. + +## What Would Actually Move the Needle + +The security researchers and practitioners we spoke with converge on several interventions that would have the highest impact: + +**Funded maintenance of critical open source projects.** The xz incident demonstrated that the most dangerous attack vector is not technical but social. Funded, supported maintainers who have colleagues and review processes are significantly harder to compromise than exhausted individuals maintaining projects alone. + +**Default verification in package managers.** Making signature verification mandatory by default — so that installing an unsigned package requires explicit user action — would create a floor of supply chain integrity without requiring developer behaviour change. + +**Build reproducibility at scale.** Reproducible builds — where the same source code and build inputs always produce identical binary outputs — allow independent verification that a distributed binary corresponds to its published source. Tools for achieving reproducibility have improved, but adoption at scale remains incomplete. + +**Faster CVE response infrastructure.** The time between vulnerability discovery and the publication of patches and advisories in machine-readable form remains too long. Streamlining this pipeline would allow the defensive tooling to respond faster. + +The progress in software supply chain security over the past five years is real and should not be minimised. The gap between where we are and where we need to be is also real and should not be minimised. + +--- + +*Raj Patel has covered software security and supply chain issues at TechPulse since 2022. He spoke with twelve enterprise security teams and eight independent security researchers for this article.* diff --git a/techpulse/posts/2025-11-30-ai-agents-enterprise.md b/techpulse/posts/2025-11-30-ai-agents-enterprise.md new file mode 100644 index 0000000..64edcd8 --- /dev/null +++ b/techpulse/posts/2025-11-30-ai-agents-enterprise.md @@ -0,0 +1,75 @@ +--- +title: "AI Agents in the Enterprise: What Actually Works" +created: 2025-11-30 11:00 +author: Maya Osei +keywords: AI agents, enterprise AI, automation, LLM agents, autonomous AI, AI guardrails +description: Case studies from five companies reveal what AI agents are reliably delivering in enterprise settings — and why autonomous decision-making remains out of reach. +--- + +The AI agent narrative has been one of the most persistent stories in enterprise technology for the past two years. Agents — AI systems that can autonomously execute multi-step tasks, use tools, and adapt to unexpected situations — represent the promise of AI that acts rather than just advises. Investors have deployed billions into agent companies. Enterprise technology buyers have run pilots. The results, as of late 2025, are instructive. + +Over three months, TechPulse conducted detailed case study interviews with five companies that have deployed AI agents in production. We also spoke with security, legal, and compliance teams who have been asked to evaluate agent deployments. What we found is a genuine technology making real contributions in specific, bounded use cases, and a gap between that reality and the full autonomy narrative that is wider than most coverage acknowledges. + +## Company One: Financial Services — Document Processing + +A large financial services firm deployed AI agents for initial processing of loan application documents in late 2024. The agent receives a loan application package, extracts structured data from unstructured documents (income statements, employment letters, bank statements), identifies missing documents, and produces a structured summary for human underwriters. + +The results have been positive. Processing time for the document extraction and initial structuring stage has been reduced by approximately 60%. Underwriter time previously spent on document organisation is now spent on actual underwriting decisions. Error rates in data extraction have decreased compared to the manual baseline. + +The key design decision that made this work: the agent operates within a tightly constrained task scope. It extracts and structures data. It does not make lending decisions. It does not send external communications. All outputs are reviewed by a human underwriter before any action is taken. The system failed several times during the pilot — wrong extractions, missed documents, format misinterpretations — but because the human review step was mandatory, none of those failures reached customers. + +"The success comes from keeping the agent in a well-defined box and having a human at the exit of that box," the technology lead told us. "When we tried expanding the scope to include initial underwriting recommendations, the failure rate was unacceptably high and the failures were not always predictable." + +## Company Two: Software Development — Code Review + +An enterprise software company with over 1,000 engineers deployed AI agents for an initial code review pass. The agent reviews pull requests for common issues: potential security vulnerabilities, test coverage gaps, code style violations, and straightforward logic errors. It comments directly on pull requests before human review. + +The outcomes are mixed but net positive. Engineers report that the agent catches approximately 30% of the issues that human reviewers would have caught, which meaningfully reduces the time human reviewers spend on mechanical issues. The agent also catches issues that human reviewers would have missed — it is thorough in a way that humans under time pressure are not. + +The failure mode is false positives. The agent comments on issues that are not actually issues at a rate that engineers find annoying but tolerable. Early versions of the system had a higher false positive rate; prompt engineering and fine-tuning on the company's specific codebase have reduced it to a level that engineers describe as "better than tolerable." + +The limits of the system are clear: it identifies potential issues but the resolution of those issues remains entirely with human engineers. When the agent suggests a fix, engineers review the suggestion carefully and often reject it. The agent's code generation is treated as a starting point, not a trusted output. + +## Company Three: HR — Candidate Screening + +A professional services firm deployed AI agents to help with initial candidate screening for entry-level positions. The agent reviews CVs, identifies candidates that meet basic threshold criteria, and generates a structured assessment of each candidate for human recruiters. + +This deployment has been the most controversial case study. The firm has observed a reduction in screening time per candidate, but they have also had to navigate significant legal and HR concern about AI decision-making in the hiring process. Several jurisdictions have enacted or are considering legislation requiring disclosure when AI is used in hiring decisions. + +The practical adjustment has been to treat the agent's assessment as a search and organisation tool rather than a decision tool. It finds and structures information; it does not recommend hiring or rejection. Human recruiters review every structured assessment before any candidate communication occurs. + +The firm's legal team has been the most sceptical of the deployment. "The AI optimises for patterns in historical data," their legal director noted. "Historical data reflects historical hiring decisions, which have biases. We have invested significant effort in audit frameworks to identify whether the agent is introducing or amplifying bias. We have not found evidence of it, but we have also not had the system in production long enough to have high confidence." + +## Company Four: Customer Support — Tier-One Resolution + +A technology company deployed AI agents to handle initial customer support queries, with the goal of resolving common issues without human intervention and escalating to human agents for complex cases. + +After six months in production, the agent handles 62% of inbound queries without escalation. Customer satisfaction scores for agent-handled queries are lower than for human-handled queries, but within acceptable parameters. Escalation accuracy — the agent's ability to identify which queries need human handling — is the most important metric and has improved significantly from the initial deployment. + +The failure modes are instructive. The agent handles common, well-defined problems (password resets, subscription changes, billing inquiries) very well. It handles novel or ambiguous problems poorly, and it does not reliably recognise when a problem is outside its competence. Early versions of the system would confidently provide incorrect information about product features or policies rather than escalating. This has been addressed through explicit escalation triggers and confidence thresholds, but the company's engineers described ongoing tuning work as "more labour-intensive than expected." + +## Company Five: Legal — Contract Review + +A law firm uses AI agents as a first-pass reviewer for standard commercial contracts. The agent reviews contracts for common issues: missing standard clauses, non-standard terms in key provisions, and potential conflicts with the client's standard positions. + +Lawyers at the firm describe the agent as genuinely useful for speeding up the mechanical review of routine contracts. It does not change how they work on complex, negotiated agreements — it is used for the volume work. + +The guardrails in place are significant: all agent outputs are reviewed by qualified lawyers, outputs are never provided directly to clients, and the firm does not market the AI assistance as part of its service model. "We treat it the way we treat a first-year associate's work," one senior partner said. "Review everything. Trust nothing until you've checked it. Learn to read the failure modes." + +## The Consistent Findings + +Across these five case studies, several consistent findings emerge: + +**Successful deployments are bounded.** Every successful agent deployment we encountered operates within a tightly defined scope with explicit constraints on what the agent can do, what data it can access, and what actions it can take without human approval. + +**Human review is non-negotiable for consequential outputs.** No company we spoke with had removed human review from the path to consequential decisions. The value of agents is in reducing the time humans spend on mechanical aspects of a task, not in removing humans from the loop. + +**Failure modes are not always predictable.** All five deployments experienced failure modes that were not anticipated during the pilot phase. The characteristic of production AI deployment is discovering new failure modes over time, which requires ongoing monitoring and prompt/system adjustment. + +**Autonomous decision-making is where all five companies drew the line.** When we asked each company what they had tried and decided not to deploy, the answers clustered around autonomous decision-making tasks — anything where the agent's output would directly trigger an action without human review. Legal liability, regulatory compliance, and customer trust concerns are cited, but underneath them is a practical concern: no one has confidence in the reliability of autonomous agent decision-making at the level needed for consequential actions. + +The AI agent story is real. It is just a narrower story than the investment narrative suggests. + +--- + +*Maya Osei conducted case study interviews between August and November 2025. Companies are anonymised; descriptions may include minor modifications to protect confidentiality.* diff --git a/techpulse/posts/2025-12-20-developer-predictions-2026.md b/techpulse/posts/2025-12-20-developer-predictions-2026.md new file mode 100644 index 0000000..69bf320 --- /dev/null +++ b/techpulse/posts/2025-12-20-developer-predictions-2026.md @@ -0,0 +1,85 @@ +--- +title: "TechPulse Predictions for 2026: Ten Bets on Developer Technology" +created: 2025-12-20 10:00 +author: Clara Winthorpe +keywords: 2026 predictions, developer technology, AI tools, WebAssembly, open source, programming trends +description: We make ten specific predictions for developer technology in 2026, and look back honestly at how our 2025 predictions fared. +--- + +Every December, TechPulse makes ten specific predictions about the coming year in developer technology. Specific predictions are more useful than vague ones, because specific predictions can be falsified — you can tell at the end of the year whether they were right, partly right, or wrong. The discipline of specificity is also good for thinking: it is easy to say "AI will be important in 2026." It is harder to say something specific, and the hardness is where the thinking happens. + +Before the 2026 predictions, an accounting of our 2025 predictions: + +**2025 retrospective:** + +1. ✓ *Rust in the Linux kernel will exceed 50,000 lines by end of 2025.* Correct. We put it at approximately 67,000 lines as of December. + +2. ✗ *At least one major AI coding assistant will ship an "offline mode" using a locally-run model.* Wrong. Copilot and Cursor both added local model features, but not the fully offline, enterprise-focused product we predicted. + +3. ✓ *The term "open source AI" will be formally disputed in a significant policy context.* Correct. The EU AI Act negotiations produced exactly this dispute. + +4. ✗ *Bun will achieve 20% developer adoption in our annual survey.* Wrong. Bun reached 14% in our December 2025 survey. + +5. ✓ *At least three well-funded AI agent companies from 2023 will shut down or be acqui-hired without significant returns.* Correct. We counted six. + +6. ✓ *Python will remain the most commonly used language in our developer survey.* Correct. + +7. ~ *The SBOM requirement in the EU Cyber Resilience Act will be delayed.* Partly correct — there was a delayed phase-in, but the core requirement launched on schedule. + +8. ✗ *A major cloud provider will launch a WebAssembly-native function runtime to compete with containers.* Wrong. Progress was made but no major commercial launch. + +9. ✓ *GitHub will lose market share in our developer survey's primary VCS platform question.* Correct — from 83% to 79%. + +10. ✓ *Developer burnout rates in our survey will remain above 35%.* Correct. They increased. + +Score: 6 correct, 2 wrong, 2 partial. About what we'd expect from honest, specific predictions. + +--- + +## 2026 Predictions + +**Prediction 1: WebAssembly will ship in production at a top-20 enterprise software company as a primary execution substrate.** + +Not as an experiment. Not as a components layer wrapping existing infrastructure. As the primary way a major production workload runs. The maturity of the tooling, the improvements in WASI P2, and the security benefits are reaching the threshold where a risk-averse enterprise can justify the migration. At least one will make the jump in 2026. + +**Prediction 2: The first major AI coding assistant data breach will occur and change how enterprises procure AI tools.** + +The amount of code that AI coding assistants have access to — including proprietary algorithms, credentials accidentally included in context, and architectural information — is an enormous and underappreciated attack surface. The security practices of AI tool providers are not uniformly rigorous. We predict a significant data incident involving AI coding tool infrastructure will occur in 2026. This will not kill the category but will significantly accelerate enterprise requirements for on-premise or single-tenant deployment options. + +**Prediction 3: Rust adoption in our developer survey will cross 25%.** + +Rust has grown from 13% in 2023 to 19% in 2024 to somewhere around 23% in 2025 (full data in our upcoming survey). The trajectory has been consistent and the reasons for it are structural: a generation of developers who learned Rust in university and through the wave of Rust evangelism in the mid-2020s is reaching positions of influence in engineering organisations. We expect the line to keep moving in 2026. + +**Prediction 4: At least one major open source project will adopt a maintainer certification system backed by a cryptographic identity.** + +The xz incident continues to shape thinking about maintainer vetting. The mechanism that was missing — a way to establish that a new contributor is who they claim to be, with appropriate cryptographic binding — has been discussed extensively but not shipped. We predict that at least one prominent open source project will deploy something in this space in 2026. + +**Prediction 5: The IPO of a developer tools company will be the most successful tech IPO of the year.** + +After the application-layer AI valuation compression, the most credible IPO candidates are companies with real, durable, growing revenue and genuine product-market fit. Developer tools companies meet these criteria better than most. The market has re-learned to value profitability and sustainable growth. A developer tools IPO will benefit from this re-learning. + +**Prediction 6: At least one major programming language will ship an official AI-assisted standard library function generator.** + +The pattern of "AI completes code I'm writing" is being extended to "AI generates idiomatic code for well-defined standard tasks." We expect a major language community — likely Python, Go, or Kotlin — to ship an official tool that generates standard library usage examples or fills common patterns using AI, integrated into the official toolchain. + +**Prediction 7: The term "vibe coding" will not appear in a serious engineering job description.** + +A backlash to the AI-generated code practices that became fashionable in 2024-2025 will produce employer differentiation — companies that want engineers who understand their code will begin explicitly signalling this in hiring requirements. The phrase used informally to describe AI-dependent development will become a negative signal in professional contexts. + +**Prediction 8: The Linux Foundation will announce a funded maintenance programme for at least 50 critical open source packages.** + +The post-xz policy environment has created political will for sustained open source maintenance funding. The Sovereign Tech Fund model will be extended or replicated at larger scale. We expect the Linux Foundation, the most credible vehicle for industry-wide open source funding, to announce a programme funded by major technology companies that provides ongoing payment to maintainers of critical packages. + +**Prediction 9: A native macOS ARM port of a significant enterprise software product that has been Windows-or-Linux-only will ship.** + +The Apple Silicon platform has reached a level of developer market share that makes it untenable for enterprise tools to not support it. We predict at least one major enterprise software product that has historically been Windows-only or primarily Linux-focused will ship a native ARM macOS version. + +**Prediction 10: Our 2026 developer survey will show developer satisfaction with AI tools has declined compared to 2025, even as usage has increased.** + +The pattern of AI tool adoption producing subsequent dissatisfaction — particularly among more experienced developers — is a trend we have been tracking. Usage will increase because the tools are embedded in workflows and expensive to remove. Satisfaction will decline because the costs (reduced code understanding, quality concerns, dependency) have become clearer. Both things will be true simultaneously. + +--- + +Check back in December 2026 for the accounting. + +*Clara Winthorpe is open source and infrastructure editor at TechPulse.* diff --git a/techpulse/posts/2026-01-15-wasm-server.md b/techpulse/posts/2026-01-15-wasm-server.md new file mode 100644 index 0000000..4e030a4 --- /dev/null +++ b/techpulse/posts/2026-01-15-wasm-server.md @@ -0,0 +1,81 @@ +--- +title: "Server-Side WebAssembly Is Finally Ready for Production" +created: 2026-01-15 09:00 +author: Raj Patel +keywords: WebAssembly, WASM server, Spin framework, WASI P2, production, serverless, Fermyon +description: After years of "almost there," server-side WebAssembly has reached production readiness. We have the benchmark data, real company case studies, and an honest assessment of the remaining rough edges. +--- + +In late 2025, something shifted in the server-side WebAssembly ecosystem. Not a single dramatic event, but a convergence: the Spin framework reached v3.0, Fermyon Cloud achieved five-nines uptime for three consecutive quarters, the WASI P2 specification stabilised, and several companies whose names you would recognise began running WebAssembly workloads in production not as experiments but as infrastructure they would have to explain an outage for. + +We have spent the past several weeks benchmarking, building, and talking to the teams that are shipping production WebAssembly. Our conclusion: server-side WebAssembly is ready for production, with meaningful caveats about what "production" means in this context and what the rough edges are. + +## What Has Changed Since 2024 + +The WebAssembly server story has been "almost ready" for several years. What changed in 2025? + +**WASI P2 stabilised and shipped in major runtimes.** The WebAssembly System Interface Preview 2 — the version of the interface standard that includes proper networking, the component model, and a more complete POSIX-like capability set — shipped stable implementations in Wasmtime, Wasmer, and WasmEdge. Previous versions of WASI had significant limitations (most notably, no network sockets) that made them unsuitable for server applications. WASI P2 removes those limitations. + +**Spin v3 simplified the developer experience significantly.** Spin, the developer-friendly WebAssembly application framework from Fermyon, underwent a major revision that made it substantially easier to build production-ready applications. The local development experience — running Spin applications locally with fast iteration, debugging support, and test tooling — reached parity with frameworks like Express or FastAPI in terms of developer ergonomics. + +**Component composition tooling matured.** The `wasm-tools` suite and the `cargo component` toolchain for Rust (the primary language for production WebAssembly) reached stable, reliable releases. Building, composing, and deploying WebAssembly components no longer requires debugging cryptic toolchain errors. + +**Companies started shipping.** Several companies began running real user traffic through WebAssembly components. This is important not just as validation but as a signal: when companies with users depending on their software choose WebAssembly, the ecosystem has reached a level of reliability where that choice is defensible. + +## Benchmark Data + +We ran performance benchmarks comparing server-side WebAssembly (Spin on Wasmtime) against Node.js, Rust Axum, and Python FastAPI for a representative HTTP application: JSON API with a database read. + +**Cold start latency** (time to first response for a newly started instance): +- Spin/WebAssembly: 1.2ms +- Rust Axum: 8ms (process startup) +- Node.js: 145ms +- Python FastAPI: 340ms + +**Throughput** (requests/second, steady state, 100 concurrent connections): +- Rust Axum: 198,000 req/s +- Spin/WebAssembly: 87,000 req/s +- Node.js: 61,000 req/s +- Python FastAPI: 24,000 req/s + +**Memory per instance:** +- Spin/WebAssembly: 12MB baseline +- Node.js: 68MB baseline +- Python FastAPI: 85MB baseline +- Rust Axum: 6MB baseline + +The WebAssembly numbers are impressive, particularly cold start and memory. The throughput is lower than native Rust, which is expected — there is a cost to the safety abstractions and the WebAssembly execution model — but significantly higher than Node.js, which makes it competitive for latency-sensitive applications where the cold start and memory characteristics are advantages. + +## Who Is Running It in Production + +We spoke with teams at three companies running WebAssembly in production: + +**A European developer tools company** migrated their plugin execution sandbox from Node.js to WebAssembly components in Q4 2025. They run untrusted user-provided code (JavaScript and Python scripts) in their product; the security isolation of WebAssembly is the primary motivation. "The sandbox is our most important security boundary," their infrastructure lead told us. "WebAssembly's memory isolation and capability-based security model is significantly stronger than what we could achieve with a process-based sandbox, with a fraction of the overhead." + +**A media company** migrated their content personalisation service — a stateless API that processes requests at the edge — to Spin on Fermyon Cloud. They cited cold start latency and cost as the primary motivations. "We had a Node.js function that was taking 400ms cold start. That was acceptable but not great. The WebAssembly version starts in under 2ms. The cost difference at our scale is material." + +**A security software company** builds a vulnerability scanning tool that needs to run plugins in a sandboxed environment. They use WebAssembly components for plugin isolation. The primary attraction is the combination of security isolation and performance — they can run hundreds of plugin instances concurrently on hardware that would struggle with equivalent process-based isolation. + +## The Remaining Rough Edges + +Production readiness does not mean perfect. The honest picture of server-side WebAssembly in early 2026 includes meaningful rough edges. + +**Language support is still Rust-centric.** Rust has the best WebAssembly toolchain, the best component model support, and the most documentation for production use. Go support has improved but has a larger binary size overhead. Python and JavaScript can run inside WebAssembly via interpreter embedding, but the resulting binaries are larger and the startup advantage narrows. If your team does not write Rust, the production WebAssembly story is less compelling. + +**Debugging is still harder than traditional environments.** WebAssembly debugging has improved significantly with DWARF extensions and improved Wasmtime debugging support, but stepping through a WebAssembly program in a debugger is still more effort than debugging native code. For teams that rely heavily on traditional debugging workflows, this is a meaningful adjustment. + +**The ecosystem of compatible libraries is smaller.** WebAssembly programs cannot use arbitrary native libraries — they need either pure WebAssembly ports or WASM-compatible versions. For most web service use cases the available libraries are sufficient; for workloads that require specialised native code (graphics, certain cryptographic operations, hardware interfaces), you will hit gaps. + +**Persistent storage patterns are still evolving.** Stateless services on WebAssembly are mature. Patterns for working with persistent storage — databases, object stores — from WebAssembly are functional but the abstractions are still evolving, particularly for the component model's approach to resource handles. + +## The Assessment + +Server-side WebAssembly is production-ready for a specific class of applications: stateless or near-stateless services where cold start latency, memory density, and security isolation are important. For teams working in Rust, it is compelling today. For teams in other languages, "almost there" is becoming increasingly accurate, but "there" is not quite here yet. + +The convergence of events in 2025 has moved server-side WebAssembly from "promising experiment" to "credible production option." Teams building new serverless infrastructure or edge computing applications should evaluate it seriously. Teams with existing production systems have no urgent reason to migrate unless their current stack has problems that WebAssembly specifically solves. + +The technology is ready. The ecosystem is maturing. The adoption question now is as much about workflow change and team learning curves as it is about technical readiness. + +--- + +*Benchmark testing conducted January 2026 on dedicated hardware. Production case studies are from anonymous interviews with engineering teams. Raj Patel covers developer tools and infrastructure at TechPulse.* diff --git a/techpulse/posts/2026-02-28-open-source-llm-2026.md b/techpulse/posts/2026-02-28-open-source-llm-2026.md new file mode 100644 index 0000000..9bfef67 --- /dev/null +++ b/techpulse/posts/2026-02-28-open-source-llm-2026.md @@ -0,0 +1,105 @@ +--- +title: "The State of Open Source LLMs: A 2026 Benchmark" +created: 2026-02-28 14:00 +author: Raj Patel +keywords: open source LLMs, LLM benchmarks, Llama, Mistral, AI models 2026, inference, enterprise AI +description: We benchmarked 12 open-weight language models across reasoning, generation, cost, and deployment complexity. Here is the honest 2026 state of play. +--- + +Every few months the open source LLM landscape shifts dramatically enough to warrant reassessment. A model that was state-of-the-art in October may be mediocre by March. The benchmarking work is genuinely useful because the improvement trajectory is steep and the specific rankings matter for real deployment decisions. + +We spent six weeks running a comprehensive benchmark suite against 12 open-weight models, with a focus on the criteria that matter for production deployment: reasoning capability, text generation quality, cost per token at production scale, deployment complexity, and enterprise readiness characteristics (compliance, data governance, predictable behaviour). + +## The Models Tested + +We tested: Llama 3.3 70B, Llama 3.3 8B, Mistral Large 2, Mistral 7B, Gemma 3 27B, Gemma 3 7B, DeepSeek R2 Distill, Qwen 2.5 72B, Phi-4, Falcon 3 40B, OLMo 2 7B, and Command R+. + +All tests used quantized models where applicable to model realistic deployment scenarios. Inference was run on A100 80GB GPUs; cost calculations are based on spot instance pricing on AWS and GCP as of February 2026. + +## Reasoning Performance + +For reasoning tasks — mathematical problem solving, logical deduction, multi-step code debugging, structured analysis — the rankings are more differentiated than for simple generation tasks: + +**Tier 1 (competitive with GPT-4 class reasoning on many tasks):** +- Llama 3.3 70B: Exceptionally strong reasoning for its class. On MATH-500 (competition-level math), scored 73.2%. +- DeepSeek R2 Distill: The standout in this benchmark cycle. Achieves Llama 3.3 70B-level reasoning at 7B parameters by distilling reasoning traces from a larger teacher model. MATH-500: 71.8%. +- Qwen 2.5 72B: Strong mathematical reasoning, particularly notable for Chinese language tasks. MATH-500: 74.1%. + +**Tier 2 (solid reasoning, notable tradeoffs):** +- Mistral Large 2: Reliable but not exceptional on pure reasoning. Better on instruction following than logic problems. +- Gemma 3 27B: Better than its size class on reasoning, competitive with Llama 70B on many tasks at lower computational cost. + +**Tier 3 (limited to simpler reasoning tasks):** +- All 7B class models except DeepSeek R2 Distill: adequate for simple multi-step problems, unreliable on competition-level math or complex code debugging. + +## Text Generation Quality + +For text generation — writing, summarisation, translation, following complex instructions — the quality gap between models is smaller than the reasoning gap: + +Llama 3.3 70B, Mistral Large 2, and Qwen 2.5 72B are all competitive with each other on standard generation tasks. The differentiation comes from style: Llama 3.3 tends toward somewhat more formal outputs; Mistral Large 2 is notably strong on structured outputs (JSON, formatted reports); Qwen 2.5 72B has superior multilingual performance. + +The 7B class models (Mistral 7B, Gemma 3 7B, Llama 3.3 8B) perform surprisingly well on simple generation tasks — blog posts, email drafts, basic summarisation — where the gap with larger models is less apparent to non-expert evaluators. + +## Cost Per Token at Production Scale + +Cost per 1 million tokens (combined input + output, assuming 70% input / 30% output ratio, at AWS spot prices): + +- Llama 3.3 8B (quantized): $0.08 +- Mistral 7B (quantized): $0.09 +- DeepSeek R2 Distill (quantized): $0.11 +- Gemma 3 7B: $0.10 +- Phi-4 (14B): $0.18 +- Gemma 3 27B: $0.29 +- Llama 3.3 70B (quantized): $0.51 +- Mistral Large 2: $0.67 +- Qwen 2.5 72B: $0.55 +- Falcon 3 40B: $0.38 + +For context, GPT-4o API pricing from OpenAI is approximately $5-10 per million tokens depending on the tier, and Claude's API is similar. The cost advantage of self-hosted open models ranges from 7x to 70x depending on the model choice and the commercial API being compared. + +This cost differential is the primary driver of enterprise adoption of open-weight models. For high-volume inference — customer support, document processing, code completion at scale — the economics of self-hosting are compelling even accounting for infrastructure and operational costs. + +## Deployment Complexity + +Not all models are equally easy to deploy. We assessed each model on the effort required to go from "we want to run this in production" to "it is running in production": + +**Straightforward deployment (3 days or less for a team with basic MLOps knowledge):** +- Llama 3.3 8B: Excellent documentation, multiple inference server options (vLLM, Ollama, llama.cpp), quantized versions widely available. +- Mistral 7B: Similar to Llama; extensive community support. +- Phi-4: Microsoft's documentation and tooling are good; easy to deploy via Azure ML or self-hosted. + +**Moderate deployment complexity (1-2 weeks):** +- Llama 3.3 70B: Requires multi-GPU setup (2x A100 minimum for quantized). Documentation is good but hardware requirements add complexity. +- Gemma 3 27B: Google's tooling is less mature than Meta's for self-hosting; requires more custom configuration. +- Mistral Large 2: Large model (123B parameters); requires significant hardware and careful quantization setup. + +**More complex (potentially weeks with team knowledge gaps):** +- DeepSeek R2 Distill: Chinese documentation is primary; English translation available but sometimes lags. +- Falcon 3 40B: Less community support than Meta and Google models; fewer ready-made tooling options. +- OLMo 2: Most "truly open" model (training data included), but less polished deployment experience. + +## Enterprise Readiness Assessment + +For enterprises evaluating open-weight models, several characteristics beyond benchmark scores matter: + +**Predictability of outputs** — models that reliably follow structured output instructions, maintain consistent persona, and do not produce unexpected outputs under edge cases. Llama 3.3 and Mistral models score well here. + +**Safety and compliance** — whether the model has meaningful safety tuning and whether it respects access restrictions (system prompts preventing specific outputs). This is an area where all open models lag proprietary models, which have had more investment in RLHF and safety fine-tuning. + +**Data residency and governance** — the primary reason enterprises adopt open-weight models. Self-hosted models offer complete data governance; API-based models do not. + +**Fine-tuning support** — the ability to fine-tune the model on proprietary data to improve domain performance. All tested models support LoRA/QLoRA fine-tuning. + +## The Bottom Line + +The open-weight LLM landscape in February 2026 is substantially better than it was a year ago. The quality gap between open models and frontier proprietary models has narrowed for most common tasks. For enterprises with specific use cases — document processing, domain-specific generation, code assistance — fine-tuned open models often meet or exceed the performance of general-purpose proprietary models at a fraction of the cost. + +For the hardest reasoning tasks — complex mathematical problem solving, multi-step logical deduction, research synthesis — frontier proprietary models still lead. The gap is narrowing but has not closed. + +The practical recommendation depends on use case. For high-volume, domain-specific tasks where cost matters and data governance is required: deploy Llama 3.3 70B or Qwen 2.5 72B with appropriate hardware, invest in fine-tuning. For reasoning-intensive tasks where quality is paramount: the commercial frontier models remain the better choice, with the cost premium justified by capability. + +The bifurcated market is here to stay. + +--- + +*Benchmarks conducted February 2026. Full benchmark methodology and raw data available to TechPulse subscribers.* diff --git a/techpulse/posts/2026-03-20-platform-consolidation.md b/techpulse/posts/2026-03-20-platform-consolidation.md new file mode 100644 index 0000000..653099e --- /dev/null +++ b/techpulse/posts/2026-03-20-platform-consolidation.md @@ -0,0 +1,69 @@ +--- +title: "Developer Platform Consolidation: The End of the Microtools Era?" +created: 2026-03-20 10:30 +author: Maya Osei +keywords: developer platforms, GitHub, GitLab, Vercel, AWS, platform consolidation, developer tools market +description: Our developer survey data shows accelerating consolidation in developer tooling. We examine who is winning, who is losing, and what developers actually want from the platforms they use. +--- + +The developer tools market of 2018-2022 was defined by proliferation. The "best-of-breed" philosophy argued that developers should assemble their own tool stacks from specialised components, each class of problem addressed by a dedicated product with deep expertise in that specific area. Version control was GitHub. CI/CD was CircleCI or GitHub Actions. Deployment was Netlify or Vercel. Observability was Datadog. Alerting was PagerDuty. Error tracking was Sentry. Security scanning was Snyk. Each category had a leader, a challenger, and several hopeful entrants. + +The data from our 2026 developer survey suggests that this era is ending — or at least maturing into something different. Consolidation is accelerating. Developers are increasingly using fewer platforms that do more things, and the buying patterns of engineering organisations are shifting toward platforms over point solutions. + +## The GitHub/GitLab/Gitea Market Structure + +In our 2026 survey, 79% of respondents use GitHub as their primary VCS platform, down from 83% in 2025 and 86% in 2023. The decline is gradual but consistent. GitLab's share has held roughly flat at 14-15%. Gitea/Forgejo (the open source self-hosted option) has grown from 2% to 5% over the same period, driven primarily by organisations with compliance requirements or political preferences for self-hosted infrastructure. + +The GitHub decline deserves nuance. It has not lost users in absolute terms — the overall developer population is growing, and GitHub continues to grow. What the survey data suggests is that the category that was once essentially synonymous with GitHub now has genuine competition that a meaningful minority of developers prefers. + +The primary driver of GitHub share loss is bundling competition from GitLab and, to a lesser extent, Bitbucket and Gitea. GitLab offers a deeply integrated platform that includes CI/CD, security scanning, container registry, and deployment pipelines without requiring external integrations. For teams that prefer one vendor to many, GitLab's value proposition has strengthened as GitHub's own integrated offerings (Actions, Advanced Security, Copilot) have grown more expensive and complex. + +"We evaluated GitHub versus GitLab last year," one engineering director told us. "GitHub has better individual components — the developer experience on GitHub.com is better. GitLab has better total integration. For our compliance requirements, the single platform story won." + +## Vercel vs. AWS: The Frontend-to-Fullstack Spectrum + +Vercel's growth story over the past five years has been one of the most impressive in developer tools. They built a deployment platform for frontend developers that was substantially simpler to use than AWS, priced accessibly, and built out an ecosystem around Next.js (which they also develop) that created significant switching costs. + +In our 2026 survey, 28% of respondents who deploy web applications use Vercel for at least some production workloads. That is a real number. But Vercel's challenge — and it is one they have been working on publicly — is that their pricing scales badly with production usage. The companies we spoke with that have had the most mixed experiences with Vercel describe the same pattern: excellent for small and medium scale, expensive at large scale, and constrained in the degree of infrastructure control available. + +AWS's position is different: it is the default for enterprises and for workloads that require flexibility or scale that managed platforms cannot provide. AWS's developer experience has improved substantially with the growth of services like App Runner, Lambda, and the managed database services, but it remains complex compared to Vercel or Fly.io for straightforward web application deployment. + +The market has not converged on a single answer because the answer is use-case dependent. Vercel is optimal for frontend-heavy applications with predictable traffic patterns. AWS is optimal for complex backend infrastructure with unusual requirements. The developers who are most frustrated are those whose requirements sit in the middle — fullstack applications with backend complexity — and who have to either accept Vercel's constraints or pay AWS's complexity tax. + +## What Developers Actually Want + +We asked our survey respondents to rank the most important characteristics of their primary development platform. The results: + +1. **Reliability** (selected by 81% as among their top three): The number-one concern is whether the platform is up when they need it. Developers have been burned by platform outages and have long memories. + +2. **Pricing predictability** (73%): The second most important concern is whether they can predict what they will pay. Services with usage-based pricing that scales unpredictably are increasingly disliked; several respondents mentioned specific incidents where a traffic spike produced a bill they did not expect. + +3. **DX quality** (69%): Developer experience — documentation quality, CLI ergonomics, local development experience — is the third most important factor. This is where newer platforms like Vercel, Fly.io, and Railway have competed successfully with AWS and GCP. + +4. **Integration breadth** (58%): The ability to integrate with the other tools in the developer's stack. + +5. **Vendor independence** (44%): A meaningful minority prioritise platforms that they are not locked into — open source platforms, standards-compliant APIs, or platforms with strong export capabilities. + +The gap between reliability/predictability (top two) and DX quality (third) is significant. Developers want platforms that work reliably and predictably first; they want them to be pleasant to use second. The implication for newer platforms is that their DX advantage, while real, is not sufficient to win against incumbents that are significantly more reliable. + +## Winners and Losers in the Consolidation + +The consolidation dynamic creates winners and losers that do not always map to the quality of the product. + +**Winners:** GitHub (despite losing market share, it is consolidating CI/CD, security, and AI tooling on a single platform); AWS (as complexity increases, enterprises default to what they know); GitLab (the integrated platform story is genuinely stronger than it was); Datadog (observability consolidation is ongoing). + +**Losers:** Independent CI/CD services (CircleCI, Buildkite, Jenkins are losing to GitHub Actions and GitLab CI for new workloads); individual open source security scanners (being replaced by bundled GitHub Advanced Security or GitLab security features); niche project management tools (being absorbed by Linear and similar tools that have expanded scope). + +**Uncertain:** Vercel (the platform is genuinely excellent but the pricing model is a persistent concern that limits growth in the enterprise segment); Fly.io (strong developer love but the reliability history creates enterprise hesitation); Railway (growing fast in small company segment but limited data on enterprise trajectory). + +## What It Means for Developers + +The consolidation trend has real implications for individual developers. Fewer platform choices means less ability to mix-and-match the best tool for each job, but also less cognitive load from maintaining many integrations. The best-of-breed era's promise — that you could assemble an optimal tool stack — was real, but so was its cost: every integration is a surface area for failure, every API key is an operational concern. + +The developers who benefit most from consolidation are those who want to minimise time spent on tooling and maximise time spent on product work. The developers who lose are those with highly specific needs that the consolidated platforms do not address well. + +Neither outcome is obviously better. It is, as ever, a tradeoff. + +--- + +*Maya Osei covers developer platform economics at TechPulse. Survey data from the TechPulse 2026 Developer Survey (n=4,200). Company interviews conducted February-March 2026.* diff --git a/techpulse/posts/2026-05-01-developer-survey-2026.md b/techpulse/posts/2026-05-01-developer-survey-2026.md new file mode 100644 index 0000000..5c1465f --- /dev/null +++ b/techpulse/posts/2026-05-01-developer-survey-2026.md @@ -0,0 +1,106 @@ +--- +title: "TechPulse Developer Survey 2026: AI Has Won, But Developers Have Mixed Feelings" +created: 2026-05-01 09:00 +author: Maya Osei +keywords: developer survey 2026, AI tools, deskilling, developer productivity, programming languages, AI adoption +description: Our 2026 survey of 4,200 developers shows 78% use AI tools daily. The adoption is near-universal among younger developers — but the concerns about deskilling, quality, and dependency are louder than ever. +--- + +The 2026 TechPulse Developer Survey — our fifth annual — is our largest yet: 4,200 respondents from 73 countries, with balanced representation across company sizes, experience levels, and industries. We ran the survey for four weeks in March and April 2026. + +The headline is stark: AI coding tools have reached near-ubiquitous adoption. 78% of respondents use AI coding tools daily, up from 73% in 2024. Among developers with five or fewer years of experience, the number is 91%. Among developers with fifteen or more years of experience, it is 61%. + +The experience gap is telling. Younger developers have grown up with these tools and find it hard to imagine working without them. Experienced developers are more likely to use them selectively and more likely to express ambivalence about what they are doing to the field. + +## The Adoption Numbers + +**Daily AI tool usage by experience:** +- 0-2 years: 94% +- 3-5 years: 89% +- 6-10 years: 79% +- 11-15 years: 67% +- 16+ years: 61% + +**Primary AI tool used:** +- GitHub Copilot (individual or enterprise): 38% +- Cursor: 29% +- JetBrains AI Assistant: 14% +- Claude in IDE/API: 8% +- Custom/self-hosted LLM integration: 11% +- Other: 7% +(Multiple selection allowed) + +Cursor's rise is the most significant tool market shift in the past two years. It now rivals GitHub Copilot for daily use, having grown from 22% in 2024. The growth is driven by the AI-native editor's deeper integration: Cursor provides contextual awareness of entire codebases, not just the current file, and its "ask" features allow natural language interaction with the codebase at a level that Copilot's autocomplete model does not match. + +## Satisfaction vs. Usage: The Widening Gap + +The most striking finding in the 2026 survey is the divergence between usage rates and satisfaction rates — a gap we first noticed emerging in 2025 data and that has widened significantly. + +**Usage vs. satisfaction for AI tools:** +- Daily users: 78% +- "Very satisfied" with AI tools: 31% +- "Somewhat satisfied": 44% +- "Mixed feelings": 18% +- "Primarily dissatisfied": 7% + +"Mixed feelings" increased from 13% in 2025 to 18% in 2026. "Very satisfied" decreased from 38% to 31%. The primary satisfaction driver is productivity on certain tasks (very high satisfaction); the primary dissatisfaction driver is quality concerns and the accumulating sense of not understanding one's own code (increasing strongly). + +## Concerns About Deskilling + +For the first time, we included specific questions about deskilling — the concern that relying on AI tools may be degrading underlying programming skills. The responses were striking. + +**"Using AI coding tools has reduced my ability to write code without them":** +- Strongly agree: 14% +- Somewhat agree: 31% +- Neither agree nor disagree: 22% +- Somewhat disagree: 24% +- Strongly disagree: 9% + +45% of respondents agree to some degree that AI tools have reduced their ability to work without them. This is a form of dependency that most productivity tools do not produce — using a calculator does not impair arithmetic ability, but many respondents believe AI coding assistance has impaired their coding ability. + +The agreement rate is highest among developers with 3-7 years of experience — the group that adopted AI tools early in their career and has now had them long enough to notice an effect. It is lowest among senior developers who adopted them selectively and latest. + +Several open response comments crystallise the concern: + +*"I started my career writing everything by hand. I can still do it if I must. My colleagues who started two years ago struggle to write a for loop without autocomplete. That is a problem I am not sure the industry is taking seriously."* — Software engineer, 14 years experience + +*"I caught myself googling the syntax for something I should know by heart. I used to know it by heart. Copilot has been doing it for me for 18 months and I've forgotten."* — Backend developer, 6 years experience + +*"I don't think I'm getting deskilled. I think I'm getting reskilled — the skills that matter are changing. Understanding code, architecture, and debugging still require the same skills as before. AI does the mechanical writing part. That seems fine to me."* — Principal engineer, 11 years experience + +## Salary Impacts + +We asked a new question in 2026: whether respondents believe AI tools have affected their salary leverage. The results are complicated. + +**Effect of AI tools on salary leverage:** +- Increased leverage (AI makes me more valuable): 29% +- No effect on salary leverage: 43% +- Decreased leverage (AI reduces the premium on individual skills): 28% + +A near-even split between "makes me more valuable" and "makes me less valuable." The interpretation of respondents who feel AI increases their leverage: AI multiplies the output of skilled developers, making experienced developers who can use AI effectively more valuable. The interpretation of respondents who feel AI decreases their leverage: AI reduces the value of junior programming work, compresses salaries for entry-level positions, and is beginning to reduce the perceived value of individual technical skill. + +Both effects are probably real for different segments of the market. Senior developers who use AI effectively to multiply their output are commanding premiums. Entry-level programming positions are reported by respondents to be harder to find and less well-compensated than two years ago. + +## What Developers Wish AI Could Do Better + +We asked respondents what they wish AI coding tools did better. The top five responses (ranked by frequency): + +1. **Better understanding of the existing codebase context** (cited by 64%): The frustration that AI suggestions do not account for project-specific patterns, architecture decisions, and constraints. + +2. **More honest about uncertainty** (51%): The confident-but-wrong failure mode. Respondents want AI tools that say "I'm not sure" rather than generating plausible-sounding incorrect code. + +3. **Better debugging assistance** (48%): The gap between AI's ability to generate code and its ability to diagnose problems in existing code remains frustrating. + +4. **Security awareness** (38%): Respondents want AI tools that flag security concerns while generating code, rather than producing code that passes tests but has security issues. + +5. **Better handling of legacy code** (35%): AI tools trained primarily on modern, idiomatic code struggle with legacy codebases, which is where many professional developers spend most of their time. + +## The Burnout Picture + +Burnout rates remain elevated: 39% of respondents describe significant burnout in the past 12 months, a slight increase from 38% in 2024. The causes have shifted: concern about AI's impact on job security has entered the top five causes of developer stress for the first time, alongside the previously dominant factors of meeting load, understaffing, on-call stress, and technical debt pressure. + +The developers least likely to report burnout are those who feel they have agency over their AI tool usage — who use them selectively, understand their outputs, and maintain skills that do not depend on them. The developers most likely to report burnout related to AI are those who feel pressure to adopt tools they have concerns about, and those who feel the pace of AI-driven change in the field is making their existing skills obsolete faster than they can adapt. + +--- + +*Full methodology, raw data, and cross-tabulations available to TechPulse subscribers. Survey conducted March 17 – April 14, 2026. n=4,200 qualified respondents.*