I am approaching AI with caution. Shiny things don't generally excite me.
Just this week I installed cursor, the AI-assisted VSCode-like IDE. I am working on a side project and decided to give it a try.
I am blown away.
I can describe the feature I want built, and it generates changes and additions that get me 90% there, within 15 or so seconds. I take those changes, and carefully review them, as if I was doing a code review of a super-junior programmer. Sometimes when I don't like the approach it took, I ask it to change the code, and it obliges and returns something closer to my vision.
Finally, once it is implemented, I manually test the new functionality. Afterward, I ask it to generated a set of automated test cases. Again, I review them carefully, both from the perspective of correctness, and suitability. It over-tests on things that don't matter and I throw away a part of the code it generates. What stays behind is on-point.
It has sped up my ability to write software and tests tremendously. Since I know what I want , I can describe it well. It generates code quickly, and I can spend my time revieweing and correcting. I don't need to type as much. It turns my abstract ideas into reasonably decent code in record time.
Another example. I wanted to instrument my app with Posthog events. First, I went through the code and added "# TODO add Posthog event" in all the places I wanted to record events. Next, I asked cursor to add the instrumentation code in those places. With some manual copy-and pasting and lots of small edits, I instrumented a small app in <10 minutes.
We are at the point where AI writes code for us and we can blindly accept it. We are at a point where AI can take care of a lot of the dreary busy typing work.
I sincerely worry about a future when most people act in this same manner.
You have - for now - sufficient experience and understanding to be able to review the AI's code and decide if it was doing what you wanted it to. But what about when you've spent months just blindly accepting" what the AI tells you? Are you going to be familiar enough with the project anymore to catch its little mistakes? Or worse, what about the new generation of coders who are growing up with these tools, who NEVER had the expertise required to be able to evaluate AI-generated code, because they never had to learn it, never had to truly internalize it?
In the article, I posit a less than glowing experience with coding tools than you've had, it sounds like, but I'm also envisioning a more complex use case, like when you need to get into the meat of some you-specific business logic it hasn't seen, not common code it's been exposed to thousands of times, because that's where it tends to fall apart the most, and in ways that are hard to detect and with serious consequences. If you haven't run into that yet, I'd be interested to know if you do some day. (And also to know if you don't, though, to be honest! Strong opinions, loosely held, and all that.)
If we keep at this LLM-does-all-out-hard-work for us, we’re going to end up with some kind of Warhammer 40k tech-priest-blessing-the-magic-machines level of understanding, where nobody actually understands anything, and we’re technologically stunted, but hey at least we don’t have the warp to contend with and some shareholders got rich at our expense.
You and I seem to live in very different worlds. The one I live and work in is full of over confident devs that have no actual IT education and mostly just copy and modify what they find on the internet. The average level of IT people I see daily is down right shocking and I'm quite confident that OP's workflow might be better for these people in the long run.
It's going to be very funny in the next few years when Accenture et al charge the government billions for a simple Java crud website thing that's entirely GPT-generated, and it'll still take 3 years and not be functional. Ironically, it'll be of better quality then they'd deliver otherwise.
> The one I live and work in is full of over confident devs that have no actual IT education and mostly just copy and modify what they find on the internet.
Too many get into the field solely due to promises of large paychecks, not due to the intellectual curiosity that drives real devs.
I actually do think this is a legitimate concern, but at the same time I feel like when higher-level languages were introduced people likely experienced a similar dilemma: you just let the compiler generate the code for you without actually knowing what you're running on the CPU?
Definitely something to tread carefully with, but it's also likely an inevitable aspect of progressing software development capabilities.
Place and routing compilers used in semiconductor design are not. Ironically, simulated annealing is the typical mechanism and is by any appropriate definition, imo, a type of AI.
Whatever you do in your life using devices that run software are proof that these tools are effective for continuing to scale complexity.
Annoying to use also ;)
I take it you haven't seen the world of HTML cleaners [1]?
The concept of glueing together text until it has the correct appearance isn't new to software. The scale at which it's happening is certainly increasing but we already had plenty of problems from the existing system. Kansas certainly didn't develop their website [2] using an LLM.
IMO, the real problem with software is the lack of a warranty. It really shouldn't matter how the software is made just the qualities it has. But without a warranty it does matter because how its made affects the qualities it has and you want the software to actually work even if it's not promised to.
> I take it you haven't seen the world of HTML cleaners [1]?
Are you seriously comparing deterministic code formatters to nondeterministic LLMs? This isn't just a change of scale because it is qualitatively different.
> Kansas certainly didn't develop their website [2] using an LLM.
Just because the software industry has a problem with incompetence doesn't mean we should be reaching for a tool that regularly hallucinates nonsense.
> IMO, the real problem with software is the lack of a warranty.
You will never get a warranty from an LLM because it is inherently nondeterministic. This is actually a fantastic argument _not_ to use LLMs for anything important including generating program text for software.
> It really shouldn't matter how the software is made
It does matter regardless of warranty or the qualities of the software because programs ought to be written to be read by humans first and machines second if you care about maintaining them. Until we create a tool that actually understands things, we will have to grapple with the problem of maintaining software that is written and read by humans.
This seems a little silly to me. It was already possible for a script kiddie to kludge together something they didn’t understand —- copying code snippets from stack overflow, etc. And yet, developers continue to write finely crafted code that they understand at depth. Just because we’ve made this process easier for the script kiddies, doesn’t prevent experts from existing and the market from realizing these experts are necessary to a well run software business.
nothing prevents you from asking an LLM to explain a snippet of code. And then ask it to explain deeper. And then finally doing some quick googling to validate the answers seem correct.
Blindly accepting code used to happen all the time, people copy pasted from stack overflow.
Yes, but copy/paste from stack overflow was a meme that was discouraged. Now we've got people proudly proclaiming they haven't written a line of code in months because AI does everything for them.
>And then finally doing some quick googling to validate the answers seem correct.
There will come a time when there won't be anyone writing information to check against. It'll be AI all the way down. Or at least it will be difficult to discern what's AI or what isn't.
And this is the major problem. People will blindly trust the output of AI because it appears to be amazing, this is how mistakes slip in. It might not be a big deal with the app you're working on, but in a banking app or medical equipment this can have a huge impact.
I feel like I’m being gaslit about these AI code tools. I’ve got the paid copilot through work and I’ve just about never had it do anything useful ever.
I’m working on a reasonably large rails app and it can’t seem to answer any questions about anything, or even auto fill the names of methods defined in the app. Instead it just makes up names that seem plausible. It’s literally worse than the built in auto suggestions of vs code, because at least those are confirmed to be real names from the code.
Maybe these tools work well on a blank project where you are building basic login forms or something. But certainly not on an established code base.
I'm in the same boat. I've tried a few of these tools and the output's generally been terrible to useless big and small. It's made up plausible-sounding but non-existent methods on the popular framework we use, something which it should have plenty of context and examples on.
Dealing with the output is about the same as dealing with a code review for an extremely junior employee... who didn't even run and verify their code was functional before sending it for a code review.
Except here's the problem. Even for intermediate developers, I'm essentially always in a situation where the process of explaining the problem, providing feedback on a potential solution, answering questions, reviewing code and providing feedback, etc takes more time out of my day than it would for me to just _write the damn code myself_.
And it's much more difficult for me to explain the solution in English than in code--I basically already have the code in my head, now I'm going through a translation step to turn it into English.
All adding AI has done is taking the part of my job that is "think about problem, come up with solution, type code in" and make it into something with way more steps, all of which are lossy as far as translating my original intent to working code.
I get we all have different experiences and all that, but as I said... same boat. From _my_ experiences this is so far from useful that hearing people rant and rave about the productivity gains makes me feel like an insane person. I can't even _fathom_ how this would be helpful. How can I not be seeing it?
The biggest lie in all of LLMs is that they’ll work out of the box and you don’t need to take time to learn them.
I find Copilot autocomplete invaluable as a productivity boost, but that’s because I’ve now spent over two years learning how to best use it!
“And it's much more difficult for me to explain the solution in English than in code--I basically already have the code in my head, now I'm going through a translation step to turn it into English.”
If that’s the case, don’t prompt them in English. Prompt them in code (or pseudo-code) and get them to turn that into code that’s more likely to be finished and working.
I do that all the time: many of my LLM prompts are the signature of a function or a half-written piece of code where I add “finish this” at the end.
You bring up a good point! These tools are useless if you can't prompt them effectively.
I am decent at explaining what I want in English. I have coded and managed developers for long enough to include tips on how I want something implemented. So far, I am nothing short of amazed. The tools are nowhere near perfect, but they do provide a non-trivial boost in my productivity. I feel like I did when I first used an IDE.
> Except here's the problem. Even for intermediate developers, I'm essentially always in a situation where the process of explaining the problem, providing feedback on a potential solution, answering questions, reviewing code and providing feedback, etc takes more time out of my day than it would for me to just _write the damn code myself_.
Exactly. And I’ve been telling myself „keep doing that, it lets them teach, otherwise they will never level up and be able to comfortably and reliably work on this codebase without much hand holding. This will pay off”. Which I still think is true to a degree, although less so with every year.
At least with the humans I work with it’s _possible_ and I can occasionally find some evidence that it _could_ be true to hang on to. I’m expending extra effort, but I’m helping another human being and _maybe_ eventually making my own life easier.
What’s the payoff for doing this with an LLM? Even if it can learn, why not let someone else do it and try again next year and see if it’s leveled up yet?
For me, AI is super helpful with one-off scripts, which I happen to write quite often when doing research. Just yesterday, I had to check my assumptions are true about a certain aspect of our live system and all I had was a large file which had to be parsed. I asked ChatGPT to write a script which parses the data and presents it in a certain way. I don't trust ChatGPT 100%, so I reviewed the script and checked it returned correct outputs on a subset of data. It's something which I'd do to the script anyway if I wrote it myself, but it saved me like 20 minutes of typing and debugging the code. I was in a hurry because we had an incident that had to be resolved as soon as possible. I haven't tried it on proper codebases (and I think it's just not possible at this moment) but for quick scripts which automate research in an ad hoc manner, it's been super useful for me.
Another case is prototyping. A few weeks ago I made a prototype to show to the stakeholders, and it was generally way faster than if I wrote it myself.
It’s writing most of my code now. Even if it’s existing code you can feed in the 1-2 files in question and iterate on them. Works quite well as long as you break it down a bit.
It’s not gas lighting the latest versions of GPT, Claude, Lama have gotten quite good
These tools must be absolutely massively better than whatever Microsoft has then because I’ve found that GitHub copilot provides negative value, I’d be more productive just turning it off rather than auditing it’s incorrect answers hoping one day it’s as good as people market it as.
> These tools must be absolutely massively better than whatever Microsoft has then
I haven't used anything from Microsoft (including Copilot) so not sure how it compares, but compared to any local model I've been able to load, and various other remote 3rd party ones (like Claude), no one comes near to GPT4 from OpenAI, especially for coding. Maybe give that a try if you can.
It still produces overly verbose code and doesn't really think about structure well (kind of like a junior programmer), but with good prompting you can kind of address that somewhat.
Probably these services are so tuned (not as in "fine-tuned" ML style) to each individual user that it's hard to get any sort of collective sense of what works and what doesn't. Not having any transparency what so ever into how they tune the model for individual users doesn't help either.
My employer blocks ChatGPT at work and we are forced to use Copilot. It's trash. I use Google docs to communicate with GPT on my personal device. GPT is so much better. Copilot reminds me of GPT3. Plausible, but wrong all the time. GPT 4o and o1 are pretty much bang on most of the time.
My experience is anecdotal, based on a sample size of one. I'm not writing to convince, but to share. Please take a look at my resume to see my background, so you can weight what I write.
I tried cursor because a technically-minded product manager colleague of mine managed to build a damned solid MVP of an AI chat agent with it. He is not a programmer, but knows enough to kick the can until things work. I figured if it worked for him, I might invest an hour of my time to check it out.
I went in with a time-boxed one hour time to install cursor and implement a single trivial feature. My app is not very sophisticated - mostly a bunch of setup flows and CRUD. However, there are some non-trivial things which I would expect to have documented in a wiki if I was building this with a team.
Cursor did really well. It generated code that was close to working. It figured out those not-obvious bits as well and the changes it made kept them in mind. This is something I would not expect from a junior dev, had I not explained those cross-dependencies to them (mostly keeping state synchronized according to business rule across different entities).
It did a poor job of applying those changes to my files. It would not add the code it generated in the right places and mess things up along the way. I felt I was wrestling with it a but too much to my liking. But once I figured this out I started hand-applying it's changes and reviewing them as I incorporated them into my code. This workflow was beautiful.
It was as if I sent a one paragraph description of the change I want, and received a text file with code snippets and instructions where to apply them.
I ended up spending four hours with cursor and giving it more and more sophisticated changes and larger features to implement. This is the first AI tool I tried where I gave it access to my codebase. I picked cursor because I've heard mixed reviews about others, and my time is valuable. It did not disappoint.
I can imagine it will trip up on a larger codebase. These tools are really young still. I don't know about other AI tools, and am planning on giving them a whirl in the near future.
That sounds almost like the complete opposite of my experience and I'm also working in a big Rails app. I wonder how our experiences can be so diametrically different.
What kind of things are you using it for? I’ve tried asking it things about the app and it only gives me generic answers that could apply to any app. I’ve tried asking it why certain things changed after a rails update and it gives me generic troubleshooting advice that could apply to anything. I’ve tried getting it to generate tests and it makes up names for things or generally gets it wrong.
OP here. I am explicitly NOT blindly trusting the output of the AI. I am treating it as a suspicious set of code written by an inexperienced developer. Doing full code review on it.
What you are saying will occasionally happen, but mistakes already happen today.
Standards for quality, client expectations, competition for market share, all those are not going to go down just because there's a new tool that helps in creating software.
New tools bring with them new ways to make errors, it's always been that way and the world hasn't ended yet...
I was in the newspaper field a year or two before desktop publishing took off, then a few years into that evolution. Rooms full of people and Linotype/Compugraphic equipment were replaced by one Mac and a printer.
I shot film cameras for years, and we had a darkroom, darkroom staff, and a film/proofsheet/print workflow. One digital camera later and that was all gone.
Before me publications were produced with hot lead.
Not much more than reviewing the code of any average dev who doesn't bother doing their due diligence. At least with an AI I immediately get an answer with "Oh yes, you're right, sorry for the oversight" and a fix. Instead of some bullshit explanation to try to convince me that their crappy code is following the specs and has no issues.
That said, I'm deeply saddened by the fact that I won't be passing on a craft I spent two decades refining.
I think there are two types of developers: those who are most excited about building things, and those who are most excited about the craft of programming.
If I can build things faster, then I'm happy to spend most of my time reviewing AI code. That doesn't mean that I never write code. Some things the AI is worse at, or need to be exactly write and its faster to do them manually.
> I think there are two types of developers: those who are most excited about building things, and those who are most excited about the craft of programming.
Love this. You hit the nail right on the head.
I don't know if I fit into one or the other. However, I do know that at times I feel like one, and at other times, the other.
If I am writing another new app and need to build a slew of CRUD code, I don't care about the craft. I mean, I don't want sloppy code, but I do not get joy out of writing what is _almost_ boilerplate. I still want it to reflect my style, but I don't want to type it all out. I already know how it all works in my head. The faster I get it into an IDE the better. Cursor (the AI IDE) allowed me to do this much faster than I would have by hand.
Then there is time where I do want to craft something beautiful. I had one part of this project where I needed to build a scheduler and I had very specific things I wanted it to do. I tried twice to describe what I want but the AI tool did not do what I wanted. It built a working piece of code, but I could not get it to grasp the nuance.
I sat down and wrote the code for the scheduler, but then had to deal with a bunch of edge cases. I took this code, gave it to the AI and told it to implement those edge cases. After reviewing and iterating on it, I had exactly what I wanted.
I think we could see a lot of these AI code tools start to pivot towards product folks for just this reason. They aren't meant for the people who find craft in what they do.
That's essentially what many hands-on engineering managers or staff engineers do today. They spend significant portions of their day reviewing code from more junior team members.
Reviewing and modifying code is more engaging than typing out the solution that is fully formed in my head. If the AI creates something close to what I have in my head from the description I gave it, I can work with it to get it even closer. I can also hand-edit it.
"I say your civilization, because as soon as we started thinking for you it really became our civilization, which is of course what this is all about." - Agent Smith
"Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them." - Dune
But I think that quote is a pretty gross mischaracterization of the parent comment.
I similarly am a big fan of Cursor. But I don't "turn [my] thinking over to machines". Even though I review every piece of code it generates and make sure I understand it, it still saves me a ton of time. Heck, some of the most value I get from Cursor isn't even it generating code for me, it's getting to ask questions about a very large codebase with many maintainers where I'm unfamiliar with large chunks. E.g. asking questions like "I would like to do X, are there any places in this codebase that already do this?"
I'm also skeptical of LLMs ever being able to live up to their hype ("AGI is coming sooooon!!!!"), but I still find them to be useful tools in context that can save me a lot of time.
I use it for simple tasks where spotting a mistake is easy. Like writing language binding for a REST API. It's a bunch of methods that look very similar, simple bodies. But it saves quite some work
Or getting keywords to read about from a field I know nothing about, like caching with zfs. Now I know what things to put in google to learn more to get to articles like this one
https://klarasystems.com/articles/openzfs-all-about-l2arc/ which for some reason doesn't appear in top google results for "zfs caching" for me
If you are another "waterboy" doing crud applications, the problem has been solved a long time ago.
What I mean by that is, the "waterboy" (crud "developer") is going to fetch the water (sql query in the database), then bring the water (Clown Bob layer) to the UI...
The size of your Clown Bob layer may vary from one company to another...
This has been solved a long time ago. It has been a well-paid clerk job that is about to come to an end.
If you are doing pretty much anything else, the AI is pathetically incapable of doing any piece of code that makes sense.
Another great example, yesterday, I wanted to know if VanillaOs was using systemD or not. I did scroll through their frontpage but I didn't see anything, so I tried the AI Chat from duckduckgo. This is a frontend for AI chatbots that includes ChatGPT, Llama, Claude and another one...
I started my question by: "can you tell me if VanillaOS is using runit as the init system?"... I wanted initially ask if it was using systemd, but I didn't want to _suggest_ systemd at first.
And of course, all of them told me: "Yeah!! It's using runit!".
Then for all of them I replied, without any fact in hands: "but why on their website they are mentioning to use systemctl to manage the services then?".
And... of course! All of them answered: "Ooouppsss, my mistake, VanillaOS uses systemD, blablabla"....
So at the end, I still don't know which init VanillaOS is using.
If you are trusting the AI as you seem to do, I wish you the best luck my friend... I just hope you will realize the damage you are doing to yourself by "stopping" coding and letting something else do the job. That skill, my friend, is easily lost with time; don't let it evaporate from your brain for some vaporware people are trying to sell you.
> We are at the point where AI writes code for us and we can blindly accept it.
I’m waiting for the day we’ll get the first major breach because someone did exactly that. This is not a case of “if”, it is very much a “when”. I’ve seen enough buggy LLM-generated code and enough people blindly accepting it to be confident in that assertion.
This would indeed be the best way around.The code reviews might even be better - currently, there's little time for them and we often have only one person in the team with much knowledge in the relevant language/framework/application, so reviews are often just "looks OK to me".
It's not quite the same, but I'm reminded of seeing a documentary decades ago which (IIRC) mentioned that a factor in air accidents had been the autopilot flying the plane and human pilots monitoring it. Having humans fly and the computer warn them of potential issues was apparently safer.
> Now, if you could switch it around so that I write the code, and the AI reviews it, that would be something.
I'm sort of doing that. I'm working on a personal project in a new language and asking Claude for help debugging and refactoring. Also, when I don't know how to create a feature, I might ask it to do so for me, but I might instead ask it for hints and an overview so I can enjoy working out the code myself.
Just this week I installed cursor, the AI-assisted VSCode-like IDE. I am working on a side project and decided to give it a try.
I am blown away.
I can describe the feature I want built, and it generates changes and additions that get me 90% there, within 15 or so seconds. I take those changes, and carefully review them, as if I was doing a code review of a super-junior programmer. Sometimes when I don't like the approach it took, I ask it to change the code, and it obliges and returns something closer to my vision.
Finally, once it is implemented, I manually test the new functionality. Afterward, I ask it to generated a set of automated test cases. Again, I review them carefully, both from the perspective of correctness, and suitability. It over-tests on things that don't matter and I throw away a part of the code it generates. What stays behind is on-point.
It has sped up my ability to write software and tests tremendously. Since I know what I want , I can describe it well. It generates code quickly, and I can spend my time revieweing and correcting. I don't need to type as much. It turns my abstract ideas into reasonably decent code in record time.
Another example. I wanted to instrument my app with Posthog events. First, I went through the code and added "# TODO add Posthog event" in all the places I wanted to record events. Next, I asked cursor to add the instrumentation code in those places. With some manual copy-and pasting and lots of small edits, I instrumented a small app in <10 minutes.
We are at the point where AI writes code for us and we can blindly accept it. We are at a point where AI can take care of a lot of the dreary busy typing work.