11
Mechanics and Construction / Re: My Advanced Realistic Humanoid Robot Project
« Last post by artbyrobot1 on February 18, 2026, 01:54:11 AM »So I was randomly watching YouTube recently and saw a George Hotz past broadcast entitled "George Hotz | Programming | Welcome to Gas Town and the future of Computer Use | Agentic AI | Part 1". It kind of shocked me to see this. I was under the impression for some years now that vibecoding was for total noob programmers who were incompetent and horrible coders that were happy to throw together some absolute trash bug filled code with LLM help and call it done. And that were they ever to need to go back and fix the code and remove bugs, it would be more trouble than it was worth due to the horrible code the LLM output. That it would be easier to rewrite it all from scratch at that point than to try to fix its bugs. I'd been watching this space closely though. That was the consensus I thought was at hand. But then this video threw a monkey wrench into that consensus. Here, a world renowned god-tier coder in George Hotz was using agentic AI and saying its the future of coding. Kind of blew my mind. I then deep dove into agentic AI, how agents work, what is the agentic loop, what are agent tools, what are agent skills, what is good vibe coding practice, what are good vs bad vibe coding methodologies, how much does it cost, etc etc. I discovered something called Claude Code and Codex and OpenCode and OpenClaw and people raving about these tools or warning about them and making fun of them. It is a incredibly polarizing topic with people adamantly for and against it all. I even listened to a 11.5 hour long audiobook called Vibe Coding Audiobook by Gene Kim, Steve Yegge. It was a huge deep dive for me.
In the end I drew several conclusions or hypothesis and chose the following stances: I think full blown vibe coding infrastructure low level important lengthy and complex coding projects is not a good idea now and maybe ever. But I think LLM assisted coding can do it. Vibe coding where you don't even really look at the code at all and trust the LLM and agentic AI using the LLM completely is probably only doable for very simple things and will be buggy and ugly code in many ways. Not useful for 99% of my goals. Maybe that can work for front-end web UIs but nothing intensive and low level like computer vision, game engines, AI dev, making an OS, making speech synthesis or speech recognition engines, SLAM, etc etc. It would be horrible for that IMO.
So for my goals, how can I best use LLM assisted coding? Well, my chosen LLM at this time for this is Chatgpt. How will I use it? Well I can use it to co-develop plans and algorithm workflows for modules or sections of code one step at a time. Working through it all. Then once that is done and saved, we can go step by step together writing the code or editing existing code as needed to bring about successful steps one at a time and test it thoroughly as we go. Now in some ways I had already been asking Chatgpt some questions while I code and getting SOME coding help as I code when I used it a few years back as a stack exchange substitute or ask a friend type assistant. At that time I didn't have it producing much code for me though. You can see videos of me coding this way on my YouTube robotics playlist. But I am now planning to take LLM use for coding assistance to a whole other level in a novel way I came up with. So on my IDE of choice, Dev C++ 4.9.9.2, there are no copilot AIs or w/e to code along with you and those are mostly autocomplete AIs anyways - they don't code for you. So that's not what I want. I also don't want to use any other IDEs for various reasons we won't go into here. So what I decided to do is just make my web browser full-screen and stay on the chatgpt chat interface THE WHOLE TIME I am coding. No alt tabbing, no terminal uses, no IDEs. Just stay on chatgpt web browser and that's it. For the most part. That's kind of the aim. So how can I code like this and test the code and whatnot then? Well basically chatgpt and I came up with the idea of creating a program that we called "The Orchestrator" that would run on my PC and read the chat I'm having with chatgpt and read the code that chatgpt writes or wants to change in my codebase and it would then validate that code change request or code snippet addition, make sure it fits our desired formatting and coding style, make sure it doesn't break any rule or screw up anything, possibly query another LLM AI with a submission of that code change and a copy of the surrounding code module it is changing and ask if it will break anything, if it will do what it claims it will do, etc etc etc. And after a extensive series of validation steps, the Orchestrator will either report back that the code was faulty with a reason why it was faulty or it will determine the code was good and passed checks. It will report back via a popup window message and/or possibly optionally text to speech so I hear it talking to me verbally which might be kind of cool. It would populate my clipboard of my mouse and tell me to paste its response into chatgpt. Which I would then do and hit enter to submit the response. Chatgpt would then see what it screwed up and rewrite it after apologizing or w/e. This would rinse repeat until the Orchestrator approved the code. Upon approving it, it would bring up a popup window UI that would act as a IDE containing my codebase file we are editing, the ability to scroll up and down to read my code, etc. It would highlight in green the new code chatgpt was recommending we add within my codebase, giving a green background there. Or if it was a code change chatgpt had written, it would split the screen of that section into two halves. On the left half would be the original code being modified. On the right half would be the changed code proposal with the actual changes highlighted with a red background. I would read both at that point, make any modifications I wanted right there in that window, and hit ok if I approved. So then my job besides co-planning with chatgpt and talking back and forth till I like the plan with the code steps is to read and approve all code changes before they go in and make any final tweaks. Ideally I no longer actually write the code, I just help plan and submit approvals. I have to understand the codebase deeply the whole time, I give all approvals, and I at no point let the LLM just go haywire on my codebase. It's relative to full automation with a local AI agent a slow and methodical process where I retain full understanding and control the whole time. This I feel comfortable with. And this setup I described is my planning coding workflow. It does not have me using the terminal nor an IDE. Its like a custom kind of weird LLM chat and popup windows weird IDE kind of "thing".
Advantages of this approach is much less brainpower needed looking up APIs to find what I want to use or manually typing all code. This also saves on keystrokes which saves my fingers from overworking and prevents any risk of carpal tunnel and whatnot. It should greatly speed up my coding development compared to manual typing especially once all the features I want the Orchestrator to have are all working.
Note: when a new session starts, I will have a big context dump notify chatgpt of its role and of our plans and short-term and long term goals and keyword trigger commands it can give the Orchestrator to get things rolling and have it go out and find files or directory structures etc from the codebase to help chatgpt navigate the codebase. Together with Chatgpt the Orchestrator will give a context snapshot to catch chatgpt up to date on what the code does so far for the module we are working on, what steps were done recently, what next steps are, etc.
Note: as of right now the way chatgpt talks to the Orchestrator is by me manually copying the browser text output of chatgpt. The Orchestrator polls the mouse clipboard to read these text inputs and parses them and acts on them accordingly. But soon I hope to develop a way for the Orchestrator to scrape the DOM of the browser to read the output of chatgpt directly or create a browser extension that outputs that as a file or as a web socket communication to the Orchestrator or perhaps eventually even have the Orchestrator just use computer vision to literally read the pixels of the screen to read what chatgpt is saying and get its input that way just as my own eyes do. Not sure what direction I'm going to go on that front quite yet but that's coming soon I feel. That will make it even easier to use.
Note: unlike most local AI agents that have to use a local LLM (costs alot to have a good one as far as hardware which is super inflated right now as well in cost) or use a cloud LLM API (charge based on tokens used or a monthly subscription - heavy use gets extremely expensive with guys like Yegge saying they are spending like $3k/mth on tokens WOW!), I chose to just use free Chatgpt as my LLM hookup. And that's good enough. But going that route, not using an API, I can't brute force problems as fast or as parallelized as some AI agents are doing but that is fine because I'm not having AI write my entire program and beat its head against a wall trying to fix its own bugs for ages and work through its own spaghetti code making stupid mistakes but eventually after a million tokens are used it finds its way like a blind man reaching around bumping into stuff in the dark, I don't need big brute forcing like those full automated AI agents are doing. So I don't need massive token use and fast calls, etc. Since I am myself co-coding, reading everything, preventing any bad code changes from making it through into my codebase, I don't need the brute forcing stuff then. And my usage can still fall into normal expected use category and not get rate limited or get me banned. I'm still the only one interacting with my mouse and keyboard into chatgpt. All inputs are manually done by a human user. So yeah, with my system, the LLM co-piloting is FREE which is important for me on my tight budget.
So anyways, chatgpt and I developed this system of using this Orchestrator as a 3rd copilot. Chatgpt is my copilot and the Orchestrator is our 3rd copilot. The Orchestrator extends Chatgpt's abilities to leave the browser and do stuff online or with my file system etc as its assistant just like local AI agents do. So it is loosely inspired by the same concept as a local AI agent. I am excited about this tool/workflow. I think it will vastly speed up my overall developer speed while still giving me fine grained control. So the downsides of automated code production associated with vibecoding don't really apply here since I'm not taking myself out of the loop so much like people warn about.
So to make this Orchestrator, chatgpt highly recommended we use Python. I've never coded in Python before and was under the impression its much slower than C/C++ so no good for anything in my robotics projects. However, since this is basically just a sort of weird LLM assisted coding assistant thing we are making, I figured it's probably fine for that use case. So I gave in and downloaded an old version Python that runs on my machine. Within a few days I, with chatgpt's help, got Python installed and working and learned how to use a text file as my Python code which I edit in notepad on windows. I use a .bat file to launch that orchestrator.py file and it launches and works. We've got it polling the clipboard and taking action when valid clipboard changes come in, doing some simple things chatgpt tells it to do successfully. I am also working on implementing a boot sequence thing where chatgpt commands it to run a boot sequence which has it open a window where it asks me to select what coding project I want to work on today from a list we store in a txt file and I press a letter on my keyboard to select a coding project from that list. It then pulls up a .txt context file about that coding project from the Orchestrator directory containing these project specific context files and it puts that file plus a list of trigger keywords and instructions chatgpt can use into the mouse clipboard and tells me to paste in the context. I then paste that in and chatgpt is told to basically use a series of command trigger words and commands to essentially open up my codebase of that particular coding project, explore the files, explore the directory, explore the various sections of code, and find out where we left off and see what the next steps are on the coding to do list for that project if any. If none, it will work with me to create the next steps of a todo list for that code project. We then can start to code it.
So far the orchestrator.py is about 500 lines of Python code. It's all bug free and working great. I took the time to carefully read it all and understand it all and now carefully read and understand every proposed change chatgpt writes for it. Only then do I add it and test it. But eventually, my aim is to have the Orchestrator do the reading of the original code and the proposed changes or additions and validate and filter it before I personally read and approve change or addition requests. The idea there is that my brainpower is limited and I only have so much fuel in the tank mentally for a given session. So the Orchestrator will see if chatgpt is doing something clearly dumb in its code change or addition request and catch that and have me paste the problem back to chatgpt by just bringing a popup window saying "paste this to chatgpt" and I just paste it. No thinking needed. And chatgpt corrects itself and tries again. Only once Orchestrator approves do I get a popup window showing the code change and old code and highlighted changes or updates text for me to read carefully and approve or reject. By taking my brain out of the loop for the dumb mistakes, the Orchestrator shields me from petty code corrections mostly and filters out the obvious dumb LLM mistakes so I don't have to. This saves me frustration and brainpower so I only see the good stuff that makes it through the filter. That makes the system more friction-less and leaves less waste of my time nonsense to deal with so I don't get so frustrated with dumb LLM code outputs. And the idea to possibly send it off to Gemini or w/e with their API free tier usage to verify it if it was a complicated change the Orchestrator wants a second opinion on should be icing on the cake and all of this happens behind the scenes while I wait.
I will say this: for me to get as far as I did with the Orchestrator in just a few days as I did is a marvel for me. I am not that fast but with Chatgpt's guidance I am so much faster. I found a 5.5 hour tutorial on how to install Python, get it going in a IDE, create projects, etc etc and watched some of it and realized I didn't need to do any of that. Chatgpt had me up and running Python in under an hour as someone who never even used it before and we were already having working code within the hour. It was amazing. Super fast for me. Chatgpt still makes dumb mistakes and bad code often, but with the Orchestrator and its guardrails and a tightly designed context window and guidance, plus all the filtration and validation steps, this can be a amazing semi-automated coding workflow.
So that's where I'm at. It's exciting stuff. This doesn't mean I'm stopping the electronics development. I plan to do this stuff in parallel with that so updates will still come on that. But the whole wait to code AT ALL until all electronics are done I just could not stick to anymore. It's been like 3-4 years since I last worked on coding for the robot project and I just couldn't hold off anymore. I think coding some then working on electronics some then coding some is fine. A bit all over the place but so be it. I got caught up in the AI assisted coding hype and just couldn't help myself jumping in and figuring out how I can best use AI to help my coding adventures in a way that fits my needs best. I couldn't wait anymore. PLUS, if I spend many more years NOT coding, I'd start to get somewhat concerned that what I already coded and learned and still remember would start to fade more and more from memory and I'd basically have no clue of what I was trying to do if I wait too long. I wanted to get back into my code before I have no mental architecture skeleton left in memory which would make things harder for me to get back into it. It's a lot of things to keep in my mind for so long without actively spending time in the codebase. So yeah that's my other excuse for why I'm going back into it "prematurely."
Note: my first impressions of Python were very, very good. I was amazed at how much it achieved by so little lines of code. Its 500 lines of code it has so far would have taken like 3,000 lines of C/C++ code I feel. It is crazy. It's very intuitive to use. I already just get it. Although chatgpt is doing the heavy lifting so I'm not actually typing it out myself. But it doesn't matter as long as I understand what it's all doing and how then I can keep its architecture clear and consistent and avoid bugs.
In the end I drew several conclusions or hypothesis and chose the following stances: I think full blown vibe coding infrastructure low level important lengthy and complex coding projects is not a good idea now and maybe ever. But I think LLM assisted coding can do it. Vibe coding where you don't even really look at the code at all and trust the LLM and agentic AI using the LLM completely is probably only doable for very simple things and will be buggy and ugly code in many ways. Not useful for 99% of my goals. Maybe that can work for front-end web UIs but nothing intensive and low level like computer vision, game engines, AI dev, making an OS, making speech synthesis or speech recognition engines, SLAM, etc etc. It would be horrible for that IMO.
So for my goals, how can I best use LLM assisted coding? Well, my chosen LLM at this time for this is Chatgpt. How will I use it? Well I can use it to co-develop plans and algorithm workflows for modules or sections of code one step at a time. Working through it all. Then once that is done and saved, we can go step by step together writing the code or editing existing code as needed to bring about successful steps one at a time and test it thoroughly as we go. Now in some ways I had already been asking Chatgpt some questions while I code and getting SOME coding help as I code when I used it a few years back as a stack exchange substitute or ask a friend type assistant. At that time I didn't have it producing much code for me though. You can see videos of me coding this way on my YouTube robotics playlist. But I am now planning to take LLM use for coding assistance to a whole other level in a novel way I came up with. So on my IDE of choice, Dev C++ 4.9.9.2, there are no copilot AIs or w/e to code along with you and those are mostly autocomplete AIs anyways - they don't code for you. So that's not what I want. I also don't want to use any other IDEs for various reasons we won't go into here. So what I decided to do is just make my web browser full-screen and stay on the chatgpt chat interface THE WHOLE TIME I am coding. No alt tabbing, no terminal uses, no IDEs. Just stay on chatgpt web browser and that's it. For the most part. That's kind of the aim. So how can I code like this and test the code and whatnot then? Well basically chatgpt and I came up with the idea of creating a program that we called "The Orchestrator" that would run on my PC and read the chat I'm having with chatgpt and read the code that chatgpt writes or wants to change in my codebase and it would then validate that code change request or code snippet addition, make sure it fits our desired formatting and coding style, make sure it doesn't break any rule or screw up anything, possibly query another LLM AI with a submission of that code change and a copy of the surrounding code module it is changing and ask if it will break anything, if it will do what it claims it will do, etc etc etc. And after a extensive series of validation steps, the Orchestrator will either report back that the code was faulty with a reason why it was faulty or it will determine the code was good and passed checks. It will report back via a popup window message and/or possibly optionally text to speech so I hear it talking to me verbally which might be kind of cool. It would populate my clipboard of my mouse and tell me to paste its response into chatgpt. Which I would then do and hit enter to submit the response. Chatgpt would then see what it screwed up and rewrite it after apologizing or w/e. This would rinse repeat until the Orchestrator approved the code. Upon approving it, it would bring up a popup window UI that would act as a IDE containing my codebase file we are editing, the ability to scroll up and down to read my code, etc. It would highlight in green the new code chatgpt was recommending we add within my codebase, giving a green background there. Or if it was a code change chatgpt had written, it would split the screen of that section into two halves. On the left half would be the original code being modified. On the right half would be the changed code proposal with the actual changes highlighted with a red background. I would read both at that point, make any modifications I wanted right there in that window, and hit ok if I approved. So then my job besides co-planning with chatgpt and talking back and forth till I like the plan with the code steps is to read and approve all code changes before they go in and make any final tweaks. Ideally I no longer actually write the code, I just help plan and submit approvals. I have to understand the codebase deeply the whole time, I give all approvals, and I at no point let the LLM just go haywire on my codebase. It's relative to full automation with a local AI agent a slow and methodical process where I retain full understanding and control the whole time. This I feel comfortable with. And this setup I described is my planning coding workflow. It does not have me using the terminal nor an IDE. Its like a custom kind of weird LLM chat and popup windows weird IDE kind of "thing".
Advantages of this approach is much less brainpower needed looking up APIs to find what I want to use or manually typing all code. This also saves on keystrokes which saves my fingers from overworking and prevents any risk of carpal tunnel and whatnot. It should greatly speed up my coding development compared to manual typing especially once all the features I want the Orchestrator to have are all working.
Note: when a new session starts, I will have a big context dump notify chatgpt of its role and of our plans and short-term and long term goals and keyword trigger commands it can give the Orchestrator to get things rolling and have it go out and find files or directory structures etc from the codebase to help chatgpt navigate the codebase. Together with Chatgpt the Orchestrator will give a context snapshot to catch chatgpt up to date on what the code does so far for the module we are working on, what steps were done recently, what next steps are, etc.
Note: as of right now the way chatgpt talks to the Orchestrator is by me manually copying the browser text output of chatgpt. The Orchestrator polls the mouse clipboard to read these text inputs and parses them and acts on them accordingly. But soon I hope to develop a way for the Orchestrator to scrape the DOM of the browser to read the output of chatgpt directly or create a browser extension that outputs that as a file or as a web socket communication to the Orchestrator or perhaps eventually even have the Orchestrator just use computer vision to literally read the pixels of the screen to read what chatgpt is saying and get its input that way just as my own eyes do. Not sure what direction I'm going to go on that front quite yet but that's coming soon I feel. That will make it even easier to use.
Note: unlike most local AI agents that have to use a local LLM (costs alot to have a good one as far as hardware which is super inflated right now as well in cost) or use a cloud LLM API (charge based on tokens used or a monthly subscription - heavy use gets extremely expensive with guys like Yegge saying they are spending like $3k/mth on tokens WOW!), I chose to just use free Chatgpt as my LLM hookup. And that's good enough. But going that route, not using an API, I can't brute force problems as fast or as parallelized as some AI agents are doing but that is fine because I'm not having AI write my entire program and beat its head against a wall trying to fix its own bugs for ages and work through its own spaghetti code making stupid mistakes but eventually after a million tokens are used it finds its way like a blind man reaching around bumping into stuff in the dark, I don't need big brute forcing like those full automated AI agents are doing. So I don't need massive token use and fast calls, etc. Since I am myself co-coding, reading everything, preventing any bad code changes from making it through into my codebase, I don't need the brute forcing stuff then. And my usage can still fall into normal expected use category and not get rate limited or get me banned. I'm still the only one interacting with my mouse and keyboard into chatgpt. All inputs are manually done by a human user. So yeah, with my system, the LLM co-piloting is FREE which is important for me on my tight budget.
So anyways, chatgpt and I developed this system of using this Orchestrator as a 3rd copilot. Chatgpt is my copilot and the Orchestrator is our 3rd copilot. The Orchestrator extends Chatgpt's abilities to leave the browser and do stuff online or with my file system etc as its assistant just like local AI agents do. So it is loosely inspired by the same concept as a local AI agent. I am excited about this tool/workflow. I think it will vastly speed up my overall developer speed while still giving me fine grained control. So the downsides of automated code production associated with vibecoding don't really apply here since I'm not taking myself out of the loop so much like people warn about.
So to make this Orchestrator, chatgpt highly recommended we use Python. I've never coded in Python before and was under the impression its much slower than C/C++ so no good for anything in my robotics projects. However, since this is basically just a sort of weird LLM assisted coding assistant thing we are making, I figured it's probably fine for that use case. So I gave in and downloaded an old version Python that runs on my machine. Within a few days I, with chatgpt's help, got Python installed and working and learned how to use a text file as my Python code which I edit in notepad on windows. I use a .bat file to launch that orchestrator.py file and it launches and works. We've got it polling the clipboard and taking action when valid clipboard changes come in, doing some simple things chatgpt tells it to do successfully. I am also working on implementing a boot sequence thing where chatgpt commands it to run a boot sequence which has it open a window where it asks me to select what coding project I want to work on today from a list we store in a txt file and I press a letter on my keyboard to select a coding project from that list. It then pulls up a .txt context file about that coding project from the Orchestrator directory containing these project specific context files and it puts that file plus a list of trigger keywords and instructions chatgpt can use into the mouse clipboard and tells me to paste in the context. I then paste that in and chatgpt is told to basically use a series of command trigger words and commands to essentially open up my codebase of that particular coding project, explore the files, explore the directory, explore the various sections of code, and find out where we left off and see what the next steps are on the coding to do list for that project if any. If none, it will work with me to create the next steps of a todo list for that code project. We then can start to code it.
So far the orchestrator.py is about 500 lines of Python code. It's all bug free and working great. I took the time to carefully read it all and understand it all and now carefully read and understand every proposed change chatgpt writes for it. Only then do I add it and test it. But eventually, my aim is to have the Orchestrator do the reading of the original code and the proposed changes or additions and validate and filter it before I personally read and approve change or addition requests. The idea there is that my brainpower is limited and I only have so much fuel in the tank mentally for a given session. So the Orchestrator will see if chatgpt is doing something clearly dumb in its code change or addition request and catch that and have me paste the problem back to chatgpt by just bringing a popup window saying "paste this to chatgpt" and I just paste it. No thinking needed. And chatgpt corrects itself and tries again. Only once Orchestrator approves do I get a popup window showing the code change and old code and highlighted changes or updates text for me to read carefully and approve or reject. By taking my brain out of the loop for the dumb mistakes, the Orchestrator shields me from petty code corrections mostly and filters out the obvious dumb LLM mistakes so I don't have to. This saves me frustration and brainpower so I only see the good stuff that makes it through the filter. That makes the system more friction-less and leaves less waste of my time nonsense to deal with so I don't get so frustrated with dumb LLM code outputs. And the idea to possibly send it off to Gemini or w/e with their API free tier usage to verify it if it was a complicated change the Orchestrator wants a second opinion on should be icing on the cake and all of this happens behind the scenes while I wait.
I will say this: for me to get as far as I did with the Orchestrator in just a few days as I did is a marvel for me. I am not that fast but with Chatgpt's guidance I am so much faster. I found a 5.5 hour tutorial on how to install Python, get it going in a IDE, create projects, etc etc and watched some of it and realized I didn't need to do any of that. Chatgpt had me up and running Python in under an hour as someone who never even used it before and we were already having working code within the hour. It was amazing. Super fast for me. Chatgpt still makes dumb mistakes and bad code often, but with the Orchestrator and its guardrails and a tightly designed context window and guidance, plus all the filtration and validation steps, this can be a amazing semi-automated coding workflow.
So that's where I'm at. It's exciting stuff. This doesn't mean I'm stopping the electronics development. I plan to do this stuff in parallel with that so updates will still come on that. But the whole wait to code AT ALL until all electronics are done I just could not stick to anymore. It's been like 3-4 years since I last worked on coding for the robot project and I just couldn't hold off anymore. I think coding some then working on electronics some then coding some is fine. A bit all over the place but so be it. I got caught up in the AI assisted coding hype and just couldn't help myself jumping in and figuring out how I can best use AI to help my coding adventures in a way that fits my needs best. I couldn't wait anymore. PLUS, if I spend many more years NOT coding, I'd start to get somewhat concerned that what I already coded and learned and still remember would start to fade more and more from memory and I'd basically have no clue of what I was trying to do if I wait too long. I wanted to get back into my code before I have no mental architecture skeleton left in memory which would make things harder for me to get back into it. It's a lot of things to keep in my mind for so long without actively spending time in the codebase. So yeah that's my other excuse for why I'm going back into it "prematurely."
Note: my first impressions of Python were very, very good. I was amazed at how much it achieved by so little lines of code. Its 500 lines of code it has so far would have taken like 3,000 lines of C/C++ code I feel. It is crazy. It's very intuitive to use. I already just get it. Although chatgpt is doing the heavy lifting so I'm not actually typing it out myself. But it doesn't matter as long as I understand what it's all doing and how then I can keep its architecture clear and consistent and avoid bugs.
Recent Posts


