I'm starting to realize that all the things I wanted back in late 2022 are now slowly coming to fruition. At first, I was baffled and frustrated by the knowledge cutoff, and that was fixed (at least with most models) by allowing them to connect to the internet to look stuff up in real time. I wanted to draw, and now they can draw pretty well without screwing up so much. And, I really wanted to be able to do things with agents. I think we're getting all of those little layers, but the rollout hasn't been as fast as I would have liked for personal reasons, but it's still probably way too fast for safety reasons.
This is encouraging news, and I am sure we'll see everyone's agents becoming more useful over the coming months.
The kinks are slowly getting ironed out, so we're finally into "This stuff mostly works" territory with agents. Give Genspark a try if you have some tasks that lend themselves well to multiple models being orchestrated that way. Would be interesting to see what you think!
Oh the demo with South Park, I was so looking forward to hear the SP voices. Played the finished video and the animations were looping, 4s in not enough, and the voices...well at least it included "they killed Kenny". Still impressive.
Oh for sure, the South Park cartoon is a rather underwhelming example in terms of the finished product!
What is impressive to me is the agent's ability to call on these models and orchestrate them competently behind the scenes.
Because putting all of that together involved analyzing the news, extracting key insights, turning that into a viable script, breaking that down into clip-level scenes based on known limitations, feeding orders to the video model and the voice model to create the clips and the audio, stitching the results together, and finally creating the interactive HTML site around the end result.
That's the big takeaway here: The system works, and as the individual components get better (longer, more coherent video clips, more realistic voices), so will the output.
Back in ye olden days circa 2022 citizen developer initiatives and automation/orchestration tools were popping with Zapier leading the way and the one I worked on - MSFT Power Automate. So for example you could create no code/low code workflows with a visual builder to automate a response to a chat by firing off an email, putting an entry in an .xls, more sophisticated stuff like a database call or form.
This is some next level sh#t. AI agents are doing this and more and Zapier has got an AI builder and of course Power Automate has Copilot. I wonder who wins out though - standalone agents like this Genspark or ones baked into established stacks. Monotone man got me with the cooking/influencer reel. Maybe rough now, but dayum, just DAYUM.
Yeah, we've got lots of good workflow automation platforms these days, also including Make and n8n. But I guess agents that can bypass all of that noise and just "be you" are the dream, eh?
I'd say if standalone agents become competent enough to bypass the entire back-end architecture and just navigate customer-facing interfaces naturally, this will be a clear win for most average users (no need to fiddle by cobbling together different tools, just tell Jarvis what you need done).
We're definitely not there yet, and Genspark doesn't do any stuff like button clicking or adding to cart on sites. But for the use cases it's designed for, it's pretty solid!
Correct, everything runs out of the box inside Genspark's own environment.
One caveat that I just mentioned in a separate Note: Super Agent can't help with use cases that require “physical” interactions with a screen like clicking on buttons and adding things to shopping carts. But for anything that can be handled via search, information synthesis, and its internal toolkit of genAI models, it’s great!
i did ask for a specific cat leash from a non amazon location and it came up with some decent options though!
will need to test it out more to have an informed opinion on this.
on a more general note,
what do AI labs seem to have tunnel vision in trying to automate these tasks with agents?
"Plan Travel to San Diego & AI Call for Me to Make Reservation
AI Call for Me to Make Restaurant Reservation"
Is there a real market for these types of AI? maybe i am old fashioned, but i actually like doing these mundane things as it adds to my enjoyment of said future activity.
Ha, I know - all of their use cases are about grocery shopping and trip planning. Which, yes, is perhaps relatable to some people who struggle with booking trips or don't enjoy it, but I kind of feel we haven't reached the level of robustness and confidence in these agents to fully outsource this in the first place.
To me, Genspark really shines where you need to do some solid research, reason about the findings, and then output a nice visual user-facing interface that lets them navigate that information. The best use cases for my needs were of that kind. It's really good at coding up something that's pleasant to interact with and contains whatever you asked it to find.
Definitely still a few steps away from a full-fledged universal agent for now though.
Your post almost perfectly summarises how I responded to GenSpark's release. I watched the demo video and was super uninspired by both the demos and the presenter, so I basically wrote it off as just another one of the dozens of agent platforms crowding the "nothing to see here" space.
Based on your post here though, would you consider doing some kind of live session for your community? I get it that you wouldn't call yourself a power user of the platform yet, but you are at least a few steps ahead of the rest of us.
I think it would be awesome to see you do some demonstrations and play with the tool live, and for us to be able to ask questions and explore together.
I for one would pay for that. Is that something you'd consider?
To me this is super motivating as I've been waiting for the first accessible and practical agent platform to come along (both for me personally, and to refer to my clients). Based on your description, it sounds like this could be it.
Exactly, I had the same "Meh, next" feeling when I first saw it, too.
But what clicked for me isn't that Genspark does anything particularly out of the ordinary, but that it appears to do things more consistently and reliably than I'm used to. (Not 100% flawlessly, mind you, but definitely with a solid hit rate.)
I'm not a power user by any means for sure, but I'd consider doing a live Q&A session. In fact, a general Q&A has been on my list for a while, but perhaps a Genspark-focused one would be a good way to test the waters.
Having said that, I'm just about to leave on an Easter holiday with my family, so realistically it wouldn't be before late April. Let me know if that'd still be relevant.
In the meantime, I'd say Proxy and Super Agent are my best experiences with AI agents thus far. Both have free tiers, too, so give them a look. I'd love to hear what your initial impressions are!
It seems a little weird that you'd prioritise Easter with your family over a webinar for a guy you've never met who lives on the other side of the world Daniel. But so be it.
(I would definitely pay for a session like this in late April)
For now, one big question... does GenSpark have a "watch me do this thing, and learn from it and do it whenever I ask you to do it" function? (Probably better named).
Listen, we've shared some nice moments. Substack comments, LinkedIn posts, the lot. But sometimes people grow apart, Mark. It's not you, it's me. There are plenty of AI Substackers in the sea who'd be lucky to webinar for you.
But cool beans, let's aim for a late April agent deep dive.
You're thinking something like Excel Macros but for an AI agent, eh? The answer is no. At least not yet. Genspark doesn't take over your computer or in fact have any visible browser/computer-like interface. It runs in a virtual environment and does its browsing/research behind the scenes. What you see as the user is best compared to OpenAI's Deep Research: exposed thinking steps and links to the sources/pages it sees along with the output of other agents it calls upon (image, video, speech, etc.).
So in that sense, it's not an agent that mimics web browsing directly, so its usefulness to anyone who's after a more direct control is likely quite limited.
For my use cases, this is actually preferable though: I'd much rather let an agent go away, do its thing, and come back to me with a solid finished product than watch OpenAI's Operator awkwardly scroll around a website, fumbling buttons and meekly asking for permission to add a dozen eggs to my shopping cart.
But try the free version, as I think it'll give you a very good feel for what you can (and can't) expect rather quickly.
Based on my limited reading comprehension skills, I predict that the above comment was about unicorns, leprechauns, and their impact on climate change.
I'm starting to realize that all the things I wanted back in late 2022 are now slowly coming to fruition. At first, I was baffled and frustrated by the knowledge cutoff, and that was fixed (at least with most models) by allowing them to connect to the internet to look stuff up in real time. I wanted to draw, and now they can draw pretty well without screwing up so much. And, I really wanted to be able to do things with agents. I think we're getting all of those little layers, but the rollout hasn't been as fast as I would have liked for personal reasons, but it's still probably way too fast for safety reasons.
This is encouraging news, and I am sure we'll see everyone's agents becoming more useful over the coming months.
For sure!
The kinks are slowly getting ironed out, so we're finally into "This stuff mostly works" territory with agents. Give Genspark a try if you have some tasks that lend themselves well to multiple models being orchestrated that way. Would be interesting to see what you think!
I just made a note and will definitely report back if/when I get a few Genspark miles in. Thanks!
Oh the demo with South Park, I was so looking forward to hear the SP voices. Played the finished video and the animations were looping, 4s in not enough, and the voices...well at least it included "they killed Kenny". Still impressive.
Oh for sure, the South Park cartoon is a rather underwhelming example in terms of the finished product!
What is impressive to me is the agent's ability to call on these models and orchestrate them competently behind the scenes.
Because putting all of that together involved analyzing the news, extracting key insights, turning that into a viable script, breaking that down into clip-level scenes based on known limitations, feeding orders to the video model and the voice model to create the clips and the audio, stitching the results together, and finally creating the interactive HTML site around the end result.
That's the big takeaway here: The system works, and as the individual components get better (longer, more coherent video clips, more realistic voices), so will the output.
Back in ye olden days circa 2022 citizen developer initiatives and automation/orchestration tools were popping with Zapier leading the way and the one I worked on - MSFT Power Automate. So for example you could create no code/low code workflows with a visual builder to automate a response to a chat by firing off an email, putting an entry in an .xls, more sophisticated stuff like a database call or form.
This is some next level sh#t. AI agents are doing this and more and Zapier has got an AI builder and of course Power Automate has Copilot. I wonder who wins out though - standalone agents like this Genspark or ones baked into established stacks. Monotone man got me with the cooking/influencer reel. Maybe rough now, but dayum, just DAYUM.
Yeah, we've got lots of good workflow automation platforms these days, also including Make and n8n. But I guess agents that can bypass all of that noise and just "be you" are the dream, eh?
I'd say if standalone agents become competent enough to bypass the entire back-end architecture and just navigate customer-facing interfaces naturally, this will be a clear win for most average users (no need to fiddle by cobbling together different tools, just tell Jarvis what you need done).
We're definitely not there yet, and Genspark doesn't do any stuff like button clicking or adding to cart on sites. But for the use cases it's designed for, it's pretty solid!
Very compelling! It does not seem like any coding is needed here. Is that the case? Will check it out this weekend!
Correct, everything runs out of the box inside Genspark's own environment.
One caveat that I just mentioned in a separate Note: Super Agent can't help with use cases that require “physical” interactions with a screen like clicking on buttons and adding things to shopping carts. But for anything that can be handled via search, information synthesis, and its internal toolkit of genAI models, it’s great!
signed up!
I tried asking it to find some pickup soccer games for me today and it definitely searched the internet thoroughly, but it did not find the meetup i was hoping it would.https://www.genspark.ai/agents?id=c78ee040-7bfc-478f-b1cc-b351356454b5
i did ask for a specific cat leash from a non amazon location and it came up with some decent options though!
will need to test it out more to have an informed opinion on this.
on a more general note,
what do AI labs seem to have tunnel vision in trying to automate these tasks with agents?
"Plan Travel to San Diego & AI Call for Me to Make Reservation
AI Call for Me to Make Restaurant Reservation"
Is there a real market for these types of AI? maybe i am old fashioned, but i actually like doing these mundane things as it adds to my enjoyment of said future activity.
Ha, I know - all of their use cases are about grocery shopping and trip planning. Which, yes, is perhaps relatable to some people who struggle with booking trips or don't enjoy it, but I kind of feel we haven't reached the level of robustness and confidence in these agents to fully outsource this in the first place.
To me, Genspark really shines where you need to do some solid research, reason about the findings, and then output a nice visual user-facing interface that lets them navigate that information. The best use cases for my needs were of that kind. It's really good at coding up something that's pleasant to interact with and contains whatever you asked it to find.
Definitely still a few steps away from a full-fledged universal agent for now though.
Hey Daniel,
Your post almost perfectly summarises how I responded to GenSpark's release. I watched the demo video and was super uninspired by both the demos and the presenter, so I basically wrote it off as just another one of the dozens of agent platforms crowding the "nothing to see here" space.
Based on your post here though, would you consider doing some kind of live session for your community? I get it that you wouldn't call yourself a power user of the platform yet, but you are at least a few steps ahead of the rest of us.
I think it would be awesome to see you do some demonstrations and play with the tool live, and for us to be able to ask questions and explore together.
I for one would pay for that. Is that something you'd consider?
To me this is super motivating as I've been waiting for the first accessible and practical agent platform to come along (both for me personally, and to refer to my clients). Based on your description, it sounds like this could be it.
Hey Mark,
Exactly, I had the same "Meh, next" feeling when I first saw it, too.
But what clicked for me isn't that Genspark does anything particularly out of the ordinary, but that it appears to do things more consistently and reliably than I'm used to. (Not 100% flawlessly, mind you, but definitely with a solid hit rate.)
I'm not a power user by any means for sure, but I'd consider doing a live Q&A session. In fact, a general Q&A has been on my list for a while, but perhaps a Genspark-focused one would be a good way to test the waters.
Having said that, I'm just about to leave on an Easter holiday with my family, so realistically it wouldn't be before late April. Let me know if that'd still be relevant.
In the meantime, I'd say Proxy and Super Agent are my best experiences with AI agents thus far. Both have free tiers, too, so give them a look. I'd love to hear what your initial impressions are!
It seems a little weird that you'd prioritise Easter with your family over a webinar for a guy you've never met who lives on the other side of the world Daniel. But so be it.
(I would definitely pay for a session like this in late April)
For now, one big question... does GenSpark have a "watch me do this thing, and learn from it and do it whenever I ask you to do it" function? (Probably better named).
Listen, we've shared some nice moments. Substack comments, LinkedIn posts, the lot. But sometimes people grow apart, Mark. It's not you, it's me. There are plenty of AI Substackers in the sea who'd be lucky to webinar for you.
But cool beans, let's aim for a late April agent deep dive.
You're thinking something like Excel Macros but for an AI agent, eh? The answer is no. At least not yet. Genspark doesn't take over your computer or in fact have any visible browser/computer-like interface. It runs in a virtual environment and does its browsing/research behind the scenes. What you see as the user is best compared to OpenAI's Deep Research: exposed thinking steps and links to the sources/pages it sees along with the output of other agents it calls upon (image, video, speech, etc.).
So in that sense, it's not an agent that mimics web browsing directly, so its usefulness to anyone who's after a more direct control is likely quite limited.
For my use cases, this is actually preferable though: I'd much rather let an agent go away, do its thing, and come back to me with a solid finished product than watch OpenAI's Operator awkwardly scroll around a website, fumbling buttons and meekly asking for permission to add a dozen eggs to my shopping cart.
But try the free version, as I think it'll give you a very good feel for what you can (and can't) expect rather quickly.
Based on my profoundly deep AI incompetency, I predict the following for this AI software.
1) It will do amazing things and offer you a wide range of options for processing data in interesting, creative and productive ways.
2) The signup and/or login feature won't work, and you won't be able to fix that by chatting with the bot.
3) Scroll bars may be missing.
4) Web design by monkeys and armadillos.
5) Cynical snotty people not praised for their genius.
Based on my limited reading comprehension skills, I predict that the above comment was about unicorns, leprechauns, and their impact on climate change.