AI’s Impact on Link Building | KGMID SEO (James Dooley Interviews Dan Petrovic)

James Dooley: Hi, today I’m joined with Dan Petravich and the topic of conversation today is about has AI affected link building strategies for SEO.

Dan Petravich: Hey James, how we doing? You all right?

James Dooley: Doing well. So with regards to link building then, what’s changed with regards to now artificial intelligence is upon us. What do you think’s changed with regards to link building strategies?

Dan Petravich: Yeah, look, I have a lot to say about the topic. I’ve presented on link building for many years. I stood on stage in front of very large audiences and I told them to clean up their act and do better. So I’d like to give it a little bit of history and maybe highlight where link building always fails.

Dan Petravich: Link building always goes as a sort of an afterthought in the SEO process and you’re always trying to make it fit the strategy that you already have. Right? So you start with, okay, we’ve got this thing we want to rank for. The page is already done. That’s finished. We need to get links for it somehow. And we’re just going to try to round a square peg. We’re trying to make the content, put it somewhere else, and then force the links to exist on that page. You know what I’m talking about. You’ve done links.

Dan Petravich: This we’ve been doing for a long time to the point where people who accept our links are now aware of what we’re doing and they ask for money. But not only that, they are fitting our silly narrative of one link for yourself for your client and two links to make it look natural. The most ridiculous thing I’ve ever heard. One for Wikipedia and one for some gov website to make it look natural. So guess what you’re doing when you do the one plus two formula. You’re basically putting a target on your link making it super obvious. Hello, I’m the only commercial link on this page and these two are fillers.

Dan Petravich: I get up on the stage, I think I was in Munich, and I say to people, this is what’s wrong at the moment, this is what I found. If I can spot your links so can Google. Nothing changed. People just keep doing the same thing. And those who accept our links now, they have policies that mirror that. They’re paring things back to us. They’re saying one link for yourself and two natural looking links.

Dan Petravich: And I was really furious about the whole thing because we ruined it for everybody. We trained the bloggers to expect that as well. So what did I do? Let’s get back into AI. And actually I’m going to go down to the machine learning level now.

Dan Petravich: TechCrunch, Mashable, Wired. I basically took top 10 biggest blogs in the world. Regardless of the topic, just by volume and readership, and I reviewed their link integrations. Just ad hoc view at everything. And I realised one thing that stood out for me straight away. Holy cow. 12 links on a page, 24 links on a page, 50 links on a page. Wow. When you go to those spammy guest post farms, one link, two links, three links, maybe four or five. That’s already an immediately obvious signal.

Dan Petravich: But I was like, what if I could train a model to think about links in the same way that these top level, highest quality blogs in the world think about links, and how they link out naturally. It took me a couple of months. I scraped all of them. I scraped TechCrunch, gigabytes of data. I pre-processed everything, cleaned up the text, extracted sentence by sentence, and I marked up every time a location in the link existed from two in the character count. And I would mark everything. This is a link, this is a link, this is a link. So basically, I ended up with gigabytes of content with markup where links used to be. Doesn’t matter where the link goes, but that’s a link.

Dan Petravich: So I pre-processed the data and I took a small off-the-shelf pre-trained model. I think it was Microsoft’s DeBERTa V2 or V3. And I fine-tuned that model using token classification. Token classification is not sequence classification. Sequence classification is positive sentiment, negative sentiment. Token classification goes down to the granularity of a single token. So basically it predicts the spans in the text which are more likely to be links than not.

Dan Petravich: So in my pre-processing I marked all the non-link text as zeros and all the link text as ones. That went into the model. Model converted into token IDs. I did my padding, batching. That machine in the background processed everything. I trained for a couple of days. Voila. A model that’s intuitive about links on the web.

Dan Petravich: So now I feed a blank page of text, no links, no markup, no HTML, nothing, just plain text. It’ll paint with great precision where a link falls in as learned from the best of the best of the web, how they link out naturally.

Dan Petravich: So, how can you use this? How can you use AI to improve link building? Two things. One, you’re writing an editorial piece and you’re trying to come up with ways to integrate your links. This will already paint the spots where links fit in naturally. So when you’re trying to think about where do I put the link on this page, put the link there. If there’s no nice place, rewrite your content, reprocess the content in the model, paint it again, pick the best spots. So that’s sort of like a link planning stage. And then you integrate that and then you do your outreach and place links for all your links that you’ve already generated in the past.

Dan Petravich: You can then run the processing, text extraction from all those linking pages from your link profile. You basically process your entire link profile and you run the analysis using this model, it’s called Lingert, and you run the text analysis and you do the predictions where the links naturally fit in that narrative. And you can basically do the scoring. Did I pick the same spot that the model picked?

Dan Petravich: So that’s your first level of research. Just to fit where the links fit naturally. The second thing is I have another model since we’re talking about AI and links. The second model is called Penguin and its job is to spot your link. So the sole purpose of the model is to see who wanted the link on that page. It effectively acts as a Google webspam member, goes, visits the page, reviews all the links. Is there one that’s obviously for commercial purposes. Who wanted a link on this page. If it cannot detect it says I don’t know. And if it can it flags the link and it flags the filler links, the ones used to make it look natural.

Dan Petravich: And I’ve been doing link profile analysis with this for two years now and the model outperforms human link builders on link detection. And I’m excited about this and nobody actually knows this. First time I’m talking about this.

Dan Petravich: I have an agentic flow in place now that takes a piece of text, tries to integrate the links in a certain way and then the Penguin algorithm tries to break it and if it fails goes back in the loop and it cycles until you can fit the link in such a way that it fools my link spam model. Basically, I have a writer and rewriter, an evaluator, going in an agentic loop, constantly looping.

Dan Petravich: I tried to fit a link in one of my, I wrote an article, I pretended I’m posting this on Moz.com, and I said, I want the link to this page to be on that article. Make it work. Went through 10 iterations, 20 iterations, 50 iterations, 100 iterations, it couldn’t make it work. My writer model, my link integrator model, my link builder model never could find a way to fool the judge.

Dan Petravich: And okay, so I want to leave this with everyone listening. If that’s the case, if you cannot make it fit, don’t do it. Don’t make that link.

James Dooley: So you’re saying relevance there is mightily important because otherwise you’re just trying to push, like you always say, like a square into a circle and it’s just not going to fit. Therefore you’ve got to try and do it. So almost less is more of going with the quality as opposed to just trying to force it.

James Dooley: I’ve got another question for you there then. Forget about the actual link, well it’s related to link building and AI. How important now is an implied link but not physical link being put on the page, like an unlinked mention, a branded mention, or whatever. How important has that become? More important with the AI or less important or what, like with regards to link building or corroboration?

Dan Petravich: For ranking purposes it doesn’t really matter. For training purposes it does matter. But where I find most utility is that there’s an interesting behaviour. It’s really cool that you mentioned that.

Dan Petravich: There’s an interesting behaviour that if you’re a well-known brand, here we go, going back to branding, if you’re a really well-known brand and you have a mention on somebody’s website that doesn’t have a link, Gemini in AI mode will fill it with a link.

James Dooley: Really?

Dan Petravich: Yeah.

James Dooley: I didn’t know that.

Dan Petravich: It’s like a gift.

James Dooley: Yeah. So basically, you know, you’re like Nike, Adidas, Under Armour, and then if it’s familiar with those brands, it’ll just link them up even if they’re just mentioned but not linked.

James Dooley: But this comes back down to again branding and if they’re familiar and they’ve got confidence and clarity that they know exactly who that brand is. Would that only do it if it’s got a KGM ID as being a known entity? Have you looked into it to see do they do it for some companies that might not have a knowledge panel?

Dan Petravich: If you’re not a known entity, it’s not going to happen. And I suspect you also have to be a source in the grounding. Anyway, not necessarily in that spot, like I’m saying it will fill that spot. But you have to be a source in the grounding because Gemini is obsessed about preventing hallucinations.

Dan Petravich: Not Gemini, Gemini is a model. I should say the Gemini app or the AI mode or AI overviews. They’ve had some recent embarrassments with glue and rocks and giving poor advice, poor health advice to people. So they are a little bit paranoid now and I think that’s the reason that they’re grounding everything with multiple sources.

Dan Petravich: So to prevent hallucinations they are only relying on things that are already in the grounding sources. So if you’re not in the grounding sources, if you’re not authoritative, I don’t think there’s a chance you’re going to get that gift of an AI mode result but then there’s like a link in there for you to click on. That I haven’t seen yet.

James Dooley: Some people might be watching this now and saying, let’s say what is a KGM ID, which stands for knowledge graph machine ID, and you mentioned there you need to be a source. Could anyone that’s got a real genuine business that aren’t a source at present, what’s the easiest way to build that authority and brand because you’ve mentioned on every single one of the episodes a key takeaway from every single one of them is brand, brand, brand, brand, and everything seems to relate back to being brand. The trust signals that come with a brand, the confidence that comes with a brand, the clarity that comes with a brand. How does someone make that real business into being a source?

Dan Petravich: Invent time machine, go back, I don’t know, seven years back, edit Firebase.

Dan Petravich: Before acquisition. I’m glad I spammed Firebase when I did because I got in where I wanted to get in. But was it Firebase? Am I getting the right?

James Dooley: We do in the UK, it’s Crunchbase, it’s a massive site.

Dan Petravich: Not Crunchbase. It was like the Google acquired one database. I’m pretty sure, I think it is Firebase. Yeah.

James Dooley: Firebase. Yeah.

Dan Petravich: I could be, it was a long time ago that they did that acquisition. Joke aside, time machines and everything. If you want to see how all this works, Google actually has a proper system of entities not just for knowledge graph and knowledge panels. They actually have all the entities mapped out.

Dan Petravich: And I even have an extension that helps you. You can go on Google search results page and you can hit that extension to see who is a known entity and it gives you the entity ID from Google’s knowledge graph. And I also have, let me see if I can find it.

James Dooley: Does that where is that pulling in from the knowledge graph API within Google?

Dan Petravich: Yeah. It just looks at the rendered source of the page and finds that.

Dan Petravich: Basically on dan.ai/tool, one of the many tools that I have there listed is Google entities. So you can basically do a search. You can just look up a name or a brand or a product and you can see if you have an entity in Google’s knowledge graph for that. That’s basically your proof that you’re like a registered known quantity with Google within the knowledge graph. That’s Google’s MID, machine ID.

James Dooley: Yeah.

Dan Petravich: Why is that relevant. That sort of logic and reasoning is throughout Google’s systems. If you look at Vertex documentation, whether you’re doing custom search, if you’re doing general Google search, MIDs are always there and you can ground with that. They have a complete knowledge graph on all the known entities.

Dan Petravich: Now there is no way to just download all that and map things out because that’s proprietary now. You can get it from old school, like frozen in time when the Firebase was snapshot. But there is an alternative. I’m wondering if I can think of it. Because I just recently was working on it and trying to map out all the entities in that.

Dan Petravich: There is, yeah, maybe we’ll sync up after the call. I’ll send you the link. The name escapes but it’s like a pretty well-known entity database.

James Dooley: Yeah, we’ll put the link in the description. We’ll find, send it me in a bit and we’ll put the link in the description.

James Dooley: But for me on this with regards to link building for AI and stuff like that, I know we’re talking a little bit about known entities and being a source or having a KGM ID. Everything around our business model now comes back down to not just ranking where it used to be. We used to be obsessed with just ranking in Google.

James Dooley: And obviously then we realised many years back to rank in Google you want brand, social media, and real traffic and engagement and everything else that comes with it. The second part is the knowledge graph, is trying to improve that confidence score in the knowledge graph for confidence and clarity. And then the third is in the LLMs trying to be not just cited but recommended in the LLMs.

James Dooley: And I think if within everything you do with your link building strategies, if you can try to align it to be helping your confidence score with who you are and what you do in the knowledge graph, try to corroborate and get the framing for LLMs, but then also get the rankings, I think those three together falling in line with each other is kind of what we’re doing with our link building strategies nowadays. Is there anything else on there then related to improving link building for AI?

Dan Petravich: Before I say that, Wikidata. Wikidata, yeah, yeah, yeah. Get on it.

Dan Petravich: Basically, I did something really important. What I’ll do is I’ll do a quick screen share just to show you what I’ve found. Basically I used all the Wikidata entities and I’ve drawn a parallel between entities that I’ve embedded, known entities that I’ve embedded, and I’ve done the semantic similarity between Gemini model and its little cousin Gemma.

Dan Petravich: And I found they’re basically in the same semantic space. The figures are different, but when you rotate the embeddings, when you mix things up, they always converge the same semantic thing. And I think there’s something about Wikidata, even if it’s not verbatim from Google’s knowledge graph, there’s something about Wikidata that’s of really, really strong utility for SEOs looking to gain an edge in not just SEO but also AI visibility.

Dan Petravich: I would seriously, I’m glad I didn’t forget about this. So yeah. Seriously check out Wikidata. It’s a great.

James Dooley: The only thing I would say on that, anyone who’s watching this, is making certain that they don’t go out creating a Wikidata account himself and editing it himself. If they don’t have some sort of knowledge of who they are online, I would start building up who you are online. Make ideally getting an entity home. So like having a jamesdooley.com and wrapping that. I mean schema helps to pull everything together but trying to pull that together.

James Dooley: Otherwise I know a lot of people that’s tried to create them and actually had them deleted. And once you try to then create it again it becomes hard. It’s almost like trying to create yourself a Wikipedia page before you actually deserve having a Wikipedia page. It’s the same with Wikidata. A lot of people have tried to create it and had it deleted.

Dan Petravich: Yeah, it’s not going to work. I refer to it as a resource for understanding the current makeup of the entities because it’s not just Google. There’s other systems and those systems will use this as both training data and sort of like a crutch to lean on for grounding of the models. So I think this is an important resource.

Dan Petravich: It didn’t cross my mind that I could try to inject my own entry in there. But because I think there has to be a parallel, an actual Wikipedia for this to work.

James Dooley: There doesn’t need to be a Wikipedia for it. There needs to have a Wikipedia page, but you can inject your own information. So if I’ve got a new brand, I can create that new brand or business and get it a Wikipedia kind of entrance. I need to be connecting it, ideally with other entities.

James Dooley: So if I say James Dooley is the founder of Petravich SEO as being an example, and I’m saying that that’s a business, because now it’s got connections with me that is an entity, then it works better. Where if you’re just trying you and you don’t have any sort of relevance online it becomes a lot more difficult. You need to connect the entities. It’s almost like nodes and edges. You need to be connecting those relationships together and the more connections you have on the web then it’s more likely to stick.

James Dooley: And creating those, what I would say is instead of it just being a hack to say everyone should go and add a Wikidata kind of things, so many people don’t add themselves to Wikidata and it’s so important to do it as long as you are genuine business and you have got those connections and stuff like that. But yeah if one’s not been created then go in and inject one and create one for sure.

James Dooley: Getting that in there is huge because then actually a lot of the time that triggers in time a knowledge panel, especially for an individual. It can trigger, if you can go and offer a book or even be on podcasts like this you can get an IMDb profile and stuff like that and all that adds to the confidence and clarity score of who Dan Petravich is. And it’s repeating who you are and what you do then it’s building the confidence score. It’s own little algorithm just on knowledge graph and who they are that I think Gemini is going to be leaning on more and more in time.

Dan Petravich: It’s very similar to Google’s internal knowledge graph. You mentioned graph, I actually built the full graph.

James Dooley: Really?

Dan Petravich: Yeah, full graph. So the whole of Wikidata. So I’m showing my age where I couldn’t recall the name. To be fair it is 8:00 p.m. I’m done. Mentally foggy already.

Dan Petravich: So what I did is I basically downloaded the whole dataset. I extracted the label information and then I built up the full undirected knowledge graph where I treat the labels, the text labels of each entity as a node and obviously I’ve got edges. But there was some data clean up in there because for each label you have multiple language versions as well.

James Dooley: Yeah.

Dan Petravich: So then you have to think about how to treat that and this and that. But I’m not going to go into details of that. I’m just reading from my screen now. It’s 68 GB file. So it’s a SQLite database with full connectivity and the screen share that you’ve seen earlier was, I actually did the embeddings, vector embeddings of the entire knowledge graph.

James Dooley: Yeah.

Dan Petravich: So I now have a semantic search engine that if I type in Rand Fishkin for example, it’ll give me high cosine similarities towards SEO but low cosine similarities towards cake making. Right. I actually have.

Dan Petravich: So basically what this gives you, this gives you a window of insight and these embeddings are generated by Gemini. So Gemini, how Gemini thinks about brands. So you can basically put your brand as a search term and it will return the most aligned concepts with that brand in the semantic space of the embedding model from Google. Same technology as Gemini, the journey model in AI search. Just think about the utility of that.

James Dooley: Yeah, it’s unbelievable.

Dan Petravich: It’s great for keyword research, great for clustering, great for keyword classification. It’s good for keyword gap analysis, content ideas. It’s just, link building is just insane that this data is free and available to us to use. But if it wasn’t for AI, I would have never been able to implement this. So just super grateful that we live post AI revolution where we can do all these things.

Dan Petravich: It’s crazy how nearly every single episode we’ve spoke about and this one is about link building for AI kind of comes back down to again, it’s building brand. It is genuinely getting yourself in the knowledge graph. In my opinion there’s only brands that are in there, or obviously there’s individuals, there’s people, there’s businesses and stuff like that. But the more confident they are in yourself, in it, Dan Petravich, it’s been an absolute pleasure.

James Dooley: We hope you like the video on link building and what has changed in the AI era. I strongly recommend checking out a couple of the links in the description. There’s one about the future of SEO and there’s another one which is over 45 minutes long about how to optimise for the LLMs, ChatGPT, Gemini, Perplexity, and all the other AI platforms of what there is. Dan, it’s been an absolute pleasure. Thank you very much.

Dan Petravich: Thanks, James.

Creators and Guests

James Dooley
Host
James Dooley
James Dooley is a UK entrepreneur.
AI’s Impact on Link Building | KGMID SEO (James Dooley Interviews Dan Petrovic)
Broadcast by