
Technical SEO Checklist
Crawling and Indexation
The first thing to look at during the technical audit is how your site is indexed and crawled by search engines. After all, if the pages on your site cannot be crawled, they will not be indexed (with few exceptions). As a consequence, the pages not represented in the index will not participate in the ranking.
Go through the Page Indexing Report in Google Search Console
The most accurate and reliable way to analyse the indexing of your website is to analyse the Page Indexing Report in Google Search Console.
Look at the indexed pages report and check which pages are in the index. See if there are pages with filtering or sorting options, if there are test pages or other pages that you don’t want to index.
Also, look at pages that have been excluded.
Not all statuses in the Excluded Pages report are a problem. You should not focus your attention on all excluded pages, but only on those where Google’s behavior does not match your intentions.
In the table below, you can see the statuses that tend to require attention and deeper analysis:
Status | What it means | What you should do |
---|---|---|
Redirect Error | Google was unable to follow the URL due to redirect issues. |
|
Server Error | The server returned a 5xx error. |
|
Discovered – not indexed | Google knows about the page but has not crawled it yet. Indicates issues with crawling budget. |
|
Crawled – not indexed | Google visited the page but chose not to index it. Usually indicates low page quality. |
|
Duplicate without user-selected canonical | Google considers the page a duplicate, but you did not specify a canonical. |
|
Duplicate, Google chose different canonical than user | Google ignored your specified canonical. |
|
Soft 404 | The page looks “empty” or “not found,” but returns a 200 OK status. |
|
The other statuses probably do not signal any problems. However, these reports are also worth reviewing to make sure that the pages have not been removed, redirected, canonicalized, or blocked from indexing by mistake.
Status | What it means | What you need to know |
---|---|---|
Alternate page with proper canonical tag | Google correctly acknowledged the canonical you specified. |
|
URL blocked by robots.txt | Google cannot crawl the page. |
|
URL marked ‘noindex’ | The page has the noindex directive. |
|
Not found (404) | The page does not exist. |
|
Blocked due to unauthorized request (401)/ Blocked due to access forbidden (403) | The page is blocked by authorization or forbidden. |
|
Page with redirect | The page redirects to another. |
|
URL blocked due to other 4xx issue | The page is inaccessible due to a 4xx error other than 404 (e.g., 403, 401, 410, etc.). |
|
In the Google Help Center, you can find a comprehensive description of the page report, including examples of issues and a detailed explanation of each status.
Screaming Frog can also help with analyzing pages that are indexed or excluded from the index. To do this, you need to connect the Google Search Console API before starting the site crawl.
To connect, go to Configuration -> API Access -> Google Search Console. Click on Sign in with Google and follow the instructions.

Source: Screaming Frog
Once connected, enable URL inspection, and you can also enable the option to ignore indexing inspection for URLs that cannot be indexed.

Source: Screaming Frog
You will then be able to see and compare the status of each page according to Search Console (the way Google sees it) and its actual status as determined during the crawl process.

Source: Screaming Frog
Please note that only 2000 URLs per day are available for each site, so this method is more suitable for small sites.
Check what is in your sitemap.xml
Sitemap.xml is an XML file that provides search engine crawlers with a list of pages on a site, as well as (optionally) information about their last modification date, update frequency, and recommended crawl priority.
It is usually placed at the root of the site, for example: https://example.com/sitemap.xml. Sitemap.xml helps search engines find new or updated pages faster. In addition, the inclusion of a page in this file is one of the signals for determining the canonical version of a page, albeit a weak one.

Source: e-commerce sport store
The sitemap.xml file is particularly useful for:
- new sites with few external links;
- large sites with many pages;
- sites with a lot of media content;
- news sites that are updated frequently.
Sitemap.xml should contain all the pages you want to index.
You can use the same Screaming Frog or other crawlers to analyze the pages included in Sitemap.xml. In Screaming Frog, sitemap.xml can be scanned separately in List Mode, or it can be included in a regular site scan. To do this, in Configuration -> Spider -> Crawl, activate XML sitemap scanning and add the absolute URLs of the sitemaps you want to crawl.
It is not recommended to use various online services for generating a Sitemap, as they may only generate a static sitemap that will not be automatically updated. The optimal option is to generate the sitemap.xml using plugins for the CMS on which the site is running, or to write a custom script that generates the sitemap according to specified conditions and automatically updates it when changes are made to the site.
When generating the sitemap.xml, make sure your file complies with the sitemap.xml protocol. You can use various online validators for this, such as https://www.xml-sitemaps.com/validate-xml-sitemap.html.
Is it necessary to include all the tags listed in the protocol? Not always. For example, Google only considers the <loc> and <lastmod> tags. Be sure that the date in the <lastmod> tag is accurate. If there are attempts to manipulate it, Google may ignore this tag.
Ensure there are no mistakes in robots.txt
The robots.txt file is the first place a search bot looks before crawling a site. It determines which sections of the site can or cannot be crawled, and, as a result, which pages will be indexed by search engines. It should always be located at https://example.com/robots.txt.
This file is a tool for managing crawling (not indexing!) of the site. Some pages, even if they are blocked in robots.txt, can still be indexed (usually if there are internal or external links to them). Such pages (indexed despite being blocked in robots.txt) can be seen in Google Search Console in the report “Indexed, though blocked by robots.txt”.

Source: Search Console
Here’s what to be sure to check regarding the robots.txt file as part of a technical SEO audit:
- Availability of the file
The file should be accessible at https://example.com/robots.txt and give a 200 OK response status. Its absence, download errors, or redirects (301, 302, 403, 404) can prevent search engines from correctly understanding the site’s crawling rules.
- Syntax and correctness
Check that the file structure follows the standard. Example of a basic template:
- User-agent: *
- Disallow: /admin/
- Allow: /public/
- Sitemap: https://example.com/sitemap.xml

Source: nike.com
- Disallow and Allow directives
Check that important pages are not accidentally disallowed, e.g:
- Home (/)
- Product Cards (/product/)
- Blog or articles (/blog/, /articles/)
A common mistake is blocking images, styles, and scripts when blocking administrative folders. In such a case, it should be specified that although the administrative folder is blocked, some types of files should be open for scanning. This often happens on WordPress sites when the folder with all user content, Disallow: /wp-content is blocked.
In this case, only files of a certain format can be opened for scanning:
- Allow: /wp-content/uploads/*.css
- Allow: /wp-content/uploads/*.js
- Allow: /wp-content/uploads/*.jpeg
To validate your robots.txt and test the directives you go to add, you can use this tool.
- Check compatibility with other directives
Errors often occur when robots.txt conflicts with:
- meta tag <meta name=“robots” content=“noindex”>
- canonical
For example, if a page is open in robots.txt but blocked via noindex, it will be crawled, but will not get into the index. This is acceptable, but it is important that it is done intentionally.
Also, a common problem is when there are other instructions for bots in the source code and a simultaneous blocking of the page in robots.txt. Search engine robots do not scan pages blocked in robots.txt. They do not see the tags specified in the code, for example, canonicalization. That is, such a canonical will simply be unaccounted for.
Check your internal linking
One of the key tasks of a technical audit is to ensure that the site’s internal linking works correctly. This means that all internal links must lead to real, existing pages that are open for indexing, return a 200 OK status code, do not contain redirects, and, most importantly, do not point to pages with 4xx/5xx errors. At first glance, this may seem like a minor detail, but in practice, even incorrect internal links can negatively affect:
- The efficiency of website crawling by search engines,
- The distribution of internal SEO weight (PageRank),
- User experience.
The first step in the analysis is to check all internal links for errors. It is especially important to identify broken links that lead to pages with a 404, 410, or other errors (such as 403, 500).
Below is a table with the main types of errors that can occur in internal links, their meaning, and recommended actions to fix them.
Error Type | What it means | What to do |
---|---|---|
404 | Page not found | Remove the link or replace it with a working one |
403 | Access forbidden | Check access settings |
301/302 | Redirect | Update the link to the final URL |
5xx | Server error | Check the server or CMS |
It is also important to analyze the depth of page hierarchy, meaning to determine at what level and how many clicks away from the homepage the key content is located. It is preferable for important pages to be no deeper than the third level — this increases their accessibility for both search engines and users.
One of the key elements of analysis is identifying “orphaned” pages — those that have no internal links pointing to them. Even if these pages are included in the sitemap, the lack of internal links makes them less accessible.
Additionally, it is important to analyze anchor texts — the words and phrases that contain links. They should be relevant and meaningful, as anchor texts help search engines understand the context of the link.
Analyze the crawl statistics
Crawl Statistics analysis is a way to understand how Googlebot interacts with a site: which pages are crawled, how frequently, and how this affects SEO. This data is available in Google Search Console → Settings → Crawl Statistics. In the table below, you can see the most common issues that you can find out in this report:
Issue | What to look for in the report | Possible causes |
---|---|---|
Sharp decrease in crawling | Fewer crawls per day | Accessibility issues, incorrect settings in robots.txt, blocks, 5xx errors |
Many 4xx and 5xx errors | Errors in URLs | Deleted pages, broken links, server issues |
Response time increased | >1 second — a warning sign | Hosting problems, server overload |
Many 3xx redirects | Redirects instead of direct URLs | Incorrect redirects, redirect chains, a large number of internal links with redirects |
CSS/JS not crawled | They are missing from the statistics | Blocked by robots.txt |
Additionally, server logs can be analyzed. They allow you to see the actual requests from search bots (not only Googlebot but also Bingbot, YandexBot, and others), rather than just aggregated data from Google Search Console.
This is an advanced, “raw” diagnostic method that requires a significant amount of time. To visualize the data, you can use open-source tools like GoAccess or Screaming Frog Log File Analyser.
Implement structured data
Structured data is a special markup format on a webpage that helps search engines understand the content of the page more accurately and deeply. It serves as a “hint” for Google and other search engines about what exactly is on the page — an article, product, recipe, review, video, etc. While it is not an official ranking signal, it indirectly affects rankings by improving how search engines understand the page.
The main standard or protocol used for structured data on websites is Schema.org. There are other protocols, such as OpenGraph, but it is used for social networks.
Schema.org is a collaborative project by Google, Microsoft, Yahoo, and Yandex, created to develop and maintain a unified standard for structured data on the web.
Schema.org includes hundreds of entity types, with the most commonly used listed in the table below:
Category | Entity (@type) | Purpose |
---|---|---|
Content and Pages | Article | An article or news content |
BlogPosting | A blog post | |
NewsArticle | A news article for Google News | |
FAQPage | A Frequently Asked Questions (FAQ) page | |
HowTo | A step-by-step guide | |
WebPage | General information about a webpage | |
Products and Offers | Product | Product description |
Offer | Price offer | |
AggregateOffer | Price range for a product from different sellers | |
Reviews and Ratings | Review | A review of a product or service |
Rating | A numerical rating (often within a Review) | |
AggregateRating | Average rating based on multiple reviews | |
Organizations and People | Organization | A description of a company or brand |
LocalBusiness | A local business with contact information and schedule | |
Person | A person (e.g., article author, speaker, etc.) | |
Events | Event | An online or offline event |
Navigation and Structure | BreadcrumbList | Breadcrumbs navigation |
SiteNavigationElement | Main menu items | |
Multimedia | VideoObject | Video with metadata (for video snippets) |
ImageObject | Image with description | |
Education and Jobs | Course | An online course or training program |
JobPosting | Job vacancy (for Google for Jobs) |
It is recommended to implement structured data in the JSON-LD format. This block is placed in the <head> or <body> of the HTML document, but it is not displayed to the user — it is read by search bots. All major search engines, such as Google, Bing, and Yahoo, support this format. An example of JSON-LD code is shown below:
<script type=”application/ld+json”>
{
“@context”: “https://schema.org”,
“@type”: “Article”,
“headline”: “What is JSON-LD?”,
“author”: {
“@type”: “Person”,
“name”: “John Smith”
},
“datePublished”: “2025-12-01”
}
</script>
When implementing structured data, follow the Schema.org protocol and use the validator to check the correctness of the implemented microdata types.Some types of structured data from the Schema.org protocol can also help with the display of rich snippets in Google search results.
Note that Google’s requirements for structured data for rich snippets differ slightly from the Schema.org standard. Often, more fields need to be specified than what the Schema.org protocol requires.So, if you want to achieve a Rich Snippet, follow Google’s guidelines for structured data. You can check the correctness of the microdata implementation using the rich snippet validator.
There are also many microdata generators, but they can only create static code that will not be updated with content changes on the page. Ensuring that the information in the microdata matches what is visible to users on the page is part of Google’s requirements for structured data. If the policy regarding structured data is violated, the page may lose all rich snippets and, in some cases, face manual penalties. Therefore, make sure your microdata is auto-generated and auto-updated.
Content
As part of a technical SEO audit, it is important to evaluate the basic content characteristics: from the structure of headings and meta tags to the presence of alt attributes for images and potential duplicate pages. These elements directly affect both indexing and how search engines perceive the site.
Test your website for full duplicates
Full duplicates occur when identical content is accessible through different URLs on the site. Duplicates can completely harm your site’s rankings.
The most common types of full duplicates are:
- Accessibility via both HTTP and HTTPS
- Accessibility with and without WWW
- Accessibility with or without a trailing slash
- Accessibility of URLs in uppercase and lowercase
- The page is accessible with file extensions like .html, .htm, .php, .aspx, and without them
- Parameters that do not change the page content, such as UTM tags
- Identical content under different URLs. For example, a product is listed in two categories, accessible via two different URLs. Or the product page accessible with and without the category in the URL.
- Test versions of the site (DEV domain used for development).
To find page duplicates related to URL variations, test the URLs manually and check the server response code for those URL variants. You can use any tool to check the server response codes, such as https://httpstatus.io/. Enter the URL variations and check their accessibility.

Source: httpstatus.io/ website + test of a client’s website
To fix issues with variations in HTTP/HTTPS, www/without-www, with/without slash, upper/lower-case, and the accessibility of pages with extensions like .html, .htm, .php, .aspx, and without them, it is necessary to set up a 301 redirect to the preferred version.
When duplicates are found due to the availability of identical content by adding or removing parts of the URL (for example, a product is available in two categories), it is best to reconsider the URL structure and the site structure. For UTM and other parameters, canonicalization can also be a solution. However, it’s important to note that Google treats the canonical tag as a recommendation, and the final decision on which URL to choose remains with Google.
If a test version of the site is found in the Google index, it should be blocked from indexing, and a request for its removal should be sent through Google Search Console.
Resolve partial page duplicates
Partial page duplicates occur when two or more pages on the site contain very similar, but not completely identical content. The most common types of partial duplicates are:
- Sorting pages
- Filter pages
- Pagination pages
- Pages with similar products (e.g., products differ only by color)
- Multiple versions of the site in the same language, but for different regions (e.g., three English sites for the USA, UK, and Australia).
Of course, every site is unique, and during a technical audit, you may identify other cases of duplicated content that require specific solutions. However, the examples above are the most common.
Partial duplicates are typically found during the site crawling process by various crawlers. They will have repeating parameters and may have the same title and H1 as the main category pages.
To eliminate partial duplicates, you cannot set up a redirect, as these pages are needed for the site’s functionality. Below, we will discuss methods for dealing with partial duplicates.
Sorting and Filtering Pages
These pages can be blocked from crawling in the robots.txt file, though this may be ignored by Google, especially if links point to these pages. This will help preserve the crawling budget.
You can also block them via the <meta name=”robots” content=”noindex, nofollow” /> directive, which will prevent these pages from being indexed but will not tell Google that they should not be crawled.
The best approach in this case is to use JavaScript to update the content on the page when the user applies sorting or filters, without generating additional URLs and links to filtering or sorting pages.
Product Variants Available at Different URLs
Ideally, all product variants should be combined on one page, where the user can select the desired color or size without changing the URL, using JavaScript. However, if a separate page is used for each variant, a canonical link to the main product page should be specified. However, as mentioned earlier, Google may ignore the canonical set by the user.
Pagination Pages
Pagination pages should not be blocked from indexing. To ensure that Google considers the first page of the category as the main one:
- Only include the first page in the sitemap.xml file.
- Add a link to the main category page on all pagination pages.
- Add page numbers to the title and H1 of the pagination pages. For example, “White Shirts – Page 2.”
Pages available in one language but for different regions
In this case, Hreflang attributes need to be used. They are used to tell search engines which language and regional version of a webpage they should show to users based on their language preference and location.
There are several ways to implement Hreflang attributes:
- In HTTP headers
- Via tags in the <head> section
- Via tags in sitemap.xml
The easiest method to implement is through tags in the <head> section.
There are the rules that hreflang attributes implemented via tags in <head> section should meet:
-
- The attribute should have the following format: <link rel=”alternate” hreflang=”lang_code_country_code” href=”url-of-page” />
- Language and country codes should be valid. To choose the valid code for each language mutation, please see this page.
- Each language version must list itself as well as all other language versions in its hreflang attributes. It means that each page must have the same number of hreflang attributes
- Links in hreflang attributes should be absolute and indexable.
An example of a code:
<link rel=”alternate” href=”https://example.com/en-us/page” hreflang=”en-us” />
<link rel=”alternate” href=”https://example.com/en-gb/page” hreflang=”en-gb” />
<link rel=”alternate” href=”https://example.com/en-us/page” hreflang=”x-default” />
Check titles, h1, h2s and descriptions for duplicates
Although titles, descriptions, and H1-H6 headers are related to on-page SEO, their analysis within a technical audit can be useful for detecting duplicates.
To aanalysethem, you can use any crawler that collects these tags.
When duplicate titles, H1-H6 tags, and descriptions are found, analyze the page data and identify the cause of the duplication. This can be due to the availability of the site via both HTTP and HTTPS, duplication of the main category tags on filter pages, or simply a human mistake where these tags were incorrectly filled out.
Optimize alt attributes for images
Alt attributes are an HTML attribute used inside the <img> tag like this: <img src=”image.jpg” alt=” Description of image”>. Its main purpose is to provide a text description of the image content. This text is shown if the image fails to load and is read aloud by screen readers to assist visually impaired users. Proper, descriptive alt text can help your images rank in image search and improve the overall relevance of the page.
If you have a website with a lot of visual content, then optimization of alt attributes is a more important step than for classical websites that rely on text content.
Many crawlers like Screaming Frog, Ahrefs, SemRush, etc. analyze alt attributes, and there you can get the data about missing or empty alt attributes.
You can read more about the creation of descriptive alt attributes in official Google documents.
Website speed, mobile, and user-friendliness
Use HTTPs protocol
Using the secure HTTPS protocol is essential to ensure the security of data transmission between the user and the server. It not only increases user trust but also has a positive impact on SEO. To check for HTTPS, simply look at the browser’s address bar — a padlock icon should appear.
For a detailed analysis, you can use the SSL Labs service, which will provide a full report on the status of the SSL certificate and identify any potential issues.
It is also important to ensure there is no mixed content — HTTP resources on HTTPS pages. For this analysis, you can use the HTTPS report in Google Search Console, which will show URLs with both HTTP and HTTPS.

Source: Search Console
Source: Search Console of our client
Improve Core Web Vitals
Core Web Vitals is a set of metrics proposed by Google to assess the quality of user experience on a website. These metrics focus on loading speed, interactivity, and visual stability of content on the page. They include three key indicators:
Metric | Description | Optimal Value |
---|---|---|
Largest Contentful Paint (LCP) | Measures the load time of the largest visible element on the page (e.g., image or text). | Less than 2.5 seconds |
First Input Delay (FID) | Measures the time it takes for the page to respond to the first user interaction (e.g., clicking a button or link). | Less than 100 milliseconds |
Cumulative Layout Shift (CLS) | Assesses the visual stability of the page, i.e., how much elements move during page load. | Less than 0.1 |
The data that was collected from real users can be viewed in the Search Console Report “Core web vitals” (aggregated data) or in PageSpeed Insights (for individual tests). While working on Core Web Vitals, keep in mind that you need to define the problems that have a big influence on the CWV metrics. For example, while optimizing LCP, you need to define which of the 4 aspects (TTFB, Load Delay, Load Time, or Render delay) contributes the most to the high LCP score.
In the example below, it’s visible that we don’t need to focus on optimization of TTFB or Load Time. Instead, we can put all our energies into improving Load Delay and then Render Delay.

Source: pagespeed.web.dev
Source: https://pagespeed.web.dev/ – test of nike.com website (just for example). Domain is blurred
Ensure your website is mobile-friendly
Mobile-friendliness has become a crucial factor since 2018 when Google shifted to a mobile-first indexing approach. This means that Google now primarily uses the mobile version of a website for ranking and indexing, rather than the desktop version.
In Google Search Console, you can test your pages by clicking on “Test Live URL” in the URL inspection tool and see how Googlebot-Mobile sees it.
Compress images
Image optimization aimed at compressing them without losing quality helps speed up the loading of the website, especially if there is a lot of graphic content on the pages.
Online tools such as TinyPNG or Squoosh can be used to compress images. It’s also worth checking if modern image formats, such as WebP, are being used, as they can significantly reduce file size.
Use CDN for international websites
Using a CDN makes sense if your website serves a wide range of geographically distant regions.
A CDN (Content Delivery Network) distributes the site’s content across servers located closer to users, reducing latency during loading. You can check for CDN usage by examining HTTP request headers in the browser’s developer tools (Network tab), where references to the CDN provider, such as Cloudflare or Akamai, may appear. There are also online tools for testing CDN. CDN configuration is typically done through the hosting panel or CMS.
Use caching
Caching allows browsers and proxy servers to store copies of resources, reducing server load and speeding up loading on subsequent visits. You can check caching correctness in the browser’s developer tools — in the Network section, look at the Cache-Control, Expires, and ETag headers. Google PageSpeed Insights also provides recommendations for caching. It is important that static resources (images, scripts, styles) have proper caching settings, and the server should have the corresponding rules configured (e.g., in .htaccess or nginx configuration). To check caching, you can use online services like GiftOfSpeed.
Conclusion
A technical audit of a website is not a one-time procedure, but an ongoing process that requires regular attention to the technical factors that can impact its performance and visibility. As each website is unique, the specific focus and frequency of checks will vary. This checklist for a technical SEO audit will help you to ensure that you haven’t forgotten anything important.