+------------------------+
| Metadata               |
+----------+-------------+
| Template | guides      |
+----------+-------------+
| Image    | ![][image0] |
+----------+-------------+
| Category | Build       |
+----------+-------------+

+-------------------+
| Section Metadata  |
+---------+---------+
| style   | content |
+---------+---------+

![][image0]

# Indexing

Adobe Experience Manager offers a way to keep an index of all the published pages in a particular section of your website. This is commonly used to build lists, feeds, and enable search and filtering use cases for your pages or content fragments.

AEM keeps this index in a spreadsheet when using Google Drive or Sharepoint as backend, and offers access to it using JSON. Please see the document [Spreadsheets and JSON](https://main--helix-website--adobe.hlx.page/developer/spreadsheets) for more information.

## Setting up an initial index with the Index Admin Tool

The easiest way to create and manage your query index is via the **[Index Admin tool](https://tools.aem.live/tools/index-admin/index.html)**.

1. Open the Index Admin tool in your browser.
2. Enter your organization and site to connect to your project.
3. Click **Add Index** to create an initial index configuration.
4. Enter a **Name** for your index.
5. Locate the **Properties** section.
6. Add each property you want extracted from the rendered HTML page — for example, `title`, `image`, `description`, or `lastModified`.
7. When you’re done, click **Save**.

\
The following table summarizes the properties that are available and from where in the HTML page they’re extracted.

+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table                                                                                                                                                     |
+===========================================================================================================================================================+
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | Name           | Description                                                                                                                          | |
| +================+======================================================================================================================================+ |
| | `author`       | Returns the content of the meta tag named `author` in the `head` element.                                                            | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | `title`        | Returns the content of the `og:title` meta property in the `head` element.                                                           | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | `date`         | Returns the content of the meta tag named `publication-date` in the `head` element.                                                  | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | `image`        | Returns the content of the `og:image` meta property in the `head` element.                                                           | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | `category`     | Returns the content of the meta tag named `category` in the `head` element.                                                          | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | `tags`         | Returns the content of the meta tag named `article:tag` in the `head` element as an array.                                           | |
| |                |                                                                                                                                      | |
| |                | See the document [Spreadsheets and JSON](https://www.hlx.live/developer/spreadsheets#arrays) for more information on array-handling. | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | `description`  | Returns the content of the meta tag named `description` in the `head` element.                                                       | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | `robots`       | Returns the content of the meta tag named `robots` in the `head` element.                                                            | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
| | `lastModified` | Returns the value of the `Last-Modified` response header for the document.                                                           | |
| +----------------+--------------------------------------------------------------------------------------------------------------------------------------+ |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+

## Reindexing

After setting up or updating your properties, click **Reindex** in the Index Admin tool. This triggers a full reindex of your content against the new configuration.

Pages are indexed when they are published. To remove a page from the index, unpublish it.

## Setting up an initial index via Admin API

It’s also possible to create an index using the Admin API. For more information, visit: [Update Indexing Configuration](https://www.aem.live/docs/config-service-setup#update-indexing-configuration).

## Troubleshooting

### Check your index

The Admin Service has an API endpoint where you can check the index representation of your page. Given your organization, site and branch, and a resource path to a page, its endpoint is:

`https://admin.hlx.page/index/<org>/<site>/<branch>/<path>`

You should get a JSON response where the data node contains the index representation of the page.

### Debug your index configuration

The AEM CLI has a feature where it will print the index record whenever you change your query configuration, which assists in finding the correct CSS selectors:

`$ aem up --print-index`

Please see the [AEM CLI GitHub documentation](https://github.com/adobe/helix-cli) for more information and watch this [video](https://www.hlx.live/media_15501e6b07101255cb6ef99077f218666809ec354.mp4) to learn more about this feature.

### Inspect the audit log

Using the [Log Viewer Tool](https://tools.aem.live/tools/log-viewer/index.html) you can check whether the indexer reports any error related to your configuration. If you filter logs by `Indexer`, and see lines with a red dot, you can expand the lines and inspect the error reported.

## Custom index definitions

See the [Indexing reference](https://www.aem.live/docs/indexing-reference) for the full syntax of index definitions, including extraction functions and examples.

## Omitting published pages from the index

A common use case is to not index pages that have `noindex` in their `robots` metadata section. It is possible to filter those out for SharePoint and Google Drive content sources by defining a `FILTER` expression in a sheet called `helix-default` that is based on the `raw_index` sheet. With BYOM content sources, it is not possible to omit pages from the index. Instead, you can filter them client-side based on the value of their `robots` property.\
\
Note, that a sitemap will always ignore published pages with `noindex`, provided a column named `robots` exists in the index source.

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Pagination (Contained)                                                                                                                                                |
+--------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| :icon-arrow: Previous                                                    | Up Next :icon-arrow:                                                                       |
|                                                                          |                                                                                            |
| ### [Forms](https://main--helix-website--adobe.hlx.page/developer/forms) | ### [Keeping it 100](https://main--helix-website--adobe.hlx.page/developer/keeping-it-100) |
+--------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+

[image0]: https://main--helix-website--adobe.aem.page/media_154896ddb0d10ee236adc3592217d30238ede804c.jpg#width=1103&height=828
