Gravatar Image

Frank Corso

Solving data challenges for growing companies

Currently accepting new data projects

I have 2–3 spots available for companies who have data challenges they need solved.

How To Use OpenSearch in Python To Index and Search Data

Published on

A feature or system that many apps and websites use is search. Whether it’s searching a blog, searching a dataset, or being able to filter and search in a report, search is a critical part of many tools.

For simple and small use cases, searching within a database can be enough. But, for projects where the searching might be fairly complex or the data might be very large, you might want to look into using a search engine.

OpenSearch is one of the most popular search tools and, in this article, I will explore how to use it with Python.

What Is OpenSearch?

OpenSearch started as an open-source fork of Elasticsearch and has continued to build out its own features and tools. Both OpenSearch and Elasticsearch are powerful suites of tools for managing and searching data.

They both use an approach to queries called query domain-specific languages (DSLs) that allow you to write queries with filters, sorting, aggregation, fuzzy matching, and more. Additionally, the query string searching is built on Apache Lucene, a powerful searching library.

Tools such as OpenSearch and Elasticsearch make it easy to set up search functionality in your applications. For example, you can add searching within your SaaS app so users can search across their data. Or, you can use it as an internal layer in places where querying database data might be inefficient.

There are many ways to use OpenSearch, including managed services, such as Amazon OpenSearch Service, or self-hosted instances.

Getting Started With OpenSearch

First, you will need to install the OpenSearch Python client using:

pip install opensearch-py

Next, you will need to run an OpenSearch instance.

For trying it out and testing locally, you can use the OpenSearch Docker image using:

docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=SomeRandomPassword123" opensearchproject/opensearch:latest

Once the container is running, you can set up the Python client to connect to it. There are many parameters the client accepts, but I normally only use a few. The following snippet is the SSL configuration for working with the default OpenSearch Docker image.

from opensearchpy import OpenSearch

host = 'localhost'
port = 9200
auth = ('admin', 'SomeRandomPassword123')

client = OpenSearch(
    hosts = [{'host': host, 'port': port}],
    http_compress = True, # enables gzip compression for request bodies
    http_auth = auth,
    use_ssl = True,
    verify_certs = False,
    ssl_assert_hostname = False,
    ssl_show_warn = False
)

Creating an Index

Within OpenSearch and Elasticsearch, you can create different types of indexes. These indexes are independent collections of documents that can be searched. Depending on your needs, you might have just a single index or could have many.

For example, if you have a blog, you might have a single index for posts where each document represents a single post. Or, you might have a larger blog with user functionality, so might have indexes for users, posts, and comments.

To set up an index, we can use the following code:

index_body = {
  'settings': {
    'index': {
      'number_of_shards': 4
    }
  }
}
client.indices.create(index='blogs-posts', body=index_body)

For indexes, OpenSearch uses shards as the basic unit of storage. The documents in an index are distributed across its shards.

Changing the number and types of shards isn’t trivial, so you’ll want to explore how to identify the right number of shards for your use case. 1 shard is often enough for small projects and for testing purposes. OpenSearch aiming for shards of 10-50 GB or less.

OpenSearch has a good overview guide on shard sizes: Optimize OpenSearch index shard sizes.

Indexing Documents

Now that we have an index, we can start adding documents to it. Documents are represented as JSON objects and can contain any data you want.

When indexing, you will specify the index name, the document, and, optionally, and ID to be able to update or delete the document later. Normally, I set the ID to match the record’s primary key in the database.

document = {
    'title': 'My First Post',
    'content': 'An example blog post'
}
client.index(
    index = 'blogs-posts',
    body = document,
    refresh = True, # Forces index to update right away so documents are searchable right away 
    id = 1
)

So, we can add a few initial, example documents to our index for testing:

documents = [
    {
      'title': 'Creating A Scatterplot Chart In Seaborn',
      'description': 'Scatterplots are a great way to visualize relationships between different dimensions. Learn how to create them in Python using Seaborn.',
      'author': 'Frank Corso',
    'year': 2026
    },
     {
      'title': 'How to Automate Kaggle Dataset Updates Using the Python API',
      'description': 'Maintaining datasets on Kaggle? This tutorial walks you through automating dataset updates using the Kaggle Python API. You\'ll learn how to authenticate, format the required data metadata file, and upload new versions programmatically—perfect for integrating into your data pipelines.',
      'author': 'Frank Corso',
         'year': 2025
    },
     {
      'title': 'Using OpenSearch',
      'description': 'Learn how to use OpenSearch.',
      'author': 'Frank Corso',
         'year': 2026
    }
]

for post_id in range(len(documents)):
    client.index(
        index = 'blogs-posts',
        body = documents[post_id],
        refresh = True,
        id = post_id
    )

Searching Documents

Now that we have some documents, it’s time to start searching for them. Opensearch uses a query DSL to define the search criteria that we can set up using a dictionary.

The dictionary can have several fields, but the most common ones are:

  • query - The query itself with its own fields and parameters
  • size - The number of results to return
  • sort (optional) - The order in which to sort the results

Within the query field, we use a dictionary for the query “type” and its parameters. The most used type is the “match” which looks in the fields you specify for a match to the word(s) you provide.

To set the query type, we use it as the key in the dictionary and then set the value to be a dictionary with the field to search in and the word(s) to search for.

For example, if we want to search for 5 titles containing the word “OpenSearch”, we can use the following Python code:

query = {
    'size': 5,
    'query': {
      'match': { # This key is the query type being used.
          'title': 'OpenSearch' # The key is the field to search in, and the value is the word to search for
      }
    }
}

results = client.search(
    body = query,
    index = 'blogs-posts'
)

results['hits']['hits']

The search returns a dictionary with several fields but the most important is in the nested hits field which will contain the ID and document itself, within the _source field. The hits dictionary will look like this:

[
  {
    "_index": "python-test-index",
    "_id": "2",
    "_score": 0.43321696,
    "_source": {
        "title": "Using OpenSearch",
        "description": "Learn how to use OpenSearch.",
        "author": "Frank Corso",
        "year": 2026
    }
  }
]

Now, you will often need to sort the results by a specific field. So, we can add a sort field to the query dictionary.

In the following example, I am searching for posts with the word “Python” in the description and sorting them by year.

query = {
    'size': 5,
    'query': {
      'match': {
          'description': 'Python' 
      }
    },
    'sort': [
        {
            'year': {
                'order': 'asc'
            }
        }
    ]
}

results = client.search(
    body = query,
    index = 'blogs-posts'
)

OpenSearch also supports fuzzy matching, which allows for typos and misspellings. We can enable this using the fuzziness parameter which can accept several values but the easiest is AUTO.

In the following example, I am searching for posts with the word “Ptyhon” in the description. This will still return the same posts from the previous example, thanks to fuzzy matching.

Note that when we add parameters into the field, we convert the value into a dictionary with its own query key for the value.

query = {
    'size': 5,
    'query': {
      'match': {
          'description': { 
              'query': 'Ptyhon',
              'fuzziness': 'AUTO'
          }
      },

    }
}

results = client.search(
    body = query,
    index = 'blogs-posts'
)

Something to keep in mind is that the query string here can have multiple words but the default is to use an OR operator. For example, if you searched for “Python Kaggle”, OpenSearch will search for values with “Python” or “Kaggle”.

We can change this to an AND operator by adding the operator parameter. Then, OpenSearch will only return results that contain both words.

query = {
    'size': 5,
    'query': {
      'match': {
          'description': {
              'query': 'Ptyhon Kaggle',
              'fuzziness': 'AUTO',
              'operator': 'AND'
          }
      },

    }
}

results = client.search(
    body = query,
    index = 'blogs-posts'
)

Searching a single field works well, but you will often want to search across many fields in a document instead of just one, such as in a global search feature. To do this, we can change our query type to multi_match.

Multi-match allows us to specify multiple fields to search in and will return results that match any of the fields.

query = {
    'size': 5,
    'query': {
      'multi_match': {
          'query': 'Ptyhon',
          'fields': ['title', 'description'],
          'fuzziness': 'AUTO'
      },
    }
}

results = client.search(
    body = query,
    index = 'blogs-posts'
)

Deleting Documents

As your app creates and updates data, you may need to delete a document from the index. To do this, we can use the delete method as shown below.

client.delete(
    index = 'blogs-posts',
    id = '2'
)

Deleting the Index

Finally, after testing or in rare cases in production, you can delete the index using the indices.delete method as shown below.

client.indices.delete(
    index = 'blogs-posts'
)

Next Steps

Great! Now you know the basics of using OpenSearch with Python. From here, you can explore some more advanced features and use cases, such as: