PyMongo: convertation BSON to Dict not working

my code here:

import os

import bson
from bson.codec_options import CodecOptions

bson_codec_options = CodecOptions(
    datetime_conversion='DATETIME_AUTO',
    tz_aware=True,
    unicode_decode_error_handler='ignore'
)
if __name__ == '__main__':
    current_dir = os.getcwd()
    bson_files = []

    for file in os.listdir(current_dir):
        current_item = os.path.join(current_dir, file)
        if os.path.isfile(current_item) and file.endswith(".bson"):
            bson_files.append(file)

    for file in bson_files:
        
        with open(os.path.join(current_dir, file), 'rb') as fcache:
            data = bson.decode(fcache.read(), codec_options=bson_codec_options)
            print(data)

I try to convert BSON to JSON. But I get error:

File "C:\PyProjects\zulip\.venv\Lib\site-packages\bson\__init__.py", line 1094, in decode
return cast("Union[dict[str, Any], _DocumentType]", _bson_to_dict(data, opts))
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'str' object cannot be interpreted as an integer

I hope anyone had already encountered such error, what did you do?

The error indicates that the bson.decode function is receiving an invalid input type. The most likely cause is that you are passing the entire binary content of the file to bson.decode, but the function expects a proper BSON binary format.

The bson.decode function in the Python bson package is typically used for decoding BSON-encoded binary data representing a single document, not a file containing multiple documents or an improperly formatted BSON structure.

Here’s how you can resolve the issue:


Updated Code

import os
import bson
from bson.codec_options import CodecOptions

bson_codec_options = CodecOptions(
    datetime_conversion='DATETIME_AUTO',
    tz_aware=True,
    unicode_decode_error_handler='ignore'
)

if __name__ == '__main__':
    current_dir = os.getcwd()
    bson_files = []

    # Collect all BSON files in the current directory
    for file in os.listdir(current_dir):
        current_item = os.path.join(current_dir, file)
        if os.path.isfile(current_item) and file.endswith(".bson"):
            bson_files.append(file)

    # Decode BSON files
    for file in bson_files:
        with open(os.path.join(current_dir, file), 'rb') as fcache:
            bson_data = fcache.read()  # Read the raw binary data

            try:
                # Use bson.decode() to handle a single BSON document
                document = bson.decode(bson_data, codec_options=bson_codec_options)
                print(document)
            except Exception as e:
                print(f"Error decoding file {file}: {e}")

Explanation of Fixes

  1. Binary Content Handling:
    bson.decode() is designed to decode a BSON-encoded document, not arbitrary binary data. If the file contains multiple BSON documents or is improperly formatted, the function will fail.
  2. Error Handling:
    Added a try block to catch any decoding issues for individual files and log errors without crashing the entire script.
  3. Assumption of Single Document:
    If your BSON files contain a single document, the above code should work. However, if your files contain multiple BSON documents or a custom format (e.g., a BSON array), you will need to process the binary data accordingly (see below).

For BSON Files Containing Multiple Documents

If your BSON files contain multiple documents (e.g., similar to MongoDB exports), you may need to iterate over them using bson.decode_all:

import bson

# Decode multiple BSON documents from binary data
with open('your_bson_file.bson', 'rb') as f:
    bson_data = f.read()
    try:
        documents = bson.decode_all(bson_data)
        for doc in documents:
            print(doc)
    except Exception as e:
        print(f"Error decoding multiple documents: {e}")

Common Causes of the TypeError

  1. Improper File Format: The .bson file might not actually be BSON-encoded. Verify the format by inspecting the file or ensuring it originates from a BSON-compliant source (e.g., MongoDB).
  2. Partial Reads: Ensure the file is fully read in binary mode (rb) to avoid decoding issues.
  3. Invalid Content: Check that the .bson file contains valid BSON data. Corrupted files or non-BSON files renamed with a .bson extension can cause this issue.

Debugging Tips

  • Inspect the File: Use a hex editor or a tool like xxd to inspect the binary content of the .bson file and verify its structure.
  • Validate the Source: Ensure the .bson files were created using a BSON-compliant encoder (e.g., MongoDB).
  • Test with Smaller Files: Create a small test BSON document, encode it using bson.encode(), and then try decoding it with your script.

If you’re still facing issues, provide more details about the source of the .bson file or its expected structure.