Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python client logic for connecting the echogarden web socket #50

Open
iAmIlluminati opened this issue May 5, 2024 · 1 comment
Open
Labels
question Further information is requested

Comments

@iAmIlluminati
Copy link

I am trying alignment, the node client works. But having issue with doing it in python. I tried sending in base64 data for audio, it broke, couldn't pass bytes serialized for json.

import asyncio
import websockets
import json
from pydub import AudioSegment


async def align_audio():
    async with websockets.connect('ws://localhost:45054') as ws:
        print('Connected to the WebSocket server')

        audio = AudioSegment.from_file(
            './audio-files/event_24ce4d18-feb0-4de6-acf4-0eac5ae9ff03.mp3', format='mp3')

        raw_audio = audio.raw_data
        sample_rate = audio.frame_rate
        channels = audio.channels

        request = {
            'messageType': 'AlignmentRequest',
            'requestId': 'some-unique-request-id',
            'input': {
                'audioChannels': [raw_audio],
                'sampleRate': sample_rate
            },
            'transcript': 'Watson, we have a most intriguing case on our hands. The Tinderbox Killer has struck again, leaving behind his chilling signature - a small wooden matchbox containing a single match.',
            'options': {
                'language': 'en-US'
            }
        }

        await ws.send(json.dumps(request))

        response = await ws.recv()
        alignment_result = json.loads(response)
        print('Alignment result:', alignment_result)

        with open('alignment_result.json', 'w') as file:
            json.dump(alignment_result, file)
            print('Alignment result saved to alignment_result.json')

        print('Disconnected from the WebSocket server')

if __name__ == '__main__':
    asyncio.run(align_audio())

@rotemdan
Copy link
Member

rotemdan commented May 5, 2024

The server documentation says:

The protocol is based on binary WebSocket messages, for both request and response objects. Messages are encoded using the MessagePack encoding scheme.

The MessagePack page includes details about how to encode and decode it in many languages, including Python. For Python, it links to this library:

https://github.com/msgpack/msgpack-python

msgpack-python gives this basic usage example:

>>> import msgpack
>>> msgpack.packb([1, 2, 3])
'\x93\x01\x02\x03'
>>> msgpack.unpackb(_)
[1, 2, 3]

Edit: some extra information

Apply the MessagePack encoding over your entire message, and in the same way, decode the entire binary message sent back from the WebSocket using MessagePack.

I chose to use a full binary encoding of the entire message since makes it more flexible and easier to implement than needing to serialize / deserialize individual object properties. Also it's more uniform since all WebSocket messages are binary, and there's no need to consider text messages at all.

@rotemdan rotemdan added the question Further information is requested label Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants