Skip to content

chore: add speech to text button integration with MEAI #334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# SpeechToTextIntegration Demo

This project demonstrates the integration of the Telerik UI for Blazor `SpeechToTextButton` component with a transcription model, such as OpenAI's `whisper-1`. It provides a simple Blazor UI for recording audio and transcribing speech to text, showcasing how to connect the UI component to a backend speech-to-text service.

## Main Purpose
- **Showcase**: Illustrates how to use the Telerik `SpeechToTextButton` in a Blazor application.
- **Integration**: Demonstrates sending recorded audio to a transcription model (e.g., OpenAI Whisper) and displaying the transcribed text in the UI.
- **Extensibility**: Serves as a starting point for integrating other speech-to-text models or services.

## Configuration Notes
- **Model Registration**: The setup for registering a transcription model (such as OpenAI Whisper or others) may vary. Refer to the specific model's documentation for registration and authentication steps.
- **Audio Recording**: The requirements for the recorded audio (file size, type, encoding, etc.) depend on the chosen transcription model. Ensure that the audio format produced by the UI matches the model's expected input.
- **Customization**: You may need to adjust the audio recording logic or backend integration to support different models or to optimize for accuracy and performance.

---
For more details, see the source code and comments in the `Home.razor` component.
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio Version 17
VisualStudioVersion = 17.14.36109.1 d17.14
MinimumVisualStudioVersion = 10.0.40219.1
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "SpeechToTextIntegration", "SpeechToTextIntegration\SpeechToTextIntegration.csproj", "{3F2BEC52-4F23-42C6-8791-3DC6CA813DB1}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Telerik.Blazor", "..\..\..\..\blazor\Telerik.Blazor\Telerik.Blazor.csproj", "{AF9263B3-0FD2-6644-74FE-84A802165E95}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{3F2BEC52-4F23-42C6-8791-3DC6CA813DB1}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{3F2BEC52-4F23-42C6-8791-3DC6CA813DB1}.Debug|Any CPU.Build.0 = Debug|Any CPU
{3F2BEC52-4F23-42C6-8791-3DC6CA813DB1}.Release|Any CPU.ActiveCfg = Release|Any CPU
{3F2BEC52-4F23-42C6-8791-3DC6CA813DB1}.Release|Any CPU.Build.0 = Release|Any CPU
{AF9263B3-0FD2-6644-74FE-84A802165E95}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{AF9263B3-0FD2-6644-74FE-84A802165E95}.Debug|Any CPU.Build.0 = Debug|Any CPU
{AF9263B3-0FD2-6644-74FE-84A802165E95}.Release|Any CPU.ActiveCfg = Release|Any CPU
{AF9263B3-0FD2-6644-74FE-84A802165E95}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {1E0CB172-1F2C-4A5B-8DC3-67C1D8A23B53}
EndGlobalSection
EndGlobal
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<!DOCTYPE html>
<html lang="en">

<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<base href="/" />
<link rel="stylesheet" href="bootstrap/bootstrap.min.css" />
<link rel="stylesheet" href="app.css" />
<link rel="stylesheet" href="SpeechToTextIntegration.styles.css" />
<link rel="icon" type="image/png" href="favicon.png" />
<link rel="stylesheet" href="_content/Telerik.UI.for.Blazor/css/kendo-theme-default/all.css" />
<script src="_content/Telerik.UI.for.Blazor/js/telerik-blazor.js" defer></script>
<HeadOutlet @rendermode="InteractiveServer" />
</head>

<body>
<Routes @rendermode="InteractiveServer" />
<script src="_framework/blazor.web.js"></script>
</body>

</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
@inherits LayoutComponentBase

<div class="page">
<main>
<article class="content px-4">
@Body
</article>
</main>
</div>

<div id="blazor-error-ui">
An unhandled error has occurred.
<a href="" class="reload">Reload</a>
<a class="dismiss">🗙</a>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
.page {
position: relative;
display: flex;
flex-direction: column;
}

main {
flex: 1;
}

.sidebar {
background-image: linear-gradient(180deg, rgb(5, 39, 103) 0%, #3a0647 70%);
}

.top-row {
background-color: #f7f7f7;
border-bottom: 1px solid #d6d5d5;
justify-content: flex-end;
height: 3.5rem;
display: flex;
align-items: center;
}

.top-row ::deep a, .top-row ::deep .btn-link {
white-space: nowrap;
margin-left: 1.5rem;
text-decoration: none;
}

.top-row ::deep a:hover, .top-row ::deep .btn-link:hover {
text-decoration: underline;
}

.top-row ::deep a:first-child {
overflow: hidden;
text-overflow: ellipsis;
}

@media (max-width: 640.98px) {
.top-row {
justify-content: space-between;
}

.top-row ::deep a, .top-row ::deep .btn-link {
margin-left: 0;
}
}

@media (min-width: 641px) {
.page {
flex-direction: row;
}

.sidebar {
width: 250px;
height: 100vh;
position: sticky;
top: 0;
}

.top-row {
position: sticky;
top: 0;
z-index: 1;
}

.top-row.auth ::deep a:first-child {
flex: 1;
text-align: right;
width: 0;
}

.top-row, article {
padding-left: 2rem !important;
padding-right: 1.5rem !important;
}
}

#blazor-error-ui {
background: lightyellow;
bottom: 0;
box-shadow: 0 -1px 2px rgba(0, 0, 0, 0.2);
display: none;
left: 0;
padding: 0.6rem 1.25rem 0.7rem 1.25rem;
position: fixed;
width: 100%;
z-index: 1000;
}

#blazor-error-ui .dismiss {
cursor: pointer;
position: absolute;
right: 0.75rem;
top: 0.5rem;
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
@page "/"

@using Microsoft.Extensions.AI

@inject IJSRuntime JSRuntime
@inject ISpeechToTextClient SpeechToTextClient

<TelerikTextArea @bind-Value="@TextValue"
Width="300px"
ShowSuffixSeparator="false">
<TextAreaSuffixTemplate>
<span class="k-spacer"></span>
<TelerikSpeechToTextButton OnStart="@OnStartHandler"
OnEnd="@OnEndHandler"
FillMode="@ThemeConstants.Button.FillMode.Flat"
IntegrationMode="@SpeechToTextButtonIntegrationMode.None">
</TelerikSpeechToTextButton>
</TextAreaSuffixTemplate>
</TelerikTextArea>

@code {
private string TextValue { get; set; } = string.Empty;
private DotNetObjectReference<Home>? dotNetObjectReference;

private async Task OnStartHandler()
{
await JSRuntime.InvokeVoidAsync("speechRecognitionStarted");
}

private async Task OnEndHandler()
{
await JSRuntime.InvokeVoidAsync("speechRecognitionEnded");
}

protected override async Task OnAfterRenderAsync(bool firstRender)
{
if (firstRender)
{
try
{
await JSRuntime.InvokeVoidAsync("initializeSpeechToTextButton");

dotNetObjectReference = DotNetObjectReference.Create(this);

await JSRuntime.InvokeVoidAsync("setDotNetObjectReference", dotNetObjectReference);
}
catch (Exception ex)
{
Console.Error.WriteLine($"JSInterop failed: {ex.Message}");
}
}

await base.OnAfterRenderAsync(firstRender);
}

[JSInvokable("OnRecordedAudio")]
public async Task OnRecordedAudio(byte[] audioBytes)
{
if (audioBytes == null || audioBytes.Length == 0)
{
return;
}

try
{
using var stream = new MemoryStream(audioBytes);

await GetSpeechToTextResponse(stream);
}
catch (Exception e)
{
Console.WriteLine(e.Message);
return;
}
}

private async Task GetSpeechToTextResponse(MemoryStream stream)
{
var response = await SpeechToTextClient.GetTextAsync(stream);
TextValue = response.Text;
StateHasChanged();
}
}

<script>
// Function to initialize the speechToTextButton object
window.initializeSpeechToTextButton = function() {
console.log("Initializing speechToTextButton object...");

// Create a dedicated object for speech-to-text functionality
window.speechToTextButton = {
// Properties
mediaRecorder: null,
recordingAborted: false,
audioChunks: [],
stream: null,

// Methods
bindMediaRecorderEvents() {
console.log("Binding media recorder events...");
this.mediaRecorder.onstart = () => this.onStart();
this.mediaRecorder.ondataavailable = (e) => this.audioChunks.push(e.data);
this.mediaRecorder.onstop = async () => {
if (this.mediaRecorder) {
const audioBlob = new Blob(this.audioChunks, { type: 'audio/wav' });
const arrayBuffer = await audioBlob.arrayBuffer();
const uint8Array = new Uint8Array(arrayBuffer);
// Call back to Blazor with the recorded audio data
try {
if (window.dotNetObjectReference) {
await window.dotNetObjectReference.invokeMethodAsync("OnRecordedAudio", uint8Array);
} else {
console.warn("dotNetObjectReference is not set.");
}
} catch (error) {
console.error("Error calling OnRecordedAudio:", error);
}
this.audioChunks = [];
this.unbindMediaRecorderEvents();
this.onEnd();
}
};
},

unbindMediaRecorderEvents() {
console.log("Unbinding media recorder events...");
if (this.stream) {
this.stream.getTracks().forEach(track => track.stop());
this.stream = null;
}
if (this.mediaRecorder) {
this.mediaRecorder.onstart = null;
this.mediaRecorder.ondataavailable = null;
this.mediaRecorder.onstop = null;
this.mediaRecorder.onerror = null;
if (this.mediaRecorder.stream) {
this.mediaRecorder.stream.getTracks().forEach(track => track.stop());
}
this.mediaRecorder = null;
}
},

async startMediaRecorder() {
console.log("Starting media recorder...");
this.recordingAborted = false;
this.stream = await navigator.mediaDevices.getUserMedia({ audio: true });
this.mediaRecorder = new MediaRecorder(this.stream);
this.bindMediaRecorderEvents();
this.mediaRecorder.start();
},

async stopMediaRecorder() {
console.log("Stopping media recorder...");
if (this.mediaRecorder && this.mediaRecorder.state !== 'inactive') {
this.mediaRecorder.stop();
}
},

// Event callbacks
onStart() {
// add any additional logic here if necessary
console.log("Media recorder started");
},

onEnd() {
// add any additional logic here if necessary
console.log("Media recorder ended");
},

// Public API methods
async speechRecognitionStarted() {
console.log("Speech recognition started - called from Blazor");
await this.startMediaRecorder();
},

async speechRecognitionEnded() {
console.log("Speech recognition ended - called from Blazor");
await this.stopMediaRecorder();
},
};

// Expose the API methods to window for Blazor interop
window.speechRecognitionStarted = () => window.speechToTextButton.speechRecognitionStarted();
window.speechRecognitionEnded = () => window.speechToTextButton.speechRecognitionEnded();
window.setDotNetObjectReference = (value) => window.dotNetObjectReference = value;

console.log("speechToTextButton object initialized successfully");
};

</script>
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<Router AppAssembly="typeof(Program).Assembly">
<Found Context="routeData">
<RouteView RouteData="routeData" DefaultLayout="typeof(Layout.MainLayout)" />
<FocusOnNavigate RouteData="routeData" Selector="h1" />
</Found>
</Router>
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
@using System.Net.Http
@using System.Net.Http.Json
@using Microsoft.AspNetCore.Components.Forms
@using Microsoft.AspNetCore.Components.Routing
@using Microsoft.AspNetCore.Components.Web
@using static Microsoft.AspNetCore.Components.Web.RenderMode
@using Microsoft.AspNetCore.Components.Web.Virtualization
@using Microsoft.JSInterop
@using SpeechToTextIntegration
@using SpeechToTextIntegration.Components

@using Telerik.Blazor
@using Telerik.Blazor.Components
Loading