Vox Screen – Frequently Asked Questions

What does the software do?

VoxScreen is designed to ingest live audio and output real time AI generated text to screen with a customisable view (font size and colour/background colour/text positioning +external screen output via HDMI).
It uses AI to generate realtime closed captions overlays for live to screen or live to stream use, translations, accessibility and inclusivity solution for events, government and educational needs.

VoxScreen primary usage?

VoxScreen is designed for Live and Hybrid events that require capability to generate and display live transcription to screen (event space) and online (live streaming) plus option of saving caption files to disk for future use.

How VoxScreen can help us to stay complient?

Compliance and inclusion is a major aspect for a lot of companies and organisations during live and hybrid events.

VoxScreen can help you in those areas by providing solutions that expands your inclusivity and compliance aspects during live events Speech-to-Text technology helps organisations to stay compliant by ensuring accurate record-keeping, improving accessibility, facilitating legal compliance, enhancing data security, supporting quality assurance. By leveraging STT, organisations can meet regulatory requirements more efficiently and effectively, reducing the risk of non-compliance.

What control of text generated VoxScreen allows?

You have full control of the text positioning (via adjusting padding) font and size selection as well as font colour and background colour.

App operator has “preview screen” of how text appears on external output.

What is WER (Word Error Rate)?

Word Error Rate (WER) is a metric used to evaluate the accuracy of automatic speech recognition (ASR) systems. It measures the number of errors in the transcription produced by the ASR system compared to a reference transcription

With large model VoxScreen provides around 5.6% WER rate.

We have tested the modesl using live event recordings and accurary can reach up to 95-98%.

Major cloud providers achieve WER smaller than 25%, while 15% or less is more acceptable by now

General Benchmarks:

Commercial ASR Systems: High-quality commercial ASR systems from major tech companies (like Google, Microsoft, IBM, and Amazon) typically achieve WERs in the range of 5% to 10% for clear, well-spoken audio in ideal conditions.
State-of-the-Art Models: The best state-of-the-art models in controlled environments, such as those used in research and competitions, can achieve WERs as low as 2% to 4%.
Conversational Speech: For more challenging conditions, such as conversational speech with background noise, overlapping speakers, or strong accents, a WER below 15% to 20% is considered very good.

Can I download a demo version for testing?

Yes, both – PC and Mac version are downloadable via our website and after installation will run in “demo mode” with some limitations but full ASR functionality.

By registering your app using license code, software will be set to “full production mode” with no limitations.

Do you provide discounts for multiple licenses purchase?

Yes, for larger organisations that require multimple licenses we provide discounts and also direct support. Please e-mail customer support and we can arrange a meeting to discuss your needs and work out a package and discount rate that suits your organisation.

Will there be any updates to the app?

Yes, we are constantly working on improvements and next generation releases. All current fixes and improvements will be sent to current customers to update the current app version and next generation releases will be free for 1 year license clients and heavily discounted for “lifetime license clients.

Can I talk to a customer manager?

Yes – we are always happy to chat to our clients. Please get in touch via our support e-mail and we can arrange a call or a meeting to discuss any questions or queries that you might have.