software development: transcribe speech to text using public APIs

1. Task

You are asked to write a program that:

- takes an audio file as input,

- chops it into clips at sentence boundaries,

- sends these audio clips one by one to three different public speech recognition services,

- saves audio clips, together with their text transcribed by the above services, into MySQL database:

timestamp, length, audio clip, Google, Baidu, iFlyTek, flag


timestamp (4-byte time): audio clip starting time in original audio file

length (4-byte integer): audio clip length in millisecond

audio clip (binary): 16-bit 16KHz single channel PCM

Google transcription (text): utf-8

Baidu transcription (text): utf-8

iFlyTek transcription (text): utf-8

flag (integer): 0 if all three transcriptions are the same, 1 if two matches, 2 if all different.

2. Audio Source

The audio could be in mp3/m4a/aac/ogg/wma format. It's extracted from youtube video. Our target is educational lectures.

One example is this youtube video: [login to view URL]

you can extract audio with [login to view URL]

the downloadable mp3 result is at [login to view URL]

You can use this for any YouTube content.

3. Audio Segmentation

If you view audio file with a tool (many out there), you will visually see separation between silences and voices. Some silences are merely word boundaries or even just syllable boundaries. The rule we ask to implement is, either the silence is enough long, or the "sentence" is already 7 seconds long. In the latter case we need to chop at a locally longest silence gap.

I see this sentence boundary identification as the most challenging one to those not familiar with audio signal processing. So I outline the logic above. Still, the next question is, how to really calculate "silence"?!

Please follow up with methods listed in this page: [login to view URL]

As one of this project acceptance criteria, we will randomly (use a random number generator on the Internet) select 50 audio clips, listen to them, and confirm the sentence boundary error rate is less than 5%.

4. Speech Recognition

The three speech recognition engines are:


[login to view URL]


A python wrap for Baidu Yuyin API

[login to view URL]

[login to view URL]

iflytek (Xunfei):

Integrate iflytek SDK to Implement Chinese Voice Recognition in AOSP [login to view URL]

Note, it is required to integrate with all above three speech recognition engines. That is, you need to do three integrations, each with its own complexities, such as applying for a free account and receiving tokens/keys.

For both Baidu and iFlyTek, you are encouraged to use Google Translate, as lots of content are in Chinese.

Both Google and Baidu are simple REST APIs, which allows you to implement in essentially any platform and language. But iFlyTek API is really an SDK. The best example I found is the above given Android version. So put together your only choice is Android application.

5. Implementation

We are open to suggestions. But given the above, we expect a pure Android APK implementation.

I will first push/copy several extracted/converted audio files into an Android phone or tablet, and then run your Android APK and get results in corresponding set of files, either in MySQL database or simply CSV format. I will then pull/copy these files back to my computer.

You shall provide a way for me to randomly go to a clip, play out its audio clip, and read the transcribed text, place it into, say, Google web service and see results.

Kemahiran: Android, Perkhidmatan Audio, Pembangunan Perisian, Analisis Statistik, Khidmat Web

Lihat lagi: marathi text speech software, text speech using web application, text speech gujarati software, php, website design, gujarati text speech software, gujarati text speech software name, text speech software, text speech using sapi voices, send text speech phone notification, text speech websites, text speech application, net text speech application, custom development internet software, integrate text speech flash, text speech flash, text speech integrated website

Tentang Majikan:
( 1 ulasan ) Cupertino, United States

ID Projek: #11391108

16 pekerja bebas membida secara purata $561 untuk pekerjaan ini


Dear Sir, Trust us we can do this project as we had done similar project of our clients. We want to work with you and build a healthy longterm relationship so please contact us and discuss for this project before fi Lagi

$749 USD dalam 35 hari
(225 Ulasan)

I want to discuss this project with you further, let me know the best suitable time for you to schedule the meeting, Feel free to message me at any time, i used to be online 14 hrs in a day on this website so probably Lagi

$773 USD dalam 10 hari
(11 Ulasan)

Hello sir, I am from Vietnam, I have 8 year in Software Industry with many running product. I have experience in web application with many large sites, also Java enterprise applications and mobile applications. My Lagi

$250 USD dalam 10 hari
(81 Ulasan)

Hello Sir, Hope you are fine there. We are having good experience with Mobile App projects and the reason we came across here to give the best output to your project with supreme quality. We have developed Lagi

$555 USD dalam 10 hari
(20 Ulasan)

Hi there, I’d like to be considered for your job position. I’m a Software Developer with a strong background developing web application. I can turn your requirement in a way that represents your brand and appeals to Lagi

$555 USD dalam 10 hari
(14 Ulasan)

do u have any api in mind to implement ?

$555 USD dalam 5 hari
(8 Ulasan)

I am a person with strong Analytical ability in Mathematics / Statistics/Economics/Finance having BSc. (specialized in statistics), MBA (specialized in Finance), MSc. (specialized in Financial Mathematics). On time d Lagi

$250 USD dalam 10 hari
(11 Ulasan)

I'm interested, but no project description so I don't know what to write here. Message me back with info if you up for it. Cheers, Alek

$555 USD dalam 10 hari
(1 Ulasan)

hi, i'm 15+ years experienced with strong knowledge of API / Payment Gateway integration and php with mysql, Plugin development and designing see my few portfolio **CSM and eCom** Magento / WooCommerce / Lagi

$526 USD dalam 10 hari
(3 Ulasan)

hi I am writing to you since I am interested in the job posting. Working with PHP, HTML/CSS and MYSQL, I have completed a number of projects that are relevant to the skills required in this job posting. They in Lagi

$555 USD dalam 10 hari
(1 Ulasan)

Hello, Professional developers with similar expertise here. We are posting our bid as an expression of interest and appreciate further discussion in private message board. We are waiting for your message to communi Lagi

$526 USD dalam 10 hari
(1 Ulasan)

Hi, I have worked on speech recognition app that detect frequency of speech & count the number of silence in the speech. I am comfortable to customize the same app to detect silence and split the audio file into c Lagi

$750 USD dalam 10 hari
(1 Ulasan)
$555 USD dalam 10 hari
(0 Ulasan)

A proposal has not yet been provided

$500 USD dalam 8 hari
(0 Ulasan)

I can make your project a great success. I'm 31 year old talented PHP and open-source developer. I have 9 years of experience in Server Administration and Web Application Development. I'm expert in web application cust Lagi

$487 USD dalam 13 hari
(0 Ulasan)

Hi.I am a Visual Basic .NET programmer with 10 years experience. I read the description of your project and I am interested. Read a short version of my CV and see screenshots+videos of softwares i made> [login to view URL] Lagi

$833 USD dalam 10 hari
(0 Ulasan)