Spoken Japanese generally has equal pitch over all syllables. However, this is not speech, this is singing. Just sing it the way the singer does.
Usually, songs are sung 1 beat :: 1 syllable. Naturally, there are exceptions, but syllables generally end:
There are five vowel sounds in the Japanese language. They are:
Double and Triple Vowels are sung as two/three syllables. A double 'o' sound will be written as 'ou'. A double 'e' sound is written as 'ei'.
The Japanese sound usually transcribed as "r" is really halfway between "r" and "l," and as people speak, the sound drifts sometimes closer to "r" and sometimes closer to "l." The same is true while singing. However, it doesn't really make any difference which way the sound drifts. To produce an authentic-sounding Japanese "r," start to say "l" but make the tongue stop short of the roof of your mouth, so that it flaps in empty space.
The English "f" is formed by bringing the lower lip up against the upper front teeth, but the Japanese "f" (actually hu in Japanese) is formed by bringing both lips together as if you are about to blow out a candle.
"n," the syllabic nasal, is different from the "n" in na, ni, nu, ne, no in that it can only come at the end of a syllable and is held as long as any other syllable. In songs, it is often sung on its own as one note, separately from the other syllables. Also, the syllabic nasal is pronounced as an "m" when it appears before "b", "p", or "m"; "ng" when it appears before "k" or "g"; and simple nasalization (like talking with a stuffy nose) when it appears before a vowel.
To sing a double consonant, extend the preceding vowel over the next beat.
"waratte" (Please smile/laugh) is sung over four beats as wa-ra-a-te. It is not sung "wa-ra-t-te" because you can't sing a "t" by itself and sound cool.