x
This website is using cookies. We use cookies to ensure that we give you the best experience on our website. More info. That's Fine
HPC:Factor Logo 
 
Latest Forum Activity

eVC compiler FP support on SH4

ict Page Icon Posted 2023-10-29 7:41 PM
#
Avatar image of ict
Factorite (Junior)

Posts:
39
Location:
United States
Status:
Hello,

I've recently added an SH4 device to my collection and I've always wondered about how well supported that architecture actually was in H/PCs...

The chips used in the likes of the Aero 8000, HPW-600 and others are fully-featured parts including the FPU which is a very unusual feature for an H/PC or really any other embedded device at that time, but did the Microsoft eVC compilers ever even utilize it when building for an SH4 target? Are there any applications in general that run noticeably faster on an SH4 versus later SH3/VR4121 devices? I'd imagine that the (also unusual) superscalar core would help the latter at least a bit.

Thanks for reading!
 Top of the page
stingraze Page Icon Posted 2023-10-31 2:14 AM
#
Avatar image of stingraze
Subscribers
H/PC Vanguard

Posts:
3,692
Location:
Japan
Status:
I don't know much about FPU utilized in Windows CE environment, but it may be interesting to use some sort of benchmark testing for those purposes to see the difference between SH based machines and ARM / MIPS machines.

A good starting point may be to compile this for Windows CE using eVT,

"a collection of a few simple fpu-intensive benchmarks"
https://kluge.in-chemnitz.de/docs/notes/benchmark.php

Source Code on GitHub
"This benchmark measures the speed of individual arithmetic operations such as add, mul, div, sqrt, sin, etc."
https://github.com/ttk592/bench

-stingraze

Edited by stingraze 2023-10-31 2:27 AM
 Top of the page
ict Page Icon Posted 2023-10-31 4:37 AM
#
Avatar image of ict
Factorite (Junior)

Posts:
39
Location:
United States
Status:
This rocks, thanks. I was thinking of rolling my own benchmark to test it but wasn't sure how I'd implement it, I might have to try setting up a development VM using the tutorial and building executables based on these... I might play with some device-hosted languages too at some point like Pocket Scheme or PocketC to see if they show any differences with a Jornada 600, I'll update the thread if I find out anything interesting.
 Top of the page
stingraze Page Icon Posted 2023-10-31 5:00 AM
#
Avatar image of stingraze
Subscribers
H/PC Vanguard

Posts:
3,692
Location:
Japan
Status:
Great! Good luck!
-stingraze
 Top of the page
stingraze Page Icon Posted 2023-10-31 12:48 PM
#
Avatar image of stingraze
Subscribers
H/PC Vanguard

Posts:
3,692
Location:
Japan
Status:
I found another simple sample in C that may be of help. Converting to C++ shouldn’t be too hard.
https://github.com/jzawodn/arm-neon-vfp-test/blob/master/test.c
 Top of the page
ict Page Icon Posted 2023-11-05 1:08 AM
#
Avatar image of ict
Factorite (Junior)

Posts:
39
Location:
United States
Status:
I haven't gotten around to setting up a development environment yet to try those C sources but out of curiosity I took a look at that most recent snippet you sent and tried to implement something similar using Pocket Scheme on a Jornada 690 (133 MHz SH3) and a Hitachi ePlate (128 MHz SH4) to see if the latter would have any boost in performance. I ended up with this Scheme procedure to approximate it:

(define (muldiv-loop x y iterations) (let ((a 0)) (do ((i 0 (+ i 1))) ((= i iterations) i) (set! a (* x y)) (set! a (/ x y))) a))


Looping this with x = 1000.0 and y = 2000.0 for 100,000 iterations runs about 20% faster on the SH4 (~7.2s) than the SH3 (~8.9s) which is a pretty good boost but it seems to me like you'd see even more if the FPU was being utilized and this from the other architectural enhancements of the SH4, I'm not sure how FP is implemented in Pocket Scheme so it doesn't mean too much as far as whether eVC properly supports it but it's still interesting to take a look at, at least...
 Top of the page
stingraze Page Icon Posted 2023-11-05 3:04 AM
#
Avatar image of stingraze
Subscribers
H/PC Vanguard

Posts:
3,692
Location:
Japan
Status:
Very interesting.
130MHz of SH3 vs. 128MHz of SH4 and SH4 was 20% faster.

It may indeed be that floating point is being processed faster with the FPU.
It might be interesting to test other types as well, like int or even char with C to test if the FPU is indeed the speedup's cause or something else.

I think there is PocketC. Not sure if SH3 program will run on SH4, but here's the link:
https://www.hpcfactor.com/scl/350/Landware/PocketC/version_b1.29

-stingraze

Edited by stingraze 2023-11-05 3:07 AM
 Top of the page
Jump to forum:
Seconds to generate: 0.144 - Cached queries : 64 - Executed queries : 12