Skip to content

added mulhu and mulhs CRT routines#645

Open
ZERICO2005 wants to merge 3 commits intomasterfrom
add_crt_mulhu
Open

added mulhu and mulhs CRT routines#645
ZERICO2005 wants to merge 3 commits intomasterfrom
add_crt_mulhu

Conversation

@ZERICO2005
Copy link
Copy Markdown
Contributor

@ZERICO2005 ZERICO2005 commented Sep 30, 2025

Added multiply high signed/unsigned routines. These can be used to optimize division by a constant. __smulhu is optimized, but the rest are not well optimized. They use the exact same calling convention as the regular multiplication routines. We can optimize these routines in later PR's.

__smulhu   :         HL = ((uint32_t)         HL * (uint32_t)      BC) >> 16
__imulhu   :        UHL = ((uint48_t)        UHL * (uint48_t)     UBC) >> 24
__lmulhu   :      E:UHL = ((uint64_t)      E:UHL * (uint64_t)   A:UBC) >> 32
__i48mulhu :    UDE:UHL = ((uint96_t)    UDE:UHL * (uint96_t) UIY:UBC) >> 48
__llmulhu  : BC:UDE:UHL = ((uint128_t)BC:UDE:UHL * (uint128_t) (SP64)) >> 64

__smulhs   :         HL = ((int32_t)          HL * (int32_t)       BC) >> 16
__imulhs   :        UHL = ((int48_t)         UHL * (int48_t)      UBC) >> 24
__lmulhs   :      E:UHL = ((int64_t)       E:UHL * (int64_t)    A:UBC) >> 32
__i48mulhs :    UDE:UHL = ((int96_t)     UDE:UHL * (int96_t)  UIY:UBC) >> 48
__llmulhs  : BC:UDE:UHL = ((int128_t) BC:UDE:UHL * (int128_t)  (SP64)) >> 64
__smulhu   :  32 bytes |  33F +  12R +   9W +  17
__imulhu   : 117 bytes | 118F +  39R +  38W +  37
__lmulhu   : 1 call to __llmulu
__i48mulhu :  93 bytes | 902F + 246R + 182W + 344
__llmulhu  : (disables interrupts to use exx) slightly faster than 2 calls to __llmulu

__bmulhu was not added since it is just mlt bc \ ld a, b (and the 8-bit calling convention is not well defined).

@ZERICO2005 ZERICO2005 marked this pull request as draft September 30, 2025 03:02
@ZERICO2005 ZERICO2005 added the crt label Oct 10, 2025
@ZERICO2005 ZERICO2005 changed the title added mulhu CRT routines added mulhu and mulhs CRT routines Oct 13, 2025
@ZERICO2005
Copy link
Copy Markdown
Contributor Author

I just converted this branch/PR from FASMG to GAS. So it would be helpful to know if I did it right

@ZERICO2005 ZERICO2005 deployed to Autotester April 1, 2026 18:17 — with GitHub Actions Active
Comment on lines +16 to +24
push hl
lea hl, iy + 0
add hl, hl
sbc a, a

ld hl, $800000
add hl, de
pop hl
rla
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can save 2F using this, which reverses the bits rotated into A but that can be solved by using or a, a \ call m for the first subtract instead of rrca \ call c

Suggested change
push hl
lea hl, iy + 0
add hl, hl
sbc a, a
ld hl, $800000
add hl, de
pop hl
rla
push de
ex de, hl
add hl, hl
sbc a, a
lea hl, iy + 0
add hl, hl
rla
ex de, hl
pop de

Comment on lines +9 to +28
__smulhs:
push bc
push hl
call __smulhu

; if (BC < 0) { result -= HL; }
bit 7, b
pop bc
jr z, .L.positive_hl
or a, a
sbc hl, bc
.L.positive_hl:

; if (HL < 0) { result -= BC; }
bit 7, b
pop bc
ret z
or a, a
sbc hl, bc
ret
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This saves 3R+3W on the stack:

Suggested change
__smulhs:
push bc
push hl
call __smulhu
; if (BC < 0) { result -= HL; }
bit 7, b
pop bc
jr z, .L.positive_hl
or a, a
sbc hl, bc
.L.positive_hl:
; if (HL < 0) { result -= BC; }
bit 7, b
pop bc
ret z
or a, a
sbc hl, bc
ret
__smulhs:
push de
ld d, h
ld e, l
call __smulhu
; if (BC < 0) { result -= HL; }
bit 7, b
jr z, .L.positive_hl
or a, a
sbc hl, de
.L.positive_hl:
; if (HL < 0) { result -= BC; }
bit 7, d
pop de
ret z
or a, a
sbc hl, bc
ret

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

2 participants