Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmMgr.c VM_RAM not mapped if VRAM_USE_FTL==0 #88

Open
parisseb opened this issue Nov 23, 2022 · 13 comments
Open

vmMgr.c VM_RAM not mapped if VRAM_USE_FTL==0 #88

parisseb opened this issue Nov 23, 2022 · 13 comments

Comments

@parisseb
Copy link
Contributor

parisseb commented Nov 23, 2022

The code in vmMgr_init() should be
mapList_AddPartitionMap(MAP_PART_RAWFLASH, PERM_R, VM_ROM_BASE, FLASH_SYSTEM_BLOCK * 64, VM_ROM_SIZE);

mapList_AddPartitionMap(MAP_PART_FTL, PERM_R | PERM_W, VM_RAM_BASE, 0, VM_RAM_SIZE);

#if (VMRAM_USE_FTL == 0)

uint32_t paddr = (uint32_t)vm_ram_none_ftl;
for (uint32_t vaddr = VM_RAM_BASE; vaddr < (VM_RAM_BASE + VM_RAM_SIZE_NONE_FTL); vaddr += PAGE_SIZE) {
    mmu_map_page(vaddr, paddr, AP_SYSRW_USRRW, VM_CACHE_ENABLE, VM_BUFFER_ENABLE);
    paddr += PAGE_SIZE;
}
mmu_invalidate_tlb();
//

#endif

So that we can use two kinds of ram at the same time: not swapped and swapped. I think that the 1 bit per pixel screen buffer (4K) should never be swapped, leaving 264K for cache. SystemConfig.h would look like
#define USE_TINY_PAGE (1)
#define VMRAM_USE_FTL (0)

#define SEG_SIZE 1048576

#if VMRAM_USE_FTL
#if USE_TINY_PAGE
#ifdef ENABLE_AUIDIOOUT
#define NUM_CACHEPAGE ( 200 ) // 273 * 1 = 273 KB
#else
#define NUM_CACHEPAGE ( 268 ) // 273 * 1 = 273 KB
#endif
#else
#define NUM_CACHEPAGE ( 79 ) // 79 * 4 = 316 KB
#endif
#else
#if USE_TINY_PAGE
#define NUM_CACHEPAGE ( 264 )
#define VM_RAM_SIZE_NONE_FTL ( 4 * 1024 )
#else
#define NUM_CACHEPAGE ( 32 )
#define VM_RAM_SIZE_NONE_FTL ( 168 * 1024 )
#endif
#endif

The loader script Scripts/sys_ld.script would be:
MEMORY { vmRAM (rwx) : ORIGIN = 0x02040000, LENGTH = 2M vmROM (rx ) : ORIGIN = 0x00100000, LENGTH = 6M }
Changes in kcasporing_gl.c:
char * screen_1bpp=0x02000000;
and avoid initialization of virtual_screen in 1 bit per pixel mode:
if (!khicas_1bpp) memset(virtual_screen, COLOR_WHITE, VIR_LCD_PIX_H * VIR_LCD_PIX_W);

Now integrate(1/(x^4+1)) is 0.93s (normal mode) or 0.54s (fast CPU). By comparion on the Casio monochrom, it's 0.34s. For plot(sin(x)) in fast CPU mode, 0.06s vs 0.15s on the Casio. If we can spare RAM in OSLoader own use, it should be possible to improve the integrate benchmark!

@yuuki410
Copy link
Contributor

yuuki410 commented Nov 23, 2022

Sorry that I can't read your post, I wonder if the following text is what you originally intended?
@parisseb


The code in vmMgr_init() should be

    mapList_AddPartitionMap(MAP_PART_RAWFLASH, PERM_R, VM_ROM_BASE, FLASH_SYSTEM_BLOCK * 64, VM_ROM_SIZE);

    mapList_AddPartitionMap(MAP_PART_FTL, PERM_R | PERM_W, VM_RAM_BASE, 0, VM_RAM_SIZE);
#if (VMRAM_USE_FTL == 0)

    uint32_t paddr = (uint32_t)vm_ram_none_ftl;
    for (uint32_t vaddr = VM_RAM_BASE; vaddr < (VM_RAM_BASE + VM_RAM_SIZE_NONE_FTL); vaddr += PAGE_SIZE) {
        mmu_map_page(vaddr, paddr, AP_SYSRW_USRRW, VM_CACHE_ENABLE, VM_BUFFER_ENABLE);
        paddr += PAGE_SIZE;
    }
    mmu_invalidate_tlb();
    //

#endif

So that we can use two kinds of ram at the same time: not swapped and swapped. I think that the 1 bit per pixel screen buffer (4K) should never be swapped, leaving 264K for cache. SystemConfig.h would look like

#define USE_TINY_PAGE       (1)
#define VMRAM_USE_FTL       (0)

#define SEG_SIZE            1048576


#if VMRAM_USE_FTL
    #if USE_TINY_PAGE
        #ifdef ENABLE_AUIDIOOUT
            #define NUM_CACHEPAGE             ( 200 ) // 273 * 1 = 273 KB
        #else
            #define NUM_CACHEPAGE             ( 268 ) // 273 * 1 = 273 KB
        #endif
    #else
        #define NUM_CACHEPAGE             ( 79 ) // 79 * 4 = 316 KB
    #endif
#else
    #if USE_TINY_PAGE
        #define NUM_CACHEPAGE             ( 264 )
        #define VM_RAM_SIZE_NONE_FTL      ( 4 * 1024 )
    #else
        #define NUM_CACHEPAGE             ( 32 )
        #define VM_RAM_SIZE_NONE_FTL      ( 168 * 1024 )
    #endif
#endif

The loader script Scripts/sys_ld.script would be:

MEMORY
{
  vmRAM    (rwx) : ORIGIN = 0x02040000, LENGTH = 2M
  vmROM    (rx ) : ORIGIN = 0x00100000, LENGTH = 6M
}

Changes in kcasporing_gl.c:

char * screen_1bpp=0x02000000;

and avoid initialization of virtual_screen in 1 bit per pixel mode:

  if (!khicas_1bpp)
    memset(virtual_screen, COLOR_WHITE, VIR_LCD_PIX_H * VIR_LCD_PIX_W);

Now integrate(1/(x^4+1)) is 0.93s (normal mode) or 0.54s (fast CPU). By comparion on the Casio monochrom, it's 0.34s. For plot(sin(x)) in fast CPU mode, 0.06s vs 0.15s on the Casio. If we can spare RAM in OSLoader own use, it should be possible to improve the integrate benchmark!


Click here to show the source code

@parisseb
Copy link
Contributor Author

parisseb commented Nov 23, 2022

Yes, I tried but could not make github render the code correctly, I don't know why (I have added the vmMgr.c file to my giac39.tgz archive). Maybe we could discuss further optimizations on a phpbb forum like https://tiplanet.org/forum/viewforum.php?f=70 ?

I think we could spare a few K in OSLoader. For example in msc_disk.c, the variables
uint8_t MSCRBuffer[2048] __aligned(4);
uint8_t MSCWRBuf[2048] __aligned(4);
are not used at all and that's 4K. Unfortunately, commenting these 2 lines does not change the RAM available for cache, because of the following L1PTE variable that is 16K aligned. But it should be possible to reorder the object files load so that this aligned variable does not leave an unused area.
[Update] For example rename vmMgr.c to 0vmMgr.c will link it's object compilation before mmu.c.
There is currently a potential for releasing about 6K. With a careful study of the OSLoader code, perhaps some buffers could be optimized.
I would really like to be able to run KhiCAS with almost 0 RAM page swapping. This would improve benchmarks as well as flash lifetime.

@parisseb
Copy link
Contributor Author

parisseb commented Nov 23, 2022

Found another unusued buffer: pcWriteBuffer in OSLoader/start.c, 5K.

@parisseb
Copy link
Contributor Author

Can we share a common buffer for page_save_wr_buf, page_save_rd_buf (VmMgr/vmMgr.c) and data_page_buffer (LowLevelAPI/llapi.c)? Potential saving 4K.

@parisseb
Copy link
Contributor Author

Potential alignment optimizations?

  • pMtdinfo (almost 2K)
  • dcd_data (1.5K)
  • PageBuffer (almost 2K) explicit alignment

@parisseb
Copy link
Contributor Author

Is L1PTE_NUM really #define L1PTE_NUM (2049)? Setting to 2048 would save almost 4K between L1PTE and L2PTE.

@Repeerc
Copy link
Collaborator

Repeerc commented Nov 23, 2022

peripheral register address at 0x80000000, we need to set "PTE_LOC[0x800]" to map this segment to virtual address space for driver, so we defined PTE_LOC[2049]. In fact, most of the area (PTE_LOC[49] to PTE_LOC[2048] total about 8KB ) is redundant and could probably be used.

@Repeerc
Copy link
Collaborator

Repeerc commented Nov 23, 2022

MSCWRBuf and MSCWRBuf I forgot to remove them, initially for the small sector USB transfers, but now I don't need them...

@parisseb
Copy link
Contributor Author

If you look at the loader output, the address of L1PTE and L2PTE differs from 12K because L2PTE is 4K aligned. In other words, the 2049-th index is responsible for 4K additional RAM use. If it's unavoidable, maybe it's possible to use 4K-4 bytes for something else.

@parisseb
Copy link
Contributor Author

parisseb commented Nov 23, 2022

I have found a way to move PageFaultQueue and mapList from bss to data and save 1K, just initialize to {0] or 0. Then move the definition of L2PTE at the beginning of vmMgr.c, renamed 0vmMgr.c, and the loader orders the RAM much better

00021444 g     O .bss	00000004 faultAddress
00022000 g     O .bss	00001000 vm_ram_none_ftl
00024000 g     O .bss	0000d000 L2PTE
00031000 g     O .bss	00042000 CachePage
00074000 g     O .bss	00002004 L1PTE
00076004 g     O .bss	00000004 vm_svc_stack_address

00076394 g       .bss	00000000 __HEAP_START

I added 4K to the non swappable usable RAM to 8K, and 4 pages of cache to 268 pages and the rom heap start address is lower than before.

@Repeerc
Copy link
Collaborator

Repeerc commented Nov 25, 2022

I moved the L1 page table to a separate space supported by the chip(default first-level page table, DFLPT), which will save 8KB of memory and I trimmed some useless buffers, and now we have about 300KB of physical memory.

DFLPT

Here is the new code:
https://github.com/Repeerc/ExistOS-For-HP39GII
or compare views:
Repeerc@0e13c69?diff=split

I have tested turning memory swapping off and it looks like the UI written by LvGL is difficult to run (maybe we need a simple UI), but it is sufficient for KhiCAS to run, so I set it up to enter KhiCAS immediately after startup (https://github.com/Repeerc/ExistOS-For-HP39GII/blob/main/System/main.c#L1160-L1178).

vm2

@parisseb
Copy link
Contributor Author

Great!
It's probably possible to spare 4K : MPTE_Table is 78 bytes and there is an emtpy gap of 4K-78 with L2PTE.
Now we should trim VRAM usage. 168K are currently already used over 270. There are 3 full screen buffer disp_buf_1, vrambuf, full_screen_buf, that's 96K instead of 32K if we share them. These 64K could be reinjected in NUMPAGES for ROM.
(Another option is to enable partial RAM swap like I did before your changes with a few dedicated areas for 1bpp screen and KhiCAS fast alloc).

@parisseb
Copy link
Contributor Author

Unfortunately, my attempts to boot the calculator with this new configuration failed.
With a few additional tricks, I have now 288K of RAM available, 32K for giac reserved areas that are never swapped and 256K for the ROM and RAM swap. The archive https://www-fourier.univ-grenoble-alpes.fr/~parisse/hp39/giac39.tgz has a README file explaining all these changes and a changes.tgz with the modified files.

One of the change I made is F3 detection at boot time in OSLoader/start.c : if F3 pressed, display No system, this way one can reflash a calculator even if System ends up with as System panic. I also had problems with USB MSC mode that did not work until I exchanged the Views and mode string displays in the source code (start.c), and then it resume working. No idea why, perhaps a problem with my calculator...

I'm now confident that the RAM swap is minimal inside KhiCAS, the lifetime of the flash should not be affected by swapping. I will now stop looking at the OS and concentrate on KhiCAS itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants